Skip to content

Conversation

@IamAGP
Copy link

@IamAGP IamAGP commented Sep 27, 2025

Summary

Adds a new synthetic text2cypher dataset generated using Claude 3.5 Haiku via AWS Bedrock, expanding the model coverage in this repository.

What's New

  • New model: First Claude 3.5 Haiku dataset in the collection
  • Budget-friendly: Configurable sample sizes with cost estimation
  • Enhanced validation: Improved false_schema detection (goes beyond what opus notebook implements)
  • Methodology: Follows established patterns from existing synthetic datasets

Files Added

  • datasets/synthetic_haiku35_demodbs/ - Complete dataset folder
  • README.md - Documentation following project conventions
  • anthropic_text2cypher_haiku35.ipynb - Generation notebook with AWS Bedrock integration
  • text2cypher_haiku35.csv - Sample dataset (10 questions for validation)
  • Supporting CSV files (text2cypher_questions.csv, text2cypher_schemas.csv)

Testing

  • ✅ Tested with 10 sample questions
  • ✅ All validation metrics working (syntax_error, timeout, returns_results, false_schema)
  • ✅ Successfully connects to AWS Bedrock Haiku 3.5
  • ✅ Neo4j demo database validation working

Value Added

  • Provides cost-effective alternative to larger models
  • Enables comparison studies between different model sizes
  • Maintains research reproducibility with clear methodology
  • Ready for larger scale generation when needed

- First Haiku 3.5 dataset in the collection
- Follows established methodology from other synthetic datasets
- Includes proper false_schema detection (improvement over opus)
- Budget-friendly approach with configurable sample sizes
- Generated and validated 10 sample questions as proof of concept
- Ready for larger scale generation

Contributes to issue: Expanding model coverage for text2cypher research
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant