Examples for evaluating generative AI use cases on Amazon Bedrock and Amazon SageMaker.
- Examples for how ROUGE is computed over text
- Examples for how BERT score is computed over text
- Consider which use cases fits each
- Implements RAGAS framework for baseline testing of amazon Bedrock Knowledge bases
- Measures retrieval accuracy and relevance
- Evaluates context precision and faithfulness
- Use RAGAS to find optimal query time parameters for knowledge bases -- number of retreived answers -- Choice of generating model
- Integration with Bedrock Guardrails
- RAGAS safety metrics implementation
- Measure guardrail accuracy by analyzing tradeoffs between over-filtering (false positives) and under-filtering (false negatives).
- Evaluate models on meeting summarization tasks using the MeetingBank dataset
- Support for both Amazon Bedrock and external models (Google Gemini)
- Pre-generation of model responses for evaluation
- Integration with Amazon Bedrock's evaluation capabilities
Open an Issue or a Pull request.
This project is licensed under the LICENSE file in the repository.