-
Notifications
You must be signed in to change notification settings - Fork 238
Knowledge beta improvements #2545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… (TI-3, TI-5) - Add pattern filtering foundation with --include/--exclude CLI flags (TI-3) - Implement knowledge settings integration with database system (TI-5) - Add comprehensive pattern filter module with glob-style support - Enhance file processor with async pattern filtering capabilities - Add extensive test coverage for pattern filtering functionality - Update knowledge CLI with improved error handling and validation - Add settings support for chunk size, overlap, and file limits
…filtering - Update API examples to use new AddContextRequest structure - Add comprehensive pattern filtering documentation with examples - Replace sync API references with async equivalents - Add new features: pattern filtering and database settings integration - Remove outdated sync method references and examples
- Replace non-existent Os::test_default() with Os::new().await.unwrap() - Update test functions to be async and await create_test_os() - Fix test assertions to match actual default config values: - chunk_size: 512 (not 1000) - chunk_overlap: 128 (not 200) - max_files: 10000 (not 5000) - Remove unused mut from Os creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove unused self parameter from PatternFilter::matches_pattern method - Update documentation to correctly describe *.rs pattern as recursive - Fix Rust compiler warning about unused self argument
…dge add - Add pattern validation in add_context method to fail fast on invalid patterns - Modify SemanticSearchError::InvalidArgument display to avoid redundant prefixes for pattern errors - Simplify error message chain to show clean pattern validation errors - Add test for pattern validation functionality Fixes verbose error messages like: 'Error: Failed to add: Failed to start indexing: Invalid argument: Invalid include pattern...' Now shows clean: 'Error: Invalid include pattern...' Other InvalidArgument errors still get appropriate prefixes.
Replace |e| SemanticSearchError::InvalidArgument(e) with SemanticSearchError::InvalidArgument
Replace escaped backslashes with backticks to show **/*.ts correctly instead of displaying literal backslashes in help output.
|
||
#[test] | ||
fn test_absolute_vs_relative_path_handling() { | ||
// Test the fix for Brandon's issue: absolute paths from walkdir should work with relative patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't include my name here haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol
assert!(result.is_ok()); | ||
let cli = result.unwrap(); | ||
|
||
if let KnowledgeSubcommand::Cancel { operation_id } = cli.knowledge { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Without operation_id - should cancel most recent operation
this test isn't checking any functionality, all it's doing is checking that operation_id is set. Same format as any clap parse test, so can we remove some of the noise here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
|
||
#[test] | ||
fn test_help_text_markdown_rendering() { | ||
// Test to reproduce Brandon's issue: "clap is trying to parse these `*` as markdown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove my name :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing
println!("Add subcommand help output:\n{}", help_output); | ||
|
||
// Check that our fix is working - patterns should be properly quoted | ||
if help_output.contains("\"**/*.ts\"") && help_output.contains("\"**/*.md\"") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need a test
// Validate patterns early to fail fast | ||
if let Some(ref include_patterns) = request.include_patterns { | ||
crate::pattern_filter::PatternFilter::new(include_patterns, &[]) | ||
.map_err(SemanticSearchError::InvalidArgument)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error should have context about what argument is invalid, right?
@@ -40,7 +40,7 @@ You can now control which files are indexed using include and exclude patterns: | |||
`/knowledge add "docs" /path/to/project --include "**/*.md" --include "**/*.txt" --exclude "node_modules/**"` | |||
|
|||
Pattern examples: | |||
- `*.rs` - All Rust files in current directory | |||
- `*.rs` - All Rust files in all directories recursively |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird docs here - if both *.rs
and **/*.rs
have the same behavior, then shouldn't we just have one line that says something like:
- `*.rs` - All Rust files in all directories recursively (equivalent to `**/*.rs`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
|
||
#[test] | ||
fn test_rust_file_detection() { | ||
assert_eq!(get_file_type(&PathBuf::from("main.rs")), FileType::Code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
way too much noise in these tests, can we cut this down? Just take a slice of str's to expected file types. It just needs a single test
assert!(filter.should_include(&PathBuf::from("main.rs"))); // Current dir - should match | ||
|
||
// These should NOT match with *.rs (only * not **) | ||
// If they do match, then the documentation is wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertion here?
Also what is the actual behavior of *.rs
then? Should it match files recursively in subdirectories or not?
use super::*; | ||
|
||
#[test] | ||
fn test_pattern_filter_creation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to reduce noise in these tests as well?
Pattern Filtering System (TI-3)
• CLI Integration: Added --include and --exclude flags to /knowledge add command
• Pattern Filter Module: New pattern_filter.rs with comprehensive glob-style pattern matching
• Async File Processing: Enhanced file_processor.rs with async pattern filtering capabilities
Minor fixes (TI-2)
• Rename from contexts from knowledge bases: Changed the semantic on how we refer to the knowledge entries.
• "Are you sure?" is Microsft-esque and very un-Amazonian.: Changed confirmation to a more amaz
• *Increased amount of extensions supported: Added more text extensions to list of supported files.
Move to right path for knowledge (TI-1)
• Automatic migration and new path for knowledge bases: Now they will live in ./aws/amazonq/knowledge_bases
Database Settings Integration (TI-5)
• Settings Schema: Added new settings for KnowledgeChunkSize, KnowledgeChunkOverlap, KnowledgeMaxFiles, KnowledgeDefaultIncludePatterns, KnowledgeDefaultExcludePatterns
• Helper Method: Added get_int_or() utility method for cleaner database settings access
• Configuration Integration: Knowledge store now reads chunk size, overlap, and file limits from database settings
🔧 API Improvements
Clean POJO-Based API
• New Request Structure: Replaced messy method overloads with AddContextRequest POJO
• Removed Deprecated Methods: Eliminated add_context_from_path() and add_context_from_path_with_patterns()
• Single Clean Method: New add_context(AddContextRequest) method handles all scenarios
• Better Naming: Renamed IndexingParams to AddContextRequest for external-facing API
Code Organization
• Extracted Migration Logic: Moved legacy knowledge base migration code to separate method
• Removed Verbose Patterns: Eliminated repetitive get_int().map_or() database access patterns
• Cleaner Error Handling: Improved error messages and validation throughout
🎨 User Experience Improvements
CLI Enhancements
• Better Styling: Changed add operation from green OperationResult::Success to neutral OperationResult::Info
• Improved Messages: Enhanced feedback messages with pattern filtering information
• Consistent Behavior: Both add and update operations now use consistent styling
Pattern Filtering UX
• Flexible Options: Support for include-only, exclude-only, or combined filtering
• Clear Feedback: Operations show which patterns are applied
• Database Defaults: Automatic loading of default patterns from database settings
API Consistency
• Method Signatures: Standardized all context operations to use structured request objects
• Error Handling: Consistent error types and messages across all operations
• Type Safety: Proper type definitions for all public API structures
📚 Documentation Updates
README Overhaul
• Async API Examples: Updated all code examples to use new async AddContextRequest API
• Pattern Filtering Guide: Comprehensive documentation with glob syntax and examples
• Feature Updates: Added pattern filtering and database settings to features list
• Removed Outdated Info: Eliminated references to deprecated sync API methods