Skip to content

Knowledge beta improvements #2545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kensave
Copy link
Contributor

@kensave kensave commented Aug 11, 2025

Pattern Filtering System (TI-3)

CLI Integration: Added --include and --exclude flags to /knowledge add command
Pattern Filter Module: New pattern_filter.rs with comprehensive glob-style pattern matching
Async File Processing: Enhanced file_processor.rs with async pattern filtering capabilities

Minor fixes (TI-2)

Rename from contexts from knowledge bases: Changed the semantic on how we refer to the knowledge entries.
"Are you sure?" is Microsft-esque and very un-Amazonian.: Changed confirmation to a more amaz
• *Increased amount of extensions supported: Added more text extensions to list of supported files.

Move to right path for knowledge (TI-1)

Automatic migration and new path for knowledge bases: Now they will live in ./aws/amazonq/knowledge_bases

Database Settings Integration (TI-5)

Settings Schema: Added new settings for KnowledgeChunkSize, KnowledgeChunkOverlap, KnowledgeMaxFiles, KnowledgeDefaultIncludePatterns, KnowledgeDefaultExcludePatterns
Helper Method: Added get_int_or() utility method for cleaner database settings access
Configuration Integration: Knowledge store now reads chunk size, overlap, and file limits from database settings

🔧 API Improvements

Clean POJO-Based API

New Request Structure: Replaced messy method overloads with AddContextRequest POJO
Removed Deprecated Methods: Eliminated add_context_from_path() and add_context_from_path_with_patterns()
Single Clean Method: New add_context(AddContextRequest) method handles all scenarios
Better Naming: Renamed IndexingParams to AddContextRequest for external-facing API

Code Organization

Extracted Migration Logic: Moved legacy knowledge base migration code to separate method
Removed Verbose Patterns: Eliminated repetitive get_int().map_or() database access patterns
Cleaner Error Handling: Improved error messages and validation throughout

🎨 User Experience Improvements

CLI Enhancements

Better Styling: Changed add operation from green OperationResult::Success to neutral OperationResult::Info
Improved Messages: Enhanced feedback messages with pattern filtering information
Consistent Behavior: Both add and update operations now use consistent styling

Pattern Filtering UX

Flexible Options: Support for include-only, exclude-only, or combined filtering
Clear Feedback: Operations show which patterns are applied
Database Defaults: Automatic loading of default patterns from database settings

API Consistency

Method Signatures: Standardized all context operations to use structured request objects
Error Handling: Consistent error types and messages across all operations
Type Safety: Proper type definitions for all public API structures

📚 Documentation Updates

README Overhaul

Async API Examples: Updated all code examples to use new async AddContextRequest API
Pattern Filtering Guide: Comprehensive documentation with glob syntax and examples
Feature Updates: Added pattern filtering and database settings to features list
Removed Outdated Info: Eliminated references to deprecated sync API methods

… (TI-3, TI-5)

- Add pattern filtering foundation with --include/--exclude CLI flags (TI-3)
- Implement knowledge settings integration with database system (TI-5)
- Add comprehensive pattern filter module with glob-style support
- Enhance file processor with async pattern filtering capabilities
- Add extensive test coverage for pattern filtering functionality
- Update knowledge CLI with improved error handling and validation
- Add settings support for chunk size, overlap, and file limits
…filtering

- Update API examples to use new AddContextRequest structure
- Add comprehensive pattern filtering documentation with examples
- Replace sync API references with async equivalents
- Add new features: pattern filtering and database settings integration
- Remove outdated sync method references and examples
- Replace non-existent Os::test_default() with Os::new().await.unwrap()
- Update test functions to be async and await create_test_os()
- Fix test assertions to match actual default config values:
  - chunk_size: 512 (not 1000)
  - chunk_overlap: 128 (not 200)
  - max_files: 10000 (not 5000)
- Remove unused mut from Os creation
Copy link
Contributor

@brandonskiser brandonskiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/knowledge cancel actually cancels all ongoing operations, not just the most recent
Screenshot 2025-08-11 at 3 34 21 PM

An error using --include syntax seems to stall the operation
Screenshot 2025-08-11 at 4 01 58 PM

- Remove unused self parameter from PatternFilter::matches_pattern method
- Update documentation to correctly describe *.rs pattern as recursive
- Fix Rust compiler warning about unused self argument
…dge add

- Add pattern validation in add_context method to fail fast on invalid patterns
- Modify SemanticSearchError::InvalidArgument display to avoid redundant prefixes for pattern errors
- Simplify error message chain to show clean pattern validation errors
- Add test for pattern validation functionality

Fixes verbose error messages like:
'Error: Failed to add: Failed to start indexing: Invalid argument: Invalid include pattern...'

Now shows clean:
'Error: Invalid include pattern...'

Other InvalidArgument errors still get appropriate prefixes.
Replace |e| SemanticSearchError::InvalidArgument(e) with SemanticSearchError::InvalidArgument
Replace escaped backslashes with backticks to show **/*.ts correctly
instead of displaying literal backslashes in help output.

#[test]
fn test_absolute_vs_relative_path_handling() {
// Test the fix for Brandon's issue: absolute paths from walkdir should work with relative patterns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't include my name here haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol

assert!(result.is_ok());
let cli = result.unwrap();

if let KnowledgeSubcommand::Cancel { operation_id } = cli.knowledge {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Without operation_id - should cancel most recent operation

this test isn't checking any functionality, all it's doing is checking that operation_id is set. Same format as any clap parse test, so can we remove some of the noise here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense


#[test]
fn test_help_text_markdown_rendering() {
// Test to reproduce Brandon's issue: "clap is trying to parse these `*` as markdown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove my name :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing

println!("Add subcommand help output:\n{}", help_output);

// Check that our fix is working - patterns should be properly quoted
if help_output.contains("\"**/*.ts\"") && help_output.contains("\"**/*.md\"") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need a test

// Validate patterns early to fail fast
if let Some(ref include_patterns) = request.include_patterns {
crate::pattern_filter::PatternFilter::new(include_patterns, &[])
.map_err(SemanticSearchError::InvalidArgument)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error should have context about what argument is invalid, right?

@@ -40,7 +40,7 @@ You can now control which files are indexed using include and exclude patterns:
`/knowledge add "docs" /path/to/project --include "**/*.md" --include "**/*.txt" --exclude "node_modules/**"`

Pattern examples:
- `*.rs` - All Rust files in current directory
- `*.rs` - All Rust files in all directories recursively
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird docs here - if both *.rs and **/*.rs have the same behavior, then shouldn't we just have one line that says something like:

- `*.rs` - All Rust files in all directories recursively (equivalent to `**/*.rs`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix


#[test]
fn test_rust_file_detection() {
assert_eq!(get_file_type(&PathBuf::from("main.rs")), FileType::Code);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

way too much noise in these tests, can we cut this down? Just take a slice of str's to expected file types. It just needs a single test

assert!(filter.should_include(&PathBuf::from("main.rs"))); // Current dir - should match

// These should NOT match with *.rs (only * not **)
// If they do match, then the documentation is wrong
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertion here?

Also what is the actual behavior of *.rs then? Should it match files recursively in subdirectories or not?

use super::*;

#[test]
fn test_pattern_filter_creation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible to reduce noise in these tests as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants