Skip to content

Commit 800a555

Browse files
authored
add mo-diag tool in the python sdk (#22629)
add mo-diag tool in the python sdk Approved by: @LeftHandCold
1 parent 3a1ee5d commit 800a555

18 files changed

+4463
-42
lines changed

clients/python/MANIFEST.in

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ include LICENSE
77
# Include requirements
88
include requirements.txt
99

10+
# Include mo_diag script
11+
include mo_diag.py
12+
1013
# Include examples (recommended for user experience)
1114
recursive-include examples *.py
1215

clients/python/README.md

Lines changed: 344 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,13 @@ A comprehensive Python SDK for MatrixOne that provides SQLAlchemy-like interface
4545
- 📚 **SQLAlchemy Integration**: Seamless SQLAlchemy integration with enhanced ORM features
4646
- 🔗 **Enhanced Query Building**: Advanced query building with logical operations (logical_and, logical_or, logical_not)
4747
- 🪝 **Connection Hooks**: Pre/post connection hooks for custom initialization logic
48+
- 🛠️ **mo-diag CLI Tool**: Interactive diagnostic tool for MatrixOne database maintenance
49+
- Index health monitoring and verification
50+
- IVF/HNSW vector index status inspection
51+
- Table statistics and metadata analysis
52+
- Interactive shell with Tab completion and command history
53+
- Non-interactive mode for scripting and automation
54+
- Batch operations on tables and indexes
4855

4956
## 🚀 Installation
5057

@@ -379,12 +386,336 @@ else:
379386
print(f"Running release version: {client.get_backend_version()}")
380387
```
381388

382-
## Advanced Features
389+
## 🛠️ mo-diag - Interactive Diagnostic Tool
383390

384-
### PITR (Point-in-Time Recovery)
391+
The `mo-diag` command-line tool provides an interactive shell for diagnosing and maintaining MatrixOne databases, with a special focus on vector indexes, secondary indexes, and table statistics.
392+
393+
### Installation
394+
395+
After installing the SDK, `mo-diag` is automatically available as a command:
396+
397+
```bash
398+
pip install matrixone-python-sdk
399+
mo-diag --help
400+
```
401+
402+
### Quick Start
403+
404+
#### Interactive Mode
405+
406+
Launch the interactive shell to execute multiple diagnostic commands:
407+
408+
```bash
409+
# Connect to default localhost
410+
mo-diag --database test
411+
412+
# Connect to remote database
413+
mo-diag --host 192.168.1.100 --port 6001 --user admin --password secret --database production
414+
```
415+
416+
**Interactive Features**:
417+
- 🔍 **Tab Completion**: Press `Tab` to auto-complete commands, table names, and database names
418+
- ⬆️⬇️ **Command History**: Use arrow keys to browse command history
419+
- 🔎 **History Search**: Press `Ctrl+R` to search command history
420+
- 🎨 **Colored Output**: Clear visual feedback with syntax highlighting
421+
- 💾 **Persistent History**: Command history saved to `~/.mo_diag_history`
422+
423+
#### Non-Interactive Mode
424+
425+
Execute single commands directly for scripting and automation:
426+
427+
```bash
428+
# Check IVF index status
429+
mo-diag -d test -c "show_ivf_status"
430+
431+
# Get detailed table statistics
432+
mo-diag -d test -c "show_table_stats my_table -a -d"
433+
434+
# Execute SQL query
435+
mo-diag -d test -c "sql SELECT COUNT(*) FROM my_table"
436+
437+
# Flush table and all indexes
438+
mo-diag -d test -c "flush_table my_table"
439+
```
440+
441+
### Available Commands
442+
443+
#### Index Management
444+
445+
**`show_indexes <table> [database]`**
446+
- Display all indexes for a table including IVF, HNSW, Fulltext, and regular indexes
447+
- Shows physical table names, index types, and statistics
448+
- Includes object counts, row counts, and sizes for vector/fulltext indexes
449+
450+
```
451+
MO-DIAG[test]> show_indexes ivf_health_demo_docs
452+
453+
📊 Secondary Indexes for 'test.ivf_health_demo_docs'
454+
455+
*************************** 1. row ***************************
456+
Index Name: idx_embedding_ivf_v2
457+
Algorithm: ivfflat
458+
Table Type: metadata
459+
Physical Table: __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581
460+
Columns: embedding
461+
Statistics:
462+
- Objects: 1
463+
- Rows: 7
464+
- Compressed Size: 940 B
465+
- Original Size: 1.98 KB
466+
467+
*************************** 2. row ***************************
468+
Index Name: idx_embedding_ivf_v2
469+
Algorithm: ivfflat
470+
Table Type: centroids
471+
Physical Table: __mo_index_secondary_0199e725-0a7b-706e-8f0a-a50edc3621a1
472+
Columns: embedding
473+
Statistics:
474+
- Objects: 1
475+
- Rows: 17
476+
- Compressed Size: 3.09 KB
477+
- Original Size: 6.83 KB
478+
```
479+
480+
**`show_all_indexes [database]`**
481+
- Health report for all tables with secondary indexes in the database
482+
- Row count consistency checks
483+
- IVF/HNSW/Fulltext index status
484+
485+
```
486+
MO-DIAG[test]> show_all_indexes
487+
488+
📊 Index Health Report for Database 'test':
489+
═══════════════════════════════════════════════════════════════════════════
490+
491+
🟢 HEALTHY TABLES (1):
492+
─────────────────────────────────────────────────────────────────────────
493+
Table Name | Indexes | Row Count | Notes
494+
──────────────────────────────────────────────────────────────────────────
495+
ivf_health_demo_docs | 1 | ✓ 1000 rows | IVF: 17 centroids, 1000 vectors
496+
```
497+
498+
**`verify_counts <table> [database]`**
499+
- Verify row count consistency between main table and all its secondary indexes
500+
- Highlights any mismatches
501+
502+
```
503+
MO-DIAG[test]> verify_counts my_table
504+
505+
📊 Row Count Verification for 'test.my_table'
506+
════════════════════════════════════════════════════════════════════════════
507+
Main table: 10,000 rows
508+
────────────────────────────────────────────────────────────────────────────
509+
✓ __mo_index_secondary_xxx: 10,000 rows
510+
✓ __mo_index_unique_yyy: 10,000 rows
511+
════════════════════════════════════════════════════════════════════════════
512+
✅ PASSED: All index tables match (10,000 rows)
513+
```
514+
515+
#### Vector Index Monitoring
516+
517+
**`show_ivf_status [database] [-v] [-t table]`**
518+
- Display IVF index building status and centroid distribution
519+
- `-v`: Verbose mode with detailed centroid information
520+
- `-t <table>`: Filter by specific table
521+
522+
```
523+
MO-DIAG[test]> show_ivf_status
524+
525+
📊 IVF Index Status in 'test':
526+
════════════════════════════════════════════════════════════════════════════
527+
Table | Index | Column | Centroids | Vectors | Balance | Status
528+
────────────────────────────────────────────────────────────────────────────
529+
ivf_health_demo_docs | idx_embedding_ivf_v2 | embedding | 17 | 1,000 | 2.35 | ✓ active
530+
```
531+
532+
#### Table Statistics
533+
534+
**`show_table_stats <table> [database] [-t] [-a] [-d]`**
535+
- Display table metadata and statistics
536+
- `-t`: Include tombstone statistics
537+
- `-a`: Include all indexes (hierarchical view with -d)
538+
- `-d`: Show detailed object lists
539+
540+
```
541+
MO-DIAG[test]> show_table_stats ivf_health_demo_docs -a -d
542+
543+
Table: ivf_health_demo_docs
544+
Objects: 1 | Rows: 1,000 | Null: 0 | Original: 176.03 KB | Compressed: 156.24 KB
545+
546+
Objects:
547+
Object Name | Rows | Null Cnt | Original Size | Compressed Size
548+
─────────────────────────────────────────────────────────────────────────────────────────────────────
549+
0199e729-642e-71e0-b338-67c4980ee294_00000 | 1000 | 0 | 176.03 KB | 156.24 KB
550+
551+
Index: idx_embedding_ivf_v2
552+
└─ Physical Table (metadata): __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581
553+
Objects: 1 | Rows: 7 | Null: 0 | Original: 1.98 KB | Compressed: 940 B
554+
555+
Objects:
556+
Object Name | Rows | Null Cnt | Original Size | Compressed Size
557+
─────────────────────────────────────────────────────────────────────────────────────────────────────
558+
0199e729-642a-7d36-ac37-0ae17325f7ec_00000 | 7 | 0 | 1.98 KB | 940 B
559+
```
560+
561+
#### Database Operations
562+
563+
**`flush_table <table> [database]`**
564+
- Flush main table and all its secondary index physical tables
565+
- Includes IVF metadata/centroids/entries, HNSW, Fulltext, and regular indexes
566+
- Requires sys user privileges
567+
568+
```
569+
MO-DIAG[test]> flush_table ivf_health_demo_docs
570+
571+
🔄 Flushing table: test.ivf_health_demo_docs
572+
✓ Main table flushed: ivf_health_demo_docs
573+
📋 Found 3 index physical tables
574+
575+
Index: idx_embedding_ivf_v2
576+
✓ metadata: __mo_index_secondary_xxx
577+
✓ centroids: __mo_index_secondary_yyy
578+
✓ entries: __mo_index_secondary_zzz
579+
580+
📊 Summary:
581+
Main table: ✓ flushed
582+
Index tables: 3/3 flushed successfully
583+
```
584+
585+
**`tables [database]`**
586+
- List all tables in current or specified database
587+
588+
**`databases`**
589+
- List all databases (highlights current database)
590+
591+
**`use <database>`**
592+
- Switch to a different database
593+
594+
**`sql <SQL statement>`**
595+
- Execute arbitrary SQL query
596+
597+
```
598+
MO-DIAG[test]> sql SELECT COUNT(*) FROM my_table
599+
600+
col0
601+
----
602+
10000
603+
604+
1 row(s) returned
605+
```
606+
607+
#### Utility Commands
608+
609+
**`history [n | -c]`**
610+
- Show last n commands (default: 20)
611+
- `-c`: Clear command history
612+
613+
**`help [command]`**
614+
- Show help for all commands or specific command
615+
616+
**`exit` / `quit`**
617+
- Exit the interactive shell
618+
619+
### Command-Line Options
620+
621+
```
622+
usage: mo-diag [-h] [--host HOST] [--port PORT] [--user USER]
623+
[--password PASSWORD] [--database DATABASE]
624+
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
625+
[--command COMMAND]
626+
627+
options:
628+
--host HOST Database host (default: localhost)
629+
--port PORT Database port (default: 6001)
630+
--user USER Database user (default: root)
631+
--password PASSWORD Database password (default: 111)
632+
--database DATABASE Database name (optional)
633+
-d DATABASE Short form of --database
634+
--log-level LEVEL Logging level (default: ERROR)
635+
--command COMMAND Execute single command and exit
636+
-c COMMAND Short form of --command
637+
```
638+
639+
### Use Cases
640+
641+
#### 1. Monitor IVF Index Health in Production
642+
643+
```bash
644+
# Quick check on index status
645+
mo-diag -d production -c "show_ivf_status -v"
646+
647+
# Detailed index inspection with physical table stats
648+
mo-diag -d production -c "show_indexes my_vector_table"
649+
650+
# Verify centroid distribution balance
651+
mo-diag -d production -c "show_ivf_status -t my_vector_table -v"
652+
```
653+
654+
#### 2. Debug Index Count Mismatches
655+
656+
```bash
657+
# Check row consistency
658+
mo-diag -d test -c "verify_counts my_table"
659+
660+
# Get health report for all indexes
661+
mo-diag -d test -c "show_all_indexes"
662+
663+
# Flush to sync if needed
664+
mo-diag -d test -c "flush_table my_table"
665+
```
666+
667+
#### 3. Analyze Table Storage
668+
669+
```bash
670+
# Get table statistics with all indexes
671+
mo-diag -d test -c "show_table_stats my_table -a"
672+
673+
# Detailed object-level analysis
674+
mo-diag -d test -c "show_table_stats my_table -a -d"
675+
676+
# Include tombstone statistics
677+
mo-diag -d test -c "show_table_stats my_table -a -t -d"
678+
```
679+
680+
#### 4. Automated Maintenance Scripts
681+
682+
```bash
683+
#!/bin/bash
684+
# daily_index_check.sh
685+
686+
DATABASES=("prod_db1" "prod_db2" "prod_db3")
687+
688+
for db in "${DATABASES[@]}"; do
689+
echo "Checking $db..."
690+
mo-diag -d "$db" -c "show_all_indexes" >> /var/log/mo_diag_daily.log
691+
mo-diag -d "$db" -c "show_ivf_status" >> /var/log/mo_diag_daily.log
692+
done
693+
```
694+
695+
### Tips and Best Practices
696+
697+
1. **Regular Health Checks**: Run `show_all_indexes` daily to catch index issues early
698+
2. **Monitor IVF Balance**: Use `show_ivf_status -v` to ensure even centroid distribution
699+
3. **Before Major Operations**: Always `verify_counts` before bulk updates or migrations
700+
4. **Production Debugging**: Use non-interactive mode for logging and monitoring
701+
5. **Tab Completion**: Leverage Tab completion to avoid typos in table/database names
702+
6. **Command History**: Use `Ctrl+R` to quickly find and re-execute previous diagnostic commands
703+
7. **Flush Regularly**: If you notice index count mismatches, `flush_table` can help sync
704+
705+
### Troubleshooting
706+
707+
**Issue**: Tab completion not working
708+
- **Solution**: Ensure `prompt_toolkit>=3.0.0` is installed: `pip install prompt_toolkit`
709+
710+
**Issue**: "Unknown database" error
711+
- **Solution**: Create the database first: `mo-diag -d test -c "sql CREATE DATABASE IF NOT EXISTS test"`
712+
713+
**Issue**: Permission denied for flush operations
714+
- **Solution**: Connect as sys user or a user with sufficient privileges
715+
716+
**Issue**: IVF index shows 0 centroids
717+
- **Solution**: The index might be empty or still building. Check with `show_table_stats <table> -a`
385718

386-
```python
387-
# Create PITR for cluster
388719
pitr = client.pitr.create_cluster_pitr(
389720
name='cluster_pitr',
390721
range_value=7,
@@ -438,6 +769,15 @@ subscription = client.pubsub.create_subscription(
438769
)
439770
```
440771

772+
773+
## Advanced Features
774+
775+
### PITR (Point-in-Time Recovery)
776+
777+
```python
778+
# Create PITR for cluster
779+
780+
441781
## Configuration
442782

443783
### Connection Parameters

0 commit comments

Comments
 (0)