Skip to content

Commit ffe9351

Browse files
authored
Fix python sdk and update docs (#22614)
Fix python sdk and update docs Approved by: @dengn
1 parent dd4bc63 commit ffe9351

File tree

11 files changed

+746
-238
lines changed

11 files changed

+746
-238
lines changed

README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ Contents
5151

5252
* [What is MatrixOne](#what-is-matrixone)
5353
* [Get Started in 60 Seconds](#️-get-started-in-60-seconds)
54+
* [Tutorials & Demos](#-tutorials--demos)
5455
* [Installation & Deployment](#️-installation--deployment)
5556
* [Architecture](#architecture)
5657
* [Python SDK](#python-sdk)
@@ -233,6 +234,44 @@ for row in results.rows:
233234
234235
📖 **[Python SDK Documentation →](clients/python/README.md)**
235236

237+
## 📚 Tutorials & Demos
238+
239+
Ready to dive deeper? Explore our comprehensive collection of hands-on tutorials and real-world demos:
240+
241+
### 🎯 Getting Started Tutorials
242+
243+
| Tutorial | Language/Framework | Description |
244+
|----------|-------------------|-------------|
245+
| [Java CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/develop-java-crud-demo/) | Java | Java application development |
246+
| [SpringBoot and JPA CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/springboot-hibernate-crud-demo/) | Java | SpringBoot with Hibernate/JPA |
247+
| [PyMySQL CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/develop-python-crud-demo/) | Python | Basic database operations with Python |
248+
| [SQLAlchemy CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/sqlalchemy-python-crud-demo/) | Python | Python with SQLAlchemy ORM |
249+
| [Django CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/django-python-crud-demo/) | Python | Django web framework |
250+
| [Golang CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/develop-golang-crud-demo/) | Go | Go application development |
251+
| [Gorm CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/gorm-golang-crud-demo/) | Go | Go with Gorm ORM |
252+
| [C# CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/c-net-crud-demo/) | C# | .NET application development |
253+
| [TypeScript CRUD Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/typescript-crud-demo/) | TypeScript | TypeScript application development |
254+
255+
### 🚀 Advanced Features Tutorials
256+
257+
| Tutorial | Use Case | Related MatrixOne Features |
258+
|----------|----------|---------------------------|
259+
| [Pinecone-Compatible Vector Search](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/pinecone-vector-demo/) | AI & Search | vector search, Pinecone-compatible API |
260+
| [IVF Index Health Monitoring](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/ivf-index-health-demo/) | AI & Search | vector search, IVF index |
261+
| [HNSW Vector Index](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/hnsw-vector-demo/) | AI & Search | vector search, HNSW index |
262+
| [Fulltext Natural Search](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/fulltext-natural-search-demo/) | AI & Search | fulltext search, natural language |
263+
| [Fulltext Boolean Search](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/fulltext-boolean-search-demo/) | AI & Search | fulltext search, boolean operators |
264+
| [Fulltext JSON Search](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/fulltext-json-search-demo/) | AI & Search | fulltext search, JSON data |
265+
| [Hybrid Search](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/hybrid-search-demo/) | AI & Search | hybrid search, vector + fulltext + SQL |
266+
| [RAG Application Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/rag-demo/) | AI & Search | RAG, vector search, fulltext search |
267+
| [Picture(Text)-to-Picture Search](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/search-picture-demo/) | AI & Search | multimodal search, image similarity |
268+
| [Dify Integration Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/dify-mo-demo/) | AI & Search | AI platform integration |
269+
| [HTAP Application Demo](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/htap-demo/) | Performance | HTAP, real-time analytics |
270+
| [Instant Clone for Multi-Team Development](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/efficient-clone-demo/) | Performance | instant clone, Git for Data |
271+
| [Safe Production Upgrade with Instant Rollback](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/snapshot-rollback-demo/) | Performance | snapshot, rollback, Git for Data |
272+
273+
📖 **[View All Tutorials →](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/snapshot-rollback-demo/)**
274+
236275
## 🛠️ <a id="installation--deployment">Installation & Deployment</a>
237276

238277
MatrixOne supports multiple installation methods. Choose the one that best fits your needs:

README_CN.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252

5353
* [MatrixOne 是什么?](#what-is-matrixone)
5454
* [60秒快速上手](#️-60秒快速上手)
55+
* [教程与示例](#-教程与示例)
5556
* [安装与部署](#️-安装与部署)
5657
* [架构](#architecture)
5758
* [Python SDK](#python-sdk)
@@ -227,6 +228,44 @@ for row in results.rows:
227228
228229
📖 **[Python SDK 文档 →](clients/python/README.md)**
229230

231+
## 📚 教程与示例
232+
233+
深入了解 MatrixOne!浏览我们全面的实践教程和真实案例:
234+
235+
### 🎯 入门教程
236+
237+
| 教程 | 语言/框架 | 说明 |
238+
|----------|-------------------|-------------|
239+
| [Java CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/develop-java-crud-demo/) | Java | Java 应用开发 |
240+
| [SpringBoot 和 JPA CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/springboot-hibernate-crud-demo/) | Java | SpringBoot + Hibernate/JPA |
241+
| [PyMySQL CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/develop-python-crud-demo/) | Python | Python 基础数据库操作 |
242+
| [SQLAlchemy CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/sqlalchemy-python-crud-demo/) | Python | Python + SQLAlchemy ORM |
243+
| [Django CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/django-python-crud-demo/) | Python | Django Web 框架 |
244+
| [Golang CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/develop-golang-crud-demo/) | Go | Go 应用开发 |
245+
| [Gorm CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/gorm-golang-crud-demo/) | Go | Go + Gorm ORM |
246+
| [C# CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/c-net-crud-demo/) | C# | .NET 应用开发 |
247+
| [TypeScript CRUD 示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/typescript-crud-demo/) | TypeScript | TypeScript 应用开发 |
248+
249+
### 🚀 高级功能教程
250+
251+
| 教程 | 使用场景 | 相关 MatrixOne 特性 |
252+
|----------|----------|---------------------------|
253+
| [Pinecone 兼容向量检索](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/pinecone-vector-demo/) | AI 与搜索 | 向量检索,Pinecone 兼容 API |
254+
| [IVF 索引健康监控](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/ivf-index-health-demo/) | AI 与搜索 | 向量检索,IVF 索引 |
255+
| [HNSW 向量索引](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/hnsw-vector-demo/) | AI 与搜索 | 向量检索,HNSW 索引 |
256+
| [全文自然语言搜索](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/fulltext-natural-search-demo/) | AI 与搜索 | 全文检索,自然语言 |
257+
| [全文布尔搜索](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/fulltext-boolean-search-demo/) | AI 与搜索 | 全文检索,布尔运算符 |
258+
| [全文 JSON 搜索](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/fulltext-json-search-demo/) | AI 与搜索 | 全文检索,JSON 数据 |
259+
| [混合搜索](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/hybrid-search-demo/) | AI 与搜索 | 混合搜索,向量+全文+SQL |
260+
| [RAG 应用示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/rag-demo/) | AI 与搜索 | RAG,向量检索,全文检索 |
261+
| [图文搜索应用](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/search-picture-demo/) | AI 与搜索 | 多模态搜索,图像相似度 |
262+
| [Dify 集成示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/dify-mo-demo/) | AI 与搜索 | AI 平台集成 |
263+
| [HTAP 应用示例](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/htap-demo/) | 性能 | HTAP,实时分析 |
264+
| [多团队开发即时克隆](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/efficient-clone-demo/) | 性能 | 即时克隆,Git for Data |
265+
| [生产环境安全升级与即时回滚](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/snapshot-rollback-demo/) | 性能 | 快照,回滚,Git for Data |
266+
267+
📖 **[查看所有教程 →](https://docs.matrixorigin.cn/en/v25.3.0.2/MatrixOne/Tutorial/snapshot-rollback-demo/)**
268+
230269
## 🛠️ <a id="installation--deployment">安装与部署</a>
231270

232271
MatrixOne 支持多种安装方式,选择最适合您需求的方式:

clients/python/docs/best_practices.rst

Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -893,6 +893,289 @@ Monitoring and Logging
893893
slow_query_threshold=1.0 # Log queries > 1 second
894894
)
895895
896+
Index Maintenance Best Practices
897+
----------------------------------
898+
899+
⭐ **Critical for Production**: Regular index maintenance ensures optimal performance, especially for vector indexes.
900+
901+
IVF Index Creation Timing
902+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
903+
904+
.. important::
905+
**Critical Issue: Index Creation Timing**
906+
907+
IVF indexes should be created **AFTER** inserting initial data for optimal clustering:
908+
909+
.. code-block:: python
910+
911+
# ✅ CORRECT ORDER:
912+
client.create_table(Document)
913+
client.batch_insert(Document, initial_data) # Insert first
914+
client.vector_ops.create_ivf("documents", "idx", "embedding", lists=50) # Index last
915+
916+
# Then continue normal operations
917+
client.insert(Document, new_doc) # ✅ IVF supports dynamic updates
918+
919+
.. code-block:: python
920+
921+
# ❌ AVOID: Creating index on empty table
922+
client.create_table(Document)
923+
client.vector_ops.create_ivf("documents", "idx", "embedding", lists=50)
924+
client.batch_insert(Document, data) # Poor initial clustering
925+
926+
**Why?** Initial data helps IVF algorithm create better balanced clusters.
927+
928+
**Key Difference from HNSW**:
929+
930+
* **IVF**: Insert data → Create index → Continue updates ✅ (dynamic)
931+
* **HNSW**: Insert ALL data → Create index → Read-only 🚧 (static, updates coming soon)
932+
933+
IVF Index Health Monitoring
934+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
935+
936+
.. code-block:: python
937+
938+
import math
939+
from datetime import datetime
940+
941+
def monitor_ivf_health(client, table_name, column_name, expected_lists):
942+
"""
943+
Monitor IVF index health - CRITICAL for production vector search.
944+
945+
Args:
946+
client: MatrixOne client
947+
table_name: Table with IVF index
948+
column_name: Vector column name
949+
expected_lists: Expected number of centroids
950+
"""
951+
# ✅ GOOD: Get comprehensive IVF statistics
952+
stats = client.vector_ops.get_ivf_stats(table_name, column_name)
953+
954+
distribution = stats['distribution']
955+
centroid_counts = distribution['centroid_count']
956+
957+
# Calculate health metrics
958+
total_centroids = len(centroid_counts)
959+
total_vectors = sum(centroid_counts)
960+
min_count = min(centroid_counts) if centroid_counts else 0
961+
max_count = max(centroid_counts) if centroid_counts else 0
962+
avg_count = total_vectors / total_centroids if total_centroids > 0 else 0
963+
964+
# ⭐ KEY METRIC: Balance ratio
965+
balance_ratio = max_count / min_count if min_count > 0 else float('inf')
966+
967+
# Health assessment
968+
print(f"\n{'='*60}")
969+
print(f"IVF Health Report - {table_name}.{column_name}")
970+
print(f"Timestamp: {datetime.now().isoformat()}")
971+
print(f"{'='*60}")
972+
print(f"Total Centroids: {total_centroids} (expected: {expected_lists})")
973+
print(f"Total Vectors: {total_vectors}")
974+
print(f"Avg/Centroid: {avg_count:.2f}")
975+
print(f"Balance Ratio: {balance_ratio:.2f}")
976+
977+
# Status assessment (threshold: <2.0 good, >2.5 rebuild)
978+
if balance_ratio < 2.0:
979+
status = "✅ HEALTHY"
980+
action = "Continue monitoring"
981+
elif balance_ratio < 2.5:
982+
status = "⚠️ FAIR"
983+
action = "Plan rebuild"
984+
else:
985+
status = "❌ CRITICAL"
986+
action = "Rebuild immediately"
987+
988+
print(f"Status: {status}")
989+
print(f"Action: {action}")
990+
print(f"{'='*60}\n")
991+
992+
return {
993+
'balance_ratio': balance_ratio,
994+
'total_vectors': total_vectors,
995+
'status': status,
996+
'action': action
997+
}
998+
999+
# ✅ GOOD: Regular health checks (schedule daily/weekly)
1000+
health = monitor_ivf_health(
1001+
client,
1002+
"documents",
1003+
"embedding",
1004+
expected_lists=100
1005+
)
1006+
1007+
# ✅ GOOD: Automated alerting
1008+
if health['balance_ratio'] > 2.5:
1009+
# Send alert (email, Slack, PagerDuty, etc.)
1010+
print(f"🚨 ALERT: Index needs attention! Balance ratio: {health['balance_ratio']:.2f}")
1011+
1012+
IVF Index Rebuild Strategy
1013+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1014+
1015+
.. code-block:: python
1016+
1017+
def rebuild_ivf_index(client, table_name, column_name, index_name):
1018+
"""
1019+
Rebuild IVF index with optimal parameters.
1020+
1021+
When to rebuild:
1022+
- Balance ratio > 2.5
1023+
- After bulk inserts (>20% new data)
1024+
- Query performance degradation
1025+
- After major deletes or updates
1026+
"""
1027+
print(f"Rebuilding IVF index: {table_name}.{column_name}")
1028+
1029+
# ✅ GOOD: Get current stats before rebuild
1030+
old_stats = client.vector_ops.get_ivf_stats(table_name, column_name)
1031+
old_counts = old_stats['distribution']['centroid_count']
1032+
total_vectors = sum(old_counts)
1033+
old_balance = max(old_counts) / min(old_counts) if min(old_counts) > 0 else float('inf')
1034+
1035+
print(f" Old stats: {total_vectors} vectors, balance {old_balance:.2f}")
1036+
1037+
# ✅ GOOD: Calculate optimal lists parameter
1038+
# Rule: lists = √N to 4×√N (where N = total vectors)
1039+
optimal_lists = int(math.sqrt(total_vectors) * 2) # Using 2×√N
1040+
optimal_lists = max(10, min(optimal_lists, 1000)) # Clamp between 10-1000
1041+
1042+
print(f" Calculated optimal lists: {optimal_lists}")
1043+
1044+
# ✅ GOOD: Drop and recreate index
1045+
try:
1046+
# Drop old index
1047+
client.vector_ops.drop(table_name, index_name)
1048+
print(f" ✓ Dropped old index")
1049+
1050+
# Recreate with optimal parameters
1051+
client.vector_ops.create_ivf(
1052+
table_name,
1053+
name=index_name,
1054+
column=column_name,
1055+
lists=optimal_lists,
1056+
op_type="vector_l2_ops"
1057+
)
1058+
print(f" ✓ Created new index with {optimal_lists} lists")
1059+
1060+
# ✅ GOOD: Verify new index health
1061+
import time
1062+
time.sleep(2) # Give index time to stabilize
1063+
1064+
new_stats = client.vector_ops.get_ivf_stats(table_name, column_name)
1065+
new_counts = new_stats['distribution']['centroid_count']
1066+
new_balance = max(new_counts) / min(new_counts) if min(new_counts) > 0 else float('inf')
1067+
1068+
improvement = ((old_balance - new_balance) / old_balance * 100)
1069+
1070+
print(f"\nRebuild Results:")
1071+
print(f" Old balance: {old_balance:.2f}")
1072+
print(f" New balance: {new_balance:.2f}")
1073+
print(f" Improvement: {improvement:.1f}%")
1074+
1075+
if new_balance < 2.0:
1076+
print(f" ✅ Index is now healthy!")
1077+
else:
1078+
print(f" ⚠️ Consider adjusting lists parameter")
1079+
1080+
except Exception as e:
1081+
print(f" ❌ Rebuild failed: {e}")
1082+
raise
1083+
1084+
# Usage in production
1085+
# ✅ GOOD: Schedule during low-traffic periods
1086+
# ✅ GOOD: Check health first, rebuild only if needed
1087+
health = monitor_ivf_health(client, "documents", "embedding", expected_lists=100)
1088+
if health['balance_ratio'] > 2.5:
1089+
rebuild_ivf_index(client, "documents", "embedding", "idx_embedding_ivf")
1090+
1091+
IVF Index Parameter Selection
1092+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1093+
1094+
.. code-block:: python
1095+
1096+
import math
1097+
1098+
# ✅ GOOD: Calculate optimal lists (guideline: <1K: 10-20, 1K-100K: 50-200, >100K: √N to 4×√N)
1099+
total_vectors = 50000
1100+
optimal_lists = int(math.sqrt(total_vectors) * 2) # Using 2×√N = ~316 lists
1101+
1102+
client.vector_ops.create_ivf(
1103+
"large_table",
1104+
name="idx_vectors",
1105+
column="embedding",
1106+
lists=optimal_lists,
1107+
op_type="vector_l2_ops"
1108+
)
1109+
1110+
Fulltext Index Maintenance
1111+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1112+
1113+
.. code-block:: python
1114+
1115+
from matrixone import FulltextParserType
1116+
1117+
# ✅ GOOD: BM25 for most cases, choose parser by content type
1118+
client.fulltext_index.create("articles", "idx_content", ["title", "content"], algorithm="BM25")
1119+
1120+
# For Chinese: NGRAM parser
1121+
client.fulltext_index.create("chinese_docs", "idx_cn", "content", algorithm="BM25",
1122+
parser=FulltextParserType.NGRAM)
1123+
1124+
# For JSON: JSON parser (indexes values, not keys)
1125+
client.fulltext_index.create("json_docs", "idx_json", "data", algorithm="BM25",
1126+
parser=FulltextParserType.JSON)
1127+
1128+
HNSW Index Considerations
1129+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1130+
1131+
.. code-block:: python
1132+
1133+
from sqlalchemy import BigInteger, Column
1134+
from matrixone.sqlalchemy_ext import create_vector_column
1135+
1136+
# ✅ GOOD: HNSW requires BigInteger primary key
1137+
class Document(Base):
1138+
__tablename__ = 'documents'
1139+
id = Column(BigInteger, primary_key=True) # Must be BigInteger
1140+
embedding = create_vector_column(128, 'f32')
1141+
1142+
# ✅ GOOD: Current workflow
1143+
client.create_table(Document)
1144+
client.batch_insert(Document, all_documents) # Insert data first
1145+
1146+
client.vector_ops.enable_hnsw()
1147+
client.vector_ops.create_hnsw(Document, "idx_embedding", "embedding", m=16)
1148+
1149+
# 🚧 Coming Soon: Dynamic updates after index creation
1150+
# Current workaround: Drop index → Modify data → Recreate index
1151+
1152+
Batch Operation Size Optimization
1153+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1154+
1155+
.. code-block:: python
1156+
1157+
# ✅ GOOD: Optimal batch sizes for different operations
1158+
1159+
# For inserts: 1000-10000 rows per batch
1160+
batch_size = 5000
1161+
for i in range(0, len(large_dataset), batch_size):
1162+
batch = large_dataset[i:i + batch_size]
1163+
client.batch_insert("table_name", batch)
1164+
print(f"Inserted batch {i//batch_size + 1}")
1165+
1166+
# For vector data: smaller batches (vectors are larger)
1167+
vector_batch_size = 1000
1168+
for i in range(0, len(vector_data), vector_batch_size):
1169+
batch = vector_data[i:i + vector_batch_size]
1170+
client.batch_insert("vectors_table", batch)
1171+
1172+
# ❌ AVOID: Too large batches (memory issues)
1173+
# client.batch_insert("table", million_rows) # May cause OOM
1174+
1175+
# ❌ AVOID: Too small batches (performance issues)
1176+
# for row in data:
1177+
# client.insert("table", row) # Very slow!
1178+
8961179
Error Handling Best Practices
8971180
------------------------------
8981181

0 commit comments

Comments
 (0)