Skip to content

Commit 82d5abe

Browse files
author
Aydin Abiar
committed
remove superfluous/LLM verbosity
Signed-off-by: Aydin Abiar <[email protected]>
1 parent 33715fb commit 82d5abe

File tree

2 files changed

+5
-56
lines changed

2 files changed

+5
-56
lines changed

doc/source/data/examples/unstructured-data-ingestion/content/README.md

Lines changed: 2 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -843,7 +843,7 @@ quality_distribution.write_parquet(
843843
)
844844
```
845845

846-
## Verification and Summary
846+
## Verification
847847

848848
After writing data to the warehouse, verify everything worked correctly. This section demonstrates:
849849

@@ -919,7 +919,7 @@ for i, record in enumerate(samples):
919919
print(f"\t{i+1}. Doc: {doc_id}, Category: {category}, Words: {words}, Quality: {quality}")
920920
```
921921

922-
## Summary and Next Steps
922+
## Summary
923923

924924
You have built a complete end-to-end document ingestion pipeline using Ray Data. This section reviews what you learned and where to go from here.
925925

@@ -1071,28 +1071,3 @@ This pipeline demonstrated all major Ray Data operations:
10711071
- Monitoring and debugging
10721072
- Scalability considerations
10731073

1074-
### Next Steps
1075-
1076-
**Extend This Pipeline:**
1077-
1. Add LLM-based content analysis (replace pattern matching)
1078-
2. Implement named entity recognition (NER)
1079-
3. Add sentiment analysis for customer documents
1080-
4. Create vector embeddings for semantic search
1081-
5. Integrate with Delta Lake or Apache Iceberg
1082-
1083-
**Learn More Ray Data:**
1084-
- **Batch Inference**: Process documents with ML models
1085-
- **Data Quality**: Advanced validation patterns
1086-
- **Performance Tuning**: Optimize for your workload
1087-
- **Integration**: Connect to Snowflake, Databricks, etc.
1088-
1089-
### Resources
1090-
1091-
- **Ray Data Documentation**: https://docs.ray.io/en/latest/data/data.html
1092-
- **Ray Data Examples**: https://docs.ray.io/en/latest/data/examples/examples.html
1093-
- **Ray Dashboard Guide**: https://docs.ray.io/en/latest/ray-observability/getting-started.html
1094-
- **Anyscale Platform**: https://docs.anyscale.com/
1095-
1096-
---
1097-
1098-
**You're now ready to build production-scale document ingestion pipelines with Ray Data!**

doc/source/data/examples/unstructured-data-ingestion/content/unstructured-data-ingestion.ipynb

Lines changed: 3 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1001,7 +1001,7 @@
10011001
"id": "6de9c8b7",
10021002
"metadata": {},
10031003
"source": [
1004-
"## Verification and Summary\n",
1004+
"## Verification\n",
10051005
"\n",
10061006
"After writing data to the warehouse, verify everything worked correctly. This section demonstrates:\n",
10071007
"\n",
@@ -1093,7 +1093,7 @@
10931093
"id": "81f2d389",
10941094
"metadata": {},
10951095
"source": [
1096-
"## Summary and Next Steps\n",
1096+
"## Summary\n",
10971097
"\n",
10981098
"You have built a complete end-to-end document ingestion pipeline using Ray Data. This section reviews what you learned and where to go from here.\n",
10991099
"\n",
@@ -1243,33 +1243,7 @@
12431243
"- Error handling approaches\n",
12441244
"- Resource optimization\n",
12451245
"- Monitoring and debugging\n",
1246-
"- Scalability considerations\n",
1247-
"\n",
1248-
"### Next Steps\n",
1249-
"\n",
1250-
"**Extend This Pipeline:**\n",
1251-
"1. Add LLM-based content analysis (replace pattern matching)\n",
1252-
"2. Implement named entity recognition (NER)\n",
1253-
"3. Add sentiment analysis for customer documents\n",
1254-
"4. Create vector embeddings for semantic search\n",
1255-
"5. Integrate with Delta Lake or Apache Iceberg\n",
1256-
"\n",
1257-
"**Learn More Ray Data:**\n",
1258-
"- **Batch Inference**: Process documents with ML models\n",
1259-
"- **Data Quality**: Advanced validation patterns\n",
1260-
"- **Performance Tuning**: Optimize for your workload\n",
1261-
"- **Integration**: Connect to Snowflake, Databricks, etc.\n",
1262-
"\n",
1263-
"### Resources\n",
1264-
"\n",
1265-
"- **Ray Data Documentation**: https://docs.ray.io/en/latest/data/data.html\n",
1266-
"- **Ray Data Examples**: https://docs.ray.io/en/latest/data/examples/examples.html\n",
1267-
"- **Ray Dashboard Guide**: https://docs.ray.io/en/latest/ray-observability/getting-started.html\n",
1268-
"- **Anyscale Platform**: https://docs.anyscale.com/\n",
1269-
"\n",
1270-
"---\n",
1271-
"\n",
1272-
"**You're now ready to build production-scale document ingestion pipelines with Ray Data!**"
1246+
"- Scalability considerations\n"
12731247
]
12741248
}
12751249
],

0 commit comments

Comments
 (0)