You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Automatically detect and extract tables from Excel, CSV, and text files.
4
9
5
10
## What is GridGulp?
6
11
7
-
GridGulp finds tables in your spreadsheets - even when there are multiple tables on one sheet or when tables don't start at cell A1. It comes with reasonable defaults and is fully configurable.
12
+
GridGulp finds tables in your spreadsheets, even when
GridGulp provides two ways to work with detected tables:
22
49
23
-
1.**Table Ranges** - Lightweight metadata about where tables are located (e.g., "A1:E100")
50
+
1.**Table Ranges** - JSON metadata about where tables are located (e.g., "A1:E100")
24
51
- Fast and memory-efficient
25
-
- Perfect for mapping table locations or visualizing spreadsheet structure
52
+
- Perfect for agent use as tools - mapping table locations or visualizing spreadsheet structure
26
53
- No actual data is loaded into memory
27
54
28
55
2.**DataFrames** - The actual data extracted from those ranges as pandas DataFrames
29
56
- Contains the full data with proper types
30
57
- Ready for analysis, transformation, or export
31
-
- Requires more memory but provides full data access
32
58
33
59
Choose based on your needs:
34
-
- Use **ranges only** when you need to know where tables are or want to process them later
60
+
- Use **ranges only** when you need to know where tables are and want to submit to other tasks - for example, a downstream process to infer purpose / intent based on data content
35
61
- Use **DataFrames** when you need to analyze or transform the actual data
36
62
37
63
### Getting Table Ranges Only
@@ -185,14 +211,14 @@ if all_dataframes:
185
211
-**Smart Headers** - Detects single and multi-row headers automatically
186
212
-**Multiple Tables** - Handles sheets with multiple separate tables
187
213
-**Quality Scoring** - Confidence scores for each detected table
188
-
-**Fast** - Processes most files in under a second
214
+
-**Fast** - Processes 1M+ cells/second for simple tables, 100K+ cells/second for complex tables
189
215
190
216
## Documentation
191
217
192
218
-[Full Usage Guide](docs/USAGE_GUIDE.md) - Detailed examples and configuration
193
219
-[API Reference](docs/API_REFERENCE.md) - Complete API documentation
194
220
-[Architecture](docs/ARCHITECTURE.md) - How GridGulp works internally
195
-
-[Testing Guide](docs/TESTING_WITH_SCRIPT.md) - Test spreadsheets in bulk with the unified test script
221
+
-[Testing Guide](docs/TESTING_GUIDE.md) - Test spreadsheets in bulk with the unified test script
0 commit comments