|
1 | 1 | # coorddetect |
2 | 2 |
|
3 | | -**coorddetect** automatically detects X, Y, Z coordinate columns from messy CSV |
4 | | -and Excel files and provides spatial diagnostics for QA/QC workflows. |
5 | | - |
6 | | -## Features |
7 | | -- Automatic XYZ detection |
8 | | -- Coordinate system inference |
9 | | -- Confidence score |
10 | | -- Robust bounding box |
11 | | -- Convex hull geometry |
12 | | -- Point density metrics |
13 | | -- Batch processing (CSV & Excel) |
14 | | -- CLI support |
| 3 | +`coorddetect` automatically detects **X, Y, Z coordinate columns** from messy tabular |
| 4 | +data (CSV and Excel) using a robust, file-order–preserving heuristic. |
| 5 | +It also provides optional **spatial QA/QC diagnostics** such as robust bounding boxes, |
| 6 | +convex hull geometry, and point density metrics. |
| 7 | + |
| 8 | +The XYZ detection logic does **not** rely on column names and is designed for |
| 9 | +real-world geomatics applications, GIS, and survey datasets. |
| 10 | + |
| 11 | +--- |
15 | 12 |
|
16 | 13 | ## Installation |
| 14 | + |
17 | 15 | ```bash |
18 | 16 | pip install coorddetect |
19 | | -"# coorddetect" |
20 | | -"# coorddetect" |
21 | | -"# coorddetect" |
22 | | -"# coorddetect" |
| 17 | +``` |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Quick Usage |
| 22 | + |
| 23 | +```python |
| 24 | +import pandas as pd |
| 25 | +from coorddetect import detect_xyz |
| 26 | + |
| 27 | +df = pd.read_csv("points.csv") |
| 28 | + |
| 29 | +xyz, meta = detect_xyz(df) |
| 30 | + |
| 31 | +print(xyz.head()) |
| 32 | +print("Selected XYZ columns:", meta["selected_columns"]) |
| 33 | +``` |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Preserve an ID Column |
| 38 | + |
| 39 | +```python |
| 40 | +xyz, meta = detect_xyz( |
| 41 | + df, |
| 42 | + id_col="WallStationID" |
| 43 | +) |
| 44 | +``` |
| 45 | + |
| 46 | +**Output columns** |
| 47 | +``` |
| 48 | +ID, X, Y, Z |
| 49 | +``` |
| 50 | + |
| 51 | +--- |
| 52 | + |
| 53 | +## Spatial Diagnostics (All Features) |
| 54 | + |
| 55 | +By default, `detect_xyz()` can return: |
| 56 | + |
| 57 | +- **Robust bounding box** |
| 58 | +- **Convex hull geometry** |
| 59 | +- **Point density metrics** |
| 60 | + |
| 61 | +```python |
| 62 | +xyz, meta = detect_xyz( |
| 63 | + df, |
| 64 | + return_bounds=True, |
| 65 | + robust_bounds=True, |
| 66 | + bounds_quantiles=(0.01, 0.99), |
| 67 | + return_hull=True, |
| 68 | + hull_dim=2, # 2 = XY hull, 3 = XYZ hull |
| 69 | + return_density=True |
| 70 | +) |
| 71 | + |
| 72 | +print("Bounds:", meta["bounds"]) |
| 73 | +print("Hull dimension:", meta["convex_hull_dim"]) |
| 74 | +print("Density:", meta["density"]) |
| 75 | +``` |
| 76 | + |
| 77 | +--- |
| 78 | + |
| 79 | +## Robust vs Raw Bounding Box |
| 80 | + |
| 81 | +```python |
| 82 | +# Outlier-resistant (recommended) |
| 83 | +xyz, meta = detect_xyz( |
| 84 | + df, |
| 85 | + robust_bounds=True, |
| 86 | + bounds_quantiles=(0.01, 0.99) |
| 87 | +) |
| 88 | + |
| 89 | +# Raw min / max bounds |
| 90 | +xyz, meta = detect_xyz( |
| 91 | + df, |
| 92 | + robust_bounds=False |
| 93 | +) |
| 94 | +``` |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## Disable Diagnostics (Pure XYZ Detection) |
| 99 | + |
| 100 | +```python |
| 101 | +xyz, meta = detect_xyz( |
| 102 | + df, |
| 103 | + return_bounds=False, |
| 104 | + return_hull=False, |
| 105 | + return_density=False |
| 106 | +) |
| 107 | +``` |
| 108 | + |
| 109 | +--- |
| 110 | + |
| 111 | +## Batch Processing (CSV & Excel) |
| 112 | + |
| 113 | +Process all `.csv`, `.xlsx`, and `.xls` files in a folder. |
| 114 | + |
| 115 | +```python |
| 116 | +from coorddetect import detect_xyz_batch |
| 117 | + |
| 118 | +results = detect_xyz_batch( |
| 119 | + input_folder="data/raw", |
| 120 | + output_folder="data/out", |
| 121 | + recursive=True, |
| 122 | + export_format="csv", # or "xlsx" |
| 123 | + excel_sheet=None, # sheet name or index for Excel |
| 124 | + id_col="WallStationID", |
| 125 | + return_bounds=True, |
| 126 | + robust_bounds=True, |
| 127 | + return_hull=True, |
| 128 | + hull_dim=2, |
| 129 | + return_density=True |
| 130 | +) |
| 131 | + |
| 132 | +print(results) |
| 133 | +``` |
| 134 | + |
| 135 | +Each file is exported as: |
| 136 | +``` |
| 137 | +<original_name>_xyz.csv |
| 138 | +``` |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## Command Line Interface (CLI) |
| 143 | + |
| 144 | +After installation, the CLI is available: |
| 145 | + |
| 146 | +```bash |
| 147 | +coorddetect INPUT_FOLDER OUTPUT_FOLDER |
| 148 | +``` |
| 149 | + |
| 150 | +### Common Examples |
| 151 | + |
| 152 | +```bash |
| 153 | +# Process a folder recursively |
| 154 | +coorddetect data/raw data/out --recursive |
| 155 | + |
| 156 | +# Export Excel instead of CSV |
| 157 | +coorddetect data/raw data/out --export-format xlsx |
| 158 | + |
| 159 | +# Use raw min/max bounds |
| 160 | +coorddetect data/raw data/out --raw-bounds |
| 161 | + |
| 162 | +# Use 3D convex hull |
| 163 | +coorddetect data/raw data/out --hull-dim 3 |
| 164 | + |
| 165 | +# Disable hull and density metrics |
| 166 | +coorddetect data/raw data/out --no-hull --no-density |
| 167 | +``` |
| 168 | + |
| 169 | +### View all options |
| 170 | + |
| 171 | +```bash |
| 172 | +coorddetect --help |
| 173 | +``` |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## Notes |
| 178 | + |
| 179 | +- Convex hull and point density metrics require `scipy` |
| 180 | +- Convex hull requires at least 3 points (2D) or 4 non-coplanar points (3D) |
| 181 | +- All diagnostics are computed **after** XYZ detection |
| 182 | +- XYZ detection logic does **not** rely on column names |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## Applications |
| 187 | + |
| 188 | +- GIS applications |
| 189 | +- Survey data validation |
| 190 | +- Geomatics research pipelines |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## License |
| 195 | + |
| 196 | +MIT |
0 commit comments