Skip to content

Commit 1db501f

Browse files
authored
Feature/smart ohlcv conversion (#24)
feat: add smart OHLCV conversion with automatic detection and analysis (#24) ## Overview Comprehensive enhancement of OHLCV data handling with intelligent automatic conversion, format detection, and market analysis capabilities. ## Core Changes ### New DataConverter Module (`src/pynecore/core/data_converter.py`) - Automatic format detection for CSV, JSON, and TXT files - Smart symbol and provider detection from filename patterns - Supports exchanges: BINANCE, BYBIT, CAPITALCOM, etc. - Handles forex pairs and crypto symbols - Complex pattern matching (e.g., `ccxt_BYBIT_BTC_USDT.csv`) - Automatic SymInfo configuration file generation ### Enhanced OHLCV File Processing (`src/pynecore/core/ohlcv_file.py`) - 1080 lines changed with major enhancements: - **Tick Size Detection**: Multi-method analysis using histogram clustering - **Trading Hours Detection**: Automatic detection from trading data - Distinguishes 24/7 crypto from business hours - Smart data sufficiency checks - **Interval Auto-Correction**: Fixes gaps at data start - **TXT File Support**: Custom parser replacing csv module - **Improved CSV Output**: Better formatting for readability - **Refactored Timestamp Parsing**: 80+ lines of duplication removed ### CLI Command Updates - **data.py** (201 lines changed): - `convert-from`: Auto-detects symbol, provider, timeframe - `convert-to`: Accepts direct file paths - Removed redundant parameter requirements - **run.py**: Automatic conversion of non-OHLCV files ### Test Coverage - **test_001_ohlcv_file.py**: 898 lines added for OHLCV enhancements - **test_004_data_converter.py**: 297 lines of new tests - **test_005_symbol_type_detection.py**: 229 lines for symbol detection ## Minor Fixes - pytest.ini: Fixed test discovery glob pattern - Removed redundant type checks and validation - Cast buffer in struct.unpack to fix IDE warnings ## Documentation - Updated CLI documentation for data and run commands - Added examples for automatic detection features ## Breaking Changes - `pyne data convert-from`: Removed timeframe parameter (auto-detected) - `pyne data convert-to`: Changed to accept direct file paths
1 parent e2bea15 commit 1db501f

File tree

12 files changed

+3396
-232
lines changed

12 files changed

+3396
-232
lines changed

docs/cli/data.md

Lines changed: 47 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -140,25 +140,33 @@ PyneCore uses a binary format (`.ohlcv`) for storing OHLCV data efficiently. How
140140

141141
### Converting to Other Formats
142142

143-
The `convert-to` command converts PyneCore format to CSV or JSON:
143+
The `convert-to` command converts PyneCore OHLCV format to CSV or JSON:
144144

145145
```bash
146-
pyne data convert-to PROVIDER [OPTIONS]
146+
pyne data convert-to OHLCV_FILE [OPTIONS]
147147
```
148148

149+
Where `OHLCV_FILE` is the path to the OHLCV file to convert.
150+
149151
Options:
150-
- `--symbol`, `-s`: Symbol to convert
151-
- `--timeframe`, `-tf`: Timeframe in TradingView format
152-
- `--format`, `-f`: Output format (csv, json)
152+
- `--format`, `-f`: Output format (csv or json, default: csv)
153153
- `--as-datetime`, `-dt`: Save timestamp as datetime instead of UNIX timestamp
154154

155+
The command automatically:
156+
- Adds `.ohlcv` extension if not specified
157+
- Creates output file with the same name but different extension
158+
- Looks in `workdir/data/` if only filename is provided
159+
155160
Example:
156161
```bash
157-
# Convert Bitcoin data to CSV
158-
pyne data convert-to ccxt --symbol "BINANCE:BTC/USDT" --timeframe "1D" --format "csv"
162+
# Convert OHLCV file to CSV
163+
pyne data convert-to BTCUSDT_1D.ohlcv
164+
165+
# Convert to JSON with human-readable dates
166+
pyne data convert-to BTCUSDT_1D.ohlcv --format json --as-datetime
159167

160-
# Convert with human-readable dates
161-
pyne data convert-to ccxt --symbol "BINANCE:BTC/USDT" --timeframe "1D" --format "csv" --as-datetime
168+
# Short form (extension optional)
169+
pyne data convert-to BTCUSDT_1D -f csv -dt
162170
```
163171

164172
### Converting from Other Formats
@@ -172,21 +180,44 @@ pyne data convert-from FILE_PATH [OPTIONS]
172180
Where `FILE_PATH` is the path to the CSV or JSON file to convert.
173181

174182
Options:
175-
- `--provider`, `-p`: Data provider name (can be any name, defaults to "custom")
176-
- `--symbol`, `-s`: Symbol name
177-
- `--timeframe`, `-tf`: Timeframe in TradingView format
178-
- `--fmt`, `-f`: Input format (csv, json) - defaults to the file extension if not specified
183+
- `--provider`, `-p`: Data provider name (defaults to auto-detected from filename)
184+
- `--symbol`, `-s`: Symbol name (defaults to auto-detected from filename)
179185
- `--timezone`, `-tz`: Timezone of the timestamps (defaults to UTC)
180186

187+
**Automatic Detection Features:**
188+
- **Symbol Detection**: The command automatically detects symbols from common filename patterns
189+
- **Provider Detection**: Recognizes provider names in filenames (BINANCE, BYBIT, CAPITALCOM, etc.)
190+
- **Format Support**: Supports CSV and JSON files, auto-detected from file extension
191+
192+
**Filename Pattern Examples:**
193+
- `BTCUSDT.csv` → Symbol: BTC/USDT
194+
- `EUR_USD.csv` → Symbol: EUR/USD
195+
- `ccxt_BYBIT_BTC_USDT.csv` → Symbol: BTC/USDT, Provider: bybit
196+
- `BINANCE_ETHUSDT_1h.csv` → Symbol: ETH/USDT, Provider: binance
197+
- `capitalcom_EURUSD.csv` → Symbol: EUR/USD, Provider: capitalcom
198+
181199
Example:
182200
```bash
183-
# Convert CSV to PyneCore format
184-
pyne data convert-from ./data/btcusd.csv --symbol "CUSTOM:BTC/USD" --timeframe "1D"
201+
# Convert CSV with automatic detection
202+
pyne data convert-from ./data/BTCUSDT.csv # Auto-detects BTC/USDT
203+
204+
# Override auto-detected values if needed
205+
pyne data convert-from ./data/btcusd.csv --symbol "BTC/USD" --provider "kraken"
185206

186207
# Convert with timezone specification
187-
pyne data convert-from ./data/eurusd.csv --symbol "CUSTOM:EUR/USD" --timeframe "60" --timezone "Europe/London"
208+
pyne data convert-from ./data/eurusd.csv --timezone "Europe/London"
188209
```
189210

211+
**Generated TOML Configuration:**
212+
213+
After conversion, a TOML configuration file is automatically generated with:
214+
- **Smart Symbol Type Detection**: Automatically identifies forex, crypto, or other asset types
215+
- **Tick Size Analysis**: Analyzes price data to determine the minimum price increment
216+
- **Opening Hours Detection**: Detects trading hours from actual trading activity
217+
- **Interval Detection**: Automatically determines the timeframe from timestamp intervals
218+
219+
The generated TOML file includes all detected information and can be manually adjusted if needed.
220+
190221
## Data File Structure
191222

192223
PyneCore uses a structured approach to store OHLCV data:

docs/cli/run.md

Lines changed: 49 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ pyne run SCRIPT DATA [OPTIONS]
2727

2828
Where:
2929
- `SCRIPT`: Path to the PyneCore script (.py) or Pine Script (.pine) file
30-
- `DATA`: Path to the OHLCV data (.ohlcv) file
30+
- `DATA`: Path to the data file (.ohlcv, .csv, .json, or .txt)
3131
- `OPTIONS`: Additional options to customize the execution
3232

3333
## Simple Example
@@ -81,15 +81,61 @@ Example with API key:
8181
pyne run my_strategy.pine eurusd_data.ohlcv --api-key "your-api-key"
8282
```
8383

84+
## Automatic Data Conversion
85+
86+
The `run` command now supports automatic conversion of non-OHLCV data formats. When you provide a CSV, JSON, or TXT file, the system automatically:
87+
88+
1. **Detects the file format** from the extension
89+
2. **Analyzes the filename** to extract symbol and provider information
90+
3. **Converts the data** to OHLCV format
91+
4. **Generates a TOML configuration** with detected parameters
92+
5. **Runs the script** with the converted data
93+
94+
### Supported Formats and Detection
95+
96+
The automatic conversion supports:
97+
- **CSV files**: Standard comma-separated values
98+
- **JSON files**: JSON formatted OHLCV data
99+
- **TXT files**: Tab, semicolon, or pipe-delimited data (coming soon)
100+
101+
### Filename Pattern Detection
102+
103+
The system recognizes common filename patterns:
104+
- `BTCUSDT.csv` → Symbol: BTC/USDT
105+
- `EUR_USD.json` → Symbol: EUR/USD
106+
- `ccxt_BYBIT_BTC_USDT.csv` → Symbol: BTC/USDT, Provider: bybit
107+
- `BINANCE_ETHUSDT_1h.csv` → Symbol: ETH/USDT, Provider: binance
108+
109+
### Example with Automatic Conversion
110+
111+
```bash
112+
# Run a script with CSV data (automatic conversion)
113+
pyne run my_strategy.py BTCUSDT.csv
114+
115+
# The system will:
116+
# 1. Detect BTC/USDT as the symbol
117+
# 2. Convert CSV to OHLCV format
118+
# 3. Generate BTCUSDT.toml with symbol info
119+
# 4. Run the script with converted data
120+
```
121+
122+
### Advanced Analysis During Conversion
123+
124+
When converting data, the system performs advanced analysis:
125+
- **Tick Size Detection**: Analyzes price movements to determine minimum price increment
126+
- **Trading Hours Detection**: Identifies when the market is actively trading
127+
- **Interval Auto-Correction**: Detects and fixes incorrect timeframe settings
128+
- **Symbol Type Detection**: Identifies forex, crypto, or other asset types
129+
84130
## Command Arguments
85131

86132
The `run` command has two required arguments:
87133

88134
- `SCRIPT`: The script file to run. If only a filename is provided, it will be searched in the `workdir/scripts/` directory.
89-
- `DATA`: The OHLCV data file to use. If only a filename is provided, it will be searched in the `workdir/data/` directory.
135+
- `DATA`: The data file to use. Supports .ohlcv, .csv, .json formats. If only a filename is provided, it will be searched in the `workdir/data/` directory.
90136

91137
<small>
92-
Note: you don't need to write the `.py` and `.ohlcv` extensions in the command.
138+
Note: you don't need to write the file extensions in the command.
93139
</small>
94140

95141
## Command Options

pytest.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ log_cli_level = DEBUG
66
log_cli_format = %(asctime)s %(levelname)6s %(module_func_line)30s - %(message)s
77
log_cli_date_format = %Y-%m-%d %H:%M:%S
88

9-
addopts = --import-mode=importlib -rs -x --spec --ignore-glob="**/data/**"
9+
addopts = --import-mode=importlib -rs -x --spec --ignore-glob="**/data/*modified.py"

0 commit comments

Comments
 (0)