Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 265 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
# AGENTS.md

Guidelines for AI agents working with the qfeval-data repository.

## Repository Overview

**qfeval-data** is a Python library for handling financial time series data. It provides the `Data` class—a specialized data structure built on PyTorch tensors for efficient manipulation of timestamped, symbol-indexed financial data (OHLCV).

For detailed specifications, see the `docs/` directory:
- `docs/README.md` - Documentation index (Japanese: `docs/README.ja.md`)
- `docs/data.md` - Complete Data class API reference (Japanese: `docs/data.ja.md`)
- `docs/flattener.md` - Flattener class reference (Japanese: `docs/flattener.ja.md`)
- `docs/util.md` - Utility functions reference (Japanese: `docs/util.ja.md`)
- `docs/examples.md` - Practical examples and recipes (Japanese: `docs/examples.ja.md`)

## Codebase Structure

```
qfeval-data/
├── qfeval_data/ # Main package
│ ├── __init__.py # Exports: Data, Flattener, __version__
│ ├── data.py # Core Data class
│ ├── flattener.py # Tensor flattening utilities
│ ├── util.py # Helper functions
│ ├── plot.py # Visualization (requires matplotlib)
│ └── version.py # Version string
├── tests/ # pytest test suite
└── pyproject.toml # Project configuration
```

## Development Commands

<!-- test:skip -->
```bash
# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=qfeval_data

# Linting and formatting
black qfeval_data tests
isort qfeval_data tests
flake8 qfeval_data tests

# Type checking
mypy qfeval_data
```

## Code Style

- Formatter: `black` (line-length: 80)
- Import sorting: `isort`
- Linting: `flake8`
- Type checking: `mypy` (strict mode)

## Key Design Patterns

1. **Lazy slicing**: Data slicing creates views without copying tensors
2. **Sorted indexes**: Timestamps and symbols are always sorted internally
3. **Method chaining**: Most methods return `Data` for fluent API
4. **PyTorch backend**: Full GPU support via tensor operations

---

## Using qfeval_data.Data (PyPI Package)

This section is for agents that consume qfeval-data as a dependency.

<!-- test:setup
import numpy as np
import pandas as pd
import torch
from qfeval_data import Data, Flattener

# Create sample OHLCV data for examples
def create_sample_data():
timestamps = np.array(
["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04"],
dtype="datetime64[D]",
)
symbols = np.array(["AAPL", "GOOG"])
tensors = {
"open": torch.tensor([[100.0, 200.0], [101.0, 201.0], [102.0, 202.0], [103.0, 203.0]]),
"high": torch.tensor([[105.0, 205.0], [106.0, 206.0], [107.0, 207.0], [108.0, 208.0]]),
"low": torch.tensor([[98.0, 198.0], [99.0, 199.0], [100.0, 200.0], [101.0, 201.0]]),
"close": torch.tensor([[104.0, 204.0], [105.0, 205.0], [106.0, 206.0], [107.0, 207.0]]),
"volume": torch.tensor([[1e6, 5e5], [1.1e6, 5.5e5], [1.2e6, 6e5], [1.3e6, 6.5e5]]),
}
return Data.from_tensors(tensors, timestamps, symbols)

data = create_sample_data()
-->

### Installation

<!-- test:skip -->
```bash
pip install qfeval-data

# With plotting support
pip install qfeval-data[plot]
```

### Core Concepts

The `Data` class wraps a dictionary of PyTorch tensors indexed by:
- **timestamps**: `np.ndarray[datetime64]` (sorted)
- **symbols**: `np.ndarray[str]` (sorted)
- **columns**: Named tensors with shape `(num_timestamps, num_symbols, *extra_dims)`

### Creating Data Objects

```python
from qfeval_data import Data
import pandas as pd
import numpy as np
import torch

# From pandas DataFrame (requires "timestamp" and "symbol" columns)
df = pd.DataFrame({
"timestamp": ["2024-01-01", "2024-01-01", "2024-01-02", "2024-01-02"],
"symbol": ["AAPL", "GOOG", "AAPL", "GOOG"],
"open": [100.0, 200.0, 101.0, 201.0],
"close": [105.0, 205.0, 106.0, 206.0],
})
data = Data.from_dataframe(df)

# From tensors directly
tensors = {
"open": torch.tensor([[100.0, 200.0], [101.0, 201.0]]),
"close": torch.tensor([[105.0, 205.0], [106.0, 206.0]]),
}
timestamps = np.array(["2024-01-01", "2024-01-02"], dtype="datetime64[D]")
symbols = np.array(["AAPL", "GOOG"])
data = Data.from_tensors(tensors, timestamps, symbols)
```

### Accessing Data

```python
# Column access
opens = data.get("open") # Single column
data.open # Attribute access shortcut

# Slicing (lazy - no data copy)
subset = data[:10, :] # First 10 timestamps
subset = data["2024-01-01", :] # By timestamp value
subset = data[:, "AAPL"] # Single symbol
subset = data[:, ["AAPL", "GOOG"]] # Multiple symbols

# Properties
data.timestamps # np.ndarray of timestamps
data.symbols # np.ndarray of symbols
data.columns # List of column names
data.shape # (num_timestamps, num_symbols)
data.tensors # Dict[str, Tensor] after slicing
```

### Arithmetic Operations

All arithmetic is element-wise on tensors:

```python
returns = (data.close / data.open) - 1
mask = data.close > data.open # Boolean Data
```

### Time Series Operations

```python
data.shift(1) # Shift forward by 1 timestamp
data.pct_change() # Percent change
data.diff() # Difference
data.cumsum() # Cumulative sum
data.moving_average(2) # 2-period moving average
```

### Aggregation (axis: 0=timestamp, 1=symbol, None=both)

```python
data.mean(axis=0) # Mean across timestamps
data.sum(axis=1) # Sum across symbols
data.std() # Std dev across all
data.min(axis=0)
data.max(axis=0)
data.count() # Count non-NaN
```

### Missing Value Handling

```python
data.dropna(axis=0, how="any") # Drop timestamps with any NaN
data.fillna(0.0) # Fill NaN with value
data.fillna(method="ffill") # Forward fill
```

### Financial Metrics

```python
data.close.annualized_return()
data.close.annualized_volatility()
data.close.annualized_sharpe_ratio()
data.close.maximum_drawdown()
data.close.metrics() # All metrics combined
```

### Resampling

```python
data.daily()
data.weekly()
data.monthly()
data.yearly()
```

### Conversion

```python
df = data.to_dataframe() # pandas DataFrame (long format)
csv = data.to_csv() # CSV string
```

### Device and Dtype

```python
data.to(torch.float64) # Change dtype
data.device # Current device
data.dtype # Current dtype
```

### Method Chaining Example

```python
result = (
data
.get(["open", "close"])
.dropna()
.pct_change()
.fillna(0.0)
.mean(axis=1)
)
```

### Flattener Utility

Convert between `Data` (timestamp/symbol indexed) and flat `Tensor` (batch indexed):

```python
from qfeval_data import Flattener

flattener = Flattener(data.close)
flat_tensor = flattener.flatten(data.close) # Data -> Tensor
restored = flattener.unflatten(flat_tensor, "prices") # Tensor -> Data
```

### Further Documentation

For complete API documentation and more examples, see:
- `docs/data.md` - Full Data class reference with all methods and parameters
- `docs/flattener.md` - Flattener class details
- `docs/examples.md` - Practical recipes for common tasks
33 changes: 33 additions & 0 deletions README.ja.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# qfeval-data

[[English](README.md)]

[![python](https://img.shields.io/badge/python-%3E=3.9-blue.svg)](https://pypi.org/project/qfeval_data/)
[![pypi](https://img.shields.io/pypi/v/qfeval_data.svg)](https://pypi.org/project/qfeval_data/)
[![CI](https://github.com/pfnet-research/qfeval-data/actions/workflows/ci-python.yaml/badge.svg)](https://github.com/pfnet-research/qfeval-data/actions/workflows/ci-python.yaml)
[![codecov](https://codecov.io/gh/pfnet-research/qfeval-data/graph/badge.svg?token=5A02B1JV7V)](https://codecov.io/gh/pfnet-research/qfeval-data)
[![downloads](https://img.shields.io/pypi/dm/qfeval_data)](https://pypi.org/project/qfeval_data)
[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

qfevalは、Preferred Networks 金融チームが開発している、金融時系列処理のためのフレームワークです。
データ形式の仕様定義、金融時系列データを効率的に扱うためのクラス/関数群、および金融時系列モデルの評価フレームワークが含まれます。

qfeval-dataは、qfevalの中でも、金融時系列データを効率的に扱うためのデータフレームを提供します。

## インストール

```bash
pip install qfeval_data
```

## 使い方

詳細なドキュメントは [docs/README.ja.md](docs/README.ja.md) を参照してください。

## リリース手順

1. `release/X.X.X` のブランチを作成する。
2. version.yaml (Bump) のワークフローが実行され、`Bumping version from Z.Z.Z to X.X.X` というタイトルのプルリクエストが作成されるので、これをマージする。
3. `release/X.X.X` ブランチを `master` にマージするプルリクエスト(タイトルは `Release/X.X.X` のままで OK)を作成する。
4. 他の人から Approval を得て、`Release/X.X.X` のプルリクエストのマージをする。
5. [Release ワークフロー](https://github.com/pfnet-research/qfeval-data/actions/workflows/release.yaml) が走るのでこれの完了を待ち、 PyPI の [qfeval/data](https://pypi.org/project/qfeval_data/#history) で新しいバージョンが追加されたことを確認する。
25 changes: 11 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,14 @@
# qfeval-data

[[日本語](README.ja.md)]

[![python](https://img.shields.io/badge/python-%3E=3.9-blue.svg)](https://pypi.org/project/qfeval_data/)
[![pypi](https://img.shields.io/pypi/v/qfeval_data.svg)](https://pypi.org/project/qfeval_data/)
[![CI](https://github.com/pfnet-research/qfeval-data/actions/workflows/ci-python.yaml/badge.svg)](https://github.com/pfnet-research/qfeval-data/actions/workflows/ci-python.yaml)
[![codecov](https://codecov.io/gh/pfnet-research/qfeval-data/graph/badge.svg?token=5A02B1JV7V)](https://codecov.io/gh/pfnet-research/qfeval-data)
[![downloads](https://img.shields.io/pypi/dm/qfeval_data)](https://pypi.org/project/qfeval_data)
[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

qfevalは、Preferred Networks 金融チームが開発している、金融時系列処理のためのフレームワークです。
データ形式の仕様定義、金融時系列データを効率的に扱うためのクラス/関数群、および金融時系列モデルの評価フレームワークが含まれます。

qfeval-dataは、qfevalの中でも、金融時系列データを効率的に扱うためのデータフレームを提供します。

---

qfeval is a framework developed by Preferred Networks' Financial Solutions team for processing financial time series data.
It includes: data format specification definitions, a set of classes/functions for efficiently handling financial time series data, and a framework for evaluating financial time series models.

Expand All @@ -25,12 +21,13 @@ pip install qfeval_data
```

## Usage
TBD

## リリース手順
See [docs/README.md](docs/README.md) for detailed documentation.

## Release Process

1. `release/X.X.X` のブランチを作成する。
2. version.yaml (Bump) のワークフローが実行され、`Bumping version from Z.Z.Z to X.X.X` というタイトルのプルリクエストが作成されるので、これをマージする。
3. `release/X.X.X` ブランチを `master` にマージするプルリクエスト(タイトルは `Release/X.X.X` のままで OK)を作成する。
4. 他の人から Approval を得て、`Release/X.X.X` のプルリクエストのマージをする。
5. [Release ワークフロー](https://github.com/pfnet-research/qfeval-data/actions/workflows/release.yaml) が走るのでこれの完了を待ち、 PyPI [qfeval/data](https://pypi.org/project/qfeval_data/#history) で新しいバージョンが追加されたことを確認する。
1. Create a `release/X.X.X` branch.
2. The version.yaml (Bump) workflow will run and create a pull request titled `Bumping version from Z.Z.Z to X.X.X`. Merge this PR.
3. Create a pull request to merge the `release/X.X.X` branch into `master` (the title `Release/X.X.X` is fine).
4. Get approval from another team member and merge the `Release/X.X.X` pull request.
5. Wait for the [Release workflow](https://github.com/pfnet-research/qfeval-data/actions/workflows/release.yaml) to complete, then verify the new version appears on PyPI at [qfeval/data](https://pypi.org/project/qfeval_data/#history).
Loading