Skip to content

Adding edf2asc conversion#29

Open
christian-oreilly wants to merge 2 commits intoscott-huberty:mainfrom
lina-usc:main
Open

Adding edf2asc conversion#29
christian-oreilly wants to merge 2 commits intoscott-huberty:mainfrom
lina-usc:main

Conversation

@christian-oreilly
Copy link

ENH: Add EDF-to-ASCII (.asc) converter

Summary

Adds a to_asc() function that converts EyeLink EDF binary files to ASCII (.asc) format, producing output equivalent to SR Research's proprietary edf2asc command-line tool. This is a pure-Python, streaming, single-pass converter that uses the existing edfapi ctypes bindings.

Verified against 270 EDF files — 269/270 pass line-by-line comparison against reference .asc files generated by the official edf2asc tool (1 file has a corrupt EDF that edfapi cannot open).

Motivation

  • Users currently need the proprietary edf2asc tool (Windows-only or requires a separate SR Research install) to get ASCII output from EDF files.
  • Having to_asc() in eyelinkio lets users convert EDF to ASCII in a single Python call, on any platform where edfapi is available.

Changes

New files

File Description
src/eyelinkio/edf/to_asc.py (461 lines) Core converter module. Streams through EDF elements via edf_get_next_data() and writes ASC output using a handler-per-event-type architecture.
src/eyelinkio/tests/test_to_asc.py (387 lines) Comprehensive test suite that compares generated output line-by-line against a reference .asc file, with documented tolerances for known edfapi version differences.

Modified files

File Change
src/eyelinkio/edf/__init__.py Export to_asc
src/eyelinkio/__init__.py Export to_asc at package level
src/eyelinkio/edf/read.py Add EDF.to_asc() convenience method; store _fpath on the EDF instance
src/eyelinkio/edf/_edf2py.py Replace bare assert with raise OSError for missing edfapi library
src/eyelinkio/tests/test_edf.py Change raise ValueError to continue for unexpected EDF files in test loops (allows test data directory to contain additional .edf files without breaking tests)

Usage

import eyelinkio

# Standalone function
eyelinkio.to_asc("recording.edf")                    # → recording.asc
eyelinkio.to_asc("recording.edf", "output.asc")      # → output.asc

# From an EDF object
edf = eyelinkio.read_edf("recording.edf")
edf.to_asc()                                          # → recording.asc

# Control INPUT field inclusion (for compatibility with edf2asc 3.1)
eyelinkio.to_asc("recording.edf", include_input=False)

What the converter handles

The converter produces all ASC output sections in correct chronological order:

  • PreambleCONVERTED FROM header with edfapi version, raw EDF preamble
  • Recording blocksSTART / END with config lines (PRESCALER, VPRESCALER, PUPIL, EVENTS, SAMPLES)
  • Sample data — Tab-separated: timestamp, gaze x/y, pupil, input (optional), HTARGET (if available), status markers
  • EventsSFIX/EFIX, SSACC/ESACC, SBLINK/EBLINK with correct spacing and field formatting
  • MessagesMSG lines with original timestamps
  • Button/InputBUTTON lines (decomposed from bit masks), INPUT lines
  • Missing data — Gaze values ≥ 1e8 rendered as .; HTARGET sentinel (-32768) rendered as . with M............ status
  • Resolution tracking — Per-sample rx/ry accumulated for END RES averages; start/end resolution used for ESACC amplitude computation

Known limitations: edfapi version differences

Background

The converter uses SR Research's edfapi C library (via ctypes) to read EDF files. The reference .asc files used for validation were generated by the official edf2asc tool. There are two edfapi versions in play:

  • edfapi 3.1 (Win32) — used by edf2asc to generate 266 of 270 reference files
  • edfapi 4.2 (macOS/Linux) — used by this converter at runtime

Both versions read the same EDF binary files, so any data that is stored in the EDF (timestamps, gaze coordinates, pupil size, messages, event boundaries) should be identical. However, per-sample angular resolution (FSAMPLE.rx, FSAMPLE.ry) is not stored in the EDF — it is computed at read time by the edfapi from calibration data and the current gaze position. The two edfapi versions use different internal algorithms for this computation, producing different resolution values for the same sample.

How we proved this

We compared the raw data output from both edfapi versions across 270 EDF files (72,946 samples total):

  1. Gaze coordinates (gx, gy): 100.0% exact match across all 72,946 samples. Every single gaze value read by edfapi 4.2 is bit-for-bit identical to the value read by edfapi 3.1. This confirms both versions faithfully decode what is stored in the EDF binary.

  2. Pupil size (pa): 100.0% exact match (same reasoning — stored in EDF, not computed).

  3. Timestamps, messages, event boundaries: 100.0% exact match.

  4. Angular resolution (rx, ry): Differs between versions. At screen center the versions agree to within <0.01%, but at screen edges edfapi 4.2 can return values up to ~90 px/deg while edfapi 3.1 caps around 55–60 px/deg. This is consistent with different internal algorithms for mapping gaze position to angular resolution — not a bug in either version.

What this affects in the ASC output

The resolution difference propagates to exactly two ASC output fields:

1. END RES — average resolution across a recording

The END line includes two resolution values that are the mean of all per-sample rx/ry values in that recording block. Since the per-sample resolution is computed differently, the averages differ.

Measured distribution across 9,394 END lines from 270 files:

Percentile Relative error (%)
p50 0.77%
p75 1.65%
p90 3.24%
p95 4.90%
p99 12.14%
max 82.94%

The large tail values (>10%) occur in recordings where gaze frequently moves to screen edges, where the resolution computation diverges most between edfapi versions. Recordings where gaze stays near screen center show <1% difference.

2. ESACC amplitude — saccade size in degrees

Saccade amplitude is computed as sqrt((dx/res_x)² + (dy/res_y)²) where res_x/res_y are the resolution values at the saccade start and end points. Since resolution differs between edfapi versions, the degree-converted amplitude differs too. Measured up to ~6% difference across 106 files. The gaze coordinates themselves (saccade start/end positions in pixels) match exactly.

Tolerance approach

Given these findings, the test comparator applies the following rules:

Strictly verified (must match exactly):

  • Sample timestamps, gaze x/y coordinates, HTARGET values
  • Event timestamps, gaze coordinates
  • START / PRESCALER / VPRESCALER / PUPIL / EVENTS / SAMPLES config lines
  • MSG / BUTTON / INPUT lines

Resolution-dependent fields (accept any difference, verify non-resolution fields):

Field What is tolerated What must still match
END RES (last 2 fields of END line) Any difference in the resolution averages Timestamp, SAMPLES, EVENTS, and all other END fields
ESACC amplitude (field 7) Any difference in degree-converted amplitude Eye, start time, end time, duration, start gaze x/y, end gaze x/y
ESACC pvel (field 8) ±1 integer difference All other fields

The rationale: these tolerances accept only fields that are mathematically derived from the edfapi-computed resolution. All fields that are stored in the EDF binary are still verified exactly. A converter bug that corrupts timestamps, gaze coordinates, event boundaries, or messages would still be caught.

Other minor cross-version differences

These affect specific edge cases and are documented in test_to_asc.py:

Difference Scope Explanation
Pupil value when gaze is missing 152/270 files (~91k samples) edf2asc 3.1 zeroes the pupil field when gaze is missing (.); edfapi 4.2 keeps the actual pupil value. The pupil hardware reading is valid even during blinks.
Gaze zeroing during non-tracked-eye blinks 2/270 files edf2asc 3.1 may zero the tracked eye's gaze during a non-tracked-eye blink (e.g., SBLINK R in a left-eye-only recording). edfapi 4.2 keeps the valid tracked-eye data.
Non-tracked-eye blink events 2/270 files edfapi 3.1 emits SBLINK R/EBLINK R in left-eye-only recordings; edfapi 4.2 omits these (produces 1 fewer line).
Event/sample ordering at blink boundaries 2/270 files The edfapi versions may deliver SBLINK/EBLINK events and adjacent sample lines in slightly different order.
Sample status markers Cosmetic only ... vs I.. vs M............ — different status string conventions between versions.
EFIX/ESACC/EBLINK duration overflow 2/270 files When sttime is near UINT32_MAX (4294967295), duration computation requires unsigned 32-bit arithmetic. Both versions compute valid durations; minor rounding differences at the overflow boundary.

Testing methodology

The test suite (test_to_asc.py) performs line-by-line comparison against reference .asc files, implementing the tolerances described above. The test_to_asc_matches_reference() function can be called with custom paths for batch validation, which is how the 270-file comprehensive test was run.

Test plan

  • pytest src/eyelinkio/tests/test_to_asc.py — 3 tests pass (creates file, line counts, line-by-line match)
  • pytest src/eyelinkio/tests/test_edf.py — existing tests pass (no regressions)
  • Comprehensive validation against 270 EDF files — 269/270 pass (1 corrupt EDF)

@christian-oreilly
Copy link
Author

@scott-huberty In case you are interested to merge this into your main package. I implemented that to streamline our data processing for Q1K. It is merged in my forked so we can use it from there, but I thought you might be interested in merging in back to the main project.

@scott-huberty
Copy link
Owner

Would love to, it would actually add the "O" part to EyeLinkIO ; ) Thanks Christian, I'll look closer tomorrow or Sunday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants