Skip to content

Add multi-output stream writers and -O CLI option#115

Open
amc-corey-cox wants to merge 7 commits intomainfrom
multi-output-stream-writers
Open

Add multi-output stream writers and -O CLI option#115
amc-corey-cox wants to merge 7 commits intomainfrom
multi-output-stream-writers

Conversation

@amc-corey-cox
Copy link
Contributor

Summary

  • Add class-based StreamWriter ABC with JSONStreamWriter, JSONLStreamWriter, YAMLStreamWriter implementations alongside the existing function-based API
  • Modify TabularStreamWriter to inherit from StreamWriter (backward compatible — stream() is an alias for process())
  • Add MultiStreamWriter for fan-out to multiple output formats in a single pass
  • Add make_stream_writer() factory and EXTENSION_FORMAT_MAP
  • Add -O / --additional-output repeatable CLI option to map-data for writing additional output files with format inferred from extension

Test plan

  • All 277 existing tests pass (backward compatibility verified)
  • 26 new unit tests for JSONStreamWriter, JSONLStreamWriter, YAMLStreamWriter, TabularStreamWriter chunk API, make_stream_writer factory, MultiStreamWriter
  • 3 new CLI integration tests for -O flag (multi-output with -o, format inference, stdout + -O)
  • Manual smoke test: linkml-map map-data -T <spec> -s <schema> -f jsonl -O /tmp/out.tsv -O /tmp/out.json <input.tsv>

Closes #114

🤖 Generated with Claude Code

Add class-based StreamWriter ABC with JSON, JSONL, YAML, and Tabular
implementations alongside the existing function-based API. Add
MultiStreamWriter for fan-out to multiple output formats in a single
pass, and a -O/--additional-output CLI option for map-data.

Closes #114

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a class-based streaming writer API (including multi-output fan-out) and exposes it via a new repeatable -O/--additional-output CLI option on map-data, enabling simultaneous emission of multiple output formats in a single transform pass.

Changes:

  • Introduces StreamWriter ABC and concrete JSONStreamWriter, JSONLStreamWriter, YAMLStreamWriter; updates TabularStreamWriter to implement the new interface while keeping stream() for backward compatibility.
  • Adds make_stream_writer(), EXTENSION_FORMAT_MAP, and MultiStreamWriter to fan-out chunk streams to multiple file outputs (with tabular header rewrite support).
  • Extends linkml-map map-data with -O/--additional-output for additional output files with format inferred from file extension, plus new unit/integration tests.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/linkml_map/writers/output_streams.py Adds StreamWriter hierarchy, factory, extension map, and MultiStreamWriter with header-rewrite post-processing.
src/linkml_map/writers/__init__.py Re-exports the new writer classes/factory/constants as part of the public writers API.
src/linkml_map/cli/cli.py Adds -O/--additional-output option and multi-output streaming paths for tabular/directory inputs.
tests/test_writers/test_output_streams.py Adds unit tests for new class-based writers, factory behavior, and multi-output fan-out.
tests/test_cli/test_cli_tabular.py Adds CLI integration tests validating -O behavior with file output and stdout output.

- JSONStreamWriter: add _started flag so preamble is emitted exactly
  once even when empty chunks precede real data
- YAMLStreamWriter: skip empty chunks to avoid emitting key: [] preamble
  that breaks subsequent continuation chunks
- Add encoding="utf-8" to rewrite-header open() calls in
  MultiStreamWriter and _rewrite_tabular_headers
- Add edge-case tests for empty chunks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
amc-corey-cox and others added 3 commits February 25, 2026 14:43
Make MultiStreamWriter accept Path | TextIO targets so it can handle
stdout directly, removing the need for the separate
_write_multi_to_stdout_and_files() and _rewrite_tabular_headers()
functions in the CLI. Convert PR #115 test classes to functional pytest
style and add a test for file-handle targets via StringIO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

- Use csv module in rewrite_header_and_pad to handle quoted fields
- Reuse EXTENSION_FORMAT_MAP for primary output format inference
- Remove unused Path import from test_output_streams.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add multi-output stream writers for simultaneous format output

2 participants