Add multi-output stream writers and -O CLI option#115
Open
amc-corey-cox wants to merge 7 commits intomainfrom
Open
Add multi-output stream writers and -O CLI option#115amc-corey-cox wants to merge 7 commits intomainfrom
amc-corey-cox wants to merge 7 commits intomainfrom
Conversation
Add class-based StreamWriter ABC with JSON, JSONL, YAML, and Tabular implementations alongside the existing function-based API. Add MultiStreamWriter for fan-out to multiple output formats in a single pass, and a -O/--additional-output CLI option for map-data. Closes #114 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a class-based streaming writer API (including multi-output fan-out) and exposes it via a new repeatable -O/--additional-output CLI option on map-data, enabling simultaneous emission of multiple output formats in a single transform pass.
Changes:
- Introduces
StreamWriterABC and concreteJSONStreamWriter,JSONLStreamWriter,YAMLStreamWriter; updatesTabularStreamWriterto implement the new interface while keepingstream()for backward compatibility. - Adds
make_stream_writer(),EXTENSION_FORMAT_MAP, andMultiStreamWriterto fan-out chunk streams to multiple file outputs (with tabular header rewrite support). - Extends
linkml-map map-datawith-O/--additional-outputfor additional output files with format inferred from file extension, plus new unit/integration tests.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/linkml_map/writers/output_streams.py |
Adds StreamWriter hierarchy, factory, extension map, and MultiStreamWriter with header-rewrite post-processing. |
src/linkml_map/writers/__init__.py |
Re-exports the new writer classes/factory/constants as part of the public writers API. |
src/linkml_map/cli/cli.py |
Adds -O/--additional-output option and multi-output streaming paths for tabular/directory inputs. |
tests/test_writers/test_output_streams.py |
Adds unit tests for new class-based writers, factory behavior, and multi-output fan-out. |
tests/test_cli/test_cli_tabular.py |
Adds CLI integration tests validating -O behavior with file output and stdout output. |
- JSONStreamWriter: add _started flag so preamble is emitted exactly once even when empty chunks precede real data - YAMLStreamWriter: skip empty chunks to avoid emitting key: [] preamble that breaks subsequent continuation chunks - Add encoding="utf-8" to rewrite-header open() calls in MultiStreamWriter and _rewrite_tabular_headers - Add edge-case tests for empty chunks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make MultiStreamWriter accept Path | TextIO targets so it can handle stdout directly, removing the need for the separate _write_multi_to_stdout_and_files() and _rewrite_tabular_headers() functions in the CLI. Convert PR #115 test classes to functional pytest style and add a test for file-handle targets via StringIO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use csv module in rewrite_header_and_pad to handle quoted fields - Reuse EXTENSION_FORMAT_MAP for primary output format inference - Remove unused Path import from test_output_streams.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StreamWriterABC withJSONStreamWriter,JSONLStreamWriter,YAMLStreamWriterimplementations alongside the existing function-based APITabularStreamWriterto inherit fromStreamWriter(backward compatible —stream()is an alias forprocess())MultiStreamWriterfor fan-out to multiple output formats in a single passmake_stream_writer()factory andEXTENSION_FORMAT_MAP-O/--additional-outputrepeatable CLI option tomap-datafor writing additional output files with format inferred from extensionTest plan
JSONStreamWriter,JSONLStreamWriter,YAMLStreamWriter,TabularStreamWriterchunk API,make_stream_writerfactory,MultiStreamWriter-Oflag (multi-output with-o, format inference, stdout +-O)linkml-map map-data -T <spec> -s <schema> -f jsonl -O /tmp/out.tsv -O /tmp/out.json <input.tsv>Closes #114
🤖 Generated with Claude Code