Skip to content

Commit d3875c9

Browse files
committed
Fix a bug in MultiSourceSeq2Seq related to preprocessing.
1 parent 8f6f261 commit d3875c9

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

roosterize/ml/naming/MultiSourceSeq2Seq.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,9 @@ def process_data_impl(
347347
# Inputs
348348
all_inputs: Dict[str, List[List[str]]] = self.get_all_inputs(lemmas, docs_sub_tokenizers)
349349
for input_type, src_sentences in all_inputs.items():
350-
IOUtils.dump(output_processed_data_dir/f"src.{input_type}.txt", src_sentences, IOUtils.Format.txtList)
350+
IOUtils.dump(output_processed_data_dir/f"src.{input_type}.txt",
351+
"".join([" ".join(sent) + "\n" for sent in src_sentences]),
352+
IOUtils.Format.txt)
351353

352354
# Outputs
353355
IOUtils.dump(

0 commit comments

Comments
 (0)