Conversation
|
the line number doesn't match still - it should say line 5 to match your example perhaps there is a different linter that reports things better? and/or it'd have been useful to also dump the offensive line? |
That's because the following is a valid yaml, where type: sft
micro_batch_size: 36
model:
type: "liger"But we can extend out custom yaml loader to raise errors on empty values: import yaml
class UniqueKeyLoader(yaml.SafeLoader):
def construct_mapping(self, node, deep=False):
mapping = set()
for key_node, _ in node.value:
key = self.construct_object(key_node, deep=deep)
if key in mapping:
raise ValueError(f"Duplicate '{key}' key found in YAML on line {key_node.start_mark.line + 1}.")
mapping.add(key)
return super().construct_mapping(node, deep)
def process_empty_scalar(self, mark):
raise ValueError(f"Empty value found in YAML on line {mark.line + 1}.")
example = """\
type: sft
micro_batch_size: 36
model:
type: "liger"
"""
data = yaml.safe_load(example)
print("Successfully loaded yaml:", data)
data = yaml.load(example, Loader=UniqueKeyLoader)This outputs: Updated the PR to include this change. Also, I realize this is now reporting an error on line 4 and not line 5 - but line 5 is not incorrect in the yaml. Thoughts? |
|
Yeah, as you said this is not helping and makes things even more confusing, since there is no problem with I didn't think that the misformatted yaml example we work with is a valid yaml when there is no schema. Thank you for pointing that out. OK, how about we go back then to the normal parser stage and figure out how to get it to report the correct line number? If that's not possible, perhaps at least dumping the problematic context? |
Addresses #158
If we make a mistake in our yaml config like below, we now catch this error before launching all processes. This gives faster feedback on config problems:
Bad config yaml:
type: sft micro_batch_size: 36 model: type: "liger" # Accidental dedent name_or_path: Felladrin/Llama-160M-Chat-v1 attn_implementation: sdpa data: sources: - HuggingFaceH4/ultrachat_200k:train max_length: 1024Error: