Early catch config errors by sfc-gh-mwyatt · Pull Request #159 · snowflakedb/ArcticTraining

sfc-gh-mwyatt · 2025-04-14T23:16:24Z

Addresses #158

If we make a mistake in our yaml config like below, we now catch this error before launching all processes. This gives faster feedback on config problems:

Bad config yaml:

type: sft
micro_batch_size: 36

model:
type: "liger" # Accidental dedent
  name_or_path: Felladrin/Llama-160M-Chat-v1
  attn_implementation: sdpa

data:
  sources:
    - HuggingFaceH4/ultrachat_200k:train
  max_length: 1024

Error:

yaml.parser.ParserError: while parsing a block mapping
  in "repro.yaml", line 1, column 1
expected <block end>, but found '<block mapping start>'
  in "repro.yaml", line 6, column 3

sfc-gh-sbekman · 2025-04-15T00:20:28Z

the line number doesn't match still - it should say line 5 to match your example

perhaps there is a different linter that reports things better?

and/or it'd have been useful to also dump the offensive line?

sfc-gh-mwyatt · 2025-04-15T15:43:13Z

the line number doesn't match still - it should say line 5 to match your example

perhaps there is a different linter that reports things better?

and/or it'd have been useful to also dump the offensive line?

That's because the following is a valid yaml, where model gets the value None:

type: sft
micro_batch_size: 36

model:
type: "liger"

But we can extend out custom yaml loader to raise errors on empty values:

import yaml

class UniqueKeyLoader(yaml.SafeLoader):
    def construct_mapping(self, node, deep=False):
        mapping = set()
        for key_node, _ in node.value:
            key = self.construct_object(key_node, deep=deep)
            if key in mapping:
                raise ValueError(f"Duplicate '{key}' key found in YAML on line {key_node.start_mark.line + 1}.")
            mapping.add(key)
        return super().construct_mapping(node, deep)

    def process_empty_scalar(self, mark):
        raise ValueError(f"Empty value found in YAML on line {mark.line + 1}.")

example = """\
type: sft
micro_batch_size: 36

model:
type: "liger"
"""

data = yaml.safe_load(example)
print("Successfully loaded yaml:", data)

data = yaml.load(example, Loader=UniqueKeyLoader)

This outputs:

❯ python yaml_fail.py
Successfully loaded yaml: {'type': 'liger', 'micro_batch_size': 36, 'model': None}
Traceback (most recent call last):
  File "/code/users/mwyatt/llama-4-eval/yaml_fail.py", line 27, in <module>
    data = yaml.load(example, Loader=UniqueKeyLoader)
  ...
  File "/code/users/mwyatt/llama-4-eval/yaml_fail.py", line 14, in process_empty_scalar
    raise ValueError(f"Empty value found in YAML on line {mark.line + 1}.")
ValueError: Empty value found in YAML on line 4.

Updated the PR to include this change.

Also, I realize this is now reporting an error on line 4 and not line 5 - but line 5 is not incorrect in the yaml. Thoughts?

sfc-gh-sbekman · 2025-04-15T17:06:48Z

Yeah, as you said this is not helping and makes things even more confusing, since there is no problem with model: but with subsequent line.

I didn't think that the misformatted yaml example we work with is a valid yaml when there is no schema. Thank you for pointing that out.

OK, how about we go back then to the normal parser stage and figure out how to get it to report the correct line number?

If that's not possible, perhaps at least dumping the problematic context?

load config before launching deepspeed

ccbfe4b

sfc-gh-mwyatt requested review from sfc-gh-caxu, sfc-gh-jrasley, sfc-gh-sbekman, sfc-gh-srajbhandari and sfc-gh-zhyao as code owners April 14, 2025 23:16

error on empty yaml value

e98c533

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early catch config errors#159

Early catch config errors#159
sfc-gh-mwyatt wants to merge 2 commits intomainfrom
mwyatt/check-config-sooner

sfc-gh-mwyatt commented Apr 14, 2025

Uh oh!

sfc-gh-sbekman commented Apr 15, 2025 •

edited

Loading

Uh oh!

sfc-gh-mwyatt commented Apr 15, 2025 •

edited

Loading

Uh oh!

sfc-gh-sbekman commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sfc-gh-mwyatt commented Apr 14, 2025

Uh oh!

sfc-gh-sbekman commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-mwyatt commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-sbekman commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfc-gh-sbekman commented Apr 15, 2025 •

edited

Loading

sfc-gh-mwyatt commented Apr 15, 2025 •

edited

Loading