-
Notifications
You must be signed in to change notification settings - Fork 55
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When a source document has a table with some preceding text, SDG fails with failed to generate data with exception: list index out of range
To Reproduce
Steps to reproduce the behavior:
- Create a Markdown in a git repo such as
https://github.com/cfchase/sample-md/blob/main/README.md
Hello World
| Hello | Hello |
|-------|-------|
| World | World |- Create a qna.yaml in your taxonomy referring to the markdown file such as
https://github.com/cfchase/sample-md/blob/main/qna.yaml
#~/.local/share/instructlab/taxonomy/knowledge/qna.yaml
...snip...
document:
repo: 'https://github.com/cfchase/sample-md.git'
commit: b5bbdd7516fd5f06956f2a1e3f207790a750c00e
patterns:
- 'README.md'- Run
ilab data generate - See error
failed to generate data with exception: list index out of range
Expected behavior
SDG continues past the document ingestion
Command Used
ilab data generate --pipeline=simple
Screenshots
Device Info (please complete the following information):
- Hardware Specs: Apple M3 Pro Chip, 36 GB Memory
- OS Version: [e.g. Mac OS 15.3
- Python Version: Python 3.11.9
- InstructLab Version:
sys.version: 3.11.9 (main, Aug 26 2024, 10:26:18) [Clang 15.0.0 (clang-1500.3.9.4)]
sys.platform: darwin
os.name: posix
platform.release: 24.3.0
platform.machine: arm64
platform.node: cchase-mac
platform.python_version: 3.11.9
platform.cpu_brand: Apple M3 Pro
memory.total: 36.00 GB
memory.available: 12.11 GB
memory.used: 18.85 GB
InstructLab:
instructlab.version: 0.23.0rc1.dev124
instructlab-dolomite.version: 0.2.0
instructlab-eval.version: 0.5.1
instructlab-quantize.version: 0.1.0
instructlab-schema.version: 0.4.2
instructlab-sdg.version: 0.7.1.dev46
instructlab-training.version: 0.7.0
Torch:
torch.version: 2.4.1
torch.backends.cpu.capability: NO AVX
torch.version.cuda: None
torch.version.hip: None
torch.cuda.available: False
torch.backends.cuda.is_built: False
torch.backends.mps.is_built: True
torch.backends.mps.is_available: True
llama_cpp_python:
llama_cpp_python.version: 0.3.6
llama_cpp_python.supports_gpu_offload: True
Additional context
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working