environment

Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

environment

python=3.12.4 torch=2.4.0 peft=0.15.2 vllm=0.6.2 tqdm=4.66.5 rapidfuzz=3.13.0 scikit-learn sacrebleu rouge-score openai=1.78.1 datetime jsonlines

script

./experiments/final_test.sh

python src/pipeline_op_qa.py \
    --file_name 'tabfact-test-clean' \
    --task_type 'TabFact' \
    --model_path '{"base_url": "http://10.10.10.8:8880/v1", "model": "~/pretrain/Qwen2.5-7B-Instruct", "tokenizer": "~/pretrain/Qwen2.5-7B-Instruct"}' \
    --save_dir 'experiments/baseline-7B/' \
    --num_workers 10 \

file_name: the name of json file in './data', schema is ['id', 'table_path', 'table_caption'(for TabFact), 'query']
task_type: WikiTQ or TabFact
model_path: local vllm server or online api service

Code Structure

src/
  pipeline_op_qa.py            Main inference code.
  operator_call.py             Operator prompt.
  operators_merge.py           Operation merge.
  orm.py                       ms-swift GRPO reward function (ORM plugin) for operator-call based TableQA.
  cell_annotator.py            TabFact cell annotation helper (adds "cells" field).
  data_select.py               Dataset filtering helper (length/verifiable filtering).
  utils/
    evaluator.py               Metrics for WikiTQ/TabFact/FeTaQA (and related scoring helpers).
    model_infer.py             OpenAI-compatible/vLLM client.
    table_serialize.py         CSV reader with header dedup; markdown table serializer; dicts_to_list.
    postprocess.py             Answer extraction from LLM outputs;
    tableqa.py                 Prompt templates.
    data_conversion.py         Numeric/date parsing utilities used by operators.
    aggregation.py             Aggregate helper (sum/avg/min/max/count; disabled by default).
    add_column.py              Operator add_column.
    select_columns.py          Operator select_column.
    sort_table.py              Operator sort_by.
    group_by.py                Operator group_by.
    column_filter.py           Operator filter.
    clean_column.py            Operator clean_column.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
experiments		experiments
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

environment

script

Code Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

environment

script

Code Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages