MCTidy: Automating the Reorganization of Model Cards to Enforce Template Consistency

The verified 48 model cards are available inside the verified_model_cards directory. Inside the directory, there are 48 directories, each named after the model card (replaced the '' from model id with '@'). Each directory contains 11 files. The files are:

1_raw.md: The raw readme file after download.
2_original_model_card.md: The readme file after processing (removing YAML metadata at the beginning, which is not visible from the ui).
3_model_generated_info_list.md: The checklist of information generated by GPT 4o mini from the original model card.
4_corrected_info_list.md: The checklist of information after manual correction.
5_reorganized_model_card.md: The reorganized model card using Gemini 2 Flash Thinking after reorganization.
6_manually_removed_extra_information.md: The reorganized model card after manually removing extra information from it.
7_manually_removed_misinterpretation.md: The reorganized model card after manually removing misinterpretations from it.
8_manually_added_missing_information.md: The reorganized model card after manually adding missing information to it.
9_gemini_jury_result.json: The content misplacement verification results of the reorganized model card by Gemini 2.5 Pro.
10_o4_mini_jury_result.json: The content misplacement verification results of the reorganized model card by O4-mini.
11_deepseek_r1_jury_result.json: The content misplacement verification results of the reorganized model card by DeepSeek R1.

Setup environment

Install the required packages

pip install -r requirements.txt

Works with Python 3.11; backward compatibility has not been tested.

Select model cards from Hugging Face repositories

Run data_collector/repo_lister.py to list all the models available in Hugging Face. A file named all_models.csv with the model list will be created inside the data directory.

python data_collector/repo_lister.py

Run data_collector/repo_selector.py to order the models and select top 1000 models. The list of the top models will be saved in data/top_1000_models.csv.

python data_collector/repo_selector.py

Run data_collector/repo_readme_collector.py to download readme files of the selected top 1000 models. The readme files will be saved inside the data/readmes directory. Each raw readme files will be saved inside data/readmes/raw directory. The further processed readme files will be saved inside data/readmes/processed directory.

python data_collector/repo_readme_collector.py

Run data_collector/readme_selector.py to process and select automated quality model cards. The list will be saved in data/top_one_model_per_organization.csv.

python data_collector/readme_selector.py

Manually verify models listed in data/top_one_model_per_organization.csv and list the unwanted models in data/excluding_repos.csv. If you don't have any unwanted models, just leave it empty with a model_id as header of the file. Now, Run data_collector/exclude_unwamted_repos.py to get the final selected list of quality model cards saved in data/selected_repos.csv.

python data_collector/exclude_unwanted_repos.py

Reorganize model cards

Run model_card_reorganizer/gemini_reorganizer.py to reorganize the selected model cards. The reorganized model cards will be saved inside data/readmes/reorganized directory. Insert your API key into the GEMINI_API_KEY placeholder in the util/constants.py file to enable Gemini model access.

python model_card_reorganizer/gemini_reorganizer.py

The reorganization instruction and template structure with section description is available in model_card_reorganizer/gemini_prompt_template.md and model_card_reorganizer/model_card_template_with_description.md respectively.

Verify Reorganization

Make checklist

Run model_card_info_lister/gpt_4o_mini_lister.py to create checklists of information from the original model cards. The checklists will be saved in data/readmes/info_list directory.

python model_card_info_lister/gpt_4o_mini_lister.py

The instruction for the checklist creation is available in model_card_info_lister/system_instruction.md.

Run LLM Jury

Run Gemini 2.5 Pro Juror

Run relevance_verifier/gemini_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.

python relevance_verifier/gemini_relevance_verifier.py

Run O4-mini Juror

Run relevance_verifier/o_mini_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.

python relevance_verifier/o_mini_relevance_verifier.py

Run DeepSeek R1 Juror

Run relevance_verifier/deepseek_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.

python relevance_verifier/deepseek_relevance_verifier.py

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
api_clients		api_clients
data_collector		data_collector
model_card_info_lister		model_card_info_lister
model_card_reorganizer		model_card_reorganizer
reorganization_verifier		reorganization_verifier
util		util
verified_model_cards		verified_model_cards
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MCTidy: Automating the Reorganization of Model Cards to Enforce Template Consistency

Setup environment

Select model cards from Hugging Face repositories

Reorganize model cards

Verify Reorganization

Make checklist

Run LLM Jury

Run Gemini 2.5 Pro Juror

Run O4-mini Juror

Run DeepSeek R1 Juror

About

Uh oh!

Releases

Packages

Uh oh!

Languages

asgaardlab/model-card-reorganization

Folders and files

Latest commit

History

Repository files navigation

MCTidy: Automating the Reorganization of Model Cards to Enforce Template Consistency

Setup environment

Select model cards from Hugging Face repositories

Reorganize model cards

Verify Reorganization

Make checklist

Run LLM Jury

Run Gemini 2.5 Pro Juror

Run O4-mini Juror

Run DeepSeek R1 Juror

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages