Skip to content

asgaardlab/model-card-reorganization

Repository files navigation

MCTidy: Automating the Reorganization of Model Cards to Enforce Template Consistency

The verified 48 model cards are available inside the verified_model_cards directory. Inside the directory, there are 48 directories, each named after the model card (replaced the '' from model id with '@'). Each directory contains 11 files. The files are:

  1. 1_raw.md: The raw readme file after download.
  2. 2_original_model_card.md: The readme file after processing (removing YAML metadata at the beginning, which is not visible from the ui).
  3. 3_model_generated_info_list.md: The checklist of information generated by GPT 4o mini from the original model card.
  4. 4_corrected_info_list.md: The checklist of information after manual correction.
  5. 5_reorganized_model_card.md: The reorganized model card using Gemini 2 Flash Thinking after reorganization.
  6. 6_manually_removed_extra_information.md: The reorganized model card after manually removing extra information from it.
  7. 7_manually_removed_misinterpretation.md: The reorganized model card after manually removing misinterpretations from it.
  8. 8_manually_added_missing_information.md: The reorganized model card after manually adding missing information to it.
  9. 9_gemini_jury_result.json: The content misplacement verification results of the reorganized model card by Gemini 2.5 Pro.
  10. 10_o4_mini_jury_result.json: The content misplacement verification results of the reorganized model card by O4-mini.
  11. 11_deepseek_r1_jury_result.json: The content misplacement verification results of the reorganized model card by DeepSeek R1.

Setup environment

Install the required packages

pip install -r requirements.txt

Works with Python 3.11; backward compatibility has not been tested.

Select model cards from Hugging Face repositories

  1. Run data_collector/repo_lister.py to list all the models available in Hugging Face. A file named all_models.csv with the model list will be created inside the data directory.
python data_collector/repo_lister.py
  1. Run data_collector/repo_selector.py to order the models and select top 1000 models. The list of the top models will be saved in data/top_1000_models.csv.
python data_collector/repo_selector.py
  1. Run data_collector/repo_readme_collector.py to download readme files of the selected top 1000 models. The readme files will be saved inside the data/readmes directory. Each raw readme files will be saved inside data/readmes/raw directory. The further processed readme files will be saved inside data/readmes/processed directory.
python data_collector/repo_readme_collector.py
  1. Run data_collector/readme_selector.py to process and select automated quality model cards. The list will be saved in data/top_one_model_per_organization.csv.
python data_collector/readme_selector.py
  1. Manually verify models listed in data/top_one_model_per_organization.csv and list the unwanted models in data/excluding_repos.csv. If you don't have any unwanted models, just leave it empty with a model_id as header of the file. Now, Run data_collector/exclude_unwamted_repos.py to get the final selected list of quality model cards saved in data/selected_repos.csv.
python data_collector/exclude_unwanted_repos.py

Reorganize model cards

Run model_card_reorganizer/gemini_reorganizer.py to reorganize the selected model cards. The reorganized model cards will be saved inside data/readmes/reorganized directory. Insert your API key into the GEMINI_API_KEY placeholder in the util/constants.py file to enable Gemini model access.

python model_card_reorganizer/gemini_reorganizer.py

The reorganization instruction and template structure with section description is available in model_card_reorganizer/gemini_prompt_template.md and model_card_reorganizer/model_card_template_with_description.md respectively.

Verify Reorganization

Make checklist

Run model_card_info_lister/gpt_4o_mini_lister.py to create checklists of information from the original model cards. The checklists will be saved in data/readmes/info_list directory.

python model_card_info_lister/gpt_4o_mini_lister.py

The instruction for the checklist creation is available in model_card_info_lister/system_instruction.md.

Run LLM Jury

Run Gemini 2.5 Pro Juror

Run relevance_verifier/gemini_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.

python relevance_verifier/gemini_relevance_verifier.py

Run O4-mini Juror

Run relevance_verifier/o_mini_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.

python relevance_verifier/o_mini_relevance_verifier.py

Run DeepSeek R1 Juror

Run relevance_verifier/deepseek_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.

python relevance_verifier/deepseek_relevance_verifier.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages