The verified 48 model cards are available inside the verified_model_cards
directory. Inside the directory, there are 48 directories, each named after the model card (replaced the '' from model id with '@'). Each directory contains 11 files. The files are:
1_raw.md
: The raw readme file after download.2_original_model_card.md
: The readme file after processing (removing YAML metadata at the beginning, which is not visible from the ui).3_model_generated_info_list.md
: The checklist of information generated byGPT 4o mini
from the original model card.4_corrected_info_list.md
: The checklist of information after manual correction.5_reorganized_model_card.md
: The reorganized model card usingGemini 2 Flash Thinking
after reorganization.6_manually_removed_extra_information.md
: The reorganized model card after manually removing extra information from it.7_manually_removed_misinterpretation.md
: The reorganized model card after manually removing misinterpretations from it.8_manually_added_missing_information.md
: The reorganized model card after manually adding missing information to it.9_gemini_jury_result.json
: The content misplacement verification results of the reorganized model card byGemini 2.5 Pro
.10_o4_mini_jury_result.json
: The content misplacement verification results of the reorganized model card byO4-mini
.11_deepseek_r1_jury_result.json
: The content misplacement verification results of the reorganized model card byDeepSeek R1
.
Install the required packages
pip install -r requirements.txt
Works with Python 3.11; backward compatibility has not been tested.
- Run
data_collector/repo_lister.py
to list all the models available in Hugging Face. A file namedall_models.csv
with the model list will be created inside thedata
directory.
python data_collector/repo_lister.py
- Run
data_collector/repo_selector.py
to order the models and select top 1000 models. The list of the top models will be saved indata/top_1000_models.csv
.
python data_collector/repo_selector.py
- Run
data_collector/repo_readme_collector.py
to download readme files of the selected top 1000 models. The readme files will be saved inside thedata/readmes
directory. Each raw readme files will be saved insidedata/readmes/raw
directory. The further processed readme files will be saved insidedata/readmes/processed
directory.
python data_collector/repo_readme_collector.py
- Run
data_collector/readme_selector.py
to process and select automated quality model cards. The list will be saved indata/top_one_model_per_organization.csv
.
python data_collector/readme_selector.py
- Manually verify models listed in
data/top_one_model_per_organization.csv
and list the unwanted models indata/excluding_repos.csv
. If you don't have any unwanted models, just leave it empty with amodel_id
as header of the file. Now, Rundata_collector/exclude_unwamted_repos.py
to get the final selected list of quality model cards saved indata/selected_repos.csv
.
python data_collector/exclude_unwanted_repos.py
Run model_card_reorganizer/gemini_reorganizer.py
to reorganize the selected model cards. The reorganized model cards will be saved inside data/readmes/reorganized
directory. Insert your API key into the GEMINI_API_KEY
placeholder in the util/constants.py
file to enable Gemini model access.
python model_card_reorganizer/gemini_reorganizer.py
The reorganization instruction and template structure with section description is available in model_card_reorganizer/gemini_prompt_template.md
and model_card_reorganizer/model_card_template_with_description.md
respectively.
Run model_card_info_lister/gpt_4o_mini_lister.py
to create checklists of information from the original model cards. The checklists will be saved in data/readmes/info_list
directory.
python model_card_info_lister/gpt_4o_mini_lister.py
The instruction for the checklist creation is available in model_card_info_lister/system_instruction.md
.
Run relevance_verifier/gemini_relevance_verifier.py
to verify the sections of all the reorganized model cards. The verification results will be saved in ``.
python relevance_verifier/gemini_relevance_verifier.py
Run relevance_verifier/o_mini_relevance_verifier.py
to verify the sections of all the reorganized model cards. The verification results will be saved in ``.
python relevance_verifier/o_mini_relevance_verifier.py
Run relevance_verifier/deepseek_relevance_verifier.py
to verify the sections of all the reorganized model cards. The verification results will be saved in ``.
python relevance_verifier/deepseek_relevance_verifier.py