LLM Metadata Repository

This repository provides structured, machine-readable metadata for a wide range of large language models (LLMs). It is designed to support tools and applications that require detailed information about model capabilities, configurations, and supported parameters. Used by BasiliskLLM and OpenAI NVDA Add-on.

This metadata enables:

Dynamic population of model selection UIs
Feature-aware prompting and parameter tuning
Compatibility and capability checks for downstream tools

The metadata is stored in JSON format and is inspired by the OpenRouter API model listing schema.

Each JSON file in the data/ directory contains a list of model objects with the following structure:

{
  "id": "gpt-5",
  "name": "GPT-5",
  "description": "OpenAI’s most advanced model...",
  "created": 1754587413,
  "context_length": 400000,
  "architecture": {
    "modality": "text+image->text",
    "input_modalities": ["text", "image", "file"],
    "output_modalities": ["text"],
    "tokenizer": "GPT",
    "instruct_type": null
  },
  "top_provider": {
    "context_length": 400000,
    "max_completion_tokens": 128000,
    "is_moderated": true
  },
  "supported_parameters": [
    "max_tokens",
    "temperature",
    "response_format",
    "structured_outputs"
  ]
}

Metadata Fields

Top-Level Fields

id: Unique model identifier (e.g., gpt-4-turbo)
name: Human-readable model name
description: Summary of model capabilities and use cases
created: Unix timestamp of model release (used for sorting by release date; models with -latest in the id are listed first)
context_length: Maximum input context length (in tokens)

architecture

modality: Overall input/output modality (e.g., text->text, text+image->text)
input_modalities: Supported input types (e.g., text, image, file)
output_modalities: Supported output types (e.g., text)
tokenizer: Tokenizer type used by the model (e.g., GPT)
instruct_type: Instruction tuning format (e.g., chatml, alpaca), or null if not applicable

top_provider

context_length: Maximum context length supported by the top provider
max_completion_tokens: Maximum number of tokens in a single response
is_moderated: Indicates whether the model is subject to content moderation

supported_parameters

A list of tunable parameters supported by the model, such as:

max_tokens
temperature
top_p
frequency_penalty
presence_penalty
tools
seed
response_format
structured_outputs

Data Sources

Metadata is curated from official provider documentation and APIs. Official model listings:

OpenAI: Models · API Reference
Anthropic: Models Overview · API Models
Mistral AI: Models · Model Comparison
xAI: Models and Pricing · Release Notes
Google: Gemini API Models · Vertex AI Models
DeepSeek: API Documentation · Model List

Data may also be synchronized from OpenRouter API which aggregates models from multiple providers.

Contributing

Contributions are welcome! To add or update metadata for a model:

Fork the repository
Add or edit the appropriate JSON file in the data/ directory
Submit a pull request

Please ensure your JSON is valid and follows the schema outlined above. Prefer official provider documentation when updating model IDs, descriptions, or parameters.

Models are sorted with *-latest aliases first, then by release date (newest first). Run python sort_models.py after editing to maintain sort order.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Metadata Repository

Metadata Fields

Top-Level Fields

architecture

top_provider

supported_parameters

Data Sources

Contributing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LLM Metadata Repository

Metadata Fields

Top-Level Fields

architecture

top_provider

supported_parameters

Data Sources

Contributing