Skip to content

🤖 Autoupdate: Epoch#5594

Merged
veronikasamborska1994 merged 30 commits intomasterfrom
auto-epoch
Feb 27, 2026
Merged

🤖 Autoupdate: Epoch#5594
veronikasamborska1994 merged 30 commits intomasterfrom
auto-epoch

Conversation

@owidbot
Copy link
Contributor

@owidbot owidbot commented Feb 1, 2026

No description provided.

@owidbot
Copy link
Contributor Author

owidbot commented Feb 1, 2026

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-auto-epoch

chart-diff: ✅
  • 4/4 reviewed charts
  • Modified: 4/4
  • New: 0/0
  • Rejected: 0
  • Data changes: 3
  • Metadata changes: 3
data-diff: ❌ Found differences
= Dataset garden/artificial_intelligence/2025-03-12/epoch
  ~ Table epoch (changed metadata)
-     -     date_accessed: '2026-01-26'
+     +     date_accessed: '2026-02-27'
    ~ Dim days_since_1949
+       + New values: 9 / 981 (0.92%)
                           model  days_since_1949
                   GPT-3.5 Turbo            27191
          GPT-3.5 Turbo Instruct            27298
                          o1-pro            27836
                   GPT-5.2 Codex            28110
                    MiniMax-M2.1            28115
-       - Removed values: 4 / 981 (0.41%)
                                        model  days_since_1949
                      GPT-3.5 (davinci-002)\n            26994
                                GPT-3.5 Turbo            26996
                            GPT-4o (May 2024)            27526
          Qwen3-235B-A22B-Instruct (Jul 2025)            27964
    ~ Dim model
+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model
                     27191          GPT-3.5 Turbo
                     27298 GPT-3.5 Turbo Instruct
                     27836                 o1-pro
                     28110          GPT-5.2 Codex
                     28115           MiniMax-M2.1
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model
                     26994             GPT-3.5 (davinci-002)\n
                     26996                       GPT-3.5 Turbo
                     27526                   GPT-4o (May 2024)
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)
    ~ Column domain (new data, changed data)
+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model           domain
                     27191          GPT-3.5 Turbo         Language
                     27298 GPT-3.5 Turbo Instruct         Language
                     27836                 o1-pro Multiple domains
                     28110          GPT-5.2 Codex         Language
                     28115           MiniMax-M2.1         Language
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model           domain
                     26994             GPT-3.5 (davinci-002)\n         Language
                     26996                       GPT-3.5 Turbo         Language
                     27526                   GPT-4o (May 2024) Multiple domains
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)         Language
    ~ Column organization_categorization (new data, changed data)
+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model organization_categorization
                     27191          GPT-3.5 Turbo                    Industry
                     27298 GPT-3.5 Turbo Instruct                    Industry
                     27836                 o1-pro                    Industry
                     28110          GPT-5.2 Codex                    Industry
                     28115           MiniMax-M2.1                    Industry
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model organization_categorization
                     26994             GPT-3.5 (davinci-002)\n                    Industry
                     26996                       GPT-3.5 Turbo                    Industry
                     27526                   GPT-4o (May 2024)                    Industry
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                    Industry
    ~ Column parameters (new data, changed data)
+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model   parameters
                     27191          GPT-3.5 Turbo  20000000000
                     27298 GPT-3.5 Turbo Instruct  20000000000
                     27836                 o1-pro         <NA>
                     28110          GPT-5.2 Codex         <NA>
                     28115           MiniMax-M2.1 229000000000
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model   parameters
                     26994             GPT-3.5 (davinci-002)\n         <NA>
                     26996                       GPT-3.5 Turbo  20000000000
                     27526                   GPT-4o (May 2024)         <NA>
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025) 235000000000
    ~ Column publication_date (new data, changed data)
+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model publication_date
                     27191          GPT-3.5 Turbo       2023-06-13
                     27298 GPT-3.5 Turbo Instruct       2023-09-28
                     27836                 o1-pro       2025-03-19
                     28110          GPT-5.2 Codex       2025-12-18
                     28115           MiniMax-M2.1       2025-12-23
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model publication_date
                     26994             GPT-3.5 (davinci-002)\n       2022-11-28
                     26996                       GPT-3.5 Turbo       2022-11-30
                     27526                   GPT-4o (May 2024)       2024-05-13
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)       2025-07-25
    ~ Column training_computation_petaflop (new data, changed data)
+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model  training_computation_petaflop
                     27191          GPT-3.5 Turbo                           <NA>
                     27298 GPT-3.5 Turbo Instruct                           <NA>
                     27836                 o1-pro                           <NA>
                     28110          GPT-5.2 Codex                           <NA>
                     28115           MiniMax-M2.1                           <NA>
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model  training_computation_petaflop
                     26994             GPT-3.5 (davinci-002)\n                   2577999872.0
                     26996                       GPT-3.5 Turbo                           <NA>
                     27526                   GPT-4o (May 2024)                           <NA>
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                   4752000000.0
        ~ Changed values: 3 / 981 (0.31%)
           days_since_1949          model  training_computation_petaflop -  training_computation_petaflop +
                     27751    DeepSeek-V3                     3407800064.0                     3300000000.0
                     27828 Hunyuan-TurboS                             <NA>                     5400000000.0
                     28082         Olmo 3                             <NA>                     1100000000.0
    ~ Column training_dataset_size__gradients (changed metadata, new data, changed data)
+       + {}
-       - title: Training dataset size
-       - description_short: |-
-       -   The number of unique data points used to train the model. Each domain has a specific data point unit; for example, for vision it is images, for language it is words, and for games it is timesteps. This means systems can only be compared directly within the same domain.
-       - description_key:
-       -   - |-
-       -     Training data size measures the volume of unique examples used to train an AI model during its learning phase. It represents the total number of distinct data points the model learns from, counted only once regardless of how many times they're seen during training.
-       -   - |-
-       -     To understand this concept, imagine teaching someone to identify different bird species. Each unique bird photo you show them is one piece of training data. If you show 100 different photos, your training data size is 100, even if you review those same photos multiple times.
-       -   - |-
-       -     Since datasets vary by domain, there's no universal unit for measuring size. Text models might count tokens, image models count pictures, and video models count clips. Epoch AI typically uses the smallest unit that triggers a model update during training. For language models that predict the next word, this would be individual tokens.
-       -   - |-
-       -     Training data size directly impacts model performance. Larger datasets enable deeper learning and more nuanced pattern recognition, allowing models to identify subtle distinctions and handle diverse real-world scenarios more effectively.
-       - unit: unique datapoints
-       - display:
-       -   numDecimalPlaces: 0
-       -   zeroDay: '1949-01-01'
-       -   yearIsDay: true
-       - processing_level: major
-       - presentation:
-       -   topic_tags:
-       -     - Artificial Intelligence

+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model  training_dataset_size__gradients
                     27191          GPT-3.5 Turbo                               NaN
                     27298 GPT-3.5 Turbo Instruct                               NaN
                     27836                 o1-pro                               NaN
                     28110          GPT-5.2 Codex                               NaN
                     28115           MiniMax-M2.1                               NaN
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model  training_dataset_size__gradients
                     26994             GPT-3.5 (davinci-002)\n                              <NA>
                     26996                       GPT-3.5 Turbo                              <NA>
                     27526                   GPT-4o (May 2024)                              <NA>
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                    36000000000000
        ~ Changed values: 656 / 981 (66.87%)
           days_since_1949                  model  training_dataset_size__gradients -  training_dataset_size__gradients +
                     12661                ASE+ACE                              500000                                 NaN
                     21735 Denoising Autoencoders                             7840000                                 NaN
                     25017       NoisyNet-Dueling                           320000000                                 NaN
                     26792                    UL2                       1000000000000                                 NaN
                     27877        Qwen3-235B-A22B                      36000000000000                                 NaN
    ~ Column training_dataset_size__total (changed metadata, new data, changed data)
-       - {}
+       + title: Training dataset size
+       + description_short: |-
+       +   The number of unique data points used to train the model. Each domain has a specific data point unit; for example, for vision it is images, for language it is words, and for games it is timesteps. This means systems can only be compared directly within the same domain.
+       + description_key:
+       +   - |-
+       +     Training data size measures the volume of unique examples used to train an AI model during its learning phase. It represents the total number of distinct data points the model learns from, counted only once regardless of how many times they're seen during training.
+       +   - |-
+       +     To understand this concept, imagine teaching someone to identify different bird species. Each unique bird photo you show them is one piece of training data. If you show 100 different photos, your training data size is 100, even if you review those same photos multiple times.
+       +   - |-
+       +     Since datasets vary by domain, there's no universal unit for measuring size. Text models might count tokens, image models count pictures, and video models count clips. Epoch AI typically uses the smallest unit that triggers a model update during training. For language models that predict the next word, this would be individual tokens.
+       +   - |-
+       +     Training data size directly impacts model performance. Larger datasets enable deeper learning and more nuanced pattern recognition, allowing models to identify subtle distinctions and handle diverse real-world scenarios more effectively.
+       + unit: unique datapoints
+       + display:
+       +   numDecimalPlaces: 0
+       +   zeroDay: '1949-01-01'
+       +   yearIsDay: true
+       + processing_level: major
+       + presentation:
+       +   topic_tags:
+       +     - Artificial Intelligence

+       + New values: 9 / 981 (0.92%)
           days_since_1949                  model  training_dataset_size__total
                     27191          GPT-3.5 Turbo                          <NA>
                     27298 GPT-3.5 Turbo Instruct                          <NA>
                     27836                 o1-pro                          <NA>
                     28110          GPT-5.2 Codex                          <NA>
                     28115           MiniMax-M2.1                          <NA>
-       - Removed values: 4 / 981 (0.41%)
           days_since_1949                               model  training_dataset_size__total
                     26994             GPT-3.5 (davinci-002)\n                           NaN
                     26996                       GPT-3.5 Turbo                           NaN
                     27526                   GPT-4o (May 2024)                           NaN
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                           NaN
        ~ Changed values: 656 / 981 (66.87%)
           days_since_1949                  model  training_dataset_size__total -  training_dataset_size__total +
                     12661                ASE+ACE                             NaN                          500000
                     21735 Denoising Autoencoders                             NaN                         7840000
                     25017       NoisyNet-Dueling                             NaN                       320000000
                     26792                    UL2                             NaN                   1000000000000
                     27877        Qwen3-235B-A22B                             NaN                  36000000000000
= Dataset garden/artificial_intelligence/2025-03-12/epoch_aggregates_affiliation
  ~ Table epoch_aggregates_affiliation (changed metadata)
-     -     date_accessed: '2026-01-26'
+     +     date_accessed: '2026-02-27'
    ~ Column cumulative_count (changed metadata, changed data)
-       -   Describes the sector where the authors of a notable AI system have their primary affiliations. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Describes the sector where the authors of a notable AI system have their primary affiliations. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 4 / 305 (1.31%)
           year organization_categorization  cumulative_count -  cumulative_count +
           2022                    Industry                 284                 283
           2023                    Industry                 351                 352
           2024                    Industry                 432                 433
           2025                    Industry                 515                 520
    ~ Column yearly_count (changed metadata, changed data)
-       -   Describes the sector where the authors of a notable AI system have their primary affiliations. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Describes the sector where the authors of a notable AI system have their primary affiliations. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 3 / 305 (0.98%)
           year organization_categorization  yearly_count -  yearly_count +
           2022                    Industry              51              50
           2023                    Industry              67              69
           2025                    Industry              83              87
= Dataset garden/artificial_intelligence/2025-03-12/epoch_aggregates_domain
  ~ Table epoch_aggregates_domain (changed metadata)
-     -     date_accessed: '2026-01-26'
+     +     date_accessed: '2026-02-27'
    ~ Column cumulative_count (changed metadata, changed data)
-       -   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 6 / 793 (0.76%)
           year     domain  cumulative_count -  cumulative_count +
           2022   Language                 255                 254
           2023   Language                 329                 330
           2024   Language                 406                 407
           2025   Language                 484                 489
           2025 Multimodal                 103                 104
    ~ Column yearly_count (changed metadata, changed data)
-       -   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 5 / 793 (0.63%)
           year      domain  yearly_count -  yearly_count +
           2022    Language              48              47
           2023    Language              74              76
           2025    Language              78              82
           2025 Mathematics               3               4
           2025  Multimodal              34              35
= Dataset garden/artificial_intelligence/2025-03-12/epoch_compute_intensive
  ~ Table epoch_compute_intensive (changed metadata)
-     -     date_accessed: '2026-01-26'
+     +     date_accessed: '2026-02-27'
    ~ Dim days_since_1949
+       + New values: 9 / 485 (1.86%)
                               model  days_since_1949
                       GPT-3.5 Turbo            27191
              GPT-3.5 Turbo Instruct            27298
                              o1-pro            27836
          Qwen3-235B-A22B (Jul 2025)            27964
                              Olmo 3            28082
-       - Removed values: 8 / 485 (1.65%)
                                        model  days_since_1949
                                GPT-3.5 Turbo            26996
                            GPT-4o (May 2024)            27526
                  Gemini 2.5 Flash (Apr 2025)            27865
          Qwen3-235B-A22B-Instruct (Jul 2025)            27964
                  Gemini 2.5 Flash (Sep 2025)            28026
    ~ Dim model
+       + New values: 9 / 485 (1.86%)
           days_since_1949                      model
                     27191              GPT-3.5 Turbo
                     27298     GPT-3.5 Turbo Instruct
                     27836                     o1-pro
                     27964 Qwen3-235B-A22B (Jul 2025)
                     28082                     Olmo 3
-       - Removed values: 8 / 485 (1.65%)
           days_since_1949                               model
                     26996                       GPT-3.5 Turbo
                     27526                   GPT-4o (May 2024)
                     27865         Gemini 2.5 Flash (Apr 2025)
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)
                     28026         Gemini 2.5 Flash (Sep 2025)
    ~ Column domain (new data, changed data)
+       + New values: 9 / 485 (1.86%)
           days_since_1949                      model                          domain
                     27191              GPT-3.5 Turbo                        Language
                     27298     GPT-3.5 Turbo Instruct                        Language
                     27836                     o1-pro Language,Mathematics,Multimodal
                     27964 Qwen3-235B-A22B (Jul 2025)                        Language
                     28082                     Olmo 3                        Language
-       - Removed values: 8 / 485 (1.65%)
           days_since_1949                               model                                  domain
                     26996                       GPT-3.5 Turbo                                Language
                     27526                   GPT-4o (May 2024) Multimodal,Language,Audio,Speech,Vision
                     27865         Gemini 2.5 Flash (Apr 2025) Language,Multimodal,Vision,Speech,Video
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                                Language
                     28026         Gemini 2.5 Flash (Sep 2025) Language,Multimodal,Vision,Speech,Video
    ~ Column parameters (new data, changed data)
+       + New values: 9 / 485 (1.86%)
           days_since_1949                      model   parameters
                     27191              GPT-3.5 Turbo  20000000000
                     27298     GPT-3.5 Turbo Instruct  20000000000
                     27836                     o1-pro         <NA>
                     27964 Qwen3-235B-A22B (Jul 2025) 235000000000
                     28082                     Olmo 3  32000000000
-       - Removed values: 8 / 485 (1.65%)
           days_since_1949                               model   parameters
                     26996                       GPT-3.5 Turbo  20000000000
                     27526                   GPT-4o (May 2024)         <NA>
                     27865         Gemini 2.5 Flash (Apr 2025)         <NA>
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025) 235000000000
                     28026         Gemini 2.5 Flash (Sep 2025)         <NA>
    ~ Column publication_date (new data, changed data)
+       + New values: 9 / 485 (1.86%)
           days_since_1949                      model publication_date
                     27191              GPT-3.5 Turbo       2023-06-13
                     27298     GPT-3.5 Turbo Instruct       2023-09-28
                     27836                     o1-pro       2025-03-19
                     27964 Qwen3-235B-A22B (Jul 2025)       2025-07-25
                     28082                     Olmo 3       2025-11-20
-       - Removed values: 8 / 485 (1.65%)
           days_since_1949                               model publication_date
                     26996                       GPT-3.5 Turbo       2022-11-30
                     27526                   GPT-4o (May 2024)       2024-05-13
                     27865         Gemini 2.5 Flash (Apr 2025)       2025-04-17
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)       2025-07-25
                     28026         Gemini 2.5 Flash (Sep 2025)       2025-09-25
    ~ Column training_computation_petaflop (new data, changed data)
+       + New values: 9 / 485 (1.86%)
           days_since_1949                      model  training_computation_petaflop
                     27191              GPT-3.5 Turbo                           <NA>
                     27298     GPT-3.5 Turbo Instruct                           <NA>
                     27836                     o1-pro                           <NA>
                     27964 Qwen3-235B-A22B (Jul 2025)                   4752000000.0
                     28082                     Olmo 3                   1100000000.0
-       - Removed values: 8 / 485 (1.65%)
           days_since_1949                               model  training_computation_petaflop
                     26996                       GPT-3.5 Turbo                           <NA>
                     27526                   GPT-4o (May 2024)                           <NA>
                     27865         Gemini 2.5 Flash (Apr 2025)                           <NA>
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                   4752000000.0
                     28026         Gemini 2.5 Flash (Sep 2025)                           <NA>
        ~ Changed values: 2 / 485 (0.41%)
           days_since_1949          model  training_computation_petaflop -  training_computation_petaflop +
                     27751    DeepSeek-V3                     3407800064.0                     3300000000.0
                     27828 Hunyuan-TurboS                             <NA>                     5400000000.0
= Dataset garden/artificial_intelligence/2025-03-12/epoch_compute_intensive_countries
  ~ Table epoch_compute_intensive_countries (changed metadata)
-     -     date_accessed: '2026-01-26'
+     +     date_accessed: '2026-02-27'
    ~ Column cumulative_count (changed metadata, changed data)
-       -   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 12 / 154 (7.79%)
           year                    country  cumulative_count -  cumulative_count +
           2023 All large-scale AI systems                 163                 164
           2023              United States                  73                  74
           2024             United Kingdom                  22                   8
           2025             United Kingdom                  32                   8
           2025              United States                 214                 213
    ~ Column yearly_count (changed metadata, changed data)
-       -   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 9 / 154 (5.84%)
           year                    country  yearly_count -  yearly_count +
           2022              United States              23              22
           2023 All large-scale AI systems             119             121
           2023              United States              46              48
           2025             United Kingdom              10               0
           2025              United States              60              58
= Dataset garden/artificial_intelligence/2025-03-12/epoch_compute_intensive_domain
  ~ Table epoch_compute_intensive_domain (changed metadata)
-     -     date_accessed: '2026-01-26'
+     +     date_accessed: '2026-02-27'
    ~ Column cumulative_count (changed metadata, changed data)
-       -   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 13 / 91 (14.29%)
           year                     domain  cumulative_count -  cumulative_count +
           2023 All large-scale AI systems                 163                 164
           2024 All large-scale AI systems                 331                 332
           2025 All large-scale AI systems                 484                 485
           2025                     Speech                  30                  27
           2025                      Video                  64                  61
    ~ Column yearly_count (changed metadata, changed data)
-       -   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2026 data is incomplete and was last updated 26 January 2026.
+       +   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2026 data is incomplete and was last updated 27 February 2026.

        ~ Changed values: 9 / 91 (9.89%)
           year                     domain  yearly_count -  yearly_count +
           2022                   Language              24              23
           2023 All large-scale AI systems             119             121
           2025                Mathematics               1               2
           2025                      Video              35              32
           2025                     Vision              51              48
= Dataset garden/artificial_intelligence/2025-03-12/epoch_regressions
  ~ Table epoch_regressions (changed metadata)
-     -     date_accessed: '2026-01-26'
+     +     date_accessed: '2026-02-27'
    ~ Dim days_since_1949
+       + New values: 10 / 993 (1.01%)
                                model  days_since_1949
                        GPT-3.5 Turbo            27191
               GPT-3.5 Turbo Instruct            27298
          4.2x/year between 2010–2025            27828
                        GPT-5.2 Codex            28110
                         MiniMax-M2.1            28115
-       - Removed values: 5 / 993 (0.50%)
                                        model  days_since_1949
                      GPT-3.5 (davinci-002)\n            26994
                                GPT-3.5 Turbo            26996
                            GPT-4o (May 2024)            27526
                  4.2x/year between 2010–2025            27823
          Qwen3-235B-A22B-Instruct (Jul 2025)            27964
    ~ Dim model
+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model
                     27191               GPT-3.5 Turbo
                     27298      GPT-3.5 Turbo Instruct
                     27828 4.2x/year between 2010–2025
                     28110               GPT-5.2 Codex
                     28115                MiniMax-M2.1
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model
                     26994             GPT-3.5 (davinci-002)\n
                     26996                       GPT-3.5 Turbo
                     27526                   GPT-4o (May 2024)
                     27823         4.2x/year between 2010–2025
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)
    ~ Column domain (new data, changed data)
+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model   domain
                     27191               GPT-3.5 Turbo Language
                     27298      GPT-3.5 Turbo Instruct Language
                     27828 4.2x/year between 2010–2025      NaN
                     28110               GPT-5.2 Codex Language
                     28115                MiniMax-M2.1 Language
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model           domain
                     26994             GPT-3.5 (davinci-002)\n         Language
                     26996                       GPT-3.5 Turbo         Language
                     27526                   GPT-4o (May 2024) Multiple domains
                     27823         4.2x/year between 2010–2025              NaN
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)         Language
    ~ Column organization_categorization (new data, changed data)
+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model organization_categorization
                     27191               GPT-3.5 Turbo                    Industry
                     27298      GPT-3.5 Turbo Instruct                    Industry
                     27828 4.2x/year between 2010–2025                         NaN
                     28110               GPT-5.2 Codex                    Industry
                     28115                MiniMax-M2.1                    Industry
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model organization_categorization
                     26994             GPT-3.5 (davinci-002)\n                    Industry
                     26996                       GPT-3.5 Turbo                    Industry
                     27526                   GPT-4o (May 2024)                    Industry
                     27823         4.2x/year between 2010–2025                         NaN
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                    Industry
    ~ Column parameters (new data, changed data)
+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model      parameters
                     27191               GPT-3.5 Turbo   20000000000.0
                     27298      GPT-3.5 Turbo Instruct   20000000000.0
                     27828 4.2x/year between 2010–2025            <NA>
                     28110               GPT-5.2 Codex            <NA>
                     28115                MiniMax-M2.1  229000003584.0
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model      parameters
                     26994             GPT-3.5 (davinci-002)\n            <NA>
                     26996                       GPT-3.5 Turbo   20000000000.0
                     27526                   GPT-4o (May 2024)            <NA>
                     27823         4.2x/year between 2010–2025            <NA>
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)  235000004608.0
        ~ Changed values: 2 / 993 (0.20%)
           days_since_1949                       model   parameters -   parameters +
                     22280 2.0x/year between 2010–2025    372285.1875   370840.96875
                     27828 2.0x/year between 2010–2025  19731761152.0  19801747456.0
    ~ Column publication_date (new data, changed data)
+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model publication_date
                     27191               GPT-3.5 Turbo       2023-06-13
                     27298      GPT-3.5 Turbo Instruct       2023-09-28
                     27828 4.2x/year between 2010–2025              NaT
                     28110               GPT-5.2 Codex       2025-12-18
                     28115                MiniMax-M2.1       2025-12-23
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model publication_date
                     26994             GPT-3.5 (davinci-002)\n       2022-11-28
                     26996                       GPT-3.5 Turbo       2022-11-30
                     27526                   GPT-4o (May 2024)       2024-05-13
                     27823         4.2x/year between 2010–2025              NaT
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)       2025-07-25
    ~ Column training_computation_petaflop (new data, changed data)
+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model  training_computation_petaflop
                     27191               GPT-3.5 Turbo                           <NA>
                     27298      GPT-3.5 Turbo Instruct                           <NA>
                     27828 4.2x/year between 2010–2025                    606301504.0
                     28110               GPT-5.2 Codex                           <NA>
                     28115                MiniMax-M2.1                           <NA>
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model  training_computation_petaflop
                     26994             GPT-3.5 (davinci-002)\n                   2577999872.0
                     26996                       GPT-3.5 Turbo                           <NA>
                     27526                   GPT-4o (May 2024)                           <NA>
                     27823         4.2x/year between 2010–2025                    597218112.0
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                   4752000000.0
        ~ Changed values: 4 / 993 (0.40%)
           days_since_1949                       model  training_computation_petaflop -  training_computation_petaflop +
                     22412 4.2x/year between 2010–2025                         0.196284                         0.194301
                     27751                 DeepSeek-V3                     3407800064.0                     3300000000.0
                     27828              Hunyuan-TurboS                             <NA>                     5400000000.0
                     28082                      Olmo 3                             <NA>                     1100000000.0
    ~ Column training_dataset_size__gradients (changed metadata, new data, changed data)
+       + {}
-       - title: Training dataset size
-       - description_short: |-
-       -   The number of unique data points used to train the model. Each domain has a specific data point unit; for example, for vision it is images, for language it is words, and for games it is timesteps. This means systems can only be compared directly within the same domain.
-       - description_key:
-       -   - |-
-       -     Training data size measures the volume of unique examples used to train an AI model during its learning phase. It represents the total number of distinct data points the model learns from, counted only once regardless of how many times they're seen during training.
-       -   - |-
-       -     To understand this concept, imagine teaching someone to identify different bird species. Each unique bird photo you show them is one piece of training data. If you show 100 different photos, your training data size is 100, even if you review those same photos multiple times.
-       -   - |-
-       -     Since datasets vary by domain, there's no universal unit for measuring size. Text models might count tokens, image models count pictures, and video models count clips. Epoch AI typically uses the smallest unit that triggers a model update during training. For language models that predict the next word, this would be individual tokens.
-       -   - |-
-       -     Training data size directly impacts model performance. Larger datasets enable deeper learning and more nuanced pattern recognition, allowing models to identify subtle distinctions and handle diverse real-world scenarios more effectively.
-       - unit: unique datapoints
-       - display:
-       -   numDecimalPlaces: 0
-       -   zeroDay: '1949-01-01'
-       -   yearIsDay: true
-       - processing_level: major
-       - presentation:
-       -   topic_tags:
-       -     - Artificial Intelligence

+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model  training_dataset_size__gradients
                     27191               GPT-3.5 Turbo                               NaN
                     27298      GPT-3.5 Turbo Instruct                               NaN
                     27828 4.2x/year between 2010–2025                               NaN
                     28110               GPT-5.2 Codex                               NaN
                     28115                MiniMax-M2.1                               NaN
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model  training_dataset_size__gradients
                     26994             GPT-3.5 (davinci-002)\n                              <NA>
                     26996                       GPT-3.5 Turbo                              <NA>
                     27526                   GPT-4o (May 2024)                              <NA>
                     27823         4.2x/year between 2010–2025                              <NA>
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                  36000000638976.0
        ~ Changed values: 660 / 993 (66.47%)
           days_since_1949                                    model  training_dataset_size__gradients -  training_dataset_size__gradients +
                     11893                          Kohonen network                              4000.0                                 NaN
                     22537                Pooling CNN (Caltech 101)                              3060.0                                 NaN
                     25443 (ensemble): AWD-LSTM-DOC (fin) × 5 (WT2)                           2000000.0                                 NaN
                     26980                                   EVA-01                        7577600000.0                                 NaN
                     27057                        BLIP-2 (Q-Former)                        2321999872.0                                 NaN
    ~ Column training_dataset_size__total (changed metadata, new data, changed data)
-       - {}
+       + title: Training dataset size
+       + description_short: |-
+       +   The number of unique data points used to train the model. Each domain has a specific data point unit; for example, for vision it is images, for language it is words, and for games it is timesteps. This means systems can only be compared directly within the same domain.
+       + description_key:
+       +   - |-
+       +     Training data size measures the volume of unique examples used to train an AI model during its learning phase. It represents the total number of distinct data points the model learns from, counted only once regardless of how many times they're seen during training.
+       +   - |-
+       +     To understand this concept, imagine teaching someone to identify different bird species. Each unique bird photo you show them is one piece of training data. If you show 100 different photos, your training data size is 100, even if you review those same photos multiple times.
+       +   - |-
+       +     Since datasets vary by domain, there's no universal unit for measuring size. Text models might count tokens, image models count pictures, and video models count clips. Epoch AI typically uses the smallest unit that triggers a model update during training. For language models that predict the next word, this would be individual tokens.
+       +   - |-
+       +     Training data size directly impacts model performance. Larger datasets enable deeper learning and more nuanced pattern recognition, allowing models to identify subtle distinctions and handle diverse real-world scenarios more effectively.
+       + unit: unique datapoints
+       + display:
+       +   numDecimalPlaces: 0
+       +   zeroDay: '1949-01-01'
+       +   yearIsDay: true
+       + processing_level: major
+       + presentation:
+       +   topic_tags:
+       +     - Artificial Intelligence

+       + New values: 10 / 993 (1.01%)
           days_since_1949                       model  training_dataset_size__total
                     27191               GPT-3.5 Turbo                          <NA>
                     27298      GPT-3.5 Turbo Instruct                          <NA>
                     27828 4.2x/year between 2010–2025                          <NA>
                     28110               GPT-5.2 Codex                          <NA>
                     28115                MiniMax-M2.1                          <NA>
-       - Removed values: 5 / 993 (0.50%)
           days_since_1949                               model  training_dataset_size__total
                     26994             GPT-3.5 (davinci-002)\n                           NaN
                     26996                       GPT-3.5 Turbo                           NaN
                     27526                   GPT-4o (May 2024)                           NaN
                     27823         4.2x/year between 2010–2025                           NaN
                     27964 Qwen3-235B-A22B-Instruct (Jul 2025)                           NaN
        ~ Changed values: 660 / 993 (66.47%)
           days_since_1949                                    model  training_dataset_size__total -  training_dataset_size__total +
                     11893                          Kohonen network                             NaN                          4000.0
                     22537                Pooling CNN (Caltech 101)                             NaN                          3060.0
                     25443 (ensemble): AWD-LSTM-DOC (fin) × 5 (WT2)                             NaN                       2000000.0
                     26980                                   EVA-01                             NaN                    7577600000.0
                     27057                        BLIP-2 (Q-Former)                             NaN                    2321999872.0
= Dataset garden/artificial_intelligence/2025-10-10/epoch_gpus
  ~ Table epoch_gpus (changed metadata)
-     -     date_accessed: '2025-10-10'
+     +     date_accessed: '2026-02-27'
    ~ Dim days_since_2000
+       + New values: 7 / 116 (6.03%)
                           hardware_name  days_since_2000
          NVIDIA GeForce GTX Titan Black             5162
                            NVIDIA GB200             9177
                     AMD Instinct MI350X             9294
                     AMD Instinct MI355X             9294
                        Amazon Trainium3             9467
-       - Removed values: 4 / 116 (3.45%)
                        hardware_name  days_since_2000
               NVIDIA GTX Titan Black             5162
          NVIDIA GB200 NVL2 (per GPU)             9177
                          NVIDIA B300             9358
               NVIDIA Blackwell Ultra             9365
    ~ Dim hardware_name
+       + New values: 7 / 116 (6.03%)
           days_since_2000                  hardware_name
                      5162 NVIDIA GeForce GTX Titan Black
                      9177                   NVIDIA GB200
                      9294            AMD Instinct MI350X
                      9294            AMD Instinct MI355X
                      9467               Amazon Trainium3
-       - Removed values: 4 / 116 (3.45%)
           days_since_2000               hardware_name
                      5162      NVIDIA GTX Titan Black
                      9177 NVIDIA GB200 NVL2 (per GPU)
                      9358                 NVIDIA B300
                      9365      NVIDIA Blackwell Ultra
    ~ Column comp_performance_per_dollar (new data, changed data)
+       + New values: 7 / 116 (6.03%)
           days_since_2000                  hardware_name  comp_performance_per_dollar
                      5162 NVIDIA GeForce GTX Titan Black                   4263595166
                      9177                   NVIDIA GB200                         <NA>
                      9294            AMD Instinct MI350X                         <NA>
                      9294            AMD Instinct MI355X                         <NA>
                      9467               Amazon Trainium3                         <NA>
-       - Removed values: 4 / 116 (3.45%)
           days_since_2000               hardware_name  comp_performance_per_dollar
                      5162      NVIDIA GTX Titan Black                   4263595166
                      9177 NVIDIA GB200 NVL2 (per GPU)                         <NA>
                      9358                 NVIDIA B300                         <NA>
                      9365      NVIDIA Blackwell Ultra                         <NA>
        ~ Changed values: 1 / 116 (0.86%)
           days_since_2000         hardware_name  comp_performance_per_dollar -  comp_performance_per_dollar +
                      8298 NVIDIA H100 SXM5 80GB                     1403078342                     1857837012
    ~ Column manufacturer (new data, changed data)
+       + New values: 7 / 116 (6.03%)
           days_since_2000                  hardware_name manufacturer
                      5162 NVIDIA GeForce GTX Titan Black       NVIDIA
                      9177                   NVIDIA GB200       NVIDIA
                      9294            AMD Instinct MI350X          AMD
                      9294            AMD Instinct MI355X          AMD
                      9467               Amazon Trainium3   Amazon AWS
-       - Removed values: 4 / 116 (3.45%)
           days_since_2000               hardware_name manufacturer
                      5162      NVIDIA GTX Titan Black       NVIDIA
                      9177 NVIDIA GB200 NVL2 (per GPU)       NVIDIA
                      9358                 NVIDIA B300       NVIDIA
                      9365      NVIDIA Blackwell Ultra       NVIDIA
= Dataset garden/artificial_intelligence/2026-01-30/frontiermath
  ~ Table epoch_benchmark_data (changed metadata)
-     -     date_accessed: '2026-01-30'
+     +     date_accessed: '2026-02-27'
    ~ Dim release_date
+       + New values: 9 / 83 (10.84%)
                   model_version release_date
                       Kimi K2P5   2026-01-27
                   Claude Opus 4   2026-02-05
              Claude Opus 4, 64K   2026-02-05
            Claude Sonnet 4, 16K   2026-02-17
          Gemini 3.1 Pro preview   2026-02-19
    ~ Dim model_version
+       + New values: 9 / 83 (10.84%)
          release_date          model_version
            2026-01-27              Kimi K2P5
            2026-02-05          Claude Opus 4
            2026-02-05     Claude Opus 4, 64K
            2026-02-17   Claude Sonnet 4, 16K
            2026-02-19 Gemini 3.1 Pro preview
    ~ Column mean_score (new data)
+       + New values: 9 / 83 (10.84%)
          release_date          model_version  mean_score
            2026-01-27              Kimi K2P5   27.900002
            2026-02-05          Claude Opus 4   38.275864
            2026-02-05     Claude Opus 4, 64K   39.655174
            2026-02-17   Claude Sonnet 4, 16K   32.400002
            2026-02-19 Gemini 3.1 Pro preview   36.899998


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2026-02-27 14:40:29 UTC
Execution time: 5.86 seconds

@veronikasamborska1994 veronikasamborska1994 force-pushed the auto-epoch branch 2 times, most recently from e15626d to ffe1618 Compare February 24, 2026 09:44
@veronikasamborska1994 veronikasamborska1994 merged commit 756c1c9 into master Feb 27, 2026
4 of 5 checks passed
@veronikasamborska1994 veronikasamborska1994 deleted the auto-epoch branch February 27, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants