- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.8k
Pull requests: openai/evals
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
      Add tnengoy_citations.dev.v0 (model-graded factuality eval)
      
    
        
          #1603
            opened Oct 12, 2025  by
            TheodorNEngoy
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Add context poisoning quarantine evaluation for testing prompt injection vulnerabilities
      
    
        
          #1597
            opened Aug 16, 2025  by
            jscaldwell55
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Fix AttributeError: Update OpenAI error imports (Closes #1564)
      
    
      
  
        
          #1577
            opened Jan 27, 2025  by
            SaiKrishna-KK
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    6 of 13 tasks
  
      Fix TypeError in add_token_usage_to_result when non-integer usage data is present
      
    
      
  
        
          #1574
            opened Jan 4, 2025  by
            masihmoloodian
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Add support for new models (gpt-4o, o1-preview and o1-mini)
      
    
      
  
        
          #1558
            opened Sep 15, 2024  by
            sakher
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Bugfixing completion stats break with new reasoning tokens release
      
    
      
  
        
          #1555
            opened Sep 13, 2024  by
            lucapericlp
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini
      
    
        
          #1551
            opened Aug 25, 2024  by
            RobinWitch
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    13 tasks done
  
      Fix the is_chat_model function to work with gpt-4o
      
    
        
          #1550
            opened Aug 22, 2024  by
            LoryPack
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    3 tasks done
  
      Added Icelandic QA evaluation data from news texts
      
    
        
          #1548
            opened Aug 20, 2024  by
            thorunna
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    12 of 13 tasks
  
      Added Icelandic QA evaluation data from Wikipedia
      
    
        
          #1547
            opened Aug 20, 2024  by
            thorunna
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    12 of 13 tasks
  
      Updating make-me-say to be compatible with Solvers
      
    
      
  
        
          #1546
            opened Aug 18, 2024  by
            lennart-finke
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    1 task done
  
      Fix Information exposure alert through an exception #1543
      
    
      
  
        
          #1545
            opened Aug 8, 2024  by
            arpitjain099
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    13 tasks done
  
      Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers
      
    
      
  
        
          #1537
            opened Jun 24, 2024  by
            sakher
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Update README: Add Langtrace as an Eval vendor
      
    
        
          #1531
            opened May 21, 2024  by
            karthikscale3
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    5 of 13 tasks
  
      Added Quran Eval & Simple Fact Model-Graded Definition
      
    
      
  
        
          #1511
            opened Apr 1, 2024  by
            sakher
            
        
        
            
    
  
    Loading…
 
        
          
   
        
      
    
      
        
      
      
  
    13 tasks done
  
Previous Next
  
  
  ProTip!
  Type g i on any issue or pull request to go back to the issue listing page.