Skip to content

arthur0211/Evals-Domain-Expert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Evals for Domain Experts

LLM Evals Logo License: MIT HTML5 JavaScript

A standalone, privacy-first LLM evaluation tool for domain experts

Zero installation • No dependencies • Complete privacy • Professional UX


Captura2231r

Captur3212ar

Ca4554pturar

🎯 Overview

LLM Evals for Domain Experts is a sophisticated evaluation interface designed specifically for domain experts who need to assess Large Language Model outputs efficiently and securely. Built as a single, self-contained HTML file, it requires no installation, setup, or external dependencies.

✨ Key Features

  • 🔒 Complete Privacy: All data stays on your local machine - no uploads, no tracking
  • 📱 Zero Installation: Just download and open in any modern browser
  • 🌍 Multilingual Support: English, Portuguese, and Spanish interfaces
  • ⚡ Professional UX: Keyboard shortcuts, batch operations, and streamlined workflows
  • 📊 Rich Analytics: Built-in statistics, approval rates, and performance metrics
  • 💾 Smart Export: CSV/JSON export with customizable separators and metadata
  • 🎨 Modern UI: Dark/light themes with responsive design for all screen sizes

🚀 Quick Start

Download and Run

  1. Download llm-evals-standalone.html
  2. Double-click to open in your browser
  3. Import your CSV file with input and llm_output columns
  4. Start evaluating immediately!

No Installation Required ✅

  • No Python, Node.js, or package managers needed
  • No server setup or configuration
  • No internet connection required after download
  • Works on Windows, macOS, Linux, and mobile devices

📋 How It Works

1. Import Your Data

  • Upload any CSV file with input and llm_output columns
  • Supports files up to 2MB with up to 500 items
  • Auto-detects previously exported evaluations for continuity

2. Evaluate Efficiently

  • Approve/Reject: Quick ✅/❌ decisions with visual feedback
  • Label Classification:
    • Error Labels: Hallucination, Factually Incorrect, Ignored Instructions, etc.
    • Quality Labels: Gold Standard, Accurate & Relevant, Creative & Innovative, etc.
  • Rich Annotations: Add ideal responses and detailed comments
  • Keyboard Shortcuts: Navigate, evaluate, and label without touching the mouse

3. Track Progress

  • Real-time statistics and approval rates
  • Session timing and performance metrics
  • Visual progress indicators
  • Auto-save functionality

4. Export Results

  • Multiple Formats: CSV (recommended) or JSON
  • Custom Separators: Comma, semicolon, pipe, triple-pipe, tab, or custom
  • Rich Metadata: Timestamps, evaluator info, and session statistics
  • Smart Naming: Auto-generated filenames with evaluation summary

🎮 Advanced Features

⌨️ Keyboard Shortcuts

Shortcut Action
/ Navigate between items
/ Approve / Reject
1-6 Quick label selection
Ctrl+E Export evaluations
Ctrl+N New session
G+[number] Go to specific item
Ctrl+F Search in data
Ctrl+Z Undo last action

🔍 Smart Navigation

  • Jump to Item: Go directly to any item number
  • Search Functionality: Find specific content across inputs/outputs
  • Batch Operations: Approve/reject remaining items in bulk
  • Undo System: Reverse recent evaluation decisions

📊 Analytics Dashboard

  • Completion Rates: Track evaluation progress
  • Time Metrics: Average time per item and total session time
  • Quality Distribution: Breakdown of labels and decisions
  • Gold Standard Tracking: Identify exceptional responses

🌐 Internationalization

Full interface localization in three languages:

  • 🇺🇸 English: Default interface language
  • 🇧🇷 Portuguese: Complete Brazilian Portuguese translation
  • 🇪🇸 Spanish: Full Spanish interface support

Language switching is instant and preserves all evaluation progress.


🔒 Privacy & Security

Complete Data Privacy

  • No Server Communication: Everything runs locally in your browser
  • No Data Upload: Your evaluations never leave your machine
  • No Tracking: Zero analytics, cookies, or user tracking
  • Offline Capable: Works without internet connection

Security Benefits

  • Air-Gapped Operation: Perfect for sensitive or confidential data
  • GDPR Compliant: No personal data collection or processing
  • Enterprise Safe: No external dependencies or security risks
  • Audit Friendly: Single file makes security review trivial

💼 Perfect for Domain Experts

Why Domain Experts Love This Tool

🧠 Cognitive Scientists: Evaluate reasoning and logical consistency
📚 Content Specialists: Assess accuracy and domain knowledge
🎯 Product Managers: Review user experience and feature requests
🔬 Researchers: Conduct systematic LLM capability studies
👩‍💼 Business Analysts: Evaluate commercial viability of AI responses
🎓 Educators: Grade and assess AI-generated educational content

Professional Workflows

  • Blind Evaluation: Evaluate without bias using randomized presentation
  • Inter-Rater Reliability: Multiple experts can evaluate the same dataset
  • Longitudinal Studies: Track model improvements over time
  • A/B Testing: Compare different models or prompts
  • Quality Assurance: Systematic review of production AI outputs

📈 Use Cases

🏢 Enterprise

  • Model Comparison: Evaluate different LLMs for specific use cases
  • Quality Control: Systematic review of AI-generated content
  • Performance Monitoring: Track model degradation or improvement
  • Compliance Checking: Ensure AI outputs meet regulatory standards

🔬 Research

  • Academic Studies: Systematic evaluation for research papers
  • Benchmark Creation: Build custom evaluation datasets
  • Capability Assessment: Test specific model capabilities
  • Error Analysis: Identify patterns in model failures

🎯 Product Development

  • User Acceptance Testing: Evaluate AI features with domain experts
  • Feature Validation: Test new AI capabilities before release
  • Customer Feedback: Structure evaluation of user-reported issues
  • Competitive Analysis: Compare your AI against competitors

🛠️ Technical Specifications

Browser Compatibility

  • Chrome/Edge: Full feature support including custom scrollbars
  • Firefox: Complete functionality with standard scrollbars
  • Safari: Full compatibility on macOS and iOS
  • Mobile Browsers: Responsive design for tablet and phone evaluation

Performance

  • File Size: ~4MB standalone file (includes all dependencies)
  • Memory Usage: Optimized for large datasets (500+ items)
  • Load Time: Instant startup, no network requests
  • Responsiveness: Smooth interactions even with large datasets

Data Format Requirements

input,llm_output
"Your question or prompt here","AI model response here"
"Another input","Another response"

Optional columns (automatically detected):

  • evaluation: Previous evaluation status
  • labels: Previous label classifications
  • ideal_output: Previous ideal response annotations
  • comments: Previous evaluation comments

🔧 Customization

CSV Export Options

  • Separators: Comma, semicolon, pipe, triple-pipe (|||), tab, or custom
  • Metadata: Optional timestamps and session statistics
  • Filtering: Export only evaluated items or include all
  • Naming: Automatic filename generation with evaluation summary

Interface Customization

  • Themes: Professional light and dark modes
  • Font Scaling: Adjustable text size (80% to 150%)
  • Language: Switch between English, Portuguese, and Spanish
  • Layout: Responsive design adapts to screen size

📊 Export Formats

CSV Format (Recommended)

input,llm_output,evaluation,labels,ideal_output,comments,evaluation_timestamp
"Input text","Output text","approved","accurate-relevant,well-structured","Ideal response","Great answer","2024-12-30T10:30:00Z"

JSON Format (Advanced)

[
  {
    "input": "Input text",
    "llm_output": "Output text", 
    "evaluation": "approved",
    "labels": "accurate-relevant,well-structured",
    "ideal_output": "Ideal response",
    "comments": "Great answer",
    "evaluation_timestamp": "2024-12-30T10:30:00Z"
  }
]

🤝 Contributing

We welcome contributions to make this tool even better for domain experts!

Areas for Contribution

  • New Languages: Add translations for additional languages
  • Label Categories: Suggest domain-specific evaluation labels
  • Export Formats: Add support for specialized export formats
  • UI Improvements: Enhance user experience and accessibility
  • Documentation: Improve guides and examples

How to Contribute

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes to llm-evals-standalone.html
  4. Test thoroughly across different browsers
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License Summary

  • Commercial Use: Use in commercial projects
  • Modification: Modify and adapt to your needs
  • Distribution: Share with colleagues and teams
  • Private Use: Use internally within organizations
  • Liability: No warranty or liability from authors

🆘 Support

Getting Help

  • 📖 Documentation: This README covers most use cases
  • 🐛 Bug Reports: Open an issue with detailed reproduction steps
  • 💡 Feature Requests: Share your ideas for improvements
  • ❓ Questions: Ask in GitHub Discussions

Common Solutions

  • File Won't Load: Ensure your CSV has input and llm_output columns
  • Slow Performance: Try smaller batches (under 500 items) for optimal speed
  • Export Issues: Check that you have evaluation data before exporting
  • Browser Issues: Use Chrome/Edge for the best experience

🙏 Acknowledgments

Built for domain experts who need:

  • Privacy: Complete data control and security
  • Simplicity: No technical barriers to evaluation
  • Efficiency: Professional workflows and keyboard shortcuts
  • Flexibility: Customizable labels and export options
  • Reliability: Offline operation and data integrity

Made with ❤️ for the AI evaluation community


⭐ Star this repository if it helps your evaluation workflows!

Report BugRequest FeatureDocumentation

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages