awesome-open-source-llms/details/vicuna.md at master · ever-works/awesome-open-source-llms

Overview

Vicuna is an open-source chatbot model created by fine-tuning LLaMA on user-shared conversations from ShareGPT. Developed by UC Berkeley, CMU, Stanford, and UC San Diego researchers (LMSYS Org), Vicuna demonstrated that high-quality instruction-following models could be created through fine-tuning open base models.

Architecture

Base Models: LLaMA-7B and LLaMA-13B
Fine-tuning Data: 70,000 user-shared ChatGPT conversations
Data Source: ShareGPT repository
Model Type: Instruction-tuned conversational model

Key Features

Achieves 90%+ of ChatGPT quality (GPT-4 evaluation)
Strong conversational abilities
Effective instruction following
Cost-effective to create and deploy
Open-source alternative to proprietary chatbots
Multiple size options (7B, 13B)

Performance Highlights

Quality Assessment

90%+ ChatGPT Quality: According to GPT-4 as judge
Outperforms LLaMA and Stanford Alpaca in 90%+ of cases
Competitive with GPT-3.5-level performance
Strong multi-turn dialogue capabilities

Benchmark Comparisons

Superior to base LLaMA on conversational tasks
Better than Alpaca on complex instructions
Effective across diverse conversation types

Training Methodology

Data Collection

70,000 conversations from ShareGPT
Real user interactions with ChatGPT
Diverse topics and conversation styles
High-quality instruction-response pairs

Fine-tuning Process

Supervised fine-tuning on conversational data
Optimization for multi-turn dialogue
Training to follow instructions effectively
Relatively low-cost fine-tuning process

Model Variants

Vicuna-7B

Based on LLaMA-7B
More efficient, suitable for resource-constrained scenarios
Good performance for size

Vicuna-13B

Based on LLaMA-13B
Better quality and reasoning
More capable in complex conversations
Flagship variant

Historical Significance

Early Success in Open-Source Chatbots

Demonstrated viability of instruction tuning
Showed high-quality data matters more than quantity
Inspired wave of open-source chat models
Proved open models could approach proprietary quality

Influence on the Field

Popularized ShareGPT as training data source
Validated fine-tuning approach for chat models
Contributed to development of evaluation methodologies
Inspired subsequent models (many used similar approaches)

Evaluation Methodology

GPT-4 as a Judge

Used GPT-4 to evaluate response quality
Comparison with ChatGPT and other models
Assessment of instruction following
Analysis of multi-turn conversations

Deployment Options

Self-hosting on consumer or enterprise GPUs
Relatively low resource requirements (7B variant)
Compatible with standard frameworks
Integration with chatbot interfaces
Available through Hugging Face

Use Cases

Conversational AI applications
Research on instruction-tuned models
Educational chatbot applications
Cost-effective alternatives to proprietary APIs
Foundation for further fine-tuning
Academic research and development

LMSYS Organization

Developed by researchers from:

UC Berkeley
Carnegie Mellon University (CMU)
Stanford University
UC San Diego

LMSYS Org continues to advance open-source AI through projects like Chatbot Arena.

Legacy and Modern Context

While newer models have surpassed Vicuna:

Remains historically significant
Demonstrated successful approach still used today
Showed path forward for open-source chatbots
Proved quality could be achieved with modest resources
Inspired ecosystem of instruction-tuned models

Relationship to Other Models

Based on LLaMA

Leveraged Meta's strong base model
Demonstrated value of base model + fine-tuning

Compared with Alpaca

Both built on LLaMA
Vicuna used conversational data (ShareGPT)
Alpaca used synthetic data (Self-Instruct)
Vicuna showed superior conversational abilities

Influenced Subsequent Models

Many models adopted ShareGPT-style training
Validation of GPT-4-as-judge methodology
Template for open-source chat model development

Licensing

Inherits license restrictions from base LLaMA model:

Research and educational use
License terms evolved with LLaMA versions
Check current licensing for specific variants

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Architecture

Key Features

Performance Highlights

Quality Assessment

Benchmark Comparisons

Training Methodology

Data Collection

Fine-tuning Process

Model Variants

Vicuna-7B

Vicuna-13B

Historical Significance

Early Success in Open-Source Chatbots

Influence on the Field

Evaluation Methodology

GPT-4 as a Judge

Deployment Options

Use Cases

LMSYS Organization

Legacy and Modern Context

Relationship to Other Models

Based on LLaMA

Compared with Alpaca

Influenced Subsequent Models

Licensing

FilesExpand file tree

vicuna.md

Latest commit

History

vicuna.md

File metadata and controls

Overview

Architecture

Key Features

Performance Highlights

Quality Assessment

Benchmark Comparisons

Training Methodology

Data Collection

Fine-tuning Process

Model Variants

Vicuna-7B

Vicuna-13B

Historical Significance

Early Success in Open-Source Chatbots

Influence on the Field

Evaluation Methodology

GPT-4 as a Judge

Deployment Options

Use Cases

LMSYS Organization

Legacy and Modern Context

Relationship to Other Models

Based on LLaMA

Compared with Alpaca

Influenced Subsequent Models

Licensing