NLP From Scratch

This is the project repo for 11-711 ANLP Fall 24 Project 2. The specs can be found in the file Info.md.

How to use

First, it is recommended that you have a machine with GPU with memory > 20GB, CUDA support, and at least 50GB available disk memory. You should also need to have a LangChain API key in hand and a Hugging Face token that can access Llama-3.1 and Llama-3.2. Then, install all the required packages (make sure you are in the root directory of this repo).

pip install -r requirements

To run the RAG pipeline, in the root directory, execute

python pipeline/rag_pipeline_new.py \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--dtype float16 \
--embedding_model_name sentence-transformers/all-MiniLM-L6-v2 \
--embedding_dim 384 \ 
--splitter_type recursive \
--chunk_size 1000 \
--chunk_overlap 200 \
--text_files_path data/crawled/crawled_text_data \
--top_k_search 3 \
--retriever_type FAISS \
--rerank_model_name ms-marco-MiniLM-L-12-v2 \
--hypo False \
--output_file output/baseline_rag.csv

Please see .pipeline/rag_pipeline_new.py for a full list of available argument options.

To run the evaluation, in the root directory, execute

python evaluation/evaluate.py --combined_dir output/baseline_rag.csv --output_dir results/baseline_rag.json

Members

The team members are (ordered by name, last name, then first name):

Haojun Liu (haojunli)

Qingyang Liu (qliu3)

Chenglin Zhang (chengliz)

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
annotation		annotation
crawler		crawler
data		data
debug		debug
evaluation		evaluation
output		output
pipeline		pipeline
report		report
results		results
submission/haojunli		submission/haojunli
.gitignore		.gitignore
Info.md		Info.md
README.md		README.md
Resource.md		Resource.md
Tuning.md		Tuning.md
requirements.txt		requirements.txt
results.txt		results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP From Scratch

How to use

Members

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP From Scratch

How to use

Members

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages