Loreer

Loreer is a sophisticated AI assistant designed to provide comprehensive knowledge on various lores, including movies, series, games, and books. Powered by advanced Large Language Models (LLMs), Loreer utilizes a Retrieval-Augmented Generation (RAG) system to deliver precise and relevant information. This RAG system is built upon League of Legends (LoL) lore and can precisely answer any LoL's lore related questions.

Below is a high-level overview of the system.

link to whiteboard

Features

Comprehensive lore knowledge
Powered by state-of-the-art LLM
Efficient data processing and embedding
Local RAG system integration

Repo Structure

Name	Objective	Path
00-parse_xml_dumb.ipynb	parses the XML file and removes unnecessary pages	Link
01-preprocess_data.ipynb	clean the data and split it into chunks	Link
02-embed_chunks.ipynb	Embedd the chunks into a vector embedding for fast retrival	Link
03-pure_LLM.ipynb	running a quantized Llama-3-8b locally using llama.cpp	Link
04-RAG_system.ipynb	Integerate the embedded query and the LLM into a single prompt	Link
web_app.py	A streamlit web app that utilize RAG system backend	Link

Web App

A web app using Streamlit is build as a UI. To run the web app simply run the below command.

streamlit run web_app.py

Models and data

Models:

In RAG system there can be lots of models, for this repo the used models are:

Name	Objective	link
meta-llama-3-8b-instruct.Q4_K_M.gguf	A quantized LLM Llama 3 8b model with GGUF format for llama.cpp usage	HF
Alibaba-NLP/gte-base-en-v1.5	An embedding model that support the context length of up to 8192 ranks high on MTEB	HF

Data:

For data, The LoL Wiki has an offical data dump that was parsed, cleaned, and embedded into a vector database.

All links to download the models and the data are available in the repo.

Reproducibility

The repository's code serves as a strong foundation for the potential extension of this RAG system to encompass different Wiki/fandom domains.

Hardware requirments

I managed to run the quantized llama3 on a laptop with 16GB of ram and a rtx 3060 with 6GB of vram, while it takes about 2-10 seconds to answer a query on the whole RAG pipeline [Retrieval, information extraction and summarization, prompt answering], it is astonishing just how much you can get on mid tier laptop.

This improvement is attributed to the utilization of llama.cpp, which facilitates C++ inference for large language models (LLMs). This approach significantly reduces the overhead, low speed, and high resources consumption associated with Python.

References

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loreer

link to whiteboard

Features

Repo Structure

Web App

Models and data

Models:

Data:

Reproducibility

Hardware requirments

References

License

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
data		data
.gitignore		.gitignore
00-parse_xml_dumb.ipynb		00-parse_xml_dumb.ipynb
01-preprocess_data.ipynb		01-preprocess_data.ipynb
02-embed_chunks.ipynb		02-embed_chunks.ipynb
03-pure_LLM.ipynb		03-pure_LLM.ipynb
04-RAG_system.ipynb		04-RAG_system.ipynb
LICENSE		LICENSE
README.md		README.md
web_app.py		web_app.py

License

M-Ali-ML/Loreer

Folders and files

Latest commit

History

Repository files navigation

Loreer

link to whiteboard

Features

Repo Structure

Web App

Models and data

Models:

Data:

Reproducibility

Hardware requirments

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages