Super Analyst Is an advanced text-to-SQL application using the powerful combination of the Vanna.ai framework, OpenAI's GPT-4 language model, and ChromaDB vector storage. The project was developed to allow the generation of complex SQL queries from natural language questions, providing an intuitive and efficient way to interact with databases.
Super Analyst transforms natural language queries into SQL efficiently and accurately, using:
- Vanna.ai: Framework for developing NLP (Natural Language Processing) applications.
- GPT-4 da OpenAI: LLM (Large Scale Language Model) to understand and generate SQL queries.
- ChromaDB: Vector storage to manage and retrieve query embeddings.
- Python: Programming language used for the application.
- Webapp da Vanna.ai: Web interface for interacting with the system.
The model has been trained with a vast amount of SQL queries and questions to enhance its ability to understand and generate SQL from complex questions.
The project is composed of the following main components:
- Frontend: Developed with the Vanna.ai webapp, it provides a friendly interface for users to interact with the system.
- Backend:
- API: Responsible for receiving questions and returning generated SQL queries.
- Vanna.ai Framework: Facilitates communication between the interface and the LLM model.
- GPT-4: Language model that interprets questions and generates the corresponding SQL query.
- ChromaDB: Used to store and retrieve embedding vectors for fast and accurate queries.
- Python 3.12.4
- Dependencies listed in the file
requirements.txt - OpenAI GPT-4 API Key
- ChromaDB Configuration
Clone the repository:
git clone https://github.com/seuusuario/super-analyst.git
cd super-analystIn the root environments folder, create 3 ".env" files for the application settings with the following variables below:
- api-config.env
API_KEY="your-openAI-apiKey"- db-config.env
DB_HOST="your-MySQL-Database"
DB_NAME="your-database-name"
DB_USER="your-database-user"
DB_PASS="your-database-password"
DB_PORT=your-database-port- user-config.env
EMAIL="yourEmail@YourDomain.com"
PASSWORD="YourPassword"Install the dependencies:
pip install -r requirements.txtTo start the application, use the command:
python main.pyIf you prefer to run through Docker, just enter the repository root folder and run the command:
docker run -d -p 8084:8084 --name super-analyst superanalystAccess the webapp through the browser at the URL http://localhost:8084, log in with the test username and password that you configured in user-config.env and test the application.
ADR 001: Choice of Language Model
- Decision: Use GPT-4 from OpenAI.
- Justification: GPT-4 offers superior text comprehension and generation capabilities compared to previous versions and other available models.
ADR 002: Vector Storage
- Decision: Adopt ChromaDB.
- Justification: ChromaDB was chosen for its performance in embedding storage and retrieval operations.
ADR 003: NLP Framework
- Decision: Choose Vanna.ai.
- Justification: Vanna.ai provides a robust framework for integrating language models and developing NLP applications.
Contributions are welcome! To collaborate with the project, follow these steps:
- Fork the repository.
- Create a branch for your feature (
git checkout -b feature/novafeature). - Commit your changes (
git commit -am 'Add new feature'). - Push to the repository (
git push origin feature/novafeature). - Create a Pull Request.
For questions or suggestions, please contact rpdesenvolvimento92@gmail.com.
Feel free to adjust or expand the sections as needed to fit your specific needs!