This project is a GenAI-powered assistant that analyzes SEC 10-K filings and other financial documents to extract key risks, litigation, and compliance issues. It uses Retrieval-Augmented Generation (RAG), tracks LLM prompts/responses with MLflow, and provides an interactive Streamlit UI for analysts.
- Ingests real SEC filings and sample filings
- RAG with FAISS for document retrieval
- LLM (Hugging Face) for risk extraction and summarization
- MLflow for prompt/response tracking
- Streamlit interface for interactive Q&A
- Prompt templates for risk/compliance
graph TD
A[SEC Filings/Docs] -->|"Ingestion"| B[Chunking & Embedding]
B -->|"Vectors"| C[FAISS Vector Store]
C -->|"Retrieval"| D[Relevant Chunks]
D -->|"Prompt"| E["LLM (Hugging Face)"]
E -->|"Answer"| F[Streamlit UI]
E -->|"Tracking"| G[MLflow]
ingestion/- Data ingestion scriptsrag/- RAG and vector store logicllm/- LLM and prompt templatesmlflow_tracking/- MLflow tracking server, logs, and setup scriptsairflow/- Airflow DAGs for pipeline orchestrationapi/app.py- Streamlit app (main entry point)data/sec/- Downloaded and sample SEC filings
- Install dependencies:
pip install -r requirements.txt - Initialize Airflow:
export AIRFLOW_HOME=$(pwd)/airflow airflow db init airflow webserver -p 8080 & airflow scheduler &
- In the Airflow UI (http://localhost:8080), enable and trigger the
genai_risk_pipelineDAG for end-to-end orchestration (ingestion, embedding, LLM, MLflow, Streamlit).
- Install dependencies:
pip install -r requirements.txt - Start MLflow tracking server:
bash mlflow_tracking/mlflow_setup.sh # or manually: mlflow server --backend-store-uri sqlite:///mlflow_tracking/mlflow.db --default-artifact-root ./mlflow_tracking --host 0.0.0.0 --port 5000 - Ingest SEC filings:
python ingestion/sec_ingest.py - Embed and store:
python rag/embed_and_store.py - Run the Streamlit UI:
streamlit run api/app.py
- Use the UI to search for litigation, risk, and compliance issues in filings
- Build the image:
docker build -t genai-risk-analyst . - Run the container:
docker run -p 8501:8501 -p 5000:5000 genai-risk-analyst
- Access Streamlit at http://localhost:8501 and MLflow at http://localhost:5000
- Start all services:
docker-compose up --build
- Access the UIs:
- Streamlit: http://localhost:8501
- MLflow: http://localhost:5000
- Airflow: http://localhost:8080
- In Airflow, enable and trigger the
genai_risk_pipelineDAG for end-to-end orchestration.
- List potential litigation risks in a 10-K
- Compare debt covenants across quarters
- Summarize ESG compliance sections
- Detect unusual spikes in transaction logs