Skip to content

Commit 5cd0fbc

Browse files
authored
Merge branch 'main' into main
2 parents b332f15 + d77b7a1 commit 5cd0fbc

File tree

137 files changed

+8878
-7500
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+8878
-7500
lines changed

.cursor/mcp.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"mcpServers": {
3+
"oso": {
4+
"url": "http://127.0.0.1:8000/sse"
5+
}
6+
}
7+
}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: true
5+
---
6+
**Strict Cursor Rules for Querying the Pyoso Data Lake**
7+
8+
1. **Input → `gather_all_entities()`**
9+
10+
* Pass the *unaltered* natural-language query (NLQ) directly into `gather_all_entities()` **every time**.
11+
12+
2. **Entities → `query_text2sql_agent()`**
13+
14+
* Feed **only** the exact output from `gather_all_entities()` into `query_text2sql_agent()`.
15+
* Do **not** edit, reorder, or add to the entity list.
16+
17+
3. **SQL String → Further Work**
18+
19+
* The response from `query_text2sql_agent()` is your finalized, properly formatted SQL query.
20+
* Use this SQL string *as-is* for any subsequent execution or analysis.
21+
22+
---
23+
24+
**Workflow Summary:**
25+
NL query **→** (`gather_all_entities()` + `query_text2sql_agent()`) **→** correct SQL out
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: true
5+
---
6+
#### 0. One-time setup
7+
8+
Always start by defined the pyoso client with the OSO_API_KEY
9+
10+
```python
11+
from pyoso import Client
12+
import os
13+
client = Client(os.getenv("OSO_API_KEY")) # never hard-code keys
14+
```
15+
16+
---
17+
18+
#### 1. Generate SQL
19+
20+
Call the `generate_sql` MCP tool and pass in the user's NL query. This will return the proper SQL to use going forward.
21+
22+
Don't ever call the MCP tool in python code, just use it yourself to gather the proper SQL query. Only the end result SQl query should be written into the code.
23+
24+
```python
25+
sql_query = 'output of generate_sql MCP tool'
26+
```
27+
28+
---
29+
30+
#### 2. Run the SQL query across the DB
31+
32+
Pass in the sql_query gathered above into thepyoso client defined above with .to_pandas(), which should return a dataframe result of the query across pyoso's data lake.
33+
34+
```python
35+
df = client.to_pandas(sql_query)
36+
```
37+
38+
---
39+
40+
#### 3. (Optional) Analysis
41+
42+
Now, based on the user's request, you are free to continue working with the final dataframe and run any additional analysis they might want done on it.
43+
44+
---
45+

.dockerignore

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,10 +67,6 @@ Dockerfile
6767
*.pyc
6868
**/__pycache__
6969

70-
# dbt
71-
**/target/
72-
**/dbt_packages/
73-
7470
# Cloudquery
7571
.cq/
7672
**/.cq

.env.example

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,16 @@ DAGSTER__CLICKHOUSE__HOST=
3535
DAGSTER__CLICKHOUSE__USER=
3636
DAGSTER__CLICKHOUSE__PASSWORD=
3737

38+
# MCP + Text2SQL Agent
39+
AGENT_VECTOR_STORE__TYPE=local
40+
AGENT_VECTOR_STORE__DIR=/path/to/your/vector/storage/directory
41+
42+
AGENT_LLM__TYPE=google_genai
43+
AGENT_LLM__GOOGLE_API_KEY=your_google_genai_api_key_here
44+
45+
AGENT_OSO_API_KEY=your_oso_api_key_here
46+
AGENT_ARIZE_PHOENIX_USE_CLOUD=0
47+
3848
###################
3949
# DEPRECATED
4050
###################
@@ -52,5 +62,28 @@ CLOUDSQL_INSTANCE_ID=
5262
CLOUDSQL_DB_NAME=
5363
CLOUDSQL_DB_PASSWORD=
5464
CLOUDSQL_DB_USER=
65+
5566
# Solves the issue of not being able to import the metrics service module when running dagster locally
5667
PYTHONPATH=warehouse/metrics-service
68+
69+
# Setup agent
70+
AGENT_OSO_API_KEY=
71+
# If using a local llm, this will attempt to use ollama with llama3.2:3b
72+
# See warehouse/oso_agent/util/config.py for more options
73+
AGENT_LLM__TYPE=local
74+
75+
# To use google's gen ai llm uncomment and add your google api key from google ai studio
76+
#AGENT_LLM__TYPE=google_genai
77+
#AGENT_LLM__GOOGLE_API_KEY=
78+
79+
AGENT_VECTOR_STORE__TYPE=local
80+
# If using the local vector store, set this to a local directory for storing the
81+
# vector store on disk
82+
AGENT_VECTOR_STORE__DIR=
83+
84+
# For google vertex ai vector search, set the following options
85+
#AGENT_VECTOR_STORE__TYPE=google_genai
86+
#AGENT_VECTOR_STORE__GCS_BUCKET=
87+
#AGENT_VECTOR_STORE__PROJECT_ID=
88+
#AGENT_VECTOR_STORE__INDEX_ID=
89+
#AGENT_VECTOR_STORE__ENDPOINT_ID=

.github/dependabot.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# To get started with Dependabot version updates, you'll need to specify which
2+
# package ecosystems to update and where the package manifests are located.
3+
# Please see the documentation for all configuration options:
4+
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
5+
6+
version: 2
7+
updates:
8+
- package-ecosystem: "npm" # See documentation for possible values
9+
directories:
10+
- "/"
11+
- "**/*"
12+
schedule:
13+
interval: "weekly"
14+
- package-ecosystem: "pip" # See documentation for possible values
15+
directories:
16+
- "/"
17+
- "**/*"
18+
schedule:
19+
interval: "weekly"

.github/workflows/ci-default.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ env:
1313
NEXT_PUBLIC_DB_GRAPHQL_URL: ${{ vars.NEXT_PUBLIC_DB_GRAPHQL_URL }}
1414
HASURA_URL: ${{ vars.HASURA_URL }}
1515
OSO_API_KEY: "test"
16+
OSO_AGENT_URL: "http://localhost:8888/"
1617
NEXT_PUBLIC_ALGOLIA_APPLICATION_ID: "test"
1718
NEXT_PUBLIC_ALGOLIA_API_KEY: "test"
1819
NEXT_PUBLIC_ALGOLIA_INDEX: "test"

.github/workflows/test-deploy-owners.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,10 @@ jobs:
4545
cd ops/external-prs &&
4646
pnpm tools common is-repo-admin ${{ github.event.pull_request.user.login }} --output-file $GITHUB_OUTPUT
4747
48-
- name: Auto-approve PR if conditions are met
49-
run: |
50-
cd ops/external-prs &&
51-
pnpm tools common attempt-auto-approve ${{ github.event.pull_request.number }}
48+
# - name: Auto-approve PR if conditions are met
49+
# run: |
50+
# cd ops/external-prs &&
51+
# pnpm tools common attempt-auto-approve ${{ github.event.pull_request.number }}
5252

5353
# - name: Login to google
5454
# uses: "google-github-actions/auth@v2"

.gitignore

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,6 @@ node_modules
5757
# cloudquery
5858
.cq/
5959

60-
# poetry.lock files outside of the root. This is to maintain the monorepo with
61-
# poetry.
62-
*/**/poetry.lock
63-
target/
64-
dbt_packages/
6560

6661
# Python
6762
*.pyc

.lintstagedrc

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,5 @@
1212
"uv run ruff check --fix --force-exclude",
1313
"uv run isort",
1414
"pnpm pyright"
15-
],
16-
"warehouse/dbt/**/*.sql": [
17-
"uv run sqlfluff fix -f",
18-
"uv run sqlfluff lint"
1915
]
2016
}

0 commit comments

Comments
 (0)