You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -13,25 +13,30 @@ Open Source Observer is a free analytics suite that helps funders measure the im
13
13
14
14
-`/apps`: The OSO apps
15
15
-`/docs`: documentation (Docusaurus)
16
-
-[on Vercel](https://docs.opensource.observer/docs) - Production build
16
+
-[on Cloudflare](https://docs.opensource.observer/) - Production build
17
17
-`/frontend`: frontend application (Next.js)
18
18
-[on Vercel](https://www.opensource.observer) - Production build
19
-
-`/hasura`: API service (Hasura) - Production
20
-
-`/cli`: The `oso` cli for setting up and using various tools in this repository
19
+
-`/hasura-clickhouse`: API service (Hasura+Clickhouse) - Production
20
+
-`/hasura-trino`: API service (Hasura+Trino) - Production
21
21
-`/docker`: Docker files
22
22
-`/lib`: Common libraries
23
23
-`/oss-artifact-validators`: Simple library to validate different properties of an "artifact"
24
-
-`/oso_common` - Python module for common tools across all python tools
24
+
-`/utils` - Common TypeScript utilities used in the monorepo
25
+
-`/ops`: Our ops related code
26
+
-`/external-prs`: GitHub app for validating pull requests
27
+
-`/help-charts`: Helm charts for Kubernetes
28
+
-`/k8s-*`: Kubernetes configuration
29
+
-`/kind`: Local Kind configuration
30
+
-`/opsscripts`: Python module of various ops related tools
31
+
-`/tf-modules`: Terraform modules
25
32
-`/warehouse`: All code specific to the data warehouse
26
33
-`/dbt`: dbt configuration
34
+
-`/docker`: Docker configuration
35
+
-`/metrics_tools`: Python utilities for managing data
27
36
-`/oso_dagster`: Dagster configuration for orchestrating software-defined assets
28
37
-`/oso_sqlmesh`: sqlmesh configuration
38
+
-`/pyoso`: Python package for `pyoso`
29
39
- Also contains other tools to manage warehouse pipelines
30
-
-`/ops`: Our ops related code
31
-
-`/external-prs`: GitHub app for validating pull requests
32
-
-`/k8s-*`: Kubernetes configuration
33
-
-`/tf-modules`: Terraform modules
34
-
-`/opstools`: Python module of various ops related tools
35
40
36
41
## Quickstart
37
42
@@ -44,18 +49,10 @@ Before you begin you'll need the following on your system:
44
49
- Python >=3.11 (see [here](https://www.python.org/downloads/))
45
50
- Python uv >= 0.6 (see [here](https://pypi.org/project/uv/))
46
51
- git (see [here](https://github.com/git-guides/install-git))
47
-
- BigQuery access (see [here](https://docs.opensource.observer/docs/get-started/#login-to-bigquery) if you don't have it setup already)
48
-
- gcloud (see [here](https://cloud.google.com/sdk/docs/install))
49
52
50
53
### Setup dependencies
51
54
52
-
First, authenticate with `gcloud`:
53
-
54
-
```bash
55
-
gcloud auth application-default login
56
-
```
57
-
58
-
Then install Node.js dependencies
55
+
To install Node.js dependencies
59
56
60
57
```
61
58
pnpm install
@@ -67,242 +64,6 @@ Also install the python dependencies
67
64
uv sync
68
65
```
69
66
70
-
You will also need to setup `dbt` to connect to Google BigQuery for running the data pipeline. The following wizard will copy a small playground dataset to your personal Google account and setup `dbt` for you.
71
-
72
-
```bash
73
-
uv run oso lets_go
74
-
```
75
-
76
-
:::tip
77
-
The script is idempotent, so you can safely run it again
78
-
if you encounter any issues.
79
-
:::
80
-
81
-
## Frontend Development
82
-
83
-
### Setup and build the frontend
84
-
85
-
First, make sure the environment variables are set for `./apps/frontend`.
86
-
Take a look at `./apps/frontend/.env.local.example` for the complete list.
87
-
88
-
- You can either set these yourself (e.g. in CI/CD)
89
-
- or copy the file to `.env.local` and populate it.
90
-
91
-
Then do a turbo build of all apps, run the following:
92
-
93
-
```bash
94
-
pnpm install
95
-
pnpm build
96
-
```
97
-
98
-
The resulting static site can be found in `./build/`.
99
-
100
-
### Running the prod server
101
-
102
-
If you've already run the build, you can use `pnpm serve` to serve the built files
103
-
104
-
### Running the frontend dev server
105
-
106
-
To run a dev server that watches for changes across code and Plasmic, run:
107
-
108
-
```bash
109
-
pnpm dev:frontend
110
-
```
111
-
112
-
## dbt Development
113
-
114
-
Our datasets are public! If you'd like to use them directly as opposed to adding to our
Once installation has completed you can enter the virtual environment.
120
-
121
-
```bash
122
-
$ source .venv/bin/activate
123
-
```
124
-
125
-
From here you should have dbt on your path.
126
-
127
-
```bash
128
-
$ which dbt
129
-
```
130
-
131
-
_This should return something like `opensource-observer/oso/.venv/bin/dbt`_
132
-
133
-
### Authenticating to bigquery
134
-
135
-
If you have write access to the dataset then you can connect to it by setting
136
-
the `opensource_observer` profile in `dbt`. Inside `~/.dbt/profiles.yml` (create
137
-
it if it isn't there), add the following:
138
-
139
-
```yaml
140
-
opensource_observer:
141
-
outputs:
142
-
production:
143
-
type: bigquery
144
-
dataset: oso
145
-
job_execution_time_seconds: 300
146
-
job_retries: 1
147
-
location: US
148
-
method: oauth
149
-
project: opensource-observer
150
-
threads: 32
151
-
playground:
152
-
type: bigquery
153
-
dataset: oso_playground
154
-
job_execution_time_seconds: 300
155
-
job_retries: 1
156
-
location: US
157
-
method: oauth
158
-
project: opensource-observer
159
-
threads: 32
160
-
# By default we target the playground. it's less costly and also safer to write
161
-
# there while developing
162
-
target: playground
163
-
```
164
-
165
-
### Setting up VS Code
166
-
167
-
The [Power User for dbt core](https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user) extension is pretty helpful.
168
-
169
-
You'll need the path to your virtual environment, which you can get by running
170
-
171
-
```bash
172
-
echo 'import sys; print(sys.prefix)' | uv run -
173
-
```
174
-
175
-
Then in VS Code:
176
-
177
-
- Install the extension
178
-
- Open the command pallet, enter "Python: select interpreter"
179
-
- Select "Enter interpreter path..."
180
-
- Enter the path from the uv command above
181
-
182
-
Check that you have a little check mark next to "dbt" in the bottom bar.
183
-
184
-
### Running dbt
185
-
186
-
Once you've updated any models you can run dbt _within the virtual environment_ by simply calling:
187
-
188
-
```bash
189
-
$ dbt run
190
-
```
191
-
192
-
:::tip
193
-
Note: If you configured the dbt profile as shown in this document,
194
-
this `dbt run` will write to the `opensource-observer.oso_playground` dataset.
195
-
:::
196
-
197
-
It is likely best to target a specific model so things don't take so long on some of our materializations:
198
-
199
-
```
200
-
$ dbt run --select {name_of_the_model}
201
-
```
202
-
203
-
## sqlmesh Development
204
-
205
-
### Running sqlmesh
206
-
207
-
For faster development of new models, we rely on duckdb as a local development
208
-
environment. While this introduces the need to ensure we have macros to
209
-
compensate for differences between environments, the simple deployment allows
210
-
for fast iteration of ideas given that the required macros exist.
211
-
212
-
To get started we need to load localized data:
213
-
214
-
To do that we do:
215
-
216
-
```bash
217
-
# Ensure we're logged into google
218
-
gcloud auth application-default login
219
-
220
-
# Run the initialization of the data pull
221
-
oso local initialize --max-results-per-query 10000 --max-days 3
222
-
```
223
-
224
-
This will download 3 days of time series data with an approximate maximum of
225
-
10000 rows in each table that is not time series defined. You can change these
226
-
or unset them if you're certain that your system can handle a larger data
227
-
download but this will be required.
228
-
229
-
Once all of the data has been downloaded you can now run sqlmesh like so:
230
-
231
-
```bash
232
-
oso local sqlmesh [...any sqlmesh args...]
233
-
```
234
-
235
-
This is a convenience function for running sqlmesh locally. This is equivalent to running this series of commands:
236
-
237
-
```bash
238
-
cd warehouse/oso_sqlmesh
239
-
sqlmesh [...any sqlmesh args... ]
240
-
```
241
-
242
-
So running:
243
-
244
-
```bash
245
-
oso local sqlmesh plan
246
-
```
247
-
248
-
Would be equivalent to
249
-
250
-
```bash
251
-
cd warehouse/oso_sqlmesh
252
-
sqlmesh plan
253
-
```
254
-
255
-
However, the real reason for this convenience function is for executing sqlmesh
256
-
against a local trino as detailed in this next section.
257
-
258
-
### Running sqlmesh on a local trino
259
-
260
-
Be warned, running local trino requires running kubernetes on your machine using
261
-
[kind](https://kind.sigs.k8s.io/). While it isn't intended to be a heavy weight
262
-
implementation, it takes more resources than simply running with duckdb.
263
-
However, in order to simulate and test this running against trino as it does on
264
-
the production OSO deployment, we need to have things wired properly with kubernetes. To initialize everything simply do:
265
-
266
-
```bash
267
-
oso ops cluster-setup
268
-
```
269
-
270
-
This can take a while so please be patient, but it will generate a local
271
-
registry that is used when running the trino deployment with the metrics
272
-
calculation service deployed. This is to test that process works and to ensure
273
-
that the MCS has the proper version deployed. Eventually this can/will be used
274
-
to test the dagster deployment.
275
-
276
-
Once everything is setup, things should be running in the kind cluster
277
-
`oso-local-test-cluster`. Normally, you'd need to ensure that you forward the
278
-
right ports so that you can access the cluster to run the sqlmesh jobs but the
279
-
convenience functions we created to run sqlmesh ensure that this is done
280
-
automatically. However before running sqlmesh you will need to initialize the
281
-
data in trino.
282
-
283
-
Much like running against a local duckdb the local trino can also be initialized with on the CLI like so:
284
-
285
-
```bash
286
-
oso local initialize --local-trino
287
-
```
288
-
289
-
Once completed, trino will be configured to have the proper source data for sqlmesh.
290
-
291
-
Finally, to run `sqlmesh plan` do this:
292
-
293
-
```bash
294
-
oso local sqlmesh --local-trino plan
295
-
```
296
-
297
-
The `--local-trino` option should be passed before any sqlmesh args. Otherwise, you can call any command or use any flags from sqlmesh after the `sqlmesh` keyword in the command invocation. So to call `sqlmesh run` you'd simply do:
298
-
299
-
```bash
300
-
oso local sqlmesh --local-trino run
301
-
```
302
-
303
-
Please note, you may periodically be logged out of the local kind cluster, just
304
-
run `oso ops cluster-setup` again if that happens.
305
-
306
67
## Reference Playbooks
307
68
308
69
For setup and common operations for each subproject, navigate into the respective directory and check out the `README.md`.
0 commit comments