Skip to content

Commit df08a9c

Browse files
authored
docs: refactoring docs (#3406)
* docs: get data refactor to include pyoso * docs: update dbt guide * docs: add placeholders in contribute-models * docs: reorganize contribute-data * docs: update README * fix: docs broken links
1 parent d2b7182 commit df08a9c

File tree

28 files changed

+660
-770
lines changed

28 files changed

+660
-770
lines changed

README.md

Lines changed: 15 additions & 254 deletions
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,30 @@ Open Source Observer is a free analytics suite that helps funders measure the im
1313

1414
- `/apps`: The OSO apps
1515
- `/docs`: documentation (Docusaurus)
16-
- [on Vercel](https://docs.opensource.observer/docs) - Production build
16+
- [on Cloudflare](https://docs.opensource.observer/) - Production build
1717
- `/frontend`: frontend application (Next.js)
1818
- [on Vercel](https://www.opensource.observer) - Production build
19-
- `/hasura`: API service (Hasura) - Production
20-
- `/cli`: The `oso` cli for setting up and using various tools in this repository
19+
- `/hasura-clickhouse`: API service (Hasura+Clickhouse) - Production
20+
- `/hasura-trino`: API service (Hasura+Trino) - Production
2121
- `/docker`: Docker files
2222
- `/lib`: Common libraries
2323
- `/oss-artifact-validators`: Simple library to validate different properties of an "artifact"
24-
- `/oso_common` - Python module for common tools across all python tools
24+
- `/utils` - Common TypeScript utilities used in the monorepo
25+
- `/ops`: Our ops related code
26+
- `/external-prs`: GitHub app for validating pull requests
27+
- `/help-charts`: Helm charts for Kubernetes
28+
- `/k8s-*`: Kubernetes configuration
29+
- `/kind`: Local Kind configuration
30+
- `/opsscripts`: Python module of various ops related tools
31+
- `/tf-modules`: Terraform modules
2532
- `/warehouse`: All code specific to the data warehouse
2633
- `/dbt`: dbt configuration
34+
- `/docker`: Docker configuration
35+
- `/metrics_tools`: Python utilities for managing data
2736
- `/oso_dagster`: Dagster configuration for orchestrating software-defined assets
2837
- `/oso_sqlmesh`: sqlmesh configuration
38+
- `/pyoso`: Python package for `pyoso`
2939
- Also contains other tools to manage warehouse pipelines
30-
- `/ops`: Our ops related code
31-
- `/external-prs`: GitHub app for validating pull requests
32-
- `/k8s-*`: Kubernetes configuration
33-
- `/tf-modules`: Terraform modules
34-
- `/opstools`: Python module of various ops related tools
3540

3641
## Quickstart
3742

@@ -44,18 +49,10 @@ Before you begin you'll need the following on your system:
4449
- Python >=3.11 (see [here](https://www.python.org/downloads/))
4550
- Python uv >= 0.6 (see [here](https://pypi.org/project/uv/))
4651
- git (see [here](https://github.com/git-guides/install-git))
47-
- BigQuery access (see [here](https://docs.opensource.observer/docs/get-started/#login-to-bigquery) if you don't have it setup already)
48-
- gcloud (see [here](https://cloud.google.com/sdk/docs/install))
4952

5053
### Setup dependencies
5154

52-
First, authenticate with `gcloud`:
53-
54-
```bash
55-
gcloud auth application-default login
56-
```
57-
58-
Then install Node.js dependencies
55+
To install Node.js dependencies
5956

6057
```
6158
pnpm install
@@ -67,242 +64,6 @@ Also install the python dependencies
6764
uv sync
6865
```
6966

70-
You will also need to setup `dbt` to connect to Google BigQuery for running the data pipeline. The following wizard will copy a small playground dataset to your personal Google account and setup `dbt` for you.
71-
72-
```bash
73-
uv run oso lets_go
74-
```
75-
76-
:::tip
77-
The script is idempotent, so you can safely run it again
78-
if you encounter any issues.
79-
:::
80-
81-
## Frontend Development
82-
83-
### Setup and build the frontend
84-
85-
First, make sure the environment variables are set for `./apps/frontend`.
86-
Take a look at `./apps/frontend/.env.local.example` for the complete list.
87-
88-
- You can either set these yourself (e.g. in CI/CD)
89-
- or copy the file to `.env.local` and populate it.
90-
91-
Then do a turbo build of all apps, run the following:
92-
93-
```bash
94-
pnpm install
95-
pnpm build
96-
```
97-
98-
The resulting static site can be found in `./build/`.
99-
100-
### Running the prod server
101-
102-
If you've already run the build, you can use `pnpm serve` to serve the built files
103-
104-
### Running the frontend dev server
105-
106-
To run a dev server that watches for changes across code and Plasmic, run:
107-
108-
```bash
109-
pnpm dev:frontend
110-
```
111-
112-
## dbt Development
113-
114-
Our datasets are public! If you'd like to use them directly as opposed to adding to our
115-
dbt models, checkout [our docs!](https://docs.opensource.observer/docs/get-started/)
116-
117-
### Using the virtual environment
118-
119-
Once installation has completed you can enter the virtual environment.
120-
121-
```bash
122-
$ source .venv/bin/activate
123-
```
124-
125-
From here you should have dbt on your path.
126-
127-
```bash
128-
$ which dbt
129-
```
130-
131-
_This should return something like `opensource-observer/oso/.venv/bin/dbt`_
132-
133-
### Authenticating to bigquery
134-
135-
If you have write access to the dataset then you can connect to it by setting
136-
the `opensource_observer` profile in `dbt`. Inside `~/.dbt/profiles.yml` (create
137-
it if it isn't there), add the following:
138-
139-
```yaml
140-
opensource_observer:
141-
outputs:
142-
production:
143-
type: bigquery
144-
dataset: oso
145-
job_execution_time_seconds: 300
146-
job_retries: 1
147-
location: US
148-
method: oauth
149-
project: opensource-observer
150-
threads: 32
151-
playground:
152-
type: bigquery
153-
dataset: oso_playground
154-
job_execution_time_seconds: 300
155-
job_retries: 1
156-
location: US
157-
method: oauth
158-
project: opensource-observer
159-
threads: 32
160-
# By default we target the playground. it's less costly and also safer to write
161-
# there while developing
162-
target: playground
163-
```
164-
165-
### Setting up VS Code
166-
167-
The [Power User for dbt core](https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user) extension is pretty helpful.
168-
169-
You'll need the path to your virtual environment, which you can get by running
170-
171-
```bash
172-
echo 'import sys; print(sys.prefix)' | uv run -
173-
```
174-
175-
Then in VS Code:
176-
177-
- Install the extension
178-
- Open the command pallet, enter "Python: select interpreter"
179-
- Select "Enter interpreter path..."
180-
- Enter the path from the uv command above
181-
182-
Check that you have a little check mark next to "dbt" in the bottom bar.
183-
184-
### Running dbt
185-
186-
Once you've updated any models you can run dbt _within the virtual environment_ by simply calling:
187-
188-
```bash
189-
$ dbt run
190-
```
191-
192-
:::tip
193-
Note: If you configured the dbt profile as shown in this document,
194-
this `dbt run` will write to the `opensource-observer.oso_playground` dataset.
195-
:::
196-
197-
It is likely best to target a specific model so things don't take so long on some of our materializations:
198-
199-
```
200-
$ dbt run --select {name_of_the_model}
201-
```
202-
203-
## sqlmesh Development
204-
205-
### Running sqlmesh
206-
207-
For faster development of new models, we rely on duckdb as a local development
208-
environment. While this introduces the need to ensure we have macros to
209-
compensate for differences between environments, the simple deployment allows
210-
for fast iteration of ideas given that the required macros exist.
211-
212-
To get started we need to load localized data:
213-
214-
To do that we do:
215-
216-
```bash
217-
# Ensure we're logged into google
218-
gcloud auth application-default login
219-
220-
# Run the initialization of the data pull
221-
oso local initialize --max-results-per-query 10000 --max-days 3
222-
```
223-
224-
This will download 3 days of time series data with an approximate maximum of
225-
10000 rows in each table that is not time series defined. You can change these
226-
or unset them if you're certain that your system can handle a larger data
227-
download but this will be required.
228-
229-
Once all of the data has been downloaded you can now run sqlmesh like so:
230-
231-
```bash
232-
oso local sqlmesh [...any sqlmesh args...]
233-
```
234-
235-
This is a convenience function for running sqlmesh locally. This is equivalent to running this series of commands:
236-
237-
```bash
238-
cd warehouse/oso_sqlmesh
239-
sqlmesh [...any sqlmesh args... ]
240-
```
241-
242-
So running:
243-
244-
```bash
245-
oso local sqlmesh plan
246-
```
247-
248-
Would be equivalent to
249-
250-
```bash
251-
cd warehouse/oso_sqlmesh
252-
sqlmesh plan
253-
```
254-
255-
However, the real reason for this convenience function is for executing sqlmesh
256-
against a local trino as detailed in this next section.
257-
258-
### Running sqlmesh on a local trino
259-
260-
Be warned, running local trino requires running kubernetes on your machine using
261-
[kind](https://kind.sigs.k8s.io/). While it isn't intended to be a heavy weight
262-
implementation, it takes more resources than simply running with duckdb.
263-
However, in order to simulate and test this running against trino as it does on
264-
the production OSO deployment, we need to have things wired properly with kubernetes. To initialize everything simply do:
265-
266-
```bash
267-
oso ops cluster-setup
268-
```
269-
270-
This can take a while so please be patient, but it will generate a local
271-
registry that is used when running the trino deployment with the metrics
272-
calculation service deployed. This is to test that process works and to ensure
273-
that the MCS has the proper version deployed. Eventually this can/will be used
274-
to test the dagster deployment.
275-
276-
Once everything is setup, things should be running in the kind cluster
277-
`oso-local-test-cluster`. Normally, you'd need to ensure that you forward the
278-
right ports so that you can access the cluster to run the sqlmesh jobs but the
279-
convenience functions we created to run sqlmesh ensure that this is done
280-
automatically. However before running sqlmesh you will need to initialize the
281-
data in trino.
282-
283-
Much like running against a local duckdb the local trino can also be initialized with on the CLI like so:
284-
285-
```bash
286-
oso local initialize --local-trino
287-
```
288-
289-
Once completed, trino will be configured to have the proper source data for sqlmesh.
290-
291-
Finally, to run `sqlmesh plan` do this:
292-
293-
```bash
294-
oso local sqlmesh --local-trino plan
295-
```
296-
297-
The `--local-trino` option should be passed before any sqlmesh args. Otherwise, you can call any command or use any flags from sqlmesh after the `sqlmesh` keyword in the command invocation. So to call `sqlmesh run` you'd simply do:
298-
299-
```bash
300-
oso local sqlmesh --local-trino run
301-
```
302-
303-
Please note, you may periodically be logged out of the local kind cluster, just
304-
run `oso ops cluster-setup` again if that happens.
305-
30667
## Reference Playbooks
30768

30869
For setup and common operations for each subproject, navigate into the respective directory and check out the `README.md`.

apps/docs/docs/contribute-data/api-crawling/graphql-api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,11 +174,11 @@ open_collective_transactions = graphql_factory(
174174

175175
:::tip
176176
If you have not setup your local Dagster environment yet, please follow
177-
our [quickstart guide](../../guides/dagster/index.md).
177+
our [quickstart guide](../setup/index.md).
178178
:::
179179

180180
After having your Dagster instance running, follow the
181-
[Dagster Asset Guide](../../guides/dagster/index.md) to materialize the assets.
181+
[Dagster Asset Guide](../setup/index.md) to materialize the assets.
182182
Our example assets are located under `assets/open_collective/transactions`.
183183

184184
![Dagster Open Collective Asset List](crawl-api-graphql-pipeline.png)

apps/docs/docs/contribute-data/api-crawling/rest-api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,11 @@ pipeline, the data will be ingested into your OSO warehouse.
132132

133133
:::tip
134134
If you have not setup your local Dagster environment yet, please follow
135-
our [quickstart guide](../../guides/dagster/index.md).
135+
our [quickstart guide](../setup/index.md).
136136
:::
137137

138138
After having your Dagster instance running, follow the
139-
[Dagster Asset Guide](../../guides/dagster/index.md) to materialize the assets.
139+
[Dagster Asset Guide](../setup/index.md) to materialize the assets.
140140
Our example assets are located under `assets/defillama/tvl`.
141141

142142
![Dagster DefiLlama Asset List](crawl-api-example-defillama.png)

apps/docs/docs/contribute-data/bigquery.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ the US multi-region.
1111
If you want OSO to host a copy of
1212
the dataset in the US multi-region,
1313
see our guide on
14-
[BigQuery Data Transfer Service](../guides/bq-data-transfer.md).
14+
[BigQuery Data Transfer Service](./bq-data-transfer.md).
1515

1616
## Make the data available in the US region
1717

@@ -29,7 +29,7 @@ you can do this directly from the
2929

3030
OSO will also copy certain valuable datasets into the
3131
`opensource-observer` project via the BigQuery Data Transfer Service
32-
See the guide on [BigQuery Data Transfer Service](../guides/bq-data-transfer.md)
32+
See the guide on [BigQuery Data Transfer Service](./bq-data-transfer.md)
3333
add dataset replication as a Dagster asset to OSO.
3434

3535
## Make the data accessible to our Google service account
File renamed without changes.

apps/docs/docs/contribute-data/gcs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ by OSO, please reach out to us on
1414
[Discord](https://www.opensource.observer/discord).
1515

1616
If you prefer to handle the data storage yourself, check out the
17-
[Connect via BigQuery guide](../guides/bq-data-transfer.md).
17+
[Connect via BigQuery guide](./bq-data-transfer.md).
1818

1919
## Schedule periodic dumps to GCS
2020

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"label": "Getting Started with Dagster",
3+
"position": 0,
4+
"link": {
5+
"type": "doc",
6+
"id": "index"
7+
}
8+
}
File renamed without changes.
File renamed without changes.

apps/docs/docs/guides/dagster/dagster_deployments.png renamed to apps/docs/docs/contribute-data/setup/dagster_deployments.png

File renamed without changes.

0 commit comments

Comments
 (0)