Skip to content

Commit b252a87

Browse files
authored
docs: include dependency graph use cases (#2684)
* feat(docs): add dependencies tutorial * refactor(docs): merge funding tutorials and add tabs * chore: update sidebar and indices * fix(docs): move to tabs in collection view * fix(docs): implement tabs on project tutorial
1 parent 03343cd commit b252a87

File tree

8 files changed

+1244
-367
lines changed

8 files changed

+1244
-367
lines changed

apps/docs/docs/tutorials/collection-view.md renamed to apps/docs/docs/tutorials/collection-view.mdx

Lines changed: 186 additions & 151 deletions
Large diffs are not rendered by default.
Lines changed: 396 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,396 @@
1+
---
2+
title: Map the Software Supply Chain
3+
sidebar_position: 3
4+
---
5+
6+
import Tabs from '@theme/Tabs';
7+
import TabItem from '@theme/TabItem';
8+
9+
Trace the dependencies in a software bill of materials (SBOM) for a given repository and assign weights or other metrics to each node. New to OSO? Check out our [Getting Started guide](../get-started/index.md) to set up your BigQuery or API access.
10+
11+
![Dependency Graph](dependency-graph.png)
12+
13+
## Getting Started
14+
15+
Before running any analysis, you'll need to set up your environment:
16+
17+
<Tabs>
18+
<TabItem value="sql" label="SQL">
19+
20+
If you haven't already, subscribe to OSO public datasets in BigQuery by clicking the "Subscribe" button on our [Datasets page](../integrate/datasets/#oso-production-data-pipeline).
21+
22+
You can run all queries in this guide directly in the [BigQuery console](https://console.cloud.google.com/bigquery).
23+
24+
</TabItem>
25+
<TabItem value="python" label="Python">
26+
27+
Start your Python notebook with the following:
28+
29+
```python
30+
from google.cloud import bigquery
31+
import pandas as pd
32+
import os
33+
34+
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = # PATH TO YOUR CREDENTIALS JSON
35+
GCP_PROJECT = # YOUR GCP PROJECT NAME
36+
37+
client = bigquery.Client(GCP_PROJECT)
38+
```
39+
40+
For more details on setting up Python notebooks, see our guide on [writing Python notebooks](../integrate/python-notebooks.md).
41+
42+
</TabItem>
43+
</Tabs>
44+
45+
## Identify Repositories and Packages
46+
47+
### Repository Metadata
48+
49+
Get metadata and basic stats about a repository using OSO's indexed data:
50+
51+
<Tabs>
52+
<TabItem value="sql" label="SQL">
53+
54+
```sql
55+
select *
56+
from `oso_production.repositories_v0`
57+
where artifact_url = 'https://github.com/ethereum/go-ethereum'
58+
```
59+
60+
</TabItem>
61+
<TabItem value="python" label="Python">
62+
63+
```python
64+
query = """
65+
select *
66+
from `oso_production.repositories_v0`
67+
where artifact_url = 'https://github.com/ethereum/go-ethereum'
68+
"""
69+
df = client.query(query).to_dataframe()
70+
```
71+
72+
</TabItem>
73+
</Tabs>
74+
75+
### SBOMs (Package Dependencies)
76+
77+
OSO uses GitHub's Software Bill of Materials (SBOMs) dataset to identify package dependencies. Note that this data doesn't differentiate between direct and indirect dependencies, but provides a good starting point for mapping the software supply chain:
78+
79+
<Tabs>
80+
<TabItem value="sql" label="SQL">
81+
82+
```sql
83+
select *
84+
from `oso_production.sboms_v0`
85+
where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI='
86+
```
87+
88+
</TabItem>
89+
<TabItem value="python" label="Python">
90+
91+
```python
92+
query = """
93+
select *
94+
from `oso_production.sboms_v0`
95+
where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI='
96+
"""
97+
df = client.query(query).to_dataframe()
98+
```
99+
100+
</TabItem>
101+
</Tabs>
102+
103+
### Package Maintainers
104+
105+
OSO leverages [Open Source Insights (deps.dev)](https://deps.dev) data to identify the repo that maintains a given package. This covers approximately 90% of packages based on our testing:
106+
107+
<Tabs>
108+
<TabItem value="sql" label="SQL">
109+
110+
```sql
111+
select
112+
package_artifact_source,
113+
package_artifact_name,
114+
package_owner_project_id,
115+
package_owner_artifact_namespace,
116+
package_owner_artifact_name
117+
from `oso_production.package_owners_v0`
118+
where package_artifact_name = '@libp2p/echo'
119+
```
120+
121+
</TabItem>
122+
<TabItem value="python" label="Python">
123+
124+
```python
125+
query = """
126+
select
127+
package_artifact_source,
128+
package_artifact_name,
129+
package_owner_project_id,
130+
package_owner_artifact_namespace,
131+
package_owner_artifact_name
132+
from `oso_production.package_owners_v0`
133+
where package_artifact_name = '@libp2p/echo'
134+
"""
135+
df = client.query(query).to_dataframe()
136+
```
137+
138+
</TabItem>
139+
</Tabs>
140+
141+
### Build a Deep Funding Graph
142+
143+
This example demonstrates how to create a dependency graph for a group of related repositories, such as the one used by [Deep Funding](https://deepfunding.org). The analysis maps relationships between key Ethereum repositories and their package dependencies:
144+
145+
<Tabs>
146+
<TabItem value="sql" label="SQL">
147+
148+
```sql
149+
select distinct
150+
sboms.from_artifact_namespace as seed_repo_owner,
151+
sboms.from_artifact_name as seed_repo_name,
152+
sboms.to_package_artifact_name as package_name,
153+
package_owners.package_owner_artifact_namespace as package_repo_owner,
154+
package_owners.package_owner_artifact_name as package_repo_name,
155+
sboms.to_package_artifact_source as package_source
156+
from `oso_production.sboms_v0` sboms
157+
join `oso_production.package_owners_v0` package_owners
158+
on
159+
sboms.to_package_artifact_name = package_owners.package_artifact_name
160+
and sboms.to_package_artifact_source = package_owners.package_artifact_source
161+
where
162+
sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP')
163+
and package_owners.package_owner_artifact_namespace is not null
164+
and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name)
165+
in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2',
166+
'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum',
167+
'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon',
168+
'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project',
169+
'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm',
170+
'eth-infinitism/account-abstraction','safe-global/safe-smart-account',
171+
'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo')
172+
```
173+
174+
</TabItem>
175+
<TabItem value="python" label="Python">
176+
177+
```python
178+
query = """
179+
select distinct
180+
sboms.from_artifact_namespace as seed_repo_owner,
181+
sboms.from_artifact_name as seed_repo_name,
182+
sboms.to_package_artifact_name as package_name,
183+
package_owners.package_owner_artifact_namespace as package_repo_owner,
184+
package_owners.package_owner_artifact_name as package_repo_name,
185+
sboms.to_package_artifact_source as package_source
186+
from `oso_production.sboms_v0` sboms
187+
join `oso_production.package_owners_v0` package_owners
188+
on
189+
sboms.to_package_artifact_name = package_owners.package_artifact_name
190+
and sboms.to_package_artifact_source = package_owners.package_artifact_source
191+
where
192+
sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP')
193+
and package_owners.package_owner_artifact_namespace is not null
194+
and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name)
195+
in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2',
196+
'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum',
197+
'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon',
198+
'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project',
199+
'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm',
200+
'eth-infinitism/account-abstraction','safe-global/safe-smart-account',
201+
'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo')
202+
"""
203+
df = client.query(query).to_dataframe()
204+
```
205+
206+
We can also go further and create a network graph from the data we've just fetched:
207+
208+
```python
209+
import networkx as nx
210+
211+
# turn each node into a GitHub URL
212+
gh = 'https://github.com/'
213+
df['seed_repo_url'] = df.apply(lambda x: f"{gh}{x['seed_repo_owner']}/{x['seed_repo_name']}", axis=1)
214+
df['package_repo_url'] = df.apply(lambda x: f"{gh}{x['package_repo_owner']}/{x['package_repo_name']}", axis=1)
215+
216+
# Store in a Network Graph
217+
G = nx.DiGraph()
218+
219+
for repo_url in df['seed_repo_url'].unique():
220+
G.add_node(repo_url, level=1)
221+
222+
for repo_url in df['package_repo_url'].unique():
223+
if repo_url not in G.nodes:
224+
G.add_node(repo_url, level=2)
225+
226+
for _, row in df.iterrows():
227+
G.add_edge(
228+
row['seed_repo_url'],
229+
row['package_repo_url'],
230+
relation=row['package_source']
231+
)
232+
233+
# Placeholder for adding weights to the graph
234+
global_weight = 0
235+
for u, v in G.edges:
236+
G[u][v]['weight'] = global_weight
237+
```
238+
239+
</TabItem>
240+
</Tabs>
241+
242+
For more examples of dependency analysis, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph).
243+
244+
## Weight Nodes and Edges
245+
246+
### Most Used Dependencies
247+
248+
Find the most commonly used dependencies across all projects in OSO. This query joins package ownership data with SBOM data to count how many projects depend on each package:
249+
250+
<Tabs>
251+
<TabItem value="sql" label="SQL">
252+
253+
```sql
254+
select
255+
p.project_id,
256+
pkgs.package_artifact_source,
257+
pkgs.package_artifact_name,
258+
count(distinct sboms.from_project_id) as num_dependents
259+
from `oso_production.package_owners_v0` pkgs
260+
join `oso_production.sboms_v0` sboms
261+
on pkgs.package_artifact_name = sboms.to_package_artifact_name
262+
and pkgs.package_artifact_source = sboms.to_package_artifact_source
263+
join `oso_production.projects_v1` p
264+
on pkgs.package_owner_project_id = p.project_id
265+
where pkgs.package_owner_project_id is not null
266+
group by 1,2,3
267+
order by 4 desc
268+
```
269+
270+
</TabItem>
271+
<TabItem value="python" label="Python">
272+
273+
```python
274+
query = """
275+
select
276+
p.project_id,
277+
pkgs.package_artifact_source,
278+
pkgs.package_artifact_name,
279+
count(distinct sboms.from_project_id) as num_dependents
280+
from `oso_production.package_owners_v0` pkgs
281+
join `oso_production.sboms_v0` sboms
282+
on pkgs.package_artifact_name = sboms.to_package_artifact_name
283+
and pkgs.package_artifact_source = sboms.to_package_artifact_source
284+
join `oso_production.projects_v1` p
285+
on pkgs.package_owner_project_id = p.project_id
286+
where pkgs.package_owner_project_id is not null
287+
group by 1,2,3
288+
order by 4 desc
289+
"""
290+
df = client.query(query).to_dataframe()
291+
292+
# Optional: Display top dependencies
293+
print("Top 10 most used dependencies:")
294+
print(df.head(10))
295+
```
296+
297+
</TabItem>
298+
</Tabs>
299+
300+
### Downstream Impact
301+
302+
This is an example of a more advanced analysis that demonstrates how to analyze relationships between onchain projects and their development dependencies:
303+
304+
<Tabs>
305+
<TabItem value="sql" label="SQL">
306+
307+
```sql
308+
select
309+
onchain_projects.project_name as `onchain_builder`,
310+
onchain_metrics.event_source as `network`,
311+
onchain_metrics.address_count_90_days,
312+
onchain_metrics.gas_fees_sum_6_months,
313+
onchain_metrics.transaction_count_6_months as transactions_6_months,
314+
code_metrics.project_name as `dev_tool_maintainer`,
315+
package_owners.package_artifact_source as `package_source`,
316+
code_metrics.active_developer_count_6_months,
317+
code_metrics.contributor_count_6_months,
318+
code_metrics.commit_count_6_months,
319+
code_metrics.opened_issue_count_6_months,
320+
code_metrics.opened_pull_request_count_6_months,
321+
code_metrics.fork_count,
322+
code_metrics.star_count,
323+
code_metrics.last_updated_at_date
324+
from `oso_production.sboms_v0` sboms
325+
join `oso_production.projects_v1` onchain_projects
326+
on sboms.from_project_id = onchain_projects.project_id
327+
join `oso_production.projects_by_collection_v1` projects_by_collection
328+
on onchain_projects.project_id = projects_by_collection.project_id
329+
join `oso_production.onchain_metrics_by_project_v1` onchain_metrics
330+
on onchain_projects.project_id = onchain_metrics.project_id
331+
join `oso_production.package_owners_v0` package_owners
332+
on sboms.to_package_artifact_name = package_owners.package_artifact_name
333+
join `oso_production.code_metrics_by_project_v1` code_metrics
334+
on package_owners.package_owner_project_id = code_metrics.project_id
335+
where
336+
projects_by_collection.collection_name = 'op-retrofunding-4'
337+
and transaction_count_6_months >= 1000
338+
and address_count_90_days >= 420
339+
```
340+
341+
</TabItem>
342+
<TabItem value="python" label="Python">
343+
344+
```python
345+
query = """
346+
select
347+
onchain_projects.project_name as `onchain_builder`,
348+
onchain_metrics.event_source as `network`,
349+
onchain_metrics.address_count_90_days,
350+
onchain_metrics.gas_fees_sum_6_months,
351+
onchain_metrics.transaction_count_6_months as transactions_6_months,
352+
code_metrics.project_name as `dev_tool_maintainer`,
353+
package_owners.package_artifact_source as `package_source`,
354+
code_metrics.active_developer_count_6_months,
355+
code_metrics.contributor_count_6_months,
356+
code_metrics.commit_count_6_months,
357+
code_metrics.opened_issue_count_6_months,
358+
code_metrics.opened_pull_request_count_6_months,
359+
code_metrics.fork_count,
360+
code_metrics.star_count,
361+
code_metrics.last_updated_at_date
362+
from `oso_production.sboms_v0` sboms
363+
join `oso_production.projects_v1` onchain_projects
364+
on sboms.from_project_id = onchain_projects.project_id
365+
join `oso_production.projects_by_collection_v1` projects_by_collection
366+
on onchain_projects.project_id = projects_by_collection.project_id
367+
join `oso_production.onchain_metrics_by_project_v1` onchain_metrics
368+
on onchain_projects.project_id = onchain_metrics.project_id
369+
join `oso_production.package_owners_v0` package_owners
370+
on sboms.to_package_artifact_name = package_owners.package_artifact_name
371+
join `oso_production.code_metrics_by_project_v1` code_metrics
372+
on package_owners.package_owner_project_id = code_metrics.project_id
373+
where
374+
projects_by_collection.collection_name = 'op-retrofunding-4'
375+
and transaction_count_6_months >= 1000
376+
and address_count_90_days >= 420
377+
"""
378+
df = client.query(query).to_dataframe()
379+
380+
# Optional: Add visualization code
381+
import plotly.express as px
382+
383+
# Example visualization
384+
fig = px.scatter(df,
385+
x='address_count_90_days',
386+
y='transactions_6_months',
387+
size='gas_fees_sum_6_months',
388+
hover_data=['onchain_builder', 'dev_tool_maintainer']
389+
)
390+
fig.show()
391+
```
392+
393+
</TabItem>
394+
</Tabs>
395+
396+
You can go even further in your analysis by joining on other OSO datasets. For more examples, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph).
1.57 MB
Loading

0 commit comments

Comments
 (0)