|
| 1 | +--- |
| 2 | +title: Map the Software Supply Chain |
| 3 | +sidebar_position: 3 |
| 4 | +--- |
| 5 | + |
| 6 | +import Tabs from '@theme/Tabs'; |
| 7 | +import TabItem from '@theme/TabItem'; |
| 8 | + |
| 9 | +Trace the dependencies in a software bill of materials (SBOM) for a given repository and assign weights or other metrics to each node. New to OSO? Check out our [Getting Started guide](../get-started/index.md) to set up your BigQuery or API access. |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +## Getting Started |
| 14 | + |
| 15 | +Before running any analysis, you'll need to set up your environment: |
| 16 | + |
| 17 | +<Tabs> |
| 18 | +<TabItem value="sql" label="SQL"> |
| 19 | + |
| 20 | +If you haven't already, subscribe to OSO public datasets in BigQuery by clicking the "Subscribe" button on our [Datasets page](../integrate/datasets/#oso-production-data-pipeline). |
| 21 | + |
| 22 | +You can run all queries in this guide directly in the [BigQuery console](https://console.cloud.google.com/bigquery). |
| 23 | + |
| 24 | +</TabItem> |
| 25 | +<TabItem value="python" label="Python"> |
| 26 | + |
| 27 | +Start your Python notebook with the following: |
| 28 | + |
| 29 | +```python |
| 30 | +from google.cloud import bigquery |
| 31 | +import pandas as pd |
| 32 | +import os |
| 33 | + |
| 34 | +os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = # PATH TO YOUR CREDENTIALS JSON |
| 35 | +GCP_PROJECT = # YOUR GCP PROJECT NAME |
| 36 | + |
| 37 | +client = bigquery.Client(GCP_PROJECT) |
| 38 | +``` |
| 39 | + |
| 40 | +For more details on setting up Python notebooks, see our guide on [writing Python notebooks](../integrate/python-notebooks.md). |
| 41 | + |
| 42 | +</TabItem> |
| 43 | +</Tabs> |
| 44 | + |
| 45 | +## Identify Repositories and Packages |
| 46 | + |
| 47 | +### Repository Metadata |
| 48 | + |
| 49 | +Get metadata and basic stats about a repository using OSO's indexed data: |
| 50 | + |
| 51 | +<Tabs> |
| 52 | +<TabItem value="sql" label="SQL"> |
| 53 | + |
| 54 | +```sql |
| 55 | +select * |
| 56 | +from `oso_production.repositories_v0` |
| 57 | +where artifact_url = 'https://github.com/ethereum/go-ethereum' |
| 58 | +``` |
| 59 | + |
| 60 | +</TabItem> |
| 61 | +<TabItem value="python" label="Python"> |
| 62 | + |
| 63 | +```python |
| 64 | +query = """ |
| 65 | + select * |
| 66 | + from `oso_production.repositories_v0` |
| 67 | + where artifact_url = 'https://github.com/ethereum/go-ethereum' |
| 68 | +""" |
| 69 | +df = client.query(query).to_dataframe() |
| 70 | +``` |
| 71 | + |
| 72 | +</TabItem> |
| 73 | +</Tabs> |
| 74 | + |
| 75 | +### SBOMs (Package Dependencies) |
| 76 | + |
| 77 | +OSO uses GitHub's Software Bill of Materials (SBOMs) dataset to identify package dependencies. Note that this data doesn't differentiate between direct and indirect dependencies, but provides a good starting point for mapping the software supply chain: |
| 78 | + |
| 79 | +<Tabs> |
| 80 | +<TabItem value="sql" label="SQL"> |
| 81 | + |
| 82 | +```sql |
| 83 | +select * |
| 84 | +from `oso_production.sboms_v0` |
| 85 | +where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI=' |
| 86 | +``` |
| 87 | + |
| 88 | +</TabItem> |
| 89 | +<TabItem value="python" label="Python"> |
| 90 | + |
| 91 | +```python |
| 92 | +query = """ |
| 93 | + select * |
| 94 | + from `oso_production.sboms_v0` |
| 95 | + where from_artifact_id = '0mjl8VhWsui_6TEZZnbQzyf8h1A9bOioIlK17p0D5hI=' |
| 96 | +""" |
| 97 | +df = client.query(query).to_dataframe() |
| 98 | +``` |
| 99 | + |
| 100 | +</TabItem> |
| 101 | +</Tabs> |
| 102 | + |
| 103 | +### Package Maintainers |
| 104 | + |
| 105 | +OSO leverages [Open Source Insights (deps.dev)](https://deps.dev) data to identify the repo that maintains a given package. This covers approximately 90% of packages based on our testing: |
| 106 | + |
| 107 | +<Tabs> |
| 108 | +<TabItem value="sql" label="SQL"> |
| 109 | + |
| 110 | +```sql |
| 111 | +select |
| 112 | + package_artifact_source, |
| 113 | + package_artifact_name, |
| 114 | + package_owner_project_id, |
| 115 | + package_owner_artifact_namespace, |
| 116 | + package_owner_artifact_name |
| 117 | +from `oso_production.package_owners_v0` |
| 118 | +where package_artifact_name = '@libp2p/echo' |
| 119 | +``` |
| 120 | + |
| 121 | +</TabItem> |
| 122 | +<TabItem value="python" label="Python"> |
| 123 | + |
| 124 | +```python |
| 125 | +query = """ |
| 126 | + select |
| 127 | + package_artifact_source, |
| 128 | + package_artifact_name, |
| 129 | + package_owner_project_id, |
| 130 | + package_owner_artifact_namespace, |
| 131 | + package_owner_artifact_name |
| 132 | + from `oso_production.package_owners_v0` |
| 133 | + where package_artifact_name = '@libp2p/echo' |
| 134 | +""" |
| 135 | +df = client.query(query).to_dataframe() |
| 136 | +``` |
| 137 | + |
| 138 | +</TabItem> |
| 139 | +</Tabs> |
| 140 | + |
| 141 | +### Build a Deep Funding Graph |
| 142 | + |
| 143 | +This example demonstrates how to create a dependency graph for a group of related repositories, such as the one used by [Deep Funding](https://deepfunding.org). The analysis maps relationships between key Ethereum repositories and their package dependencies: |
| 144 | + |
| 145 | +<Tabs> |
| 146 | +<TabItem value="sql" label="SQL"> |
| 147 | + |
| 148 | +```sql |
| 149 | +select distinct |
| 150 | + sboms.from_artifact_namespace as seed_repo_owner, |
| 151 | + sboms.from_artifact_name as seed_repo_name, |
| 152 | + sboms.to_package_artifact_name as package_name, |
| 153 | + package_owners.package_owner_artifact_namespace as package_repo_owner, |
| 154 | + package_owners.package_owner_artifact_name as package_repo_name, |
| 155 | + sboms.to_package_artifact_source as package_source |
| 156 | + from `oso_production.sboms_v0` sboms |
| 157 | + join `oso_production.package_owners_v0` package_owners |
| 158 | + on |
| 159 | + sboms.to_package_artifact_name = package_owners.package_artifact_name |
| 160 | + and sboms.to_package_artifact_source = package_owners.package_artifact_source |
| 161 | + where |
| 162 | + sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP') |
| 163 | + and package_owners.package_owner_artifact_namespace is not null |
| 164 | + and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name) |
| 165 | + in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2', |
| 166 | + 'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum', |
| 167 | + 'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon', |
| 168 | + 'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project', |
| 169 | + 'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm', |
| 170 | + 'eth-infinitism/account-abstraction','safe-global/safe-smart-account', |
| 171 | + 'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo') |
| 172 | +``` |
| 173 | + |
| 174 | +</TabItem> |
| 175 | +<TabItem value="python" label="Python"> |
| 176 | + |
| 177 | +```python |
| 178 | +query = """ |
| 179 | + select distinct |
| 180 | + sboms.from_artifact_namespace as seed_repo_owner, |
| 181 | + sboms.from_artifact_name as seed_repo_name, |
| 182 | + sboms.to_package_artifact_name as package_name, |
| 183 | + package_owners.package_owner_artifact_namespace as package_repo_owner, |
| 184 | + package_owners.package_owner_artifact_name as package_repo_name, |
| 185 | + sboms.to_package_artifact_source as package_source |
| 186 | + from `oso_production.sboms_v0` sboms |
| 187 | + join `oso_production.package_owners_v0` package_owners |
| 188 | + on |
| 189 | + sboms.to_package_artifact_name = package_owners.package_artifact_name |
| 190 | + and sboms.to_package_artifact_source = package_owners.package_artifact_source |
| 191 | + where |
| 192 | + sboms.to_package_artifact_source in ('NPM','RUST','GO','PIP') |
| 193 | + and package_owners.package_owner_artifact_namespace is not null |
| 194 | + and concat(sboms.from_artifact_namespace, '/', sboms.from_artifact_name) |
| 195 | + in ('prysmaticlabs/prysm','sigp/lighthouse','consensys/teku','status-im/nimbus-eth2', |
| 196 | + 'chainsafe/lodestar','grandinetech/grandine','ethereum/go-ethereum', |
| 197 | + 'nethermindeth/nethermind','hyperledger/besu','erigontech/erigon', |
| 198 | + 'paradigmxyz/reth','ethereum/solidity','ethereum/remix-project', |
| 199 | + 'vyperlang/vyper','ethereum/web3.py','ethereum/py-evm', |
| 200 | + 'eth-infinitism/account-abstraction','safe-global/safe-smart-account', |
| 201 | + 'a16z/helios','web3/web3.js','ethereumjs/ethereumjs-monorepo') |
| 202 | +""" |
| 203 | +df = client.query(query).to_dataframe() |
| 204 | +``` |
| 205 | + |
| 206 | +We can also go further and create a network graph from the data we've just fetched: |
| 207 | + |
| 208 | +```python |
| 209 | +import networkx as nx |
| 210 | + |
| 211 | +# turn each node into a GitHub URL |
| 212 | +gh = 'https://github.com/' |
| 213 | +df['seed_repo_url'] = df.apply(lambda x: f"{gh}{x['seed_repo_owner']}/{x['seed_repo_name']}", axis=1) |
| 214 | +df['package_repo_url'] = df.apply(lambda x: f"{gh}{x['package_repo_owner']}/{x['package_repo_name']}", axis=1) |
| 215 | + |
| 216 | +# Store in a Network Graph |
| 217 | +G = nx.DiGraph() |
| 218 | + |
| 219 | +for repo_url in df['seed_repo_url'].unique(): |
| 220 | + G.add_node(repo_url, level=1) |
| 221 | + |
| 222 | +for repo_url in df['package_repo_url'].unique(): |
| 223 | + if repo_url not in G.nodes: |
| 224 | + G.add_node(repo_url, level=2) |
| 225 | + |
| 226 | +for _, row in df.iterrows(): |
| 227 | + G.add_edge( |
| 228 | + row['seed_repo_url'], |
| 229 | + row['package_repo_url'], |
| 230 | + relation=row['package_source'] |
| 231 | + ) |
| 232 | + |
| 233 | +# Placeholder for adding weights to the graph |
| 234 | +global_weight = 0 |
| 235 | +for u, v in G.edges: |
| 236 | + G[u][v]['weight'] = global_weight |
| 237 | +``` |
| 238 | + |
| 239 | +</TabItem> |
| 240 | +</Tabs> |
| 241 | + |
| 242 | +For more examples of dependency analysis, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph). |
| 243 | + |
| 244 | +## Weight Nodes and Edges |
| 245 | + |
| 246 | +### Most Used Dependencies |
| 247 | + |
| 248 | +Find the most commonly used dependencies across all projects in OSO. This query joins package ownership data with SBOM data to count how many projects depend on each package: |
| 249 | + |
| 250 | +<Tabs> |
| 251 | +<TabItem value="sql" label="SQL"> |
| 252 | + |
| 253 | +```sql |
| 254 | +select |
| 255 | + p.project_id, |
| 256 | + pkgs.package_artifact_source, |
| 257 | + pkgs.package_artifact_name, |
| 258 | + count(distinct sboms.from_project_id) as num_dependents |
| 259 | +from `oso_production.package_owners_v0` pkgs |
| 260 | +join `oso_production.sboms_v0` sboms |
| 261 | + on pkgs.package_artifact_name = sboms.to_package_artifact_name |
| 262 | + and pkgs.package_artifact_source = sboms.to_package_artifact_source |
| 263 | +join `oso_production.projects_v1` p |
| 264 | + on pkgs.package_owner_project_id = p.project_id |
| 265 | +where pkgs.package_owner_project_id is not null |
| 266 | +group by 1,2,3 |
| 267 | +order by 4 desc |
| 268 | +``` |
| 269 | + |
| 270 | +</TabItem> |
| 271 | +<TabItem value="python" label="Python"> |
| 272 | + |
| 273 | +```python |
| 274 | +query = """ |
| 275 | + select |
| 276 | + p.project_id, |
| 277 | + pkgs.package_artifact_source, |
| 278 | + pkgs.package_artifact_name, |
| 279 | + count(distinct sboms.from_project_id) as num_dependents |
| 280 | + from `oso_production.package_owners_v0` pkgs |
| 281 | + join `oso_production.sboms_v0` sboms |
| 282 | + on pkgs.package_artifact_name = sboms.to_package_artifact_name |
| 283 | + and pkgs.package_artifact_source = sboms.to_package_artifact_source |
| 284 | + join `oso_production.projects_v1` p |
| 285 | + on pkgs.package_owner_project_id = p.project_id |
| 286 | + where pkgs.package_owner_project_id is not null |
| 287 | + group by 1,2,3 |
| 288 | + order by 4 desc |
| 289 | +""" |
| 290 | +df = client.query(query).to_dataframe() |
| 291 | + |
| 292 | +# Optional: Display top dependencies |
| 293 | +print("Top 10 most used dependencies:") |
| 294 | +print(df.head(10)) |
| 295 | +``` |
| 296 | + |
| 297 | +</TabItem> |
| 298 | +</Tabs> |
| 299 | + |
| 300 | +### Downstream Impact |
| 301 | + |
| 302 | +This is an example of a more advanced analysis that demonstrates how to analyze relationships between onchain projects and their development dependencies: |
| 303 | + |
| 304 | +<Tabs> |
| 305 | +<TabItem value="sql" label="SQL"> |
| 306 | + |
| 307 | +```sql |
| 308 | +select |
| 309 | + onchain_projects.project_name as `onchain_builder`, |
| 310 | + onchain_metrics.event_source as `network`, |
| 311 | + onchain_metrics.address_count_90_days, |
| 312 | + onchain_metrics.gas_fees_sum_6_months, |
| 313 | + onchain_metrics.transaction_count_6_months as transactions_6_months, |
| 314 | + code_metrics.project_name as `dev_tool_maintainer`, |
| 315 | + package_owners.package_artifact_source as `package_source`, |
| 316 | + code_metrics.active_developer_count_6_months, |
| 317 | + code_metrics.contributor_count_6_months, |
| 318 | + code_metrics.commit_count_6_months, |
| 319 | + code_metrics.opened_issue_count_6_months, |
| 320 | + code_metrics.opened_pull_request_count_6_months, |
| 321 | + code_metrics.fork_count, |
| 322 | + code_metrics.star_count, |
| 323 | + code_metrics.last_updated_at_date |
| 324 | +from `oso_production.sboms_v0` sboms |
| 325 | +join `oso_production.projects_v1` onchain_projects |
| 326 | + on sboms.from_project_id = onchain_projects.project_id |
| 327 | +join `oso_production.projects_by_collection_v1` projects_by_collection |
| 328 | + on onchain_projects.project_id = projects_by_collection.project_id |
| 329 | +join `oso_production.onchain_metrics_by_project_v1` onchain_metrics |
| 330 | + on onchain_projects.project_id = onchain_metrics.project_id |
| 331 | +join `oso_production.package_owners_v0` package_owners |
| 332 | + on sboms.to_package_artifact_name = package_owners.package_artifact_name |
| 333 | +join `oso_production.code_metrics_by_project_v1` code_metrics |
| 334 | + on package_owners.package_owner_project_id = code_metrics.project_id |
| 335 | +where |
| 336 | + projects_by_collection.collection_name = 'op-retrofunding-4' |
| 337 | + and transaction_count_6_months >= 1000 |
| 338 | + and address_count_90_days >= 420 |
| 339 | +``` |
| 340 | + |
| 341 | +</TabItem> |
| 342 | +<TabItem value="python" label="Python"> |
| 343 | + |
| 344 | +```python |
| 345 | +query = """ |
| 346 | + select |
| 347 | + onchain_projects.project_name as `onchain_builder`, |
| 348 | + onchain_metrics.event_source as `network`, |
| 349 | + onchain_metrics.address_count_90_days, |
| 350 | + onchain_metrics.gas_fees_sum_6_months, |
| 351 | + onchain_metrics.transaction_count_6_months as transactions_6_months, |
| 352 | + code_metrics.project_name as `dev_tool_maintainer`, |
| 353 | + package_owners.package_artifact_source as `package_source`, |
| 354 | + code_metrics.active_developer_count_6_months, |
| 355 | + code_metrics.contributor_count_6_months, |
| 356 | + code_metrics.commit_count_6_months, |
| 357 | + code_metrics.opened_issue_count_6_months, |
| 358 | + code_metrics.opened_pull_request_count_6_months, |
| 359 | + code_metrics.fork_count, |
| 360 | + code_metrics.star_count, |
| 361 | + code_metrics.last_updated_at_date |
| 362 | + from `oso_production.sboms_v0` sboms |
| 363 | + join `oso_production.projects_v1` onchain_projects |
| 364 | + on sboms.from_project_id = onchain_projects.project_id |
| 365 | + join `oso_production.projects_by_collection_v1` projects_by_collection |
| 366 | + on onchain_projects.project_id = projects_by_collection.project_id |
| 367 | + join `oso_production.onchain_metrics_by_project_v1` onchain_metrics |
| 368 | + on onchain_projects.project_id = onchain_metrics.project_id |
| 369 | + join `oso_production.package_owners_v0` package_owners |
| 370 | + on sboms.to_package_artifact_name = package_owners.package_artifact_name |
| 371 | + join `oso_production.code_metrics_by_project_v1` code_metrics |
| 372 | + on package_owners.package_owner_project_id = code_metrics.project_id |
| 373 | + where |
| 374 | + projects_by_collection.collection_name = 'op-retrofunding-4' |
| 375 | + and transaction_count_6_months >= 1000 |
| 376 | + and address_count_90_days >= 420 |
| 377 | +""" |
| 378 | +df = client.query(query).to_dataframe() |
| 379 | + |
| 380 | +# Optional: Add visualization code |
| 381 | +import plotly.express as px |
| 382 | + |
| 383 | +# Example visualization |
| 384 | +fig = px.scatter(df, |
| 385 | + x='address_count_90_days', |
| 386 | + y='transactions_6_months', |
| 387 | + size='gas_fees_sum_6_months', |
| 388 | + hover_data=['onchain_builder', 'dev_tool_maintainer'] |
| 389 | +) |
| 390 | +fig.show() |
| 391 | +``` |
| 392 | + |
| 393 | +</TabItem> |
| 394 | +</Tabs> |
| 395 | + |
| 396 | +You can go even further in your analysis by joining on other OSO datasets. For more examples, check out the [Deep Funding repo](https://github.com/deepfunding/dependency-graph). |
0 commit comments