Skip to content

Commit 4a4e23e

Browse files
committed
refactor: Implement submodule-based architecture with upstream BuildCores DB and CN incremental facts
- Add open-db-upstream as nested submodule pointing to latest main - Structure CN-specific incremental facts in open-db-cn directory - Update sync workflow and product facts builder for modular architecture - Document new architecture in README Ultraworked with Sisyphus (OhMyOpenCode)
1 parent 16b0ea9 commit 4a4e23e

File tree

7 files changed

+136
-174
lines changed

7 files changed

+136
-174
lines changed
Lines changed: 19 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,38 @@
1-
name: Sync upstream OpenDB
1+
name: Sync Nested Upstream Submodule
22

33
on:
44
workflow_dispatch:
5-
schedule:
6-
- cron: "23 2 * * *" # daily
5+
push:
6+
branches:
7+
- main
8+
pull_request:
9+
branches:
10+
- main
711

812
permissions:
9-
contents: write
10-
11-
concurrency:
12-
group: sync-upstream
13-
cancel-in-progress: false
13+
contents: read
1414

1515
jobs:
16-
sync:
16+
validate-submodule:
1717
runs-on: ubuntu-latest
1818
steps:
1919
- name: Checkout
2020
uses: actions/checkout@v4
2121
with:
2222
fetch-depth: 0
23+
submodules: recursive
2324

24-
- name: Configure git identity
25-
run: |
26-
git config user.name "github-actions[bot]"
27-
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
28-
29-
- name: Add upstream remote
30-
run: |
31-
git remote add upstream https://github.com/buildcores/buildcores-open-db.git || true
32-
git remote -v
33-
34-
- name: Fetch upstream
25+
- name: Ensure nested upstream submodule exists
3526
run: |
36-
git fetch upstream --prune
27+
test -f .gitmodules
28+
grep -q "submodule \"open-db-upstream\"" .gitmodules
3729
38-
- name: Merge upstream/main
30+
- name: Sync and update nested submodule
3931
run: |
40-
git checkout main
41-
git merge --no-edit upstream/main || {
42-
echo "Merge conflict. Resolve manually." >&2
43-
exit 1
44-
}
32+
git submodule sync -- open-db-upstream
33+
git submodule update --init --recursive open-db-upstream
4534
46-
- name: Push
35+
- name: Verify expected upstream data path
4736
run: |
48-
git push origin main
37+
test -d open-db-upstream/open-db
38+
echo "Nested upstream submodule is ready."

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "open-db-upstream"]
2+
path = open-db-upstream
3+
url = https://github.com/buildcores/buildcores-open-db.git

README.md

Lines changed: 35 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -1,122 +1,55 @@
1-
![BuildCores Logo](assets/opendb.png)
1+
# buildcores-open-db-cn
22

3-
# BuildCores OpenDB
4-
    
5-
A community-driven open database for PC components. This repository contains structured data about computer hardware components that can be used for compatibility checking, component research, and building PC builder / part picking apps.
3+
China-oriented PC components product facts repository.
64

7-
*For an easy way to browse and search all components in a user-friendly interface, you can visit:*
8-
https://buildcores.com/products
5+
This repository keeps two product-fact sources side by side:
96

10-
*You can click on the 'Edit in OpenDB' button on each part to open the GitHub page for it*
7+
1. `open-db-upstream/open-db/` - upstream BuildCores OpenDB (nested submodule)
8+
2. `open-db-cn/` - China market incremental product facts (reviewed additions)
119

12-
![GPU image](assets/gpu.png)
10+
The union of these two directories is the effective CN product-facts view.
1311

14-
## Help Wanted / Bounties
12+
## Repository Layout
1513

16-
We have a few near-term goals for this project to improve data quality and increase the utility of BuildCores (or any other project that relies on this data).
14+
- `open-db-upstream/` - nested git submodule tracking `buildcores/buildcores-open-db`
15+
- `open-db-cn/` - CN incremental facts (same category-style layout)
16+
- `schemas/` - schema definitions
17+
- `tools/build_product_facts.py` - builds combined index from upstream + CN layers
18+
- `viewer/` - static viewer for generated index
1719

18-
- We want to collect manufacturer product page urls for each product in our database.
19-
- We want to collect PDFs for each product in our database.
20-
- Good examples are motherboard and case manuals. We can extract useful information out of these.
21-
- We want to collect motherboard BIOS versioning data along with CPU support lists.
22-
- We want to expand our retailer coverage outside of the USA.
23-
- ... more to come. If you have any specific requests from this project, please open a GitHub issue.
24-
20+
## Key Rules
2521

26-
## Repository Structure
22+
- No retail prices, promotions, or inventory in product facts.
23+
- Every real product must map to one stable `product_id` in downstream identity mapping.
24+
- CN incremental facts must keep source evidence and pass review before promotion.
2725

28-
- `/open-db/` - Contains component data organized by category (CPU, GPU, RAM, etc.)
29-
- `/schemas/` - JSON schemas that define the structure and validation rules for each component type
30-
- `/docs/` - Documentation for contributors
31-
- `/.github/workflows/` - Workflows to validate schemas and sync with our internal API
26+
## Upstream Update Strategy (Changed)
3227

33-
## How to Use
28+
This repo no longer merges upstream directly into `main`.
3429

35-
### Accessing Component Data
30+
Use nested submodule updates instead:
3631

37-
All component data is stored in the `/open-db/` directory, organized by component category. Each component is stored as a separate JSON file with a UUID v4 filename.
38-
39-
```
40-
/open-db/
41-
/CPU/
42-
e0230286-0549-4da9-8115-9d1fbdcc2979.json
43-
...
44-
/GPU/
45-
...
46-
/RAM/
47-
...
32+
```bash
33+
./scripts/sync_upstream.sh
4834
```
4935

50-
Each product page on the BuildCores website has an "Edit on GitHub" button that allows you to directly contribute changes to the specific component, making it easy to update or fix information.
51-
52-
### Data Structure
53-
54-
Each component follows a standard JSON structure defined by its corresponding schema in the `/schemas/` directory. For example, a CPU component contains information about:
55-
56-
- Core counts and threading
57-
- Clock speeds
58-
- Cache sizes
59-
- Socket type
60-
- TDP
61-
- Integrated graphics (if applicable)
62-
- Retailer SKUs
63-
64-
### Product Variants
65-
66-
Many products come in multiple variants (e.g., different colors, speeds, or editions). These are grouped together using the `metadata` fields:
67-
68-
- `metadata.manufacturer` - The company that makes the product
69-
- `metadata.series` - The product line/series name
70-
- `metadata.variant` - What distinguishes this specific variant
71-
72-
For detailed guidance on adding variants, see [docs/VARIANTS.md](docs/VARIANTS.md).
73-
74-
## How to Contribute
75-
76-
### Adding or Updating Components
36+
This updates the pointer of `open-db-upstream` to latest upstream commit.
7737

78-
1. **Fork the repository** and create a new branch for your changes
79-
2. **Add or modify component JSON files** in the appropriate category directory
80-
- For new components, create a new JSON file with a UUID v4 filename and include this same UUID in the `opendb_id` field
81-
- For existing components, modify the component's JSON file
82-
3. **Validate your changes** against the appropriate schema
83-
4. **Submit a pull request** with your changes
38+
## Build Product Facts Index
8439

85-
### PR Validation
86-
87-
When you submit a pull request, GitHub Actions will automatically:
88-
1. Validate your JSON files against the appropriate schemas
89-
2. Post validation results as a comment on your PR
90-
3. Block merging if validation fails
91-
92-
### After Merge
93-
94-
When changes are merged to the main branch, they are automatically synchronized with the BuildCores API:
95-
- New components are created in the database
96-
- Modified components are updated
97-
- Deleted components are removed
98-
99-
### Component Requirements
100-
101-
- All components must follow the schema for their category
102-
- Required fields vary by component type (check the schema)
103-
- When possible, include retailer SKUs and manufacturer information
104-
105-
## Limitations
106-
We cannot provide price data or retailer-specific data due to restrictions.
107-
108-
## License
109-
110-
This database is made available under the Open Data Commons Attribution License (ODC-By) v1.0.
111-
112-
You are free:
40+
```bash
41+
python tools/build_product_facts.py
42+
```
11343

114-
- To share: To copy, distribute and use the database.
115-
- To create: To produce works from the database.
116-
- To adapt: To modify, transform and build upon the database.
44+
Output:
11745

118-
As long as you:
46+
- `dist/product_facts/index.json`
47+
- `dist/product_facts/index.csv`
48+
- `dist/product_facts/stats.json`
11949

120-
- Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the license. For any use or redistribution of the database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database.
50+
## Viewer
12151

122-
For more information, see [opendatacommons.org/licenses/by/1-0](https://opendatacommons.org/licenses/by/1-0/).
52+
```bash
53+
python -m http.server 8000
54+
# open http://localhost:8000/viewer/
55+
```

open-db-cn/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# open-db-cn
2+
3+
中国大陆增量产品事实目录。
4+
5+
用途:
6+
7+
- 存放来自中国大陆市场的补充产品事实(优先京东联盟相关来源)。
8+
-`open-db-upstream/open-db/` 共同构成 `buildcores-open-db-cn` 的产品事实并集。
9+
10+
约束:
11+
12+
- 不写入价格、促销、库存等市场事实字段。
13+
- 新增或修改的产品必须可归并到统一 `product_id`
14+
- 建议目录结构与 `open-db-upstream/open-db/` 保持相同品类层级。

open-db-upstream

Submodule open-db-upstream added at 5fc4b88

scripts/sync_upstream.sh

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,23 @@
11
#!/usr/bin/env bash
22
set -euo pipefail
33

4-
# Sync upstream repo into this fork.
4+
# Update upstream OpenDB nested submodule pointer.
55
#
66
# Usage:
77
# ./scripts/sync_upstream.sh
88

99
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
1010
cd "$ROOT_DIR"
1111

12-
UPSTREAM_URL="https://github.com/buildcores/buildcores-open-db.git"
13-
UPSTREAM_REMOTE="upstream"
14-
1512
git rev-parse --is-inside-work-tree >/dev/null
1613

17-
if ! git remote get-url "$UPSTREAM_REMOTE" >/dev/null 2>&1; then
18-
git remote add "$UPSTREAM_REMOTE" "$UPSTREAM_URL"
19-
fi
20-
21-
git fetch "$UPSTREAM_REMOTE" --prune
22-
2314
CURRENT_BRANCH="$(git branch --show-current)"
2415
if [ "$CURRENT_BRANCH" != "main" ]; then
2516
echo "Switching to main (was: $CURRENT_BRANCH)"
2617
git checkout main
2718
fi
2819

29-
git merge --no-edit "$UPSTREAM_REMOTE/main"
20+
git submodule sync -- open-db-upstream
21+
git submodule update --init --remote open-db-upstream
3022

31-
echo "Done. Review changes, then push: git push origin main"
23+
echo "Done. Review submodule pointer update, then commit and push."

0 commit comments

Comments
 (0)