Skip to content

Commit e9e9911

Browse files
authored
csv-to-knowledge-graph (#52)
* move across csv-to-knowledge-graph * ci: add testing workflows for csv-to-kg * ci: * * ci: * * ci: only on changes to path * ci: add backend build * ci: * * ci: * * gitignore files * trunk fixes
1 parent ad04064 commit e9e9911

File tree

164 files changed

+27708
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

164 files changed

+27708
-1
lines changed
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Setup Project
2+
description: This action sets up the project with specified versions of PNPM and Node.js
3+
4+
inputs:
5+
pnpm-version:
6+
description: The version of pnpm to use
7+
required: false
8+
default: 10.6.0
9+
node-version:
10+
description: The version of Node.js to use
11+
required: false
12+
default: 22.13.0
13+
14+
runs:
15+
using: composite
16+
steps:
17+
- uses: pnpm/action-setup@v2
18+
with:
19+
version: ${{ inputs.pnpm-version }}
20+
- uses: actions/setup-node@v3
21+
with:
22+
node-version: ${{ inputs.node-version }}
23+
cache: pnpm
24+
cache-dependency-path: ./csv-to-knowledge-graph/pnpm-lock.yaml
25+
- name: Install dependencies
26+
run: pnpm install
27+
shell: bash
28+
working-directory: ./csv-to-knowledge-graph

.github/workflows/csv-to-kg.yml

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
name: Test and Build CSV to Knowledge Graph
2+
3+
permissions:
4+
contents: read
5+
packages: write
6+
actions: read
7+
8+
on:
9+
workflow_dispatch:
10+
push:
11+
branches: [main]
12+
paths:
13+
- csv-to-knowledge-graph/**
14+
- .github/actions/csv-to-kg/**
15+
- .github/workflows/csv-to-kg.yml
16+
pull_request:
17+
branches: [main]
18+
paths:
19+
- csv-to-knowledge-graph/**
20+
- .github/actions/csv-to-kg/**
21+
- .github/workflows/csv-to-kg.yml
22+
23+
env:
24+
MODUS_DIR: ""
25+
26+
jobs:
27+
build-all:
28+
if: "!contains(github.event.head_commit.message, '[skip-ci]')"
29+
runs-on: ubuntu-latest
30+
defaults:
31+
run:
32+
working-directory: ./csv-to-knowledge-graph
33+
steps:
34+
- uses: actions/checkout@v4
35+
- uses: ./.github/actions/csv-to-kg
36+
- name: Build All
37+
run: pnpm build
38+
39+
package-tests:
40+
if: "!contains(github.event.head_commit.message, '[skip-ci]')"
41+
runs-on: ubuntu-latest
42+
needs: build-all
43+
defaults:
44+
run:
45+
working-directory: ./csv-to-knowledge-graph
46+
steps:
47+
- uses: actions/checkout@v4
48+
- uses: ./.github/actions/csv-to-kg
49+
- name: Build Packages
50+
run: pnpm build:packages
51+
- name: Test Packages
52+
run: pnpm test:packages
53+
54+
integration-tests:
55+
if: "!contains(github.event.head_commit.message, '[skip-ci]')"
56+
runs-on: ubuntu-latest
57+
needs: build-all
58+
defaults:
59+
run:
60+
working-directory: ./csv-to-knowledge-graph
61+
steps:
62+
- uses: actions/checkout@v4
63+
- uses: ./.github/actions/csv-to-kg
64+
- name: Build Packages
65+
run: pnpm build:packages
66+
- name: Setup Dgraph
67+
run: |
68+
# Create directory for Dgraph data
69+
mkdir -p ~/csv_graph
70+
71+
# Ensure Docker Compose is installed
72+
docker compose --version
73+
74+
# Start Dgraph using the existing docker compose file
75+
docker compose up -d
76+
77+
# Wait for Dgraph to be ready
78+
echo "Waiting for Dgraph to start..."
79+
80+
# Retry health check up to 10 times with 5 second intervals
81+
for i in {1..10}; do
82+
if curl -s localhost:8080/health > /dev/null; then
83+
echo "Dgraph is ready!"
84+
break
85+
fi
86+
87+
if [ $i -eq 10 ]; then
88+
echo "Dgraph failed to start. Showing logs:"
89+
docker compose logs
90+
exit 1
91+
fi
92+
93+
echo "Waiting for Dgraph to be ready (attempt $i/10)..."
94+
sleep 5
95+
done
96+
97+
# Wait for Dgraph to be ready (adjust sleep time as needed)
98+
echo "Waiting for Dgraph to start..."
99+
sleep 30
100+
101+
# Verify Dgraph is running
102+
curl -s localhost:8080/health || (docker compose logs && exit 1)
103+
104+
- name: Test Integration
105+
run: pnpm test:integration
106+
107+
- name: Cleanup Dgraph
108+
if: always()
109+
run: |
110+
echo "Stopping Dgraph containers..."
111+
docker compose down
112+
echo "Dgraph containers stopped."
113+
114+
modus-build:
115+
name: modus build
116+
if: "!contains(github.event.head_commit.message, '[skip-ci]')"
117+
runs-on: ubuntu-latest
118+
steps:
119+
- name: Checkout code
120+
uses: actions/checkout@v4
121+
with:
122+
submodules: recursive
123+
124+
- name: Set Modus directory
125+
run: |
126+
echo "MODUS_DIR=./csv-to-knowledge-graph/modus" >> "$GITHUB_ENV"
127+
128+
- name: Setup Node
129+
uses: actions/setup-node@v4
130+
with:
131+
node-version: "22"
132+
133+
- name: Setup Go
134+
uses: actions/setup-go@v5
135+
136+
- name: Setup TinyGo
137+
uses: acifani/setup-tinygo@v2
138+
with:
139+
tinygo-version: 0.34.0
140+
141+
- name: Build project
142+
run: npx -p @hypermode/modus-cli -y modus build
143+
working-directory: ${{ env.MODUS_DIR }}
144+
shell: bash

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,5 @@ go.work.sum
3030
**/build
3131
**/node_modules
3232
**/.next
33+
dist/
34+
repomix-output.xml

.trunk/configs/.markdownlint.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
# Prettier friendly markdownlint config (all formatting rules disabled)
22
extends: markdownlint/style/prettier
3+
MD033: false

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,18 @@
33
A set of tools and experimental projects for Dgraph.
44

55
## [data-import/csv-to-rdf](./data-import/csv-to-rdf/README.md)
6+
67
import tools: handle any set of CSV files and produce triples in RDF format using mapping template files.
78

89
## [docker/foaf_graph](./docker/foaf_graph/README.md)
10+
911
A self-contained (Docker) Friend-of-a-Friend graph comprising a three-group Dgraph cluster pre-populated with data. Also in the Docker image is an instance of Jupyter Lab. Several Notebooks illustrate querying with GraphQL and DQL.
1012

1113
## [docker/standalone_bulk_loader](./docker/standalone_bulk_loader/README.md)
12-
A Docker image for learning Dgraph which automatically `bulk loads` the data and schemas present in the `import` folder on start up if no data is present (no p directory).
1314

15+
A Docker image for learning Dgraph which automatically `bulk loads` the data and schemas present in the `import` folder on start up if no data is present (no p directory).
1416

17+
## [csv-to-knowledge-graph](./csv-to-knowledge-graph/README.md)
1518

19+
Create Dgraph backed knowledge graphs from CSV files.
20+
Built with Hypermode and powered by AI.

csv-to-knowledge-graph/README.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# csv-to-knowledge-graph
2+
3+
<p align=center>
4+
<img width="80%" src="docs/images/banner-dark.png#gh-dark-mode-only" alt="csv-to-knowledge-graph"/>
5+
<img width="80%" src="docs/images/banner-white.png#gh-light-mode-only" alt="csv-to-knowledge-graph"/>
6+
</p>
7+
8+
<div align=center>
9+
<h3>Create Dgraph backed knowledge graphs from CSV files. </h3>
10+
11+
<p>Built with <a href="https://hypermode.com/">Hypermode</a> and powered by AI.</p>
12+
13+
<p> 👉
14+
<a href="https://csv-to-knowledge-graph-frontend.vercel.app">Import my CSV now!</a>
15+
</p>
16+
</div>
17+
18+
## Table of Contents
19+
20+
- [csv-to-knowledge-graph](#csv-to-knowledge-graph)
21+
- [Table of Contents](#table-of-contents)
22+
- [Features](#features)
23+
- [Introduction](#introduction)
24+
- [What problem does this solve?](#what-problem-does-this-solve)
25+
- [How does it work?](#how-does-it-work)
26+
- [Under the Hood](#under-the-hood)
27+
- [AI-Powered Analysis](#ai-powered-analysis)
28+
- [RDF Generation](#rdf-generation)
29+
- [Query Generation](#query-generation)
30+
- [CSV-to-RDF Library](#csv-to-rdf-library)
31+
- [RDF-to-Dgraph Library](#rdf-to-dgraph-library)
32+
- [Usage](#usage)
33+
- [Powered by Hypermode](#powered-by-hypermode)
34+
35+
> Looking to contribute? Check out the [Contributing Guide](docs/CONTRIBUTING.md) for more information on how to get started with development.
36+
37+
## Features
38+
39+
- 🚀 **Browser-Based CSV Processing** - Upload CSV files directly in your browser.
40+
41+
- 🧠 **AI-Powered Graph Generation** - Auto-detect entities and relationships from CSV columns.
42+
43+
- 🔍 **Interactive Graph Visualization** - Zoom, pan, and reposition nodes in your knowledge graph.
44+
45+
- 🔄 **RDF Template Generation** - Create RDF templates from your graph structure.
46+
47+
- 📝 **RDF Data Conversion** - Transform CSV to RDF with real-time progress tracking.
48+
49+
- 🔌 **Dgraph Integration** - Connect, test, and import data to your Dgraph instance.
50+
51+
- 💡 **DQL Query Generation** - Get auto-generated queries specific to your schema.
52+
53+
- 🔗 **Ratel Support** - Open queries in Dgraph's Ratel UI with one click.
54+
55+
- 🧩 **Modular Architecture** - Separate packages for CSV-to-RDF, RDF-to-Dgraph, and graph handling.
56+
57+
## Introduction
58+
59+
### What problem does this solve?
60+
61+
Getting data into Dgraph is unnecessarily complicated. Currently, you need to:
62+
63+
- Manually create schema files that correctly model your data
64+
- Learn RDF (Resource Description Framework) format and its quirks
65+
- Run complicated command-line tools with cryptic options
66+
- Write custom scripts to transform your data that often break
67+
68+
This creates a steep learning curve that prevents many organizations from using graph databases effectively. CSV to Knowledge Graph removes these barriers by providing a simple, visual way to transform regular CSV files into graph data.
69+
70+
### How does it work?
71+
72+
1. **Upload your CSV**: Simply drag and drop your CSV file into the browser
73+
2. **AI analyzes your data**: Our AI examines your column names to automatically identify entities and relationships
74+
3. **Visual graph preview**: See and interact with the proposed knowledge graph structure
75+
4. **Generate RDF**: Convert your CSV data to the RDF format Dgraph requires
76+
5. **One-click import**: Connect to your Dgraph instance and import with a single click
77+
78+
Here's a simple example of how a CSV file:
79+
80+
```sh
81+
Order_ID,Customer_Name,Product_Name,Quantity,Price
82+
ORD-001,John Smith,Wireless Earbuds,1,79.99
83+
ORD-002,Sarah Johnson,Smart Watch,1,249.99
84+
```
85+
86+
Gets transformed into RDF triples:
87+
88+
```sh
89+
<_:Customer_John_Smith> <dgraph.type> "Customer" .
90+
<_:Customer_John_Smith> <Customer.name> "John Smith" .
91+
92+
<_:Order_ORD-001> <dgraph.type> "Order" .
93+
<_:Order_ORD-001> <Order.id> "ORD-001" .
94+
<_:Order_ORD-001> <PLACED_BY> <_:Customer_John_Smith> .
95+
96+
<_:Product_Wireless_Earbuds> <dgraph.type> "Product" .
97+
<_:Product_Wireless_Earbuds> <Product.name> "Wireless Earbuds" .
98+
<_:Product_Wireless_Earbuds> <Product.price> "79.99" .
99+
<_:Product_Wireless_Earbuds> <Product.quantity> "1" .
100+
101+
<_:Product_Wireless_Earbuds> <BELONGS_TO_ORDER> <_:Order_ORD-001> .
102+
```
103+
104+
<p align=center style="margin-top: 20px; margin-bottom: 20px;">
105+
<img width="80%" src="docs/images/root-graph.png" alt="csv-to-knowledge-graph"/>
106+
</p>
107+
108+
## Under the Hood
109+
110+
### AI-Powered Analysis
111+
112+
Our AI analyzes your CSV column names to intelligently identify entities, attributes, and potential relationships. It recognizes patterns like `Customer_Name` or `Order_ID` to generate a coherent graph structure that represents the real-world relationships in your data.
113+
114+
This analysis happens entirely in your browser. The AI examines column naming patterns, value types, and contextual relationships to build a comprehensive graph model that serves as the foundation for RDF generation.
115+
116+
### RDF Generation
117+
118+
The RDF generation pipeline automatically creates a template that maps CSV data to a valid RDF format compatible with Dgraph. This includes:
119+
120+
- Generating appropriate entity types based on column groupings
121+
- Creating unambiguous relationship predicates with proper direction
122+
- Defining attribute mappings between CSV columns and RDF properties
123+
- Setting correct data types for each attribute
124+
125+
The template is then applied to your CSV data, converting each row into a set of RDF triples that preserve the semantic structure of your information.
126+
127+
### Query Generation
128+
129+
Once your data is imported, the application automatically generates useful Dgraph Query Language (DQL) queries customized to your specific data model. These queries are designed to:
130+
131+
- Showcase common query patterns for your specific entity types
132+
- Demonstrate traversal across relationships in your knowledge graph
133+
- Include appropriate filters and aggregations based on your data structure
134+
135+
Each query can be immediately executed or opened in Dgraph's Ratel UI for further exploration and modification.
136+
137+
### CSV-to-RDF Library
138+
139+
The `csv-to-rdf` library processes CSV files entirely in the browser, using a template-based approach to transform tabular data into RDF triples. It features:
140+
141+
- Memory-efficient chunking to handle large CSV files without browser crashes
142+
- Real-time progress tracking ideal for responsive UI feedback
143+
- Template-based transformation that replaces column placeholders with actual values
144+
- Support for complex entity relationships and data type conversions
145+
- Streaming processing with minimal memory footprint
146+
147+
### RDF-to-Dgraph Library
148+
149+
The `rdf-to-dgraph` library enables direct browser-to-Dgraph communication without requiring a backend server:
150+
151+
- Uses standard browser fetch APIs to connect directly to Dgraph's HTTP endpoints
152+
- Handles authentication, schema setup, and data import through a browser-compatible interface
153+
- Provides detailed import statistics and progress tracking
154+
- Automatically sets up appropriate relationship directives in the schema
155+
- Works around browser limitations by breaking large mutations into manageable chunks
156+
- Fetches current schema information and node type counts for query generation
157+
158+
## Usage
159+
160+
👉 [Import my CSV now!](https://csv-to-knowledge-graph-frontend.vercel.app)
161+
162+
## Powered by Hypermode
163+
164+
Built with ❤️ by [Hypermode](https://hypermode.com/).

0 commit comments

Comments
 (0)