Skip to content

Commit d41e246

Browse files
Merge pull request #695 from dgraph-io/raphael/olap
create learning on olap from blog post
2 parents 48cde7a + 8f9d28f commit d41e246

File tree

2 files changed

+211
-0
lines changed

2 files changed

+211
-0
lines changed

content/learn/data-engineer/_index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,5 @@ with Dgraph from your application.
2323

2424

2525
### In this section
26+
27+
- [Unlocking Analytical Power with Dgraph]({{< relref "analytical-power-dgraph.md" >}}) - A technical guide on using Dgraph for OLAP use cases and analytical solutions
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
+++
2+
title = "Unlocking Analytical Power with Dgraph"
3+
description = "A technical guide on using Dgraph for Online Analytical Processing (OLAP) use cases, leveraging graph structure and DQL for comprehensive analytical solutions."
4+
date = "2024-01-15T10:00:00Z"
5+
type = "learn"
6+
weight = 4
7+
8+
[menu.learn]
9+
parent = "learn-data-engineer"
10+
name = "Analytical Power with Dgraph"
11+
weight = 4
12+
13+
+++
14+
15+
In this guide, we explore how Dgraph, a graph database optimized for Online Transaction Processing (OLTP) and deeply nested queries, can also be used effectively for Online Analytical Processing (OLAP) use cases. We'll highlight Dgraph's analytical capabilities through examples and practical techniques for designing analytical solutions without the need for an additional OLAP solution.
16+
17+
## What is OLTP vs. OLAP?
18+
19+
**OLTP (Online Transaction Processing)** focuses on processing day-to-day transactions, while **OLAP (Online Analytical Processing)** is geared toward analyzing data from multiple sources to support business decision-making.
20+
21+
Dgraph, though primarily designed for OLTP, has robust features that make it capable of addressing OLAP needs by leveraging its graph structure and DQL (Dgraph Query Language).
22+
23+
## Relationships Form the Dimensionality
24+
25+
In Dgraph, relationships between nodes naturally form the dimensions required for OLAP-style analysis.
26+
27+
DQL's aggregation and math functions, combined with thoughtful graph design, allow you to create a comprehensive analytical solution directly within Dgraph.
28+
29+
The examples below use a dataset about donations to public schools in the U.S. built from public data provided by DonorsChoose.org in a [Kaggle project dataset](https://www.kaggle.com/datasets/hanselhansel/donorschoose). You can also find the data ready to load into Dgraph in the [Dgraph benchmarks GitHub repository](https://github.com/hypermodeinc/dgraph-benchmarks/tree/main/donors).
30+
31+
## Example: Basic Count of Projects per School
32+
33+
To count the number of projects per school, you can use the following DQL query:
34+
35+
```graphql
36+
{
37+
stats(func: type(School)) {
38+
School.name
39+
count(~Project.school)
40+
}
41+
}
42+
```
43+
44+
This query returns school names and the corresponding project counts:
45+
46+
```json
47+
{
48+
"data": {
49+
"stats": [
50+
{ "School.name": "Abbott Middle School", "count(~Project.school)": 16 },
51+
{ "School.name": "Lincoln Elementary School", "count(~Project.school)": 7 },
52+
{ "School.name": "Rosemont Early Education Center", "count(~Project.school)": 5 }
53+
]
54+
}
55+
}
56+
```
57+
58+
## Customizing Query Results for Visualization
59+
60+
DQL's structure allows you to align query responses with the format needed for visualization tools. For instance, to use the query result in a Python script with Plotly, you can modify the query:
61+
62+
```graphql
63+
{
64+
school(func: type(School)) {
65+
category: School.name
66+
value: count(~Project.school)
67+
}
68+
}
69+
```
70+
71+
Using this result, you can create a bar chart in Python:
72+
73+
```python
74+
import plotly.express as px
75+
import pandas as pd
76+
77+
def bar_chart(payload, title='Bar Chart'):
78+
df = pd.json_normalize(payload['school'])
79+
fig = px.bar(df, y='category', x='value', title=title, orientation='h', text_auto=True)
80+
fig.show()
81+
82+
# Query result
83+
res = {
84+
"school": [
85+
{"category": "Abbott Middle School", "value": 16},
86+
{"category": "Lincoln Elementary School", "value": 7},
87+
{"category": "Rosemont Early Education Center", "value": 5}
88+
]
89+
}
90+
91+
bar_chart(res, "Number of Projects per School")
92+
```
93+
94+
## Advanced Aggregations and Variables
95+
96+
Dgraph variables add flexibility by enabling filtering, ordering, and querying additional data. Here's an example that counts projects per school and orders them by project count:
97+
98+
```graphql
99+
{
100+
var(func: type(School)) {
101+
c as count(~Project.school)
102+
}
103+
serie(func: uid(c), orderdesc: val(c)) {
104+
category: School.name
105+
project_count: val(c)
106+
}
107+
}
108+
```
109+
110+
## Grouping and Filtering by Dimensions
111+
112+
Dgraph's [@groupby directive]({{< relref "/dql/query/directive/groupby.md" >}}) allows for powerful OLAP-style groupings. Here's an example of counting nodes by type:
113+
114+
```graphql
115+
{
116+
stats(func: has(dgraph.type)) @groupby(dgraph.type) {
117+
count: count(uid)
118+
}
119+
}
120+
```
121+
122+
The response includes counts for each type, such as City, School, and Project. Additionally, you can use filtering to focus on specific dimensions.
123+
124+
## Complex Aggregations: Hierarchical Data
125+
126+
To analyze hierarchical data, such as the number and sum of donations by state and city, you can design queries that traverse node relationships:
127+
128+
```graphql
129+
{
130+
var(func: type(State)) {
131+
city: ~City.state {
132+
~School.city {
133+
School.projects {
134+
Project.donations {
135+
a as Donation.amount
136+
}
137+
s as sum(val(a))
138+
c as count(Project.donations)
139+
}
140+
s1 as sum(val(s))
141+
c1 as sum(val(c))
142+
}
143+
s2 as sum(val(s1))
144+
c2 as sum(val(c1))
145+
}
146+
s3 as sum(val(s2))
147+
c3 as sum(val(c2))
148+
}
149+
stats(func: type(State)) {
150+
state: State.name
151+
amount: val(s3)
152+
count: val(c3)
153+
city: ~City.state {
154+
City.name
155+
amount: val(s2)
156+
count: val(c2)
157+
}
158+
}
159+
}
160+
```
161+
162+
## Multi-Dimensional Analysis
163+
164+
When multiple dimensions, such as school and category, are involved but not directly related in the graph, you can split the analysis into multiple queries and combine the results in your application. Here's an example query for donations per school within a specific category:
165+
166+
```graphql
167+
query stat_per_school_for_category($category: string) {
168+
var(func: eq(Category.name, $category)) {
169+
c1_projects as ~Project.category {
170+
c1_schools as Project.school
171+
}
172+
}
173+
stats(func: uid(c1_schools)) {
174+
School.name
175+
total_donation: sum(val(c1_projects))
176+
}
177+
}
178+
```
179+
180+
The results can then be visualized as a bubble chart in Python:
181+
182+
```python
183+
import plotly.express as px
184+
import pandas as pd
185+
186+
# Example data
187+
data = [
188+
{"Category": "Literacy", "School": "Abbott Middle", "Total Donation": 500},
189+
{"Category": "Math", "School": "Lincoln Elementary", "Total Donation": 300}
190+
]
191+
192+
df = pd.DataFrame(data)
193+
fig = px.scatter(
194+
df, x='School', y='Category', size='Total Donation', title='Donations by School and Category'
195+
)
196+
fig.show()
197+
```
198+
199+
## Conclusion
200+
201+
Dgraph's flexible graph model and powerful DQL capabilities make it a great choice for analytical use cases. By leveraging its inherent relationships, variables, and aggregation functions, you can create insightful and efficient OLAP-style analyses directly within Dgraph. Whether it's basic counts, hierarchical aggregations, or multi-dimensional data, Dgraph offers a seamless and performant solution for your analytical needs.
202+
203+
## Related Topics
204+
205+
- [DQL Query Language]({{< relref "/dql/_index.md" >}})
206+
- [Aggregation Functions]({{< relref "/dql/query/functions.md#aggregation-functions" >}})
207+
- [@groupby Directive]({{< relref "/dql/query/directive/groupby.md" >}})
208+
- [Query Variables]({{< relref "/dql/query/variables.md" >}})
209+
- [Dgraph Overview]({{< relref "/dgraph-overview.md" >}})

0 commit comments

Comments
 (0)