|
| 1 | ++++ |
| 2 | +title = "Unlocking Analytical Power with Dgraph" |
| 3 | +description = "A technical guide on using Dgraph for Online Analytical Processing (OLAP) use cases, leveraging graph structure and DQL for comprehensive analytical solutions." |
| 4 | +date = "2024-01-15T10:00:00Z" |
| 5 | +type = "learn" |
| 6 | +weight = 4 |
| 7 | + |
| 8 | +[menu.learn] |
| 9 | + parent = "learn-data-engineer" |
| 10 | + name = "Analytical Power with Dgraph" |
| 11 | + weight = 4 |
| 12 | + |
| 13 | ++++ |
| 14 | + |
| 15 | +In this guide, we explore how Dgraph, a graph database optimized for Online Transaction Processing (OLTP) and deeply nested queries, can also be used effectively for Online Analytical Processing (OLAP) use cases. We'll highlight Dgraph's analytical capabilities through examples and practical techniques for designing analytical solutions without the need for an additional OLAP solution. |
| 16 | + |
| 17 | +## What is OLTP vs. OLAP? |
| 18 | + |
| 19 | +**OLTP (Online Transaction Processing)** focuses on processing day-to-day transactions, while **OLAP (Online Analytical Processing)** is geared toward analyzing data from multiple sources to support business decision-making. |
| 20 | + |
| 21 | +Dgraph, though primarily designed for OLTP, has robust features that make it capable of addressing OLAP needs by leveraging its graph structure and DQL (Dgraph Query Language). |
| 22 | + |
| 23 | +## Relationships Form the Dimensionality |
| 24 | + |
| 25 | +In Dgraph, relationships between nodes naturally form the dimensions required for OLAP-style analysis. |
| 26 | + |
| 27 | +DQL's aggregation and math functions, combined with thoughtful graph design, allow you to create a comprehensive analytical solution directly within Dgraph. |
| 28 | + |
| 29 | +The examples below use a dataset about donations to public schools in the U.S. built from public data provided by DonorsChoose.org in a [Kaggle project dataset](https://www.kaggle.com/datasets/hanselhansel/donorschoose). You can also find the data ready to load into Dgraph in the [Dgraph benchmarks GitHub repository](https://github.com/hypermodeinc/dgraph-benchmarks/tree/main/donors). |
| 30 | + |
| 31 | +## Example: Basic Count of Projects per School |
| 32 | + |
| 33 | +To count the number of projects per school, you can use the following DQL query: |
| 34 | + |
| 35 | +```graphql |
| 36 | +{ |
| 37 | + stats(func: type(School)) { |
| 38 | + School.name |
| 39 | + count(~Project.school) |
| 40 | + } |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +This query returns school names and the corresponding project counts: |
| 45 | + |
| 46 | +```json |
| 47 | +{ |
| 48 | + "data": { |
| 49 | + "stats": [ |
| 50 | + { "School.name": "Abbott Middle School", "count(~Project.school)": 16 }, |
| 51 | + { "School.name": "Lincoln Elementary School", "count(~Project.school)": 7 }, |
| 52 | + { "School.name": "Rosemont Early Education Center", "count(~Project.school)": 5 } |
| 53 | + ] |
| 54 | + } |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +## Customizing Query Results for Visualization |
| 59 | + |
| 60 | +DQL's structure allows you to align query responses with the format needed for visualization tools. For instance, to use the query result in a Python script with Plotly, you can modify the query: |
| 61 | + |
| 62 | +```graphql |
| 63 | +{ |
| 64 | + school(func: type(School)) { |
| 65 | + category: School.name |
| 66 | + value: count(~Project.school) |
| 67 | + } |
| 68 | +} |
| 69 | +``` |
| 70 | + |
| 71 | +Using this result, you can create a bar chart in Python: |
| 72 | + |
| 73 | +```python |
| 74 | +import plotly.express as px |
| 75 | +import pandas as pd |
| 76 | + |
| 77 | +def bar_chart(payload, title='Bar Chart'): |
| 78 | + df = pd.json_normalize(payload['school']) |
| 79 | + fig = px.bar(df, y='category', x='value', title=title, orientation='h', text_auto=True) |
| 80 | + fig.show() |
| 81 | + |
| 82 | +# Query result |
| 83 | +res = { |
| 84 | + "school": [ |
| 85 | + {"category": "Abbott Middle School", "value": 16}, |
| 86 | + {"category": "Lincoln Elementary School", "value": 7}, |
| 87 | + {"category": "Rosemont Early Education Center", "value": 5} |
| 88 | + ] |
| 89 | +} |
| 90 | + |
| 91 | +bar_chart(res, "Number of Projects per School") |
| 92 | +``` |
| 93 | + |
| 94 | +## Advanced Aggregations and Variables |
| 95 | + |
| 96 | +Dgraph variables add flexibility by enabling filtering, ordering, and querying additional data. Here's an example that counts projects per school and orders them by project count: |
| 97 | + |
| 98 | +```graphql |
| 99 | +{ |
| 100 | + var(func: type(School)) { |
| 101 | + c as count(~Project.school) |
| 102 | + } |
| 103 | + serie(func: uid(c), orderdesc: val(c)) { |
| 104 | + category: School.name |
| 105 | + project_count: val(c) |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +## Grouping and Filtering by Dimensions |
| 111 | + |
| 112 | +Dgraph's [@groupby directive]({{< relref "/dql/query/directive/groupby.md" >}}) allows for powerful OLAP-style groupings. Here's an example of counting nodes by type: |
| 113 | + |
| 114 | +```graphql |
| 115 | +{ |
| 116 | + stats(func: has(dgraph.type)) @groupby(dgraph.type) { |
| 117 | + count: count(uid) |
| 118 | + } |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +The response includes counts for each type, such as City, School, and Project. Additionally, you can use filtering to focus on specific dimensions. |
| 123 | + |
| 124 | +## Complex Aggregations: Hierarchical Data |
| 125 | + |
| 126 | +To analyze hierarchical data, such as the number and sum of donations by state and city, you can design queries that traverse node relationships: |
| 127 | + |
| 128 | +```graphql |
| 129 | +{ |
| 130 | + var(func: type(State)) { |
| 131 | + city: ~City.state { |
| 132 | + ~School.city { |
| 133 | + School.projects { |
| 134 | + Project.donations { |
| 135 | + a as Donation.amount |
| 136 | + } |
| 137 | + s as sum(val(a)) |
| 138 | + c as count(Project.donations) |
| 139 | + } |
| 140 | + s1 as sum(val(s)) |
| 141 | + c1 as sum(val(c)) |
| 142 | + } |
| 143 | + s2 as sum(val(s1)) |
| 144 | + c2 as sum(val(c1)) |
| 145 | + } |
| 146 | + s3 as sum(val(s2)) |
| 147 | + c3 as sum(val(c2)) |
| 148 | + } |
| 149 | + stats(func: type(State)) { |
| 150 | + state: State.name |
| 151 | + amount: val(s3) |
| 152 | + count: val(c3) |
| 153 | + city: ~City.state { |
| 154 | + City.name |
| 155 | + amount: val(s2) |
| 156 | + count: val(c2) |
| 157 | + } |
| 158 | + } |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +## Multi-Dimensional Analysis |
| 163 | + |
| 164 | +When multiple dimensions, such as school and category, are involved but not directly related in the graph, you can split the analysis into multiple queries and combine the results in your application. Here's an example query for donations per school within a specific category: |
| 165 | + |
| 166 | +```graphql |
| 167 | +query stat_per_school_for_category($category: string) { |
| 168 | + var(func: eq(Category.name, $category)) { |
| 169 | + c1_projects as ~Project.category { |
| 170 | + c1_schools as Project.school |
| 171 | + } |
| 172 | + } |
| 173 | + stats(func: uid(c1_schools)) { |
| 174 | + School.name |
| 175 | + total_donation: sum(val(c1_projects)) |
| 176 | + } |
| 177 | +} |
| 178 | +``` |
| 179 | + |
| 180 | +The results can then be visualized as a bubble chart in Python: |
| 181 | + |
| 182 | +```python |
| 183 | +import plotly.express as px |
| 184 | +import pandas as pd |
| 185 | + |
| 186 | +# Example data |
| 187 | +data = [ |
| 188 | + {"Category": "Literacy", "School": "Abbott Middle", "Total Donation": 500}, |
| 189 | + {"Category": "Math", "School": "Lincoln Elementary", "Total Donation": 300} |
| 190 | +] |
| 191 | + |
| 192 | +df = pd.DataFrame(data) |
| 193 | +fig = px.scatter( |
| 194 | + df, x='School', y='Category', size='Total Donation', title='Donations by School and Category' |
| 195 | +) |
| 196 | +fig.show() |
| 197 | +``` |
| 198 | + |
| 199 | +## Conclusion |
| 200 | + |
| 201 | +Dgraph's flexible graph model and powerful DQL capabilities make it a great choice for analytical use cases. By leveraging its inherent relationships, variables, and aggregation functions, you can create insightful and efficient OLAP-style analyses directly within Dgraph. Whether it's basic counts, hierarchical aggregations, or multi-dimensional data, Dgraph offers a seamless and performant solution for your analytical needs. |
| 202 | + |
| 203 | +## Related Topics |
| 204 | + |
| 205 | +- [DQL Query Language]({{< relref "/dql/_index.md" >}}) |
| 206 | +- [Aggregation Functions]({{< relref "/dql/query/functions.md#aggregation-functions" >}}) |
| 207 | +- [@groupby Directive]({{< relref "/dql/query/directive/groupby.md" >}}) |
| 208 | +- [Query Variables]({{< relref "/dql/query/variables.md" >}}) |
| 209 | +- [Dgraph Overview]({{< relref "/dgraph-overview.md" >}}) |
0 commit comments