Skip to content

Commit d9ac8e7

Browse files
committed
Visualizing tabular data
1 parent a26469b commit d9ac8e7

File tree

4 files changed

+334
-0
lines changed

4 files changed

+334
-0
lines changed

docs/_quarto.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,15 @@ book:
9292
text: "Exporting Data Frames"
9393

9494

95+
- part: "Visualizing Tabular Data"
96+
chapters:
97+
- href: notes/dataviz-tabular/overview.qmd
98+
text: "Data Visualization Overview"
99+
- href: notes/dataviz-tabular/trendlines.qmd
100+
text: "Charts with Trendlines"
101+
- href: notes/dataviz-tabular/candlesticks.qmd
102+
text: "Candlestick Charts"
103+
95104

96105
- part: "Time-series Data Analysis"
97106
chapters:
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
format:
3+
html:
4+
code-fold: false
5+
jupyter: python3
6+
execute:
7+
cache: true # re-render only when source changes
8+
---
9+
10+
# Candlestick Charts, Revisited
11+
12+
We have [previously studied](https://prof-rossetti.github.io/intro-software-dev-python-book/notes/dataviz/candlesticks.html) how to create candlestick charts using OHLC data. We can do this with tabular data as well.
13+
14+
Converting OHLC data to `DataFrame` object:
15+
16+
```{python}
17+
from pandas import DataFrame
18+
19+
ohlc_data = [
20+
{"date": "2030-03-16", "open": 236.2800, "high": 240.0550, "low": 235.9400, "close": 237.7100, "volume": 28092196},
21+
{"date": "2030-03-15", "open": 234.9600, "high": 235.1850, "low": 231.8100, "close": 234.8100, "volume": 26042669},
22+
{"date": "2030-03-12", "open": 234.0100, "high": 235.8200, "low": 233.2300, "close": 235.7500, "volume": 22653662},
23+
{"date": "2030-03-11", "open": 234.9600, "high": 239.1700, "low": 234.3100, "close": 237.1300, "volume": 29907586},
24+
{"date": "2030-03-10", "open": 237.0000, "high": 237.0000, "low": 232.0400, "close": 232.4200, "volume": 29746812}
25+
]
26+
df = DataFrame(ohlc_data)
27+
df.head()
28+
```
29+
30+
Because the `Candlestick` class comes from the Graph Objects sub-library, instead of passing the `DataFrame` object, we pass the columns directly:
31+
32+
```{python}
33+
from plotly.graph_objects import Figure, Candlestick
34+
35+
stick = Candlestick(x=df["date"],
36+
open=df["open"],
37+
high=df["high"],
38+
low=df["low"],
39+
close=df["close"]
40+
)
41+
42+
fig = Figure(data=[stick])
43+
fig.update_layout(title="Example Candlestick Chart")
44+
fig.show()
45+
```
Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
---
2+
format:
3+
html:
4+
code-fold: false
5+
jupyter: python3
6+
execute:
7+
cache: true # re-render only when source changes
8+
---
9+
10+
# Data Visualization with Tabular Data
11+
12+
We have [previously covered](https://prof-rossetti.github.io/intro-software-dev-python-book/notes/dataviz/overview.html) how to create data visualizations using the `plotly` package.
13+
14+
In that introductory chapter, we passed simple lists to the chart-making functions, however the `plotly` package provides an easy-to-use, intuitive interface when working with tabular data.
15+
16+
Now that we know how to work with `DataFrame` objects, let's revisit each of the previous examples, but this time using tabular data.
17+
18+
### Line Charts, Revisited
19+
20+
Starting with some example data, like before, this time we construct a `DataFrame` object from the data (because the data is in an eligible format, in this case a list of dictionaries):
21+
22+
```{python}
23+
from pandas import DataFrame
24+
25+
line_data = [
26+
{"date": "2020-10-01", "stock_price_usd": 100.00},
27+
{"date": "2020-10-02", "stock_price_usd": 101.01},
28+
{"date": "2020-10-03", "stock_price_usd": 120.20},
29+
{"date": "2020-10-04", "stock_price_usd": 107.07},
30+
{"date": "2020-10-05", "stock_price_usd": 142.42},
31+
{"date": "2020-10-06", "stock_price_usd": 135.35},
32+
{"date": "2020-10-07", "stock_price_usd": 160.60},
33+
{"date": "2020-10-08", "stock_price_usd": 162.62},
34+
]
35+
36+
df = DataFrame(line_data)
37+
df.head()
38+
```
39+
40+
If we construct a `DataFrame` from this data, we get to skip the mapping step, and move directly to the chart-making step.
41+
42+
Now we have a few options about how to pass this data to the chart-making function. We can use a `Series` oriented approach, or a `DataFrame` oriented approach.
43+
44+
#### `Series` Oriented Approach
45+
46+
In the `Series` oriented approach, we pass the columns to the chart-making function, because when representing a column, the series is list-like:
47+
48+
```{python}
49+
from plotly.express import line
50+
51+
fig = line(x=df["date"], y=df["stock_price_usd"], height=350,
52+
title="Stock Prices over Time",
53+
labels={"x": "Date", "y": "Stock Price ($)"}
54+
)
55+
fig.show()
56+
```
57+
58+
#### `DataFrame` Oriented Approach
59+
60+
Alternatively, we can use a `DataFrame` oriented approach where we pass the `DataFrame` as the first parameter to the chart-maker function.
61+
62+
63+
```{python}
64+
from plotly.express import line
65+
66+
fig = line(df, x="date", y="stock_price_usd", height=350,
67+
title="Stock Prices over Time",
68+
labels={"date": "Date", "stock_price_usd": "Stock Price ($)"}
69+
)
70+
fig.show()
71+
```
72+
73+
Notice, when we pass the `DataFrame` as the first parameter, now the `x` and `y` parameters refer to string column names in that `DataFrame` to be plotted on the `x` and `y` axis, respectively. The `labels` parameter keys now reference the column names as well.
74+
75+
For the remaining examples, we will use this `DataFrame` oriented approach.
76+
77+
### Bar Charts, Revisited
78+
79+
Constructing a `DataFrame` from the raw data:
80+
81+
```{python}
82+
bar_data = [
83+
{"genre": "Thriller", "viewers": 123456},
84+
{"genre": "Mystery", "viewers": 234567},
85+
{"genre": "Sci-Fi", "viewers": 987654},
86+
{"genre": "Fantasy", "viewers": 876543},
87+
{"genre": "Documentary", "viewers": 283105},
88+
{"genre": "Action", "viewers": 544099},
89+
{"genre": "Romantic Comedy", "viewers": 121212}
90+
]
91+
df = DataFrame(bar_data)
92+
df.head()
93+
```
94+
95+
Charting the data:
96+
97+
```{python}
98+
from plotly.express import bar
99+
100+
fig = bar(df, x="genre", y="viewers", height=350,
101+
title="Viewership by Genre",
102+
labels={"genre": "Genre", "viewers": "Viewers"}
103+
)
104+
fig.show()
105+
```
106+
107+
108+
#### Horizontal Bar Chart, Revisited
109+
110+
With categorical data, a horizontal bar chart can be a better choice than a vertical bar chart. Ideally, the bars are sorted so the largest are on top. This helps tell the story of which are the "top genres".
111+
112+
Before charting, we use a `pandas` sorting operation to get the bars in the right order:
113+
114+
```{python}
115+
df.sort_values(by="viewers", inplace=True)
116+
df.head()
117+
```
118+
119+
:::{.callout-warning title="Important Note"}
120+
Notice, here in order to get bars in DESCENDING order, we sort the data in ASCENDING order.
121+
122+
Oddly, and counter-intuitively, `plotly` plots the data in reverse order as it was passed in.
123+
:::
124+
125+
126+
```{python}
127+
fig = bar(df, y="genre", x="viewers", orientation="h", height=350,
128+
title="Viewership by Genre",
129+
labels={"genre": "Genre", "viewers": "Viewers"}
130+
)
131+
fig.show()
132+
```
133+
134+
### Scatter Plots, Revisited
135+
136+
Constructing a `DataFrame` from raw data:
137+
138+
```{python}
139+
scatter_data = [
140+
{"income": 30_000, "life_expectancy": 65.5},
141+
{"income": 35_000, "life_expectancy": 62.1},
142+
{"income": 50_000, "life_expectancy": 66.7},
143+
{"income": 55_000, "life_expectancy": 71.0},
144+
{"income": 70_000, "life_expectancy": 72.5},
145+
{"income": 75_000, "life_expectancy": 77.3},
146+
{"income": 90_000, "life_expectancy": 82.9},
147+
{"income": 95_000, "life_expectancy": 80.0},
148+
]
149+
df = DataFrame(scatter_data)
150+
df.head()
151+
```
152+
153+
Plotting the data:
154+
155+
```{python}
156+
from plotly.express import scatter
157+
158+
fig = scatter(df, x="income", y="life_expectancy", height=350,
159+
title="Life Expectancy by Income",
160+
labels={"income": "Income", "life_expectancy": "Life Expectancy (years)"}
161+
)
162+
fig.show()
163+
```
164+
165+
166+
167+
168+
### Pie Charts, Revisited
169+
170+
Constructing a `DataFrame` from raw data:
171+
172+
```{python}
173+
pie_data = [
174+
{"company": "Company X", "market_share": 0.55},
175+
{"company": "Company Y", "market_share": 0.30},
176+
{"company": "Company Z", "market_share": 0.15}
177+
]
178+
df = DataFrame(pie_data)
179+
df.head()
180+
```
181+
182+
183+
```{python}
184+
from plotly.express import pie
185+
186+
fig = pie(df, labels="company", values="market_share", height=350,
187+
title="Market Share by Company",
188+
)
189+
fig.show()
190+
```
191+
192+
193+
194+
195+
### Histograms, Revisited
196+
197+
Constructing a `DataFrame` from raw data:
198+
199+
```{python}
200+
histo_data = [
201+
{"user": "User A", "average_opinion": 0.1},
202+
{"user": "User B", "average_opinion": 0.4},
203+
{"user": "User C", "average_opinion": 0.4},
204+
{"user": "User D", "average_opinion": 0.8},
205+
{"user": "User E", "average_opinion": 0.86},
206+
{"user": "User F", "average_opinion": 0.75},
207+
{"user": "User G", "average_opinion": 0.90},
208+
{"user": "User H", "average_opinion": 0.99},
209+
]
210+
df = DataFrame(histo_data)
211+
df.head()
212+
```
213+
214+
Charting the data:
215+
216+
```{python}
217+
from plotly.express import histogram
218+
219+
fig = histogram(df, x="average_opinion", height=350, nbins=5,
220+
title="User Average Opinions",
221+
labels={"average_opinion": "Average Opinion"}
222+
)
223+
fig.show()
224+
```
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
format:
3+
html:
4+
code-fold: false
5+
jupyter: python3
6+
execute:
7+
cache: true # re-render only when source changes
8+
---
9+
10+
# Scatter Plot with Trendlines, Revisited
11+
12+
We have [previously studied](https://prof-rossetti.github.io/intro-software-dev-python-book/notes/dataviz/trendlines.html) how to create scatter plots with trendlines. We can do this with tabular data as well.
13+
14+
Constructing a `DataFrame` from raw data:
15+
16+
```{python}
17+
from pandas import DataFrame
18+
19+
scatter_data = [
20+
{"income": 30_000, "life_expectancy": 65.5},
21+
{"income": 35_000, "life_expectancy": 62.1},
22+
{"income": 50_000, "life_expectancy": 66.7},
23+
{"income": 55_000, "life_expectancy": 71.0},
24+
{"income": 70_000, "life_expectancy": 72.5},
25+
{"income": 75_000, "life_expectancy": 77.3},
26+
{"income": 90_000, "life_expectancy": 82.9},
27+
{"income": 95_000, "life_expectancy": 80.0},
28+
]
29+
df = DataFrame(scatter_data)
30+
df.head()
31+
```
32+
33+
Linear trends:
34+
35+
```{python}
36+
from plotly.express import scatter
37+
38+
fig = scatter(df, x="income", y="life_expectancy", height=350,
39+
title="Life Expectancy by Income",
40+
labels={"x": "Income", "life_expectancy": "Life Expectancy (years)"},
41+
trendline="ols", trendline_color_override="red"
42+
)
43+
fig.show()
44+
```
45+
46+
Non-linear trends:
47+
48+
49+
```{python}
50+
fig = scatter(df, x="income", y="life_expectancy", height=350,
51+
title="Life Expectancy by Income",
52+
labels={"x": "Income", "life_expectancy": "Life Expectancy (years)"},
53+
trendline="lowess", trendline_color_override="red"
54+
)
55+
fig.show()
56+
```

0 commit comments

Comments
 (0)