SELECT
'PGCon' as conference_name,
2023 as edition,
'Data Science with Postgres: Introductory Training' as workshop_title,
'Jônatas Davi Paganini' as author,
'jonatas@timescale.com' as author_mailIs NY getting warmer?
WITH ny_summer AS (
SELECT *
FROM public.weather_metrics
WHERE city_name = 'New York'
AND EXTRACT(MONTH FROM time) IN (6, 7, 8) -- Summer months (June, July, August)
AND EXTRACT(HOUR FROM (time + timezone_shift::text::interval)) BETWEEN 12 AND 16 -- Hot hours (12:00 PM to 4:00 PM)
)
SELECT time_bucket('1 year', time) AS x,
AVG(temp_c) AS y
FROM ny_summer
GROUP BY 1
ORDER BY 1WITH ny_winter AS (
SELECT *
FROM public.weather_metrics
WHERE city_name = 'New York'
AND EXTRACT(MONTH FROM time) IN (12, 1, 2) -- Winter months (December, January, February)
AND EXTRACT(HOUR FROM (time + timezone_shift::text::interval)) BETWEEN 0 AND 4 -- Cold hours (12:00 AM to 4:00 AM)
)
SELECT time_bucket('1 year', time) AS x,
AVG(temp_c) AS y
FROM ny_winter
GROUP BY 1
ORDER BY 1;* Postgresql since 2004.
-
Backend developer
-
Ruby/Shell/Postgresql/Vim
- Developer Advocate at Timescale
- Introduction to Data Science - 1h
- Exploring the Time Series dataset - 1h
- Practical exercises with the dataset - 1h
- Knowledge sharing - 0.5h
we can have short breaks each hour
-
Data Science and its Applications in various domains
- Focus of Today will be a weather analysis.
-
Introduction to SQL and Postgresql
- Overview of SQL, its syntax and use cases.
- Introduction to Postgresql database and its features.
-
Overview of Timescaledb and its Features
- Explanation of what Timescaledb is and how it extends Postgresql for time-series data.
- Overview of the features and benefits of using Timescaledb for time-series data analysis.
Data science is an interdisciplinary field that involves using statistical and computational methods to extract insights and knowledge from data.
In the context of weather analysis, data science can be used to
- analyze weather metrics
- predict future weather patterns
We're not going to focus on prediction Today.
Source is from open weather:
- Free data from entire world.
- Free API.
- Statistics from anywhere.
- Time-series data.
The focus will be weather metrics.
Interact with open weather dataset via psql:
psql openweatheryou can use your favorite tool if you want ;)
Use createdb openweather in case you don't have it yet.
you can use your favorite tool if you want ;)
If you don't have timescaledb installed on your database, enable the extension:
CREATE EXTENSION timescaledb;You can skip this step if you're using Timescale Cloud or Timescale docker images.
Download the repository:
https://github.com/jonatas/sql-data-science-training
Use the schema.sql file for the next few steps if you want. Our steps will
be:
- Create table
- Create indices
- Transform the table into hypertable
CREATE TABLE public.weather_metrics (
"time" timestamp without time zone NOT NULL,
timezone_shift integer,
city_name text,
temp_c double precision,
feels_like_c double precision,
temp_min_c double precision,
temp_max_c double precision,
pressure_hpa double precision,
humidity_percent double precision,
wind_speed_ms double precision,
wind_deg integer,
rain_1h_mm double precision,
rain_3h_mm double precision,
snow_1h_mm double precision,
snow_3h_mm double precision,
clouds_percent integer,
weather_type_id integer
);SELECT create_hypertable('weather_metrics', 'time',
chunk_time_interval => INTERVAL '1 month');We'll use the CSVs in the data folder from
https://github.com/jonatas/sql-data-science-training.
\i schema.sql
\COPY weather_metrics FROM './data/nairobi.csv' DELIMITER ',' CSV HEADER;
\COPY weather_metrics FROM './data/new_york.csv' DELIMITER ',' CSV HEADER;
\COPY weather_metrics FROM './data/toronto.csv' DELIMITER ',' CSV HEADER;
\COPY weather_metrics FROM './data/stockholm.csv' DELIMITER ',' CSV HEADER;
\COPY weather_metrics FROM './data/princeton.csv' DELIMITER ',' CSV HEADER;
\COPY weather_metrics FROM './data/vienna.csv' DELIMITER ',' CSV HEADER;copy commands should be executed line by line
On psql we can enable timing to check the performance of every command:
\timing
SELECT count(1) FROM weather_metrics;Timescaledb offers a different counting approach that is very approximate to real counter.
SELECT approximate_row_count('weather_metrics');$ md-show global_warming.md "postgres://jonatasdp@localhost:5432/openweather"
SELECT
x, random() as y
FROM
generate_series(
now() - INTERVAL '20 seconds',
now(), INTERVAL '1 second') xSELECT
'sin' as name,
array_agg(now() + make_interval(hours=>i)) as x,
array_agg(sin(i/ 12.0 * PI())) as y
FROM generate_series(1,168) i
UNION ALL
SELECT
'cos' as name,
array_agg(now() + make_interval(hours=>i)) as x,
array_agg(cos(i/ 12.0 * PI())) as y
FROM generate_series(1,168) i;Type will refer to type chart.
SELECT
'bar' as type,
array_agg(random() * 100) as y,
array_agg(g) as x
FROM
generate_series(
now() - INTERVAL '1 hour',
now(),
INTERVAL '1 minute') g GROUP BY 1Title will inject the title in the layout.
SELECT
'Total records per city' as title,
'bar' as type,
city_name as x,
count(*) y
FROM weather_metrics
GROUP BY 1,2,3;Name will make it the series name.
WITH resume as (
SELECT city_name AS name,
time_bucket('1 year', time) AS x,
avg(temp_c) as y
FROM weather_metrics
GROUP BY 1,2
ORDER BY 1,2
)
SELECT name, array_agg(x) as x, array_agg(y) as y
FROM resume
GROUP BY 1select time_bucket('1 month', time) as x,
AVG(temp_c) as y
FROM weather_metrics
WHERE time BETWEEN '2022-01-01' and '2023-01-02'
AND city_name = 'New York'
GROUP BY 1
ORDER BY 1;select time_bucket('1 hour', time) as x,
avg(temp_c) as y
from weather_metrics
where time between '2022-01-01' and '2023-01-02'
and city_name = 'New York'
group by 1
order by 1;how many records are available per year?
select time_bucket('1 year', time) as x,
count(*) as y,
'bar' as type
from weather_metrics
group by 1,3
order by 1Describe the weather_metrics table:
\d weather_metricsThe Hypertable 5WH!
- Who: The timescaledb extension
- What: Hypertable
- When: you need to handle time-series data (insert, select, update, delete)
- Where: In your PostgreSQL database
- Why: to optimize time-series throughtput
- How: using table partitions to compress, parallelize and manage smaller chunks of data.
\d+ weather_metricsLet's start exploring the Time Series Data and answer a few questions.
Answer with sql
- How to adapt the code to work the city time zone?
- How many different cities are available?
- When the data starts and when it ends?
- How many records we have per city?
- What was the average temperature of New York in January in the last 10 years.
- What is the hottest and coldest city we're tracking?
- What is the city that rains more?
- Choose a city and investigate the season of the city?
SELECT time + timezone_shift::text::interval AS time,
temp_c AS temperature,
city_name AS city
FROM weather_metrics ORDER BY 1 DESC LIMIT 5;What are the name of the cities available?
SELECT DISTINCT city_name FROM weather_metrics;How many different cities are available?
SELECT COUNT(DISTINCT city_name) FROM weather_metrics;When the data starts and when it ends?
SELECT MIN(time), MAX(time) FROM weather_metrics;How many records we have per city?
SELECT city_name, count(*) FROM weather_metrics GROUP BY 1;What was the average temperature over all times?
select city_name, avg(temp_c) from weather_metrics group by 1;Time Series Analysis in SQL and Timescaledb
What was the average temperature of NY in January of 2022?
select avg(temp_c) from weather_metrics
where city_name = 'New York'
and time between '2022-01-01' and '2022-01-31';What was the average temperature of New York in January in the last 10 years.
What is the hottest and coldest city we're tracking?
What is the city that rains more?
Choose a city and investigate the season of the city?
Create views for:
- ny view: fix the time zone shift
- ny_winter: filter only winter days
- ny_summer: filter only summer days
Use CTEs for minor queries that you're not going to explore further.
Hypertable will partition date by time interval.
Let's run EXPLAIN ANALYZE in the previous query:
EXPLAIN ANALYZE
select avg(temp_c) from weather_metrics
where city_name = 'New York'
and time between '2022-01-01' and '2022-01-31';Understanding a bit of the execution plan:
EXPLAIN SELECT count(1) FROM weather_metrics;Partitioned tables can divide and conquer!
-> Parallel Append (cost=0.29..56957.24 rows=1023114 width=0) │
│ -> Parallel Index Only Scan using _hyper_1_...idx on _hyper_1_...
Get average of temperature grouped by one hour.
SELECT time_bucket('1 hour', time) AS bucket,
avg(temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01' AND '2022-06-02'
GROUP BY 1 ORDER BY 1;The time_bucket also supports timestamps with time zones.
Now, let's get a bit more details adding the min AND max:
SELECT time_bucket('1 hour'::interval, time) AS bucket,
avg(temp_c)::numeric(4,2),
min(temp_c), max(temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01' AND '2022-06-02'
GROUP BY 1 ORDER BY 1;Candlestick pattern: capture values grouped by a timeframe.
OHLC = Open, High, Low, Close
SELECT time_bucket('1 hour'::interval, time) AS bucket,
first(temp_c, time) as open,
max(temp_c) as high,
min(temp_c) as low,
last(temp_c, time) as close
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01 06:00:00' AND '2022-06-02 12:00:00'
GROUP BY 1 ORDER BY 1;You can also use first and last to find the open/close inside a time_bucket.
Later we'll see the
candlestick_aggfrom the toolkit extension.
Now, we can also check the standard deviation:
SELECT time_bucket('1 hour'::interval, time) AS bucket,
avg(temp_c)::numeric(4,2),
min(temp_c), max(temp_c), stddev(temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01' AND '2022-06-02'
GROUP BY 1 ORDER BY 1;Now going deep into individual values inside this hour:
SELECT time_bucket('1 hour'::interval, time) AS bucket,
array_agg( temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01 00:00:00' AND '2022-06-01 01:00:00'
GROUP BY 1 ORDER BY 1;Enable the toolkit extension:
CREATE EXTENSION timescaledb_toolkit;- Statistic functions
To get the percentile_agg function an overview:
SELECT time_bucket('1 hour'::interval, time) AS bucket,
percentile_agg( temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1;The functions with _agg suffix' indicates that several statistical aggregates
can be pre-computed and save computing later.
They also support the pipeline operator.
Extracting the median from the percentile:
SELECT time_bucket('1 month'::interval, time) AS x,
approx_percentile(0.5, percentile_agg( temp_c)) as y
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2021-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1;Allows functional programming in SQL with the pipeline operator ->.
SELECT time_bucket('1 month'::interval, time) AS x,
percentile_agg( temp_c) -> approx_percentile(0.5) as y
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2021-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1;Now, getting quartiles AND median from percentiles:
SELECT time_bucket('1 month'::interval, time) AS bucket,
approx_percentile(0.25, percentile_agg( temp_c)) AS q_1,
approx_percentile(0.5, percentile_agg( temp_c)) AS median,
approx_percentile(0.75, percentile_agg( temp_c)) AS q3
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2021-06-01 00:00:00' AND '2022-06-01 01:00:00'
GROUP BY 1 ORDER BY 1;Pre-compute aggregations with CTE can reuse the previous calculated percentile_agg:
WITH one_month AS (
SELECT time_bucket('1 month'::interval, time) AS bucket,
percentile_agg( temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2021-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1
)
SELECT bucket as x,
approx_percentile(0.25, percentile_agg) AS y,
approx_percentile(0.5, percentile_agg) AS y_median,
approx_percentile(0.75, percentile_agg) AS y_q3,
approx_percentile(0.99, percentile_agg) AS y_99
FROM one_month;Statistical aggregates in one or two dimensions to pre-compute statistics summary.
SELECT time_bucket('1 hour'::interval, time) AS bucket,
stats_agg( temp_c) AS hourly_agg
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1Compute an average from stats aggs:
SELECT time_bucket('1 hour'::interval, time) AS x,
average(stats_agg( temp_c)) AS y
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1Using CTE to reuse the stats aggs pre-computed data:
WITH one_day AS (
SELECT time_bucket('1 hour'::interval, time) AS bucket,
stats_agg( temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2022-06-01 00:00:00-04' AND '2022-06-01 23:59:59-04'
GROUP BY 1
ORDER BY 1
)
SELECT bucket as x, average(stats_agg) as y FROM one_day;Rollup can combine stats aggs in different time frames:
WITH hourly AS (
SELECT time_bucket('1 hour'::interval, time) AS hour_bucket,
stats_agg( temp_c)
FROM weather_metrics
WHERE city_name = 'New York'
AND time between '2022-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1
)
SELECT time_bucket('1 day', hour_bucket) as x,
average(rollup(stats_agg)) as y
FROM hourly GROUP BY 1 ORDER BY 1;Cascading rollups can reuse previous stats aggs:
WITH hourly AS ( SELECT time_bucket('1 hour'::interval, time) AS bucket,
stats_agg( temp_c) AS hourly_agg
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2021-06-01 00:00:00' AND '2022-07-01 01:00:00'
GROUP BY 1 ORDER BY 1
),
daily AS ( SELECT time_bucket('1 day', bucket) AS bucket,
rollup(hourly_agg) AS daily_agg
FROM hourly GROUP BY 1
),
monthly AS ( SELECT time_bucket('1 month', bucket) AS bucket,
rollup(daily_agg) AS monthly_agg
FROM daily GROUP BY 1
)
SELECT bucket as x, average(monthly_agg) as y from monthly;Adding variance AND stddev without expensive computing process:
-- previous stats aggs example
SELECT bucket,
average(monthly_agg),
variance(monthly_agg),
stddev(monthly_agg)
FROM monthly;Querying number of values from pre-computed stats aggs:
WITH hourly AS (
SELECT time_bucket('1 hour'::interval, time) AS bucket,
stats_agg( temp_c) AS hourly_agg
FROM weather_metrics
WHERE city_name = 'New York'
AND time BETWEEN '2023-02-01 00:00:00' AND '2023-02-01 12:00:00'
GROUP BY 1 ORDER BY 1
)
SELECT bucket, average(hourly_agg), num_vals(hourly_agg) from hourly;AKA Continuous Aggregates ;)
Materialized views for hypertables.
CREATE MATERIALIZED VIEW ny_hourly_agg
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour'::interval, time) AS bucket,
stats_agg( temp_c) AS hourly_agg
FROM weather_metrics
WHERE city_name = 'New York'
GROUP BY 1;Materialized data can be combined with real time data from open timeframes.
CREATE MATERIALIZED VIEW ny_daily_agg
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 day',bucket),
rollup(hourly_agg) AS daily_agg
FROM ny_hourly_agg group by 1;CREATE MATERIALIZED VIEW weather_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour'::interval, time),
city_name,
candlestick_agg(time, temp_c, 1) as candlestick
FROM weather_metrics
GROUP BY 1,2 WITH DATA;SELECT
time_bucket as x,
array_agg(open(candlestick)) as open,
array_agg(high(candlestick)) as high,
array_agg(low(candlestick)) as low,
array_agg(close(candlestick)) as close
FROM weather_hourly
WHERE city_name = 'New York'
GROUP BY 1
ORDER BY time_bucket DESC LIMIT 30;WITH ny AS (
SELECT
time_bucket ('1 month', time),
avg(temp_c)
FROM
weather_metrics
WHERE
city_name = 'New York'
AND time BETWEEN '2010-01-01' AND '2021-01-01'
GROUP BY 1
),
nai AS (
SELECT
time_bucket ('1 month', time),
avg(temp_c)
FROM
weather_metrics
WHERE
city_name = 'Nairobi'
AND time BETWEEN '2010-01-01' AND '2021-01-01'
GROUP BY 1
)
SELECT
time_bucket ('1 y', ny.time_bucket) as x,
covariance(stats_agg(ny.avg, nai.avg)) as y,
corr(stats_agg(ny.avg, nai.avg)) as y_covariance
FROM ny
JOIN nai ON ny.time_bucket = nai.time_bucket
GROUP BY 1;Enable tablefunc to use crosstab:
CREATE EXTENSION tablefunc;WITH city_names AS (
SELECT DISTINCT city_name as name
FROM weather_metrics order by 1
)
SELECT a.name as first, b.name as second
FROM city_names a
CROSS JOIN city_names b;SELECT * FROM crosstab($$
WITH city_names AS (
SELECT DISTINCT city_name as name
FROM weather_metrics order by 1
),
pairs as (
SELECT a.name as first, b.name as second
FROM city_names a
CROSS JOIN city_names b
),
summary AS (
SELECT time_bucket('1 h', time), city_name,
avg(temp_c)
FROM weather_metrics
WHERE time BETWEEN '2010-01-01' AND '2021-01-01'
GROUP BY 1,2
ORDER BY 1,2)
SELECT
a.city_name as first, b.city_name as second,
corr(stats_agg(a.avg, b.avg))
FROM pairs
JOIN summary a ON (pairs.first = a .city_name)
JOIN summary b ON (pairs.second = b.city_name AND a.time_bucket = b.time_bucket)
--WHERE b.city_name = 'New York' and a.city_name = 'Nairobi'
GROUP BY 1,2
order by 1, 2
$$::text,
'select distinct city_name from weather_metrics order by 1'::text
) as ct(city_name text,
"Nairobi" double precision,
"New York" double precision,
"Princeton" double precision,
"Stockholm" double precision,
"Toronto" double precision,
"Vienna" double precision);with resume as (
select time_bucket('1 hour', time),
percentile_agg(temp_c)
from weather_metrics
where time between '2023-01-02' and '2023-01-03'
and city_name = 'New York'
group by 1 order by 1
)
select 'highest' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.99, percentile_agg)) as y
from resume group by 1
union all
select 'lowest' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.01, percentile_agg)) as y
from resume group by 1
order by 1,2;with resume as (
select time_bucket('1 month', time),
percentile_agg(temp_c)
from weather_metrics
where time between '2010-01-01' and '2023-01-02'
and city_name = 'New York'
group by 1 order by 1
)
select 'highest' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.99, percentile_agg)) as y
from resume group by 1
union all
select 'lowest' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.01, percentile_agg)) as y
from resume group by 1
order by 1,2;with resume as (
select time_bucket('1 year', time),
percentile_agg(temp_c)
from weather_metrics
where time between '1978-01-01' and '2023-01-02'
and city_name = 'New York'
group by 1 order by 1
)
select 'highest' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.99, percentile_agg)) as y
from resume group by 1
union all
select 'lowest' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.01, percentile_agg)) as y
from resume group by 1
order by 1,2;with resume as (
select time_bucket('1 month', time),
percentile_agg(temp_c),
avg(temp_c) as avg
from weather_metrics
where time between '2022-01-01' and '2023-01-02'
and city_name = 'New York'
group by 1 order by 1
),
median as (
select 'median' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.5, percentile_agg)) as y
from resume
),
average as (
select 'average' as name,
array_agg(time_bucket) as x,
array_agg(avg) as y
from resume
)
SELECT * FROM median UNION ALL
SELECT * FROM average ;with resume as (
select time_bucket('1 day', time),
percentile_agg(temp_c),
avg(temp_c) as avg
from weather_metrics
where time between '2022-01-01' and '2023-01-02'
and city_name = 'New York'
group by 1 order by 1
),
median as (
select 'median' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.5, percentile_agg)) as y
from resume
),
average as (
select 'average' as name,
array_agg(time_bucket) as x,
array_agg(avg) as y
from resume
)
SELECT * FROM median
UNION ALL
SELECT * FROM average ;with resume as (
select time_bucket('1 year', time),
percentile_agg(temp_c),
uddsketch(200, 0.001, temp_c),
tdigest(200, temp_c),
avg(temp_c)
from weather_metrics
where -- time between '2022-01-01' and '2023-01-02' and
city_name = 'New York'
group by 1 order by 1
),
median as (
select 'median' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.5, percentile_agg)) as y
from resume
),
median_uddsketch as (
select 'median uddsketch' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.5, uddsketch)) as y
from resume
),
median_tdigest as (
select 'median uddsketch' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.5, tdigest)) as y
from resume
),
average as (
select 'average' as name,
array_agg(time_bucket) as x,
array_agg(avg) as y
from resume
)
SELECT * FROM median UNION ALL
SELECT * FROM median_uddsketch UNION ALL
SELECT * FROM median_tdigest UNION ALL
SELECT * FROM average ;with resume as (
select time_bucket('1 month', time),
percentile_agg(temp_c),
avg(temp_c) as avg
from weather_metrics
where time between '2022-01-01' and '2023-01-02'
and city_name = 'New York'
group by 1 order by 1
),
p1 as (
select 'p1' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.01, percentile_agg)) as y
from resume
),
p99 as (
select 'p99' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.99, percentile_agg)) as y
from resume
),
median as (
select 'median' as name,
array_agg(time_bucket) as x,
array_agg(approx_percentile(0.5, percentile_agg)) as y
from resume
),
average as (
select 'average' as name,
array_agg(time_bucket) as x,
array_agg(avg) as y
from resume
)
SELECT * FROM median UNION ALL
SELECT * FROM average UNION ALL
SELECT * FROM p1 UNION ALL
SELECT * FROM p99with ny as (
select (lttb(time, temp_c, 300) -> unnest()).* as pair
from weather_metrics
where time between '1978-01-01' and '2023-01-01'
and city_name = 'New York'
) select ny.time as x, ny.value as y from ny;What if I compare lttb with averages from buckets?
WITH resume AS (
SELECT city_name as name, (lttb(time, temp_c, 720) -> unnest()).* as pair
FROM weather_metrics
WHERE time BETWEEN '2022-01-01' AND '2023-01-01'
GROUP BY 1
) SELECT name,
array_agg(time) AS x,
array_agg(value) as y
FROM resume
GROUP BY 1;Practical exercises to perform Time Series analysis in SQL and Timescaledb.
- Hands-on exercises to perform Time Series analysis on the weather dataset.
- Aggregating data with tookit.
- Approximate Percentiles
- Creating candlesticks (OHLC charts)
- Correlation matrix: Creating a correlation matrix in Postgresql from the correlation coefficient
- Downsampling: Using the LTTB function to reduce large datasets without losing visual similarity with the original data
- Presentations and knowledge sharing by the participants
Opportunities for students to present their findings and share their knowledge with the rest of the class.
- https://github.com/timescale/timescaledb
- https://github.com/timescale/timescaledb-toolkit
- https://timescale.com/community
- https://docs.timescale.com/
- @jonatasdp on {Twitter,Linkedin}
- Github: @jonatas