Skip to content

Commit d7c1f1b

Browse files
committed
PostgreSQL database for Open Data
1 parent 042c316 commit d7c1f1b

File tree

18 files changed

+474
-3
lines changed

18 files changed

+474
-3
lines changed

opendata-python/Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@ venv:
22
pipenv install tox tox-pyenv twine
33

44
test: venv
5-
pipenv run tox
5+
docker-compose -f docker-compose.test.yaml up -d
6+
- pipenv run tox
7+
docker-compose down
68

79
build: venv
810
pipenv run python setup.py sdist bdist_wheel

opendata-python/README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,3 +166,65 @@ activities[99].metadata
166166
...
167167
}}
168168
```
169+
170+
### Connecting to a PostgreSQL database
171+
Although having all the Open Data files available as plain files on your computer has advantages (especially for less tech-savvy users), querying the data is slow and can be complicated.
172+
To overcome this, it is possible to store all the data in a [PostgreSQL](https://www.postgresql.org/) database as well.
173+
174+
Setting up PostgreSQL (documentation [here](https://www.postgresql.org/docs/11/tutorial-install.html)) can be hassle, so there is a `docker-compose.yaml` included in this repository that *should* work out of the box by running `docker-compose up` in the directory where the file is stored.
175+
I am not going into the rabbit hole of explaining how to install docker and docker-compose here (a quick search will yield enough results for that). One comment: On MacOS and Linux installation is mostly painless, on Windows it not always is and I would advice against using docker there.
176+
As an alternative, you can use a local installation of PostgreSQL (assuming username=opendata, password=password, database name=opendata by default).
177+
178+
When PostgreSQL is installed correctly and running, inserting data into the database is as easy as:
179+
```python
180+
from opendata import OpenData
181+
from opendata.db.main import OpenDataDB
182+
from opendata.models import LocalAthlete
183+
184+
od = OpenData()
185+
opendatadb = OpenDataDB()
186+
opendatadb.create_tables() # This is only needed once
187+
188+
athlete = od.get_remote_athlete('0031326c-e796-4f35-8f25-d3937edca90f')
189+
190+
opendatadb.insert_athlete(athlete)
191+
```
192+
Please note: This only inserts the athlete into the database, not the activities for this athlete.
193+
To add al the activities too:
194+
```python
195+
for activity in athlete.activities():
196+
opendatadb.insert_activity(activity, athlete)
197+
```
198+
199+
At this point there are 2 tables in the opendata database: "athletes" and "activities".
200+
The database schemas for both tables can be viewed [here](opendata/db/models.py).
201+
202+
If you are familiar with raw SQL you can query the database directly, but if you prefer to stay in Python land, I got you covered too: Under the hood this library uses the [SQLAlchemy](https://www.sqlalchemy.org/) ORM.
203+
For some general documentation on how that works, see [here](https://docs.sqlalchemy.org/en/latest/orm/tutorial.html).
204+
Querying the data is possible using SQLAlchemy's query language (documentation [here](https://docs.sqlalchemy.org/en/latest/orm/query.html)).
205+
206+
For example, to get a count of all activities that have power:
207+
```python
208+
from opendata.db import models
209+
from sqlalchemy.sql import not_
210+
211+
session = opendatadb.get_session()
212+
session.query(models.Activities).filter(not_(models.Activities.power.all('nan'))).count()
213+
```
214+
215+
Filters can be [chained](https://docs.sqlalchemy.org/en/latest/glossary.html#term-method-chaining) to apply multiple filters in one query:
216+
```python
217+
from datetime import datetime
218+
219+
from opendata.db import models
220+
from sqlalchemy.sql import not_
221+
222+
session = opendatadb.get_session()
223+
session.query(models.Activities).filter(Activities.datetime <= datetime(2017, 1, 1)).\
224+
filter(not_(models.Activities.power.all('nan'))).count()
225+
```
226+
227+
You can also query for nested keys/values in the metadata (stored in the "meta" column because SQLAlchemy uses the metadata column internally):
228+
```python
229+
session.query(models.Activity).filter(models.Activity.metrics.contains({'workout_time': '2703.00000'})).count()
230+
```
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
version: '3.3'
2+
3+
services:
4+
postgres:
5+
image: postgres
6+
restart: always
7+
ports:
8+
- "5433:5432"
9+
environment:
10+
POSTGRES_USER: opendata
11+
POSTGRES_PASSWORD: password
12+
POSTGRES_DB: opendata
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
version: '3.3'
2+
3+
services:
4+
postgres:
5+
image: postgres
6+
restart: always
7+
ports:
8+
- "5432:5432"
9+
volumes:
10+
- ./postgres-data:/var/lib/postgresql/data
11+
environment:
12+
POSTGRES_USER: opendata
13+
POSTGRES_PASSWORD: password
14+
POSTGRES_DB: opendata

opendata-python/opendata/conf.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,10 @@
2828
data_prefix='data',
2929
metadata_prefix='metadata',
3030
datasets_prefix='datasets',
31-
local_storage=config['Storage']['local_storage_path']
31+
local_storage=config['Storage']['local_storage_path'],
32+
db_host='localhost',
33+
db_port='5432',
34+
db_user='opendata',
35+
db_password='password',
36+
db_name='opendata',
3237
)

opendata-python/opendata/db/__init__.py

Whitespace-only changes.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
csv_to_db_mapping = {
2+
'secs': 'time',
3+
'km': 'distance',
4+
'spd': 'speed',
5+
'power': 'power',
6+
'cad': 'cadence',
7+
'hr': 'heartrate',
8+
'alt': 'altitude',
9+
'slope': 'slope',
10+
'temp': 'temperature',
11+
}
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
from contextlib import contextmanager
2+
3+
from sqlalchemy import create_engine
4+
from sqlalchemy.orm import sessionmaker
5+
6+
from opendata.conf import settings
7+
from opendata.utils import filename_to_datetime
8+
from . import models
9+
from .constants import csv_to_db_mapping
10+
11+
12+
class OpenDataDB:
13+
def __init__(self, host=settings.db_host, port=settings.db_port,
14+
user=settings.db_user, password=settings.db_password,
15+
database=settings.db_name):
16+
self.host = host
17+
self.port = port
18+
self.user = user
19+
self.password = password
20+
self.database = database
21+
self.Session = sessionmaker()
22+
23+
def get_engine(self):
24+
return create_engine(
25+
f'postgres://{self.user}:{self.password}@{self.host}:{self.port}/{self.database}'
26+
)
27+
28+
@contextmanager
29+
def engine(self):
30+
engine = self.get_engine()
31+
yield engine
32+
engine.dispose()
33+
34+
def get_session(self):
35+
return self.Session(bind=self.get_engine())
36+
37+
@contextmanager
38+
def session(self):
39+
session = self.get_session()
40+
yield session
41+
session.close()
42+
43+
def create_tables(self):
44+
with self.session() as session, self.engine() as engine:
45+
models.Base.metadata.create_all(engine)
46+
session.commit()
47+
48+
def insert_athlete(self, athlete):
49+
with self.session() as session:
50+
session.add(models.Athlete(
51+
id=athlete.id,
52+
meta=athlete.metadata
53+
))
54+
session.commit()
55+
56+
def insert_activity(self, activity, athlete=None):
57+
with self.session() as session:
58+
if activity.metadata is not None \
59+
and 'METRICS' in activity.metadata:
60+
metrics = activity.metadata.pop('METRICS')
61+
else:
62+
metrics = None
63+
64+
db_activity = models.Activity(
65+
id=activity.id,
66+
datetime=filename_to_datetime(activity.id),
67+
meta=activity.metadata,
68+
metrics=metrics,
69+
)
70+
71+
if athlete is not None:
72+
db_activity.athlete = athlete.id
73+
74+
for column in csv_to_db_mapping.keys():
75+
if column in activity.data:
76+
setattr(
77+
db_activity,
78+
csv_to_db_mapping[column],
79+
activity.data[column].values.tolist()
80+
)
81+
82+
session.add(db_activity)
83+
session.commit()
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
from sqlalchemy import Column, Float, ForeignKey, String
2+
from sqlalchemy.dialects import postgresql
3+
from sqlalchemy.types import DateTime
4+
from sqlalchemy.ext.declarative import declarative_base
5+
from sqlalchemy.orm import relationship
6+
7+
Base = declarative_base()
8+
9+
10+
class Athlete(Base):
11+
__tablename__ = 'athletes'
12+
13+
id = Column(String, primary_key=True)
14+
meta = Column(postgresql.JSONB)
15+
activities = relationship('Activity')
16+
17+
def __repr__(self):
18+
return f'<Athlete({self.id})'
19+
20+
21+
class Activity(Base):
22+
__tablename__ = 'activities'
23+
24+
id = Column(String, primary_key=True)
25+
athlete = Column(String, ForeignKey('athletes.id'))
26+
datetime = Column(DateTime)
27+
28+
meta = Column(postgresql.JSONB)
29+
metrics = Column(postgresql.JSONB)
30+
31+
time = Column(postgresql.ARRAY(Float, dimensions=1))
32+
distance = Column(postgresql.ARRAY(Float, dimensions=1))
33+
speed = Column(postgresql.ARRAY(Float, dimensions=1))
34+
power = Column(postgresql.ARRAY(Float, dimensions=1))
35+
cadence = Column(postgresql.ARRAY(Float, dimensions=1))
36+
heartrate = Column(postgresql.ARRAY(Float, dimensions=1))
37+
altitude = Column(postgresql.ARRAY(Float, dimensions=1))
38+
slope = Column(postgresql.ARRAY(Float, dimensions=1))
39+
temperature = Column(postgresql.ARRAY(Float, dimensions=1))
40+
41+
def __repr__(self):
42+
return f'<Activity({self.id})'

opendata-python/opendata/utils.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,12 @@ def date_string_to_filename(date_string):
1111
return suffix + '.csv'
1212

1313

14+
def filename_to_datetime(filename):
15+
return datetime.strptime(filename, FILENAME_FORMAT_WITH_EXTENSION)
16+
17+
1418
def filename_to_date_string(filename):
15-
dt = datetime.strptime(filename, FILENAME_FORMAT_WITH_EXTENSION)
19+
dt = filename_to_datetime(filename)
1620
return dt.strftime(DATE_STRING_FORMAT) + 'UTC'
1721

1822

0 commit comments

Comments
 (0)