Skip to content

Commit df6ada5

Browse files
authored
more robust parsing of user counts (#113)
The latest userstats-relay-country.csv file has a row with scientific notation. Parsing as a float first prevents failing on that.
1 parent 223f606 commit df6ada5

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

tornettools/stage.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,9 @@ def stage_users(args, min_unix_time, max_unix_time):
6060

6161
date = str(parts[0]) # like '2019-01-01'
6262
country_code = str(parts[1]) # like 'us'
63-
user_count = int(parts[2]) # like '14714'
63+
# At least one float has been observed in the file:
64+
# <https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40121>
65+
user_count = int(float(parts[2])) # like '14714' or '2e+05'
6466

6567
dt = datetime.strptime(date, "%Y-%m-%d").replace(tzinfo=timezone.utc)
6668
unix_time = int(dt.strftime("%s")) # returns stamp like 1548910800

0 commit comments

Comments
 (0)