Decoupled Tracab dat / json from meta data file types#364
Decoupled Tracab dat / json from meta data file types#364probberechts merged 13 commits intoPySport:masterfrom
Conversation
|
It seems my local version and the GitHub tests don't agree on what UTC time is... |
|
Does anyone have any idea what could be causing this timezone/conversion mismatch? When I test it locally this asserting is true: But here the automated tests seems to be in disagreement by 1 hour and want it to be |
|
I think it is this line: parse(meta_data.match.attrib["dtDate"]).astimezone(timezone.utc)You first parse the date using your local timezone, then the I think the correct implementation is parse(meta_data.match.attrib["dtDate"]).replace(tzinfo=timezone.utc) |
|
Thanks, totally didn't notice that! |
|
FWIW, this should now be fixed |
de6bbf3 to
5562837
Compare
|
I noticed there was a merge conflict, I've resolved that now (in 2 attempts 🤣). |
|
I've refactored this but I still have to test a few things. I was also wondering whether there is documentation about these (meta)data formats and whether Tracab uses a particular name / version number / code to refer to them. |
|
I see the JSON file does not have a version, but the XML seems to have a version, although I'm not sure how consist it's use is and my only reference points are an old file from 2019 with However, I wrote that test with Perhaps in general we can start to make a point of paying more attention to version numbers (if/when they are provided). Both at PySport (when talking with providers) and in the Common Data Format we're at least trying to get vendors to use versioning. |
|
@probberechts I made a small fix inside Potentially, we should also discuss how we deal with this for games with extra time. |
I noticed in the Tracab data-loader we were enforcing that whenever a
.datfile was provided we could only provide a.xmlmeta data file, and when we provide a.jsontracking file this would always need to be paired with a.jsonmeta data file.I found a situation where this was not the case (ie. we had
.datand.jsonmeta data), and thus the parser broke.Concretely this means:
tracab.load(more specifically insideidentify_deserializer) we no longer use the combination ofmeta_data_extensionandraw_data_extensionto see which Deserializer we need. This is now only dependents on theraw_data_extensionmeta_data_extensionto the deserializer. I chose to do this by adding a new parameter to bothTRACABDatDeserializerandTRACABJSONDeserializercalledmeta_data_extension. (Note: I tried to do this differently, but because we already open the meta data insideloadand pass it directly to the deserializer as theTRACABInputs()we can't retrieve themeta_data_extensionafter that step.load_meta_datawhich takesself.meta_data_extensionandinputs.meta_dataand returns this ugly thing. (I can clean this up by return a dictionary with these values or something, but not sure if that's necessary.load_meta_datastep made me realize that Enriched metadata with date, game_week and game_id #340 had an oversight and did not include UTC date and game_id in the TracabDATDeserializer. This means I had to edit line 176 inkloppy/tests/test_tracab.pyto make the test reflect UTC time. (This is the test that currently fails though)load_meta_datalives in a newly createdhelpers.pyfile insidekloppy/infa/serializers/tracking/tracab/and it acts a a simple switching board to grab eitherload_meta_data_xmlorload_meta_data_jsonbased themeta_data_extension. (This could be done differently, but not sure if that's necessary).helpers.pyfile also contains both functions for parsing the correct meta data file.meta_tracking_assertions. I did this because we're running the same asserts 4 times on 4 different combinations of tracking and meta data.