-
Notifications
You must be signed in to change notification settings - Fork 24
Issue 507 - unify CSIRO and CORIOLIX regex parse transforms #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
| kwargs: | ||
| return_das_record: true | ||
| data_id: gnsspo112593 # Overrides or fills in data_id | ||
| field_patterns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having field_patterns as a kwarg is a pretty big departure from device definitions. Why the change?
Would we now have to write all our definitions into the config files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, dang - I hadn't realized that, in the last iteration, I dropped the definition_path argument. The intent was to mirror the functionality of parse_transform, where you have the option of either specifying a device definition_path (as we've traditionally done with NBP), or providing the field_patterns. I'll go back and stick it back in.
Big error on my part!
| ) | ||
|
|
||
| # 2. Initialize the Converter (if requested) | ||
| self.converter = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nesting transforms like this seems like somewhat of a departure from core design, would it be better to have this as a util? Or call it in the voyage config?
In the ConvertFieldsTransform why do lat_lon_fields need their own variable? Seems like you are trying to do something extra there with renaming. I think you should just put this logic in for all type conversions and then put lat an lon in the type_map. I would also add a way to do this logic from the dict e.g.
{'heave': {'data_type': 'float', 'units': 'm', 'unconverted_name': 'raw_heave'}}
This would provide a path for those using the other system to change to the dict system.
Also you don't currently have a way for us to convert our lats and lons with (i.e. no option for a dict?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Slipping the converter in there felt like a way to let the transform mirror the parse_transform functionality, where what gets returned is a fully-fledged record with everything having its correct type. Had I thought about more clearly upfront, I wouldn't have created a separate ConvertFormatTransform, and just embedded it in the regex parser.
-
lat_lon_fields is definitely a hack. I wanted an easy way to turn the CORIOLIX-extracted lat+dir fields into something that could be plotted by the Grafana geo tile. I like your idea of folding it into the dict using a lat_lon data type, accompanied by something like unconverted_name and unconverted_direction. I'll see if I can get that coded up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (1) is true, and you should put it in the parser so it keeps the loggers making sense to how they are described in documentation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've whipped up a variation for (2) that feels true to form. In fields you can specify
Latitude:
data_type: 'nmea_lat'
direction_field: 'NorS'
and ConvertFieldsTransform recognizes nmea_lat and nmea_lon as special data types that incorporate a direction field to do the conversion. I think that makes it consistent with other field handling.
| # At this point 'parsed_record' is likely a DASRecord (if we used converter) | ||
| # or a dict (if we didn't use converter and didn't ask for DASRecord). | ||
|
|
||
| if self.return_json: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to return so many formats? We are doing a lot of checking for these types and sending of these types, would it be easier to just keep to a DAS record and have a transform defined in the config to a json etc at the end if that is what people wanted for their other systems?
If these formats are needed can this logic be put in a base class and reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, yeah. Mostly for (probably imagined) backward compatibility. I should have had the default return type of ParseTransform be DASRecord way back, but I didn't, so feel compelled to have all those options. But you're right (why am I not surprised? :) - this is a new transform, so I don't have to feel compelled to be backward compatible with anything. I'll have it just return DASRecords - thanks!
| else: | ||
| data_id = parsed_record.get('data_id', None) | ||
| if not data_id: | ||
| data_id = 'unknown' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should throw a warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
| # path: test/logger/utils/test_regex_parser.py -> root (4 levels up) | ||
| sys.path.append(dirname(dirname(dirname(dirname(realpath(__file__)))))) | ||
|
|
||
| from logger.utils.regex_parser import RegexParser # noqa: E402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same file as test_regex_parser.py?
…. Removes dict and JSON return options.
…ror on conflict with field_patterns.
- Refactor RegexParseTransform to support full device definitions (match data_id to device/type). - Fix read_config.py to strip whitespace from include patterns, enabling correct nested config loading. - Refactor ConvertFieldsTransform to support declarative 'nmea_lat'/'nmea_lon' types in 'fields' dict, replacing legacy 'lat_lon_fields'. - Update RegexParseTransform to use declarative configuration for cached converters. - Update unit tests for RegexParseTransform and ConvertFieldsTransform to verify new functionality and ensure flake8 compliance.
- Added warning log when data_id is not found in record or configuration and defaults to 'unknown'. - Respects the 'quiet' flag.
This is a first pass at a RegexParseTransform that unifies the functionality of CORIOLIX and CSIRO transforms (#507 ).
For simplicity, at the moment it doesn't try to perfectly mimic the calling conventions of either, though that could be done at the cost of a little complexity at initialization.
I'd love to get everyones' thoughts on this direction before adding it to the dev branch.