Skip to content

Conversation

@davidpablocohn
Copy link
Collaborator

@davidpablocohn davidpablocohn commented Jan 13, 2026

This is a first pass at a RegexParseTransform that unifies the functionality of CORIOLIX and CSIRO transforms (#507 ).

For simplicity, at the moment it doesn't try to perfectly mimic the calling conventions of either, though that could be done at the cost of a little complexity at initialization.

I'd love to get everyones' thoughts on this direction before adding it to the dev branch.

kwargs:
return_das_record: true
data_id: gnsspo112593 # Overrides or fills in data_id
field_patterns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having field_patterns as a kwarg is a pretty big departure from device definitions. Why the change?
Would we now have to write all our definitions into the config files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, dang - I hadn't realized that, in the last iteration, I dropped the definition_path argument. The intent was to mirror the functionality of parse_transform, where you have the option of either specifying a device definition_path (as we've traditionally done with NBP), or providing the field_patterns. I'll go back and stick it back in.

Big error on my part!

)

# 2. Initialize the Converter (if requested)
self.converter = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nesting transforms like this seems like somewhat of a departure from core design, would it be better to have this as a util? Or call it in the voyage config?
In the ConvertFieldsTransform why do lat_lon_fields need their own variable? Seems like you are trying to do something extra there with renaming. I think you should just put this logic in for all type conversions and then put lat an lon in the type_map. I would also add a way to do this logic from the dict e.g.
{'heave': {'data_type': 'float', 'units': 'm', 'unconverted_name': 'raw_heave'}}
This would provide a path for those using the other system to change to the dict system.
Also you don't currently have a way for us to convert our lats and lons with (i.e. no option for a dict?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Slipping the converter in there felt like a way to let the transform mirror the parse_transform functionality, where what gets returned is a fully-fledged record with everything having its correct type. Had I thought about more clearly upfront, I wouldn't have created a separate ConvertFormatTransform, and just embedded it in the regex parser.

  2. lat_lon_fields is definitely a hack. I wanted an easy way to turn the CORIOLIX-extracted lat+dir fields into something that could be plotted by the Grafana geo tile. I like your idea of folding it into the dict using a lat_lon data type, accompanied by something like unconverted_name and unconverted_direction. I'll see if I can get that coded up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (1) is true, and you should put it in the parser so it keeps the loggers making sense to how they are described in documentation!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've whipped up a variation for (2) that feels true to form. In fields you can specify

Latitude:
  data_type: 'nmea_lat'
  direction_field: 'NorS'

and ConvertFieldsTransform recognizes nmea_lat and nmea_lon as special data types that incorporate a direction field to do the conversion. I think that makes it consistent with other field handling.

# At this point 'parsed_record' is likely a DASRecord (if we used converter)
# or a dict (if we didn't use converter and didn't ask for DASRecord).

if self.return_json:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to return so many formats? We are doing a lot of checking for these types and sending of these types, would it be easier to just keep to a DAS record and have a transform defined in the config to a json etc at the end if that is what people wanted for their other systems?
If these formats are needed can this logic be put in a base class and reused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, yeah. Mostly for (probably imagined) backward compatibility. I should have had the default return type of ParseTransform be DASRecord way back, but I didn't, so feel compelled to have all those options. But you're right (why am I not surprised? :) - this is a new transform, so I don't have to feel compelled to be backward compatible with anything. I'll have it just return DASRecords - thanks!

else:
data_id = parsed_record.get('data_id', None)
if not data_id:
data_id = 'unknown'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should throw a warning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

# path: test/logger/utils/test_regex_parser.py -> root (4 levels up)
sys.path.append(dirname(dirname(dirname(dirname(realpath(__file__))))))

from logger.utils.regex_parser import RegexParser # noqa: E402
Copy link
Collaborator

@ellapiet ellapiet Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same file as test_regex_parser.py?

- Refactor RegexParseTransform to support full device definitions (match data_id to device/type).
- Fix read_config.py to strip whitespace from include patterns, enabling correct nested config loading.
- Refactor ConvertFieldsTransform to support declarative 'nmea_lat'/'nmea_lon' types in 'fields' dict, replacing legacy 'lat_lon_fields'.
- Update RegexParseTransform to use declarative configuration for cached converters.
- Update unit tests for RegexParseTransform and ConvertFieldsTransform to verify new functionality and ensure flake8 compliance.
- Added warning log when data_id is not found in record or configuration and defaults to 'unknown'.
- Respects the 'quiet' flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants