-
Notifications
You must be signed in to change notification settings - Fork 1
Add file converters #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file converters #129
Conversation
|
I'm reviewing this today, but @Relm-Arrowny are you happy that this is the way we should do the file converting? |
It seems good—very flexible, and it allows us to add any converter later. My only small niggle is the FILE_TO_CONVERTER_MAP. Do you think the client should have the option to provide the desired converter? |
We were thinking that a converter always turns a file into json, which means a file shouldn't need more than one converter and so the option is just "use converter or don't" |
olliesilvester
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is a great start, I like how you've done this.
As well as the comments, please could you:
- Add a couple of system tests to make sure the client can be used as expected with some of these formatters. Don't need 100% coverage though
- Update the docs: The first bullet point of future features can now be a current feature :) . Also could you add something small to the main guide on how a developer should add to the file converts. Probably only needs to be a few sentences.
|
Also: |
Ah I should probably remove that test as it involves reading the filesystem. It's quite nice to have locally though, as it checks everything in the map can be converted without errors. Is there a way to disable it in CI but keep the test? |
2a89404 to
4275240
Compare
docs/how-to/config-server-guide.md
Outdated
|
|
||
| # File converters | ||
|
|
||
| Converters can be used to turn a file into a standard format server-side, reducing the complexity of reading config files client-side. Available converters exist [here](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._converters.py) - see if there's a suitable converter you can use before adding your own. [This dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) maps files to converters. Add the path of your config file and a suitable converter to this dictionary and it will automatically be used by the config server when a request for that file is made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention here that the converters should always be going to json, and that if someone has a usecase where json isn't working, please make an issue so we can discuss it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please could you put this part beneath the paragraph about getting return types:
Suggestion:
By default, this will return the file’s raw string output - which includes things like linebreaks. It is up to you to format this output - using splitlines() will get you a list where each item is a line in the file. get_file_contents has a desired_return_type parameter where you can instead ask for a dict or bytes. The server will try and do the conversion, and respond with an HTTP error if the requested file is incompatible with that type. Custom file converters [link to file converters subsection] can be used to specify how a file should be converted to a dict
then directly below
File converters When requesting a file as a dict, the server will check against [this dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) to see if there is a defined way to convert this file to json. Custom formatters should be added as required along with updates to the dictionary. Please create an issue if there is a usecase where you'd like to convert a file to a non-json format. If there is no custom converter, the server will attempt to use 'json.loads' on the file contents. A request for str or bytes bypasses converters.
| is in BEAMLINE_PARAMETER_KEYWORDS, it leaves it as a string. Otherwise, it is | ||
| converted to a number or bool.""" | ||
| lines = contents.splitlines() | ||
| config_lines_sep_key_and_value = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| config_lines_sep_key_and_value = [ | |
| #Get lists of parameter keys and values | |
| config_lines_sep_key_and_value = [ |
c06f41b to
2da1ffd
Compare
oliwenmandiamond
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but the structure of the repo is very rigid and can see it quickly becoming unmanageable with the number of converters and models added. Please consider my comments on how to scale it better with a plugin structure
| return self | ||
|
|
||
|
|
||
| class GenericLut(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class GenericLut(BaseModel): | |
| class GenericLookupTable(BaseModel): |
Class names should be explicit. Variable names should be short e.g to use it it would be like this
generic_lut = GenericLookupTable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to make this even more generic! I would let you specify the type that the GenericLookupTable should get
e.g
from typing import TypeVar, Generic
T = TypeVar('T')
class GenericLookupTable(BaseModel, Generic(T)):
column_names: list[str]
rows: list[list[T]]
...Then I can do this
generic_lut1 = GenericLookupTable[float | int](...)
generic_lut2 = GenericLookupTable[str](...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. @jacob720 and I were also discussing the ideal API to use a LUT from bluesky/dodal to get everything really nicely typed. I will put a prototype beneath here and spin it into a new issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPECIFIC_DICT_KEYS = Literal["millimetres", "electron_volts"]
Num = int | float
# Runtime error if specific dict isn't using SPECIFIC_DICT_KEYS as its fields
class SpecificDict(TypedDict):
electron_volts: list[Num]
millimetres: list[Num]
# Runtime error if lists don't have the same length
class SpecificLut(BaseModel):
lut_dict: SpecificDict
def convert_values(
self,
known_unit: SPECIFIC_DICT_KEYS,
known_value: Num,
desired_unit: SPECIFIC_DICT_KEYS,
) -> Num:
_closest_value = min(
self.lut_dict[known_unit], key=lambda x: abs(x - known_value)
)
_index = self.lut_dict[known_unit].index(_closest_value)
_converted_value = self.lut_dict[desired_unit][_index]
return _converted_value
my_dict = SpecificDict({"electron_volts": [1, 2, 3], "millimetres": [5, 6, 7]})
my_lut = SpecificLut(lut_dict=my_dict)
# Now I can convert things and get static typing errors if I try and use the wrong units
val_ev = 1.1
val_mm = my_lut.convert_values(
"electron_volts", val_ev, "millimetres"
) # Typing is happy
val_mm = my_lut.convert_values(
"electron_volts", val_ev, "pretty flowers"
) # Typing is unhappy
The ugly part is having to write the literals twice, but there's no avoiding that right now since Python TypeDict's don't give you a type for the dict keys. I think the API to convert is quite nice other than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will put this into a separate issue though. This PR is big enough 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this
The column names are generated dynamically based on the literal
COLUMN_NAMES = Literal["angle_deg", "gap_mm"]
class SpecificLut(BaseModel):
rows: list[list[int | float]]
@model_validator(mode="after")
def make_column_names(self):
self._column_names: list[COLUMN_NAMES] = list(get_args(COLUMN_NAMES))
return self
def get(
self,
column_name: COLUMN_NAMES,
value: int | float,
target_column_name: COLUMN_NAMES,
):
column_index = self._column_names.index(column_name)
target_column_index = self._column_names.index(target_column_name)
for row in enumerate(self.rows):
if row[column_index] == value:
return row[target_column_index]
raise ValueError(f"Value not found in table for column {column_name}")
my_lut = SpecificLut(rows=[[]])
my_lut.get(column_name="not_accepted", value=5, target_column_name="gap_mm")So you get static typing but have one source of truth for column names. I think there might be a way to do the same thing with enums too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing!
| @@ -0,0 +1,88 @@ | |||
| from typing import Any | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file will grow very big so I think this isn't going to be very scalable.
I would separate out the converters and models into a module for the thing it is used for e.g something like below:
daq_config_server/serializer_plugins/
├── display_config/
│ ├── models.py
│ └── _converters.py
├── generic_lookup_table/
│ ├── models.py
│ └── _converters.py
├── undulator_lookup_table/
│ ├── models.py
│ └── _converters.py
...
By organizing each "example" as its own module, you can keep the functionality isolated and focused which will help prevent one large file that becomes hard to manage in the future if it is dumped with all the converters and models from every beamline.
Maybe serializer_plugins is not the correct name, but is used to illustrate my point.
Then in daq_config_server and dodal we can then do
from daq_config_server.serializer_plugins.display_config.models import DisplayConfigDataOr with shortcuts via __init__.py file
from daq_config_server.serializer_plugins.display_config import DisplayConfigDataThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good, I'll create an issue for it and do seperately just in the interest of getting this one merged soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return self | ||
|
|
||
|
|
||
| class GenericLut(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. @jacob720 and I were also discussing the ideal API to use a LUT from bluesky/dodal to get everything really nicely typed. I will put a prototype beneath here and spin it into a new issue
| return self | ||
|
|
||
|
|
||
| class GenericLut(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPECIFIC_DICT_KEYS = Literal["millimetres", "electron_volts"]
Num = int | float
# Runtime error if specific dict isn't using SPECIFIC_DICT_KEYS as its fields
class SpecificDict(TypedDict):
electron_volts: list[Num]
millimetres: list[Num]
# Runtime error if lists don't have the same length
class SpecificLut(BaseModel):
lut_dict: SpecificDict
def convert_values(
self,
known_unit: SPECIFIC_DICT_KEYS,
known_value: Num,
desired_unit: SPECIFIC_DICT_KEYS,
) -> Num:
_closest_value = min(
self.lut_dict[known_unit], key=lambda x: abs(x - known_value)
)
_index = self.lut_dict[known_unit].index(_closest_value)
_converted_value = self.lut_dict[desired_unit][_index]
return _converted_value
my_dict = SpecificDict({"electron_volts": [1, 2, 3], "millimetres": [5, 6, 7]})
my_lut = SpecificLut(lut_dict=my_dict)
# Now I can convert things and get static typing errors if I try and use the wrong units
val_ev = 1.1
val_mm = my_lut.convert_values(
"electron_volts", val_ev, "millimetres"
) # Typing is happy
val_mm = my_lut.convert_values(
"electron_volts", val_ev, "pretty flowers"
) # Typing is unhappy
The ugly part is having to write the literals twice, but there's no avoiding that right now since Python TypeDict's don't give you a type for the dict keys. I think the API to convert is quite nice other than that.
olliesilvester
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's looking good. A few more comments but I think we're close now, and everyone seems to be happy with this solution
| (file-converters)= | ||
| # File converters | ||
|
|
||
| Converters can be used to turn a file into a standard format server-side, reducing the complexity of reading config files client-side. Converters can convert config files to a `dict` or `pydantic model`, and the same `pydantic model` can be reconstructed client-side by the `get_file_contents` method. Available converters exist [here](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._converters.py) - see if there's a suitable converter you can use before adding your own. [This dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) maps files to converters. Add the path of your config file and a suitable converter to this dictionary and it will automatically be used by the config server when a request for that file is made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still have converters to get things into a dict? I thought we'd just leave the dict requesting logic as it was before this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do, I'd need to make some more pydantic models which I'm happy to do. But for example, currently there's a general xml to dict converter that will work on all xml files, which could be useful to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in person: We think this is fine for now.
However, I am wondering if someone requests a file as a dict, and a converter to a basemodel for that file exists. They'd probably expect their file to be dumbly converted with json.loads(raw_contents), whereas in reality they'd get a dict which is formatted for the base model.
Could make a separate issue for this and think about it later? After typing this I'm leaning towards not having specific converts for when you request a dict.
olliesilvester
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now, and very solid set of tests. I think just address the overload comment and I'm happy to merge. Let's do separate issues for: Oli's comment on restructuring files, making specific LUTs, and whether or not we should keep converters to dicts
| (file-converters)= | ||
| # File converters | ||
|
|
||
| Converters can be used to turn a file into a standard format server-side, reducing the complexity of reading config files client-side. Converters can convert config files to a `dict` or `pydantic model`, and the same `pydantic model` can be reconstructed client-side by the `get_file_contents` method. Available converters exist [here](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._converters.py) - see if there's a suitable converter you can use before adding your own. [This dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) maps files to converters. Add the path of your config file and a suitable converter to this dictionary and it will automatically be used by the config server when a request for that file is made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in person: We think this is fine for now.
However, I am wondering if someone requests a file as a dict, and a converter to a basemodel for that file exists. They'd probably expect their file to be dumbly converted with json.loads(raw_contents), whereas in reality they'd get a dict which is formatted for the base model.
Could make a separate issue for this and think about it later? After typing this I'm leaning towards not having specific converts for when you request a dict.
Co-authored-by: olliesilvester <[email protected]>
Co-authored-by: olliesilvester <[email protected]>
|
Have fixed the overloads |
olliesilvester
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for doing this! I'll write a few issues that this one has spawned. IIRC we have:
- Improving file structure (Oli's comment)
- Thinking more about dict converters
- Making a new type of test which talks to the real server so that we don't need to copy the all the whitelisted files to
test_data - Making specific LUT models
Changes addressed or put into other issues
Fixes #127
Requires #128
Adds some file converters to turn files of various formats into dicts. If a request asks for a dict, the config server will run the requested file through the converter corresponding to that file in
FILE_TO_CONVERTER_MAP. If the file is not in the map, it will attempt tojson.loads(file_contents)as happened before.