Skip to content

Conversation

@jacob720
Copy link
Collaborator

Fixes #127

Requires #128

Adds some file converters to turn files of various formats into dicts. If a request asks for a dict, the config server will run the requested file through the converter corresponding to that file in FILE_TO_CONVERTER_MAP. If the file is not in the map, it will attempt to json.loads(file_contents) as happened before.

@olliesilvester
Copy link
Collaborator

I'm reviewing this today, but @Relm-Arrowny are you happy that this is the way we should do the file converting?

@Relm-Arrowny
Copy link
Collaborator

I'm reviewing this today, but @Relm-Arrowny are you happy that this is the way we should do the file converting?

It seems good—very flexible, and it allows us to add any converter later. My only small niggle is the FILE_TO_CONVERTER_MAP. Do you think the client should have the option to provide the desired converter?

@olliesilvester
Copy link
Collaborator

I'm reviewing this today, but @Relm-Arrowny are you happy that this is the way we should do the file converting?

It seems good—very flexible, and it allows us to add any converter later. My only small niggle is the FILE_TO_CONVERTER_MAP. Do you think the client should have the option to provide the desired converter?

We were thinking that a converter always turns a file into json, which means a file shouldn't need more than one converter and so the option is just "use converter or don't"

Copy link
Collaborator

@olliesilvester olliesilvester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is a great start, I like how you've done this.

As well as the comments, please could you:

  • Add a couple of system tests to make sure the client can be used as expected with some of these formatters. Don't need 100% coverage though
  • Update the docs: The first bullet point of future features can now be a current feature :) . Also could you add something small to the main guide on how a developer should add to the file converts. Probably only needs to be a few sentences.

@olliesilvester
Copy link
Collaborator

Also: test_all_files_in_dict_can_be_parsed_with_no_errors - FileNotFoundError: [Errno 2] No such file or directory: '/dls_sw/i23/software/aithre/aithre_display.configuration

@jacob720
Copy link
Collaborator Author

Also: test_all_files_in_dict_can_be_parsed_with_no_errors - FileNotFoundError: [Errno 2] No such file or directory: '/dls_sw/i23/software/aithre/aithre_display.configuration

Ah I should probably remove that test as it involves reading the filesystem. It's quite nice to have locally though, as it checks everything in the map can be converted without errors. Is there a way to disable it in CI but keep the test?


# File converters

Converters can be used to turn a file into a standard format server-side, reducing the complexity of reading config files client-side. Available converters exist [here](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._converters.py) - see if there's a suitable converter you can use before adding your own. [This dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) maps files to converters. Add the path of your config file and a suitable converter to this dictionary and it will automatically be used by the config server when a request for that file is made.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention here that the converters should always be going to json, and that if someone has a usecase where json isn't working, please make an issue so we can discuss it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please could you put this part beneath the paragraph about getting return types:

Suggestion:

By default, this will return the file’s raw string output - which includes things like linebreaks. It is up to you to format this output - using splitlines() will get you a list where each item is a line in the file. get_file_contents has a desired_return_type parameter where you can instead ask for a dict or bytes. The server will try and do the conversion, and respond with an HTTP error if the requested file is incompatible with that type. Custom file converters [link to file converters subsection] can be used to specify how a file should be converted to a dict

then directly below

File converters When requesting a file as a dict, the server will check against [this dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) to see if there is a defined way to convert this file to json. Custom formatters should be added as required along with updates to the dictionary. Please create an issue if there is a usecase where you'd like to convert a file to a non-json format. If there is no custom converter, the server will attempt to use 'json.loads' on the file contents. A request for str or bytes bypasses converters.

is in BEAMLINE_PARAMETER_KEYWORDS, it leaves it as a string. Otherwise, it is
converted to a number or bool."""
lines = contents.splitlines()
config_lines_sep_key_and_value = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config_lines_sep_key_and_value = [
#Get lists of parameter keys and values
config_lines_sep_key_and_value = [

Copy link
Collaborator

@oliwenmandiamond oliwenmandiamond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but the structure of the repo is very rigid and can see it quickly becoming unmanageable with the number of converters and models added. Please consider my comments on how to scale it better with a plugin structure

return self


class GenericLut(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class GenericLut(BaseModel):
class GenericLookupTable(BaseModel):

Class names should be explicit. Variable names should be short e.g to use it it would be like this

generic_lut = GenericLookupTable()

Copy link
Collaborator

@oliwenmandiamond oliwenmandiamond Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to make this even more generic! I would let you specify the type that the GenericLookupTable should get

e.g

from typing import TypeVar, Generic

T = TypeVar('T')

class GenericLookupTable(BaseModel, Generic(T)):
    column_names: list[str]
    rows: list[list[T]]
    ...

Then I can do this

generic_lut1 = GenericLookupTable[float | int](...)
generic_lut2 = GenericLookupTable[str](...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. @jacob720 and I were also discussing the ideal API to use a LUT from bluesky/dodal to get everything really nicely typed. I will put a prototype beneath here and spin it into a new issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPECIFIC_DICT_KEYS = Literal["millimetres", "electron_volts"]

Num = int | float
# Runtime error if specific dict isn't using SPECIFIC_DICT_KEYS as its fields


class SpecificDict(TypedDict):
    electron_volts: list[Num]
    millimetres: list[Num]


# Runtime error if lists don't have the same length


class SpecificLut(BaseModel):
    lut_dict: SpecificDict

    def convert_values(
        self,
        known_unit: SPECIFIC_DICT_KEYS,
        known_value: Num,
        desired_unit: SPECIFIC_DICT_KEYS,
    ) -> Num:
        _closest_value = min(
            self.lut_dict[known_unit], key=lambda x: abs(x - known_value)
        )
        _index = self.lut_dict[known_unit].index(_closest_value)
        _converted_value = self.lut_dict[desired_unit][_index]
        return _converted_value


my_dict = SpecificDict({"electron_volts": [1, 2, 3], "millimetres": [5, 6, 7]})

my_lut = SpecificLut(lut_dict=my_dict)

# Now I can convert things and get static typing errors if I try and use the wrong units
val_ev = 1.1
val_mm = my_lut.convert_values(
    "electron_volts", val_ev, "millimetres"
)  # Typing is happy

val_mm = my_lut.convert_values(
    "electron_volts", val_ev, "pretty flowers"
)  # Typing is unhappy

The ugly part is having to write the literals twice, but there's no avoiding that right now since Python TypeDict's don't give you a type for the dict keys. I think the API to convert is quite nice other than that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will put this into a separate issue though. This PR is big enough 😆

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@jacob720 jacob720 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this

The column names are generated dynamically based on the literal

COLUMN_NAMES = Literal["angle_deg", "gap_mm"]


class SpecificLut(BaseModel):
    rows: list[list[int | float]]

    @model_validator(mode="after")
    def make_column_names(self):
        self._column_names: list[COLUMN_NAMES] = list(get_args(COLUMN_NAMES))
        return self

    def get(
        self,
        column_name: COLUMN_NAMES,
        value: int | float,
        target_column_name: COLUMN_NAMES,
    ):
        column_index = self._column_names.index(column_name)
        target_column_index = self._column_names.index(target_column_name)
        for row in enumerate(self.rows):
            if row[column_index] == value:
                return row[target_column_index]
        raise ValueError(f"Value not found in table for column {column_name}")


my_lut = SpecificLut(rows=[[]])
my_lut.get(column_name="not_accepted", value=5, target_column_name="gap_mm")

So you get static typing but have one source of truth for column names. I think there might be a way to do the same thing with enums too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!

@@ -0,0 +1,88 @@
from typing import Any
Copy link
Collaborator

@oliwenmandiamond oliwenmandiamond Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will grow very big so I think this isn't going to be very scalable.

I would separate out the converters and models into a module for the thing it is used for e.g something like below:

daq_config_server/serializer_plugins/
  ├── display_config/
  │     ├── models.py
  │     └── _converters.py
  ├── generic_lookup_table/
  │     ├── models.py
  │     └── _converters.py
  ├── undulator_lookup_table/
  │     ├── models.py
  │     └── _converters.py
   ...

By organizing each "example" as its own module, you can keep the functionality isolated and focused which will help prevent one large file that becomes hard to manage in the future if it is dumped with all the converters and models from every beamline.

Maybe serializer_plugins is not the correct name, but is used to illustrate my point.

Then in daq_config_server and dodal we can then do

from daq_config_server.serializer_plugins.display_config.models import DisplayConfigData

Or with shortcuts via __init__.py file

from daq_config_server.serializer_plugins.display_config import DisplayConfigData

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good, I'll create an issue for it and do seperately just in the interest of getting this one merged soon

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return self


class GenericLut(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. @jacob720 and I were also discussing the ideal API to use a LUT from bluesky/dodal to get everything really nicely typed. I will put a prototype beneath here and spin it into a new issue

return self


class GenericLut(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPECIFIC_DICT_KEYS = Literal["millimetres", "electron_volts"]

Num = int | float
# Runtime error if specific dict isn't using SPECIFIC_DICT_KEYS as its fields


class SpecificDict(TypedDict):
    electron_volts: list[Num]
    millimetres: list[Num]


# Runtime error if lists don't have the same length


class SpecificLut(BaseModel):
    lut_dict: SpecificDict

    def convert_values(
        self,
        known_unit: SPECIFIC_DICT_KEYS,
        known_value: Num,
        desired_unit: SPECIFIC_DICT_KEYS,
    ) -> Num:
        _closest_value = min(
            self.lut_dict[known_unit], key=lambda x: abs(x - known_value)
        )
        _index = self.lut_dict[known_unit].index(_closest_value)
        _converted_value = self.lut_dict[desired_unit][_index]
        return _converted_value


my_dict = SpecificDict({"electron_volts": [1, 2, 3], "millimetres": [5, 6, 7]})

my_lut = SpecificLut(lut_dict=my_dict)

# Now I can convert things and get static typing errors if I try and use the wrong units
val_ev = 1.1
val_mm = my_lut.convert_values(
    "electron_volts", val_ev, "millimetres"
)  # Typing is happy

val_mm = my_lut.convert_values(
    "electron_volts", val_ev, "pretty flowers"
)  # Typing is unhappy

The ugly part is having to write the literals twice, but there's no avoiding that right now since Python TypeDict's don't give you a type for the dict keys. I think the API to convert is quite nice other than that.

Copy link
Collaborator

@olliesilvester olliesilvester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's looking good. A few more comments but I think we're close now, and everyone seems to be happy with this solution

(file-converters)=
# File converters

Converters can be used to turn a file into a standard format server-side, reducing the complexity of reading config files client-side. Converters can convert config files to a `dict` or `pydantic model`, and the same `pydantic model` can be reconstructed client-side by the `get_file_contents` method. Available converters exist [here](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._converters.py) - see if there's a suitable converter you can use before adding your own. [This dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) maps files to converters. Add the path of your config file and a suitable converter to this dictionary and it will automatically be used by the config server when a request for that file is made.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have converters to get things into a dict? I thought we'd just leave the dict requesting logic as it was before this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do, I'd need to make some more pydantic models which I'm happy to do. But for example, currently there's a general xml to dict converter that will work on all xml files, which could be useful to keep.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person: We think this is fine for now.

However, I am wondering if someone requests a file as a dict, and a converter to a basemodel for that file exists. They'd probably expect their file to be dumbly converted with json.loads(raw_contents), whereas in reality they'd get a dict which is formatted for the base model.

Could make a separate issue for this and think about it later? After typing this I'm leaning towards not having specific converts for when you request a dict.

Copy link
Collaborator

@olliesilvester olliesilvester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now, and very solid set of tests. I think just address the overload comment and I'm happy to merge. Let's do separate issues for: Oli's comment on restructuring files, making specific LUTs, and whether or not we should keep converters to dicts

(file-converters)=
# File converters

Converters can be used to turn a file into a standard format server-side, reducing the complexity of reading config files client-side. Converters can convert config files to a `dict` or `pydantic model`, and the same `pydantic model` can be reconstructed client-side by the `get_file_contents` method. Available converters exist [here](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._converters.py) - see if there's a suitable converter you can use before adding your own. [This dictionary](https://github.com/DiamondLightSource/daq-config-server/blob/main/converters._file_converter_map.py) maps files to converters. Add the path of your config file and a suitable converter to this dictionary and it will automatically be used by the config server when a request for that file is made.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person: We think this is fine for now.

However, I am wondering if someone requests a file as a dict, and a converter to a basemodel for that file exists. They'd probably expect their file to be dumbly converted with json.loads(raw_contents), whereas in reality they'd get a dict which is formatted for the base model.

Could make a separate issue for this and think about it later? After typing this I'm leaning towards not having specific converts for when you request a dict.

@jacob720
Copy link
Collaborator Author

jacob720 commented Dec 4, 2025

Have fixed the overloads

Copy link
Collaborator

@olliesilvester olliesilvester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for doing this! I'll write a few issues that this one has spawned. IIRC we have:

  • Improving file structure (Oli's comment)
  • Thinking more about dict converters
  • Making a new type of test which talks to the real server so that we don't need to copy the all the whitelisted files to test_data
  • Making specific LUT models

@olliesilvester olliesilvester dismissed oliwenmandiamond’s stale review December 4, 2025 14:16

Changes addressed or put into other issues

@jacob720 jacob720 merged commit de81d49 into main Dec 4, 2025
21 checks passed
@jacob720 jacob720 deleted the 127_add_converters branch December 4, 2025 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MX file converters

5 participants