Allow supplemental API data to be passed to ColdfrontFetchProcessor#282
Allow supplemental API data to be passed to ColdfrontFetchProcessor#282QuanMPhm wants to merge 1 commit intoCCI-MOC:mainfrom
Conversation
This allows passing data for billable projects not currently in Coldfront Namely bare metal projects
| invoice.PI_FIELD: pi_name, | ||
| invoice.INSTITUTION_ID_FIELD: "N/A", | ||
| invoice.CLUSTER_NAME_FIELD: cluster_name, | ||
| invoice.IS_COURSE_FIELD: False, # (TODO) Quan Assuming supplemental data does not contain course info? |
There was a problem hiding this comment.
There was a problem hiding this comment.
What do you mean by "Assuming supplemental data does not contain course info?"
There was a problem hiding this comment.
@joachimweyl The supplemental data's purpose is to provide data that the Coldfront API would normally have (an allocation's name, PI, whether it belongs to a course). The current supplemental data file does not currently have a column to indicate if a project belongs in a course. I wanted to ask if we want to assume projects listed in this file can be assumed to never be in courses.
@joachimweyl Adding the extra column to indicate course-membership is simple.
There was a problem hiding this comment.
Please check the supplemental data file against the original; it looks like it is out of date.
I don't know that we can never assume they are courses, but so far, none of them have been. I would say it is worth adding the column just in case.
| row[SUPPLEMENTAL_START_DATE], row[SUPPLEMENTAL_END_DATE] | ||
| ), | ||
| axis=1, | ||
| ) |
There was a problem hiding this comment.
Suggestion: not that what you are doing is incorrect, but you could do the following which is more pandas like:
in_time_range_mask = (
(supplemental_df[SUPPLEMENTAL_START_DATE] <= invoice_settings.invoice_month) &
(invoice_settings.invoice_month <= supplemental_df[SUPPLEMENTAL_END_DATE])
)This way we don't have to operate row by row. Not that we are running into performance issues but this is more of a vector operation.
|
|
||
| def get_supplement_api_data(self) -> pandas.DataFrame: | ||
| supplemental_df = pandas.DataFrame() | ||
| if invoice_settings.supplement_api_data_filepath: |
There was a problem hiding this comment.
I think you should log when a config file is load and when it's not loaded.
| ].itertuples(index=False, name=None) | ||
| ) | ||
|
|
||
| def get_supplement_api_data(self) -> pandas.DataFrame: |
There was a problem hiding this comment.
Can you add a docstring here and in general going forward?
The reason I ask this is because I was wondering why this supplemental data like pi name and institution name are time bound, but after digging around turns it's a design decision. So, the docstring could convey that useful information (even linking to the issue or comment where we made this non obvious choice).
| ) | ||
| in_time_range_mask = supplemental_df.apply( | ||
| lambda row: _is_in_time_range( | ||
| row[SUPPLEMENTAL_START_DATE], row[SUPPLEMENTAL_END_DATE] |
There was a problem hiding this comment.
the supplemental data as of now[1] has no start and end date, so this would just error out.
What is the behaviour supposed to be when no start or end date is found? Erroring out? logging? or assume the data is applicable (cc: @joachimweyl)
[1] https://github.com/CCI-MOC/invoicing-private-data/blob/main/project_api_data.csv
There was a problem hiding this comment.
Log, and we should add dates. We will need to work with RH to obtain dates. The template is supposed to be gathering this but all of these projects are pre the template
|
@QuanMPhm quick question. Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)? |
@knikolla There's nothing strictly preventing us from doing that. It would require converting the current supplemental data file from CSV to YAML. Is that fine to do? |
@QuanMPhm I'd rather do a one time conversion of the CSV to YAML than maintain two separate data formats. I would suggest trying to mimic the ColdFront API output as closely as possible within the JSON. |
Closes #239. Related to https://github.com/CCI-MOC/invoicing-private-data/pull/68. This allows passing data for billable projects not currently in Coldfront Namely bare metal projects.
This will finally allow billing of Bare Metal projects