Skip to content

Allow supplemental API data to be passed to ColdfrontFetchProcessor#282

Open
QuanMPhm wants to merge 1 commit intoCCI-MOC:mainfrom
QuanMPhm:239/bm
Open

Allow supplemental API data to be passed to ColdfrontFetchProcessor#282
QuanMPhm wants to merge 1 commit intoCCI-MOC:mainfrom
QuanMPhm:239/bm

Conversation

@QuanMPhm
Copy link
Copy Markdown
Contributor

Closes #239. Related to https://github.com/CCI-MOC/invoicing-private-data/pull/68. This allows passing data for billable projects not currently in Coldfront Namely bare metal projects.

This will finally allow billing of Bare Metal projects

This allows passing data for billable projects not currently in Coldfront
Namely bare metal projects
@QuanMPhm QuanMPhm requested review from knikolla and naved001 March 21, 2026 19:28
invoice.PI_FIELD: pi_name,
invoice.INSTITUTION_ID_FIELD: "N/A",
invoice.CLUSTER_NAME_FIELD: cluster_name,
invoice.IS_COURSE_FIELD: False, # (TODO) Quan Assuming supplemental data does not contain course info?
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "Assuming supplemental data does not contain course info?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joachimweyl The supplemental data's purpose is to provide data that the Coldfront API would normally have (an allocation's name, PI, whether it belongs to a course). The current supplemental data file does not currently have a column to indicate if a project belongs in a course. I wanted to ask if we want to assume projects listed in this file can be assumed to never be in courses.

@joachimweyl Adding the extra column to indicate course-membership is simple.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the supplemental data file against the original; it looks like it is out of date.
I don't know that we can never assume they are courses, but so far, none of them have been. I would say it is worth adding the column just in case.

row[SUPPLEMENTAL_START_DATE], row[SUPPLEMENTAL_END_DATE]
),
axis=1,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: not that what you are doing is incorrect, but you could do the following which is more pandas like:

in_time_range_mask = ( 
(supplemental_df[SUPPLEMENTAL_START_DATE] <= invoice_settings.invoice_month) &
(invoice_settings.invoice_month <= supplemental_df[SUPPLEMENTAL_END_DATE]) 
)

This way we don't have to operate row by row. Not that we are running into performance issues but this is more of a vector operation.


def get_supplement_api_data(self) -> pandas.DataFrame:
supplemental_df = pandas.DataFrame()
if invoice_settings.supplement_api_data_filepath:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should log when a config file is load and when it's not loaded.

].itertuples(index=False, name=None)
)

def get_supplement_api_data(self) -> pandas.DataFrame:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring here and in general going forward?

The reason I ask this is because I was wondering why this supplemental data like pi name and institution name are time bound, but after digging around turns it's a design decision. So, the docstring could convey that useful information (even linking to the issue or comment where we made this non obvious choice).

)
in_time_range_mask = supplemental_df.apply(
lambda row: _is_in_time_range(
row[SUPPLEMENTAL_START_DATE], row[SUPPLEMENTAL_END_DATE]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the supplemental data as of now[1] has no start and end date, so this would just error out.

What is the behaviour supposed to be when no start or end date is found? Erroring out? logging? or assume the data is applicable (cc: @joachimweyl)

[1] https://github.com/CCI-MOC/invoicing-private-data/blob/main/project_api_data.csv

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log, and we should add dates. We will need to work with RH to obtain dates. The template is supposed to be gathering this but all of these projects are pre the template

@knikolla
Copy link
Copy Markdown
Contributor

@QuanMPhm quick question. Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)?

@QuanMPhm
Copy link
Copy Markdown
Contributor Author

Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)?

@knikolla There's nothing strictly preventing us from doing that. It would require converting the current supplemental data file from CSV to YAML. Is that fine to do?

@knikolla
Copy link
Copy Markdown
Contributor

Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)?

@knikolla There's nothing strictly preventing us from doing that. It would require converting the current supplemental data file from CSV to YAML. Is that fine to do?

@QuanMPhm I'd rather do a one time conversion of the CSV to YAML than maintain two separate data formats. I would suggest trying to mimic the ColdFront API output as closely as possible within the JSON.

@QuanMPhm
Copy link
Copy Markdown
Contributor Author

QuanMPhm commented Mar 24, 2026

@knikolla @naved001 Here's the PR to do the one-time conversion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fill out Project Name and PI column using bm_projects.csv file in nonbillable repo

4 participants