Refactor type handling in invoices by QuanMPhm · Pull Request #201 · CCI-MOC/invoicing

QuanMPhm · 2025-05-27T04:15:10Z

Closes #191. More details in the commit message.

Currently, the test cases are failing because the PI-specific invoice injects an incompatible value in the invoice dataframe. After some thinking, I thought the PI invoice could do some refactoring in that spot as well. I'll put this as a draft for now while I work on refactor the PI invoice.

larsks · 2025-06-06T14:24:16Z

process_report/invoices/invoice.py

+        self.data[column_name] = value
+        self.data[column_name] = self.data[column_name].astype(column_type)


I'm a little puzzled about what's happening here: we're setting self.data[column_name], and then immediately setting it to another value. I guess this is to have access to the astype method, but is there way of accomplishing this that is mroe intuitive?

Looking through the docs a bit, I think we could write this as:

self.data[column_name] = pandas.Series(value, dtype=column_type)

My simple test:

>>> BALANCE_FIELD_TYPE = pandas.ArrowDtype(pyarrow.decimal128(21, 2)) >>> df = pandas.DataFrame() >>> df['foo'] = pandas.Series([1,2,3], dtype=BALANCE_FIELD_TYPE) >>> df foo 0 1.00 1 2.00 2 3.00

larsks · 2025-06-06T14:26:47Z

process_report/tests/base.py

+### Invoice column types
+BOOL_FIELD_TYPE = pandas.BooleanDtype()
+STRING_FIELD_TYPE = pandas.StringDtype()
+BALANCE_FIELD_TYPE = pandas.ArrowDtype(pyarrow.decimal128(21, 2))
+###


Shouldn't we be importing these from process_report.invoices.invoice rather than re-defining them here?

My idea was that the tests should check if the invoicing code can handle input dataframes with specific datatypes. If I imported these types from process_report.invoices.invoice, then I wouldn't be testing the type compatibility. I imagined this would be useful in a case where if a future change accidentally or intentionally changed the typing of STRING_FIELD_TYPE or BALANCE_FIELD_TYPE, I would like my tests to show me there's typing incompatibility.

Previously, there was no consistent method to create new invoice columns with an explicit type. This has been the source of some confusion when debugging and writing test cases. As such, several changes have been added to reduce confusion over the typing of invoice columns: - A function (`_create_column`) has been added to `invoice.py` to provide a consistent way of creating new columns with an explicit type. - Many fields will now use Pandas' built-in extension types [1], such as `BooleanDtype`, which allows them to be nullable. This also enforces typing, as assigning incompatible values to these columns will raise a type error - Decimal precision has been increased to 21. This is to allow `int64` values to be converted to `pyarrow.decimal128(21, 2)`. This is useful for writing test cases (no longer have to write `[1.0, 2.0]`), and in case our invoices have balance fields that resemble integers - A new test case base class is made with a function to create a dataframe with standardized column types. This allows creating invoice dataframes that more closely resemble what our pipeline will see during execution - Removed redundant code in `_get_test_invoice()` of many test classes - `PIInvoice` will cast the dataframe to `StringDtype` to prevent TypeError with the new Pandas extension types [1] https://pandas.pydata.org/docs/user_guide/basics.html#dtypes

QuanMPhm requested review from knikolla and naved001 May 27, 2025 04:15

QuanMPhm mentioned this pull request May 27, 2025

Resolved pandas deprecation warnings during pipeline run #198

Merged

QuanMPhm requested a review from larsks May 27, 2025 18:29

QuanMPhm force-pushed the 191/typing branch 2 times, most recently from e78e897 to fd5a1ad Compare May 28, 2025 16:49

larsks reviewed Jun 6, 2025

View reviewed changes

QuanMPhm force-pushed the 191/typing branch from fd5a1ad to a8b2042 Compare June 17, 2025 15:44

QuanMPhm requested a review from larsks June 17, 2025 15:44

QuanMPhm marked this pull request as draft July 10, 2025 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor type handling in invoices#201

Refactor type handling in invoices#201
QuanMPhm wants to merge 1 commit intoCCI-MOC:mainfrom
QuanMPhm:191/typing

QuanMPhm commented May 27, 2025

Uh oh!

larsks Jun 6, 2025

Uh oh!

larsks Jun 6, 2025

Uh oh!

QuanMPhm Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self.data[column_name] = value
		self.data[column_name] = self.data[column_name].astype(column_type)

Conversation

QuanMPhm commented May 27, 2025

Uh oh!

larsks Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

larsks Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

QuanMPhm Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants