Remove pandas by kfrey-idm · Pull Request #77 · EMOD-Hub/emod-api

kfrey-idm · 2026-02-16T04:32:32Z

Bumps version to v3.1

Removes pandas, shapely, and scikit-learn as dependencies.

Removes deprecated infer_natural_mortality, generate_mortality_over_time_from_data, and demographics leftovers from dtk_tools.

Retains _create_grid_files even though it's definitely incorrect: see EMOD-Hub/issues-and-discussions#82

Copilot

Pull request overview

This PR updates emod-api to v3.1.0 and removes pandas, shapely, and scikit-learn from the runtime dependency set by rewriting affected implementation and test code to use the Python stdlib (notably csv) and NumPy.

Changes:

Bump package version to 3.1.0 and drop pandas, shapely, and scikit-learn from pyproject.toml dependencies.
Replace pandas-based CSV parsing/writing in demographics, gridding, and channel report code paths with stdlib csv.
Remove deprecated mortality-generation APIs and remove legacy dtk_tools demographics leftovers and pandas-based channel report CSV utilities/tests.

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`pyproject.toml`	Version bump to 3.1.0 and removes pandas/shapely/scikit-learn deps.
`emod_api/__init__.py`	Version bump to 3.1.0.
`emod_api/demographics/demographics.py`	Replaces pandas CSV ingestion with `csv` reader; affects node creation from CSV.
`emod_api/demographics/service/service.py`	Rewrites `_create_grid_files` to avoid pandas, uses `csv` + NumPy aggregation.
`emod_api/demographics/service/grid_construction.py`	Removes shapely/pandas usage and returns dict-based grid representation.
`emod_api/channelreports/channels.py`	Removes pandas dependency; rewrites `to_csv` using `csv.writer`.
`emod_api/channelreports/icj_to_csv.py`	Deletes pandas-based inset chart JSON→CSV helper.
`emod_api/demographics/demographics_base.py`	Removes deprecated `infer_natural_mortality` implementation and sklearn imports.
`emod_api/demographics/calculators.py`	Removes deprecated mortality-over-time generator that relied on pandas.
`tests/test_demographics.py`	Replaces pandas-based CSV assertions with stdlib `csv`; removes deprecated mortality test.
`tests/test_migration.py`	Replaces pandas CSV read with stdlib `csv` validation.
`tests/test_demog_from_pop.py`	Reworks test to avoid pandas and align with new grid CSV outputs.
`tests/test_channel_reports.py`	Removes pandas-dependent channel dataframe and inset-chart CSV conversion tests.
`tests/data/demographics/demog_in.csv`	Minor data edit (string field content change).
`tests/data/demographics/No_Site_grid.csv`	Adds reference grid output used by updated population-grid tests.
`emod_api/.dtk_tools/...`	Deletes legacy demographics tooling code pulled from `dtk-tools`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T05:14:18Z

emod_api/demographics/service/grid_construction.py

+    print("Creating grid...")



construct() uses print() for status output. This is library code and will emit noise for all callers (including tests/CLI usage) and is hard to control. Prefer the project’s logging approach (e.g., logging.getLogger(__name__) at debug/info) or remove these statements.

Copilot · 2026-02-16T05:14:19Z

emod_api/demographics/demographics.py

@@ -152,50 +152,92 @@ def get_value(row, headers):
            return None


from_csv() defines a nested get_value() helper that is no longer used after the switch away from pandas. This is dead code now and makes the method harder to follow; it should be removed (or re-used if intended).

Copilot · 2026-02-16T05:14:19Z

emod_api/channelreports/channels.py

+                csv_obj.writerow(channel_names)
+                for row_idx in range(self._numTimeSteps):
+                    csv_obj.writerow([self[cname][row_idx] for cname in channel_names])


In the transpose branch, range(self._numTimeSteps) will raise AttributeError because _numTimeSteps is not a ChannelReport attribute (time steps live on self._header.num_time_steps / self.num_time_steps). Also, the previous pandas implementation wrote an explicit timestep index column; this implementation drops it, which is a breaking change for consumers reading the transposed CSV. Use self.num_time_steps and consider writing a leading "timestep" column to preserve the prior format.

Suggested change

csv_obj.writerow(channel_names)

for row_idx in range(self._numTimeSteps):

csv_obj.writerow([self[cname][row_idx] for cname in channel_names])

# Include an explicit timestep column to match the previous pandas-based format

csv_obj.writerow(["timestep"] + list(channel_names))

for row_idx in range(self.num_time_steps):

csv_obj.writerow([row_idx] + [self[cname][row_idx] for cname in channel_names])

Copilot · 2026-02-16T05:14:19Z

tests/test_demographics.py

+            for csv_row in csv_obj:
+                lat_dict[float(csv_row[ni_idx])] = float(csv_row[lat_idx])
+                lon_dict[float(csv_row[ni_idx])] = float(csv_row[lon_idx])
+                pop_dict[float(csv_row[ni_idx])] = float(csv_row[pop_idx])

        # checking if we have the same number of nodes and the number of rows in csv file
-        self.assertEqual(len(csv_df), len(demog.nodes))
-
-        for index, row in csv_df.iterrows():
-            pop = int(row['pop'])
-            lat = float(row['lat'])
-            lon = float(row['lon'])
-            node_id = int(row['node_id'])
-            self.assertEqual(pop, demog.nodes[index].node_attributes.initial_population)
-            self.assertEqual(lat, demog.nodes[index].node_attributes.latitude)
-            self.assertEqual(lon, demog.nodes[index].node_attributes.longitude)
-            self.assertEqual(node_id, demog.nodes[index].forced_id)
+        self.assertEqual(len(pop_dict), len(demog.nodes))
+
+        for node_id in pop_dict:
+            self.assertEqual(pop_dict[node_id], demog.get_node_by_id(node_id).node_attributes.initial_population)
+            self.assertEqual(lat_dict[node_id], demog.get_node_by_id(node_id).node_attributes.latitude)
+            self.assertEqual(lon_dict[node_id], demog.get_node_by_id(node_id).node_attributes.longitude)
+            self.assertEqual(node_id, demog.get_node_by_id(node_id).forced_id)


CSV parsing here stores node_id keys as float(...), but Demographics.get_node_by_id() expects IDs that match the internal int node IDs exactly; using floats will cause lookups to fail and raise UnknownNodeException. Parse node_id as int(...) and keep dict keys as ints (and pass node_id= explicitly for clarity).

Copilot · 2026-02-16T05:14:20Z

emod_api/demographics/service/service.py

+            for row_val in csv_obj:
+                lat.append(float(row_val[lat_idx]))
+                lon.append(float(row_val[lon_idx]))
+                if (pop_idx):


pop_idx is a column index (0-based). The condition if (pop_idx): will be false when the population column is the first column (index 0), causing the code to ignore the provided population values and fall back to 5.5. Use an explicit pop_idx is not None check instead (and keep the index value even when it is 0).

Suggested change

if (pop_idx):

if pop_idx is not None:

Copilot · 2026-02-16T05:14:20Z

emod_api/demographics/service/grid_construction.py

@@ -85,59 +87,13 @@ def construct(x_min, y_min, x_max, y_max):
            gcids.append(cell_id)
            idy += 1


The grid cell IDs stored in grid_dict['gcid'] are off by one compared to the IDs stored in grid_id_2_cell_id: you set grid_id_2_cell_id[grid_id] = cell_id and then increment cell_id before appending to gcids. This makes the gcid list inconsistent with the mapping used by point_2_grid_cell_id_lookup, and breaks downstream joins/aggregation that assume the IDs match.

Copilot · 2026-02-16T05:14:20Z

emod_api/demographics/demographics.py

+                node = Node(lat_val, lon_val, pop_val, node_attributes=node_attributes, forced_id=forced_id, meta=dict())
+                out_nodes.append(node)
+
+        print(out_nodes)


print(out_nodes) will dump the entire node list to stdout every time Demographics.from_csv() is called, which can be extremely large and will pollute logs/console output. This should be removed or replaced with a bounded debug log message (e.g., count of nodes).

Suggested change

print(out_nodes)

print(f"Created {len(out_nodes)} nodes from {input_file}.")

Copilot · 2026-02-16T05:14:20Z

tests/test_demographics.py

+        with open(input_file) as csv_file:
+            csv_obj = csv.reader(csv_file, dialect='unix')
+            headers = next(csv_obj, None)
+            ni_idx = headers.index('node_id')
+            br_idx = headers.index('birth_rate')
+            for csv_row in csv_obj:
+                br_dict[float(csv_row[ni_idx])] = float(csv_row[br_idx])
+
+        # Compare csv birth rates with demographics file
+        for node_id in br_dict:
+            birth_rate = br_dict[node_id]
            self.assertAlmostEqual(demog.get_node_by_id(node_id=node_id).birth_rate, birth_rate)


Same issue as above: br_dict uses float(csv_row[ni_idx]) for node IDs, but get_node_by_id() requires the exact int ID. Use int(...) for node_id keys to avoid UnknownNodeException during the assertions.

Copilot · 2026-02-16T05:14:21Z

tests/test_migration.py

+        num_row = 0
+        for csv_row in csv_obj:
+            for row_val in csv_row:
+                assert(len(row_val) > 0)


This test uses a bare assert(...) inside a unittest.TestCase method. Prefer self.assertTrue(...) / self.assertGreater(...) so the assertion is never skipped under python -O and failure messages are consistent with the rest of the suite.

Suggested change

assert(len(row_val) > 0)

self.assertGreater(len(row_val), 0)

kfrey-idm added 7 commits February 15, 2026 11:14

Working

b9f14fe

Temp

26f22e1

Working

16e7222

Working

a178cfe

Working

04fa073

Working

66daf63

Linting

b5daee8

kfrey-idm self-assigned this Feb 16, 2026

kfrey-idm added 3 commits February 15, 2026 20:44

Ignore non-utf8 errors

3dac018

Don't use loadtxt

b612fff

No need for numpy

13ae801

kfrey-idm requested a review from Copilot February 16, 2026 05:06

Copilot started reviewing on behalf of kfrey-idm February 16, 2026 05:07 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

CoPilot review

7dcca18

kfrey-idm merged commit 877e51a into EMOD-Hub:main Feb 16, 2026
12 checks passed

kfrey-idm deleted the kfrey_pandas branch February 16, 2026 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Remove pandas#77

Remove pandas#77
kfrey-idm merged 11 commits intoEMOD-Hub:mainfrom
kfrey-idm:kfrey_pandas

kfrey-idm commented Feb 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -152,50 +152,92 @@ def get_value(row, headers):
		return None

-                csv_obj.writerow(channel_names)
-                for row_idx in range(self._numTimeSteps):
-                    csv_obj.writerow([self[cname][row_idx] for cname in channel_names])
+                # Include an explicit timestep column to match the previous pandas-based format
+                csv_obj.writerow(["timestep"] + list(channel_names))
+                for row_idx in range(self.num_time_steps):
+                    csv_obj.writerow([row_idx] + [self[cname][row_idx] for cname in channel_names])

		@@ -85,59 +87,13 @@ def construct(x_min, y_min, x_max, y_max):
		gcids.append(cell_id)
		idy += 1

	print(out_nodes)
	print(f"Created {len(out_nodes)} nodes from {input_file}.")

Comments

Conversation

kfrey-idm commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kfrey-idm commented Feb 16, 2026 •

edited

Loading