Skip to content

Experimental optimisations, proposed by ChatGPT. #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 7, 2025

Conversation

neilwalkinshaw
Copy link
Contributor

@neilwalkinshaw neilwalkinshaw commented Jul 11, 2025

Prompt: "Can you speed this up?", pasting CausalDag.enumerate_minimal_adjustment_sets

Summary of changes:

moral_graph[t] instead of nx.neighbors(...)
Accesses adjacency dict directly, faster in memory

update(...) instead of union with comprehension
Avoids building intermediate sets

Eliminated intermediate list conversions
Avoids unnecessary copies

Variable renaming (pbd_graph, ancestor_graph, etc.)
Improves clarity and prevents redundant calls

Same applied to list_all_min_sep

Summary of changes:

graph[node] instead of nx.neighbors(graph, node)
Faster adjacency access in memory

treatment_component = ... break
Avoids unnecessary iteration after finding the component

sample(sorted(...), 1)[0]
Returns a value, not a set, avoiding later unpacking

Removed repeated set(...) wrapping
Reduces GC and memory allocations

Same applied to constructive_backdoor_criterion

Avoids repeated set(self.nodes)
self.nodes is probably already a set-like iterable

Combines descendant updates efficiently
Reduces overhead of set.union with unpacking

Avoids constructing logger message unless needed
Significant savings if logging level > INFO

Clearer and faster condition check with & (set intersection)
Cleaner than issubset(difference(...))

Copy link

github-actions bot commented Jul 11, 2025

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON black 32 0 0.95s
✅ PYTHON pylint 32 0 5.45s

See detailed report in MegaLinter reports

MegaLinter is graciously provided by OX Security

Copy link

codecov bot commented Jul 11, 2025

Codecov Report

❌ Patch coverage is 98.75000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.67%. Comparing base (ad50cfb) to head (9816b98).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
causal_testing/specification/causal_dag.py 98.63% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #337      +/-   ##
==========================================
- Coverage   95.78%   95.67%   -0.12%     
==========================================
  Files          27       27              
  Lines        1638     1618      -20     
==========================================
- Hits         1569     1548      -21     
- Misses         69       70       +1     
Files with missing lines Coverage Δ
causal_testing/main.py 96.17% <100.00%> (ø)
...sal_testing/surrogate/causal_surrogate_assisted.py 100.00% <100.00%> (ø)
causal_testing/testing/metamorphic_relation.py 100.00% <100.00%> (ø)
causal_testing/specification/causal_dag.py 98.88% <98.63%> (-0.62%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 756203c...9816b98. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jmafoster1 jmafoster1 marked this pull request as ready for review August 1, 2025 15:29
@jmafoster1 jmafoster1 requested a review from f-allian August 1, 2025 15:35
@jmafoster1
Copy link
Contributor

I've been through this. Most of the optimisations were fine. Some were a little odd. I also took the opportunity to add some extra ones of my own. Also, most notably, I made this class a proper subclass of nx.DiGraph so we don't have to keep accessing .graph all the time. It doesn't make it any faster, but it does make it much cleaner.

Copy link
Contributor

@f-allian f-allian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions look all reasonable to me. This is exactly what I was looking into some time ago when I tried to speed up the identification for large DAGs (see related issue #259). The use of generators makes sense here, and removing the NetworkX bottlenecks will also likely speed things up. But without doing a detailed profiling analysis into the before/after, we won't know exactly how much this will have optimised the identification of large DAGs.

@jmafoster1
Copy link
Contributor

we won't know exactly how much this will have optimised the identification of large DAGs.

My admittedly limited test runs indicate not much! 🙃 But at least the code is cleaner now.

@jmafoster1 jmafoster1 merged commit 0ad05e2 into main Aug 7, 2025
22 checks passed
@jmafoster1 jmafoster1 deleted the fandango_experiment branch August 7, 2025 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants