Skip to content

Conversation

@timosachsenberg
Copy link
Collaborator

@timosachsenberg timosachsenberg commented Dec 30, 2025

Summary

  • Add support for using C++ std::hash<T> specializations directly in Python bindings via the wrap-hash directive
  • Add cpp_hash declaration (aliased from std::hash) in generated Cython code
  • Support # wrap-hash:\n# std syntax to automatically use std::hash<T>
  • Maintain backward compatibility with existing expression-based wrap-hash

Test plan

  • Tested with OpenMS pyopenms build
  • Verified hash functions work correctly for Peak1D, Peak2D, ChromatogramPeak, AASequence, EmpiricalFormula, PeptideHit, Software, Adduct
  • Verified equal objects produce equal hashes
  • Verified objects work correctly in Python set and dict

Usage

In .pxd files:

cdef cppclass MyClass:
    # wrap-hash:
    #  std

This generates:

def __hash__(self):
    cdef cpp_hash[MyClass] hasher
    return hasher(deref(self.inst.get()))

The cpp_hash name is used instead of hash to avoid conflicts with Python's built-in hash() function.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Flexible hash generation for wrapped C++ classes (expression-based or std::hash-backed) and improved integral-type handling for index/bounds checks.
  • Documentation
    • Expanded wrap-hash guidance with two-mode descriptions and usage examples.
  • Tests
    • New test suite validating hash behavior, equality, set/dict semantics, and hash consistency across scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

This commit adds support for using C++ std::hash<T> specializations
directly in Python bindings via the wrap-hash directive.

Changes:
- Add `cpp_hash` declaration (aliased from std::hash) in generated code
- Support `# wrap-hash:\n#  std` syntax to automatically use std::hash<T>
- Maintain backward compatibility with existing expression-based wrap-hash

Usage in .pxd files:
```
cdef cppclass MyClass:
    # wrap-hash:
    #  std
```

This generates:
```python
def __hash__(self):
    cdef cpp_hash[MyClass] hasher
    return hasher(deref(self.inst.get()))
```

The `cpp_hash` name is used instead of `hash` to avoid conflicts with
Python's built-in hash() function.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

📝 Walkthrough

Walkthrough

Adds std::hash support to the wrapper generator: CodeGenerator now emits either a std::hash-based hash or an expression-based hash depending on wrap_hash. Introduces a Cython extern for std::hash, a helper to detect integral C++ types, test headers/pyd and tests exercising hashing behavior.

Changes

Cohort / File(s) Summary
Code generation & std::hash integration
autowrap/CodeGenerator.py
Branches hash generation for wrapped classes: if wrap_hash[0] is "std" or empty, emit a std::hash-based __hash__ using a cpp_hash wrapper; otherwise emit an expression-based __hash__ returning hash(deref(self.inst.get()).<expr>). Adds _is_integral_type() helper used for bounds checks. Inserts a public Cython extern declaration (cppclass cpp_hash "std::hash" [T]) into default C imports.
C++ test types
tests/test_files/hash_test.hpp
New C++ header declaring ExprHashClass, StdHashClass, TemplatedHashClass<T> and std::hash specializations for StdHashClass and TemplatedHashClass<int>.
Cython declarations for tests
tests/test_files/hash_test.pxd
New Cython extern declarations binding the three C++ test classes for wrapper generation.
Runtime tests
tests/test_hash.py
New Python test module that generates the wrapper, imports it, and validates expression-based and std::hash-based hashing semantics (consistency, equality, set/dict behavior) across the three test classes.
Documentation
docs/README.md
Expanded documentation for wrap-hash directive: two hashing modes (expression and std::hash), requirements (equality), examples, and guidance.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • jpfeuffer

Poem

🐰 I nibble bytes and hop through code,
I wrap your hashes down the road.
Std::hash or expr in hand,
Bindings snug across the land.
Hooray — the wrappers bounce with glee!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.69% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding std::hash support to the wrap-hash directive, which is the primary feature introduced across all modified files.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
autowrap/CodeGenerator.py (1)

1689-1695: Consider separating unrelated changes into different PRs.

The _is_integral_type helper method is used for operator[] bounds checking, which appears unrelated to the std::hash support feature described in the PR title and objectives. While the implementation is correct, bundling unrelated improvements can make code review more difficult and obscure git history.

For maintainability, consider keeping PRs focused on a single logical change. If this bounds checking improvement was intentional, it might be worth mentioning in the PR description.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18a9698 and d36d4a5.

📒 Files selected for processing (1)
  • autowrap/CodeGenerator.py
🧰 Additional context used
🧬 Code graph analysis (1)
autowrap/CodeGenerator.py (1)
autowrap/Code.py (1)
  • add (56-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (==3.1.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.10)
  • GitHub Check: test (==3.1.0, 3.11)
  • GitHub Check: test (==3.1.0, 3.12)
  • GitHub Check: test (==3.2.0, 3.11)
  • GitHub Check: test (==3.2.0, 3.13)
  • GitHub Check: test (==3.2.0, 3.12)
🔇 Additional comments (2)
autowrap/CodeGenerator.py (2)

2106-2114: LGTM! Well-designed std::hash declaration.

The cpp_hash declaration correctly wraps std::hash with proper template syntax, exception handling, and naming that avoids conflicts with Python's built-in hash(). The signature matches C++ requirements perfectly.


674-700: std::hash specializations are correctly invoked via cpp_hash template, but ensure types using wrap-hash: std have proper C++ specializations.

The implementation correctly declares and uses cpp_hash[$cy_type] from the C++ <functional> namespace. However, no current code in the repository uses wrap-hash: std—existing test cases all use custom hash expressions. If developers add wrap-hash: std to any class definitions, they must ensure those C++ types have corresponding std::hash specializations defined.

- Add hash_test.hpp with C++ classes and std::hash specializations
- Add hash_test.pxd with wrap-hash directive examples:
  - Expression-based hash (regression test for old behavior)
  - std::hash-based hash (new behavior)
  - Template class with std::hash
- Add test_hash.py with comprehensive tests for hash functionality
- Update docs/README.md with wrap-hash documentation:
  - Describe both expression-based and std::hash modes
  - Add examples for both modes
  - Include C++ std::hash specialization example

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Change $cpp_name to $cy_type in the inline comment to match the
actual template parameter used in the generated code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
autowrap/CodeGenerator.py (1)

1689-1695: Good helper for integral type detection.

The _is_integral_type method provides a comprehensive set of integral types for bounds checking in operator[].

Consider adding intptr_t and uintptr_t to the set for completeness, as they're also integral types sometimes used for indexing:

🔎 Optional enhancement
 def _is_integral_type(self, cpp_type):
     """Check if a CppType represents an integral (integer-like) type."""
     integral_base_types = {'int', 'long', 'short', 'char', 'size_t', 'ptrdiff_t',
                            'int8_t', 'int16_t', 'int32_t', 'int64_t',
                            'uint8_t', 'uint16_t', 'uint32_t', 'uint64_t',
-                           'ssize_t'}
+                           'ssize_t', 'intptr_t', 'uintptr_t'}
     return cpp_type.base_type in integral_base_types
docs/README.md (1)

180-188: Clear documentation of the two hash modes.

The updated description effectively explains both expression-based and std::hash-based hashing, including the requirement for operator==.

Consider adding a cautionary note that when using std mode, C++ compilation will fail if std::hash<YourClass> is not specialized. This would help users debug issues more quickly:

🔎 Suggested enhancement
   - **std::hash-based**: Use `std` to leverage the C++ `std::hash<T>` template
     specialization for the class. This requires that `std::hash<YourClass>` is
-    specialized in your C++ code.
+    specialized in your C++ code. If the specialization is missing, C++ 
+    compilation will fail with a template instantiation error.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d36d4a5 and 7be068d.

📒 Files selected for processing (5)
  • autowrap/CodeGenerator.py
  • docs/README.md
  • tests/test_files/hash_test.hpp
  • tests/test_files/hash_test.pxd
  • tests/test_hash.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_hash.py
🧰 Additional context used
🧬 Code graph analysis (2)
autowrap/CodeGenerator.py (1)
autowrap/Code.py (1)
  • add (56-75)
tests/test_files/hash_test.hpp (1)
tests/test_files/wrapped_container_test.hpp (1)
  • h1 (42-46)
🪛 markdownlint-cli2 (0.18.1)
docs/README.md

252-252: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)


256-256: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


264-264: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (==3.2.0, 3.13)
  • GitHub Check: test (==3.2.0, 3.12)
  • GitHub Check: test (==3.1.0, 3.12)
  • GitHub Check: test (==3.2.0, 3.11)
  • GitHub Check: test (==3.1.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.11)
  • GitHub Check: test (==3.1.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.10)
🔇 Additional comments (7)
autowrap/CodeGenerator.py (2)

2107-2115: Correct std::hash exposure via cpp_hash.

The extern declaration properly exposes std::hash<T> as cpp_hash to avoid conflicts with Python's built-in hash() function. The nogil specification and exception handling are appropriate.


674-700: Good implementation of the two-mode hash strategy.

The logic correctly distinguishes between std::hash-based and expression-based hashing. The case-insensitive check for "std" is a nice touch.

When hash_expr is "std" or empty, the generated code assumes std::hash<$cy_type> exists. This requirement is already documented in the README: users must provide a std::hash specialization in their C++ code when using this mode. If a user omits the specialization, the C++ compiler will fail with an explicit template error, making the issue immediately visible—not a silent failure.

docs/README.md (1)

252-281: Excellent examples demonstrating both hash modes.

The code examples clearly show how to use expression-based and std::hash-based hashing, including a complete C++ std::hash specialization. This will be very helpful for users.

tests/test_files/hash_test.hpp (3)

1-27: Well-designed test class for expression-based hashing.

ExprHashClass provides a clean test case for the legacy expression-based hash mode with proper value semantics and equality operators.


29-62: Good std::hash specialization implementation.

StdHashClass and its std::hash specialization correctly demonstrate the new std::hash-based mode. The hash function properly combines id and label hashes using XOR with bit shift, which is a standard approach.


64-93: Excellent coverage of template class hashing.

TemplatedHashClass<T> with its std::hash<TemplatedHashClass<int>> specialization validates that the feature works correctly with template instantiations.

tests/test_files/hash_test.pxd (1)

1-54: Comprehensive test declarations covering both hash modes.

The Cython declarations properly test both expression-based (getValue() for ExprHashClass) and std::hash-based (std for StdHashClass and TemplatedHashClass) hashing. The wrap-instances directive correctly specifies the template instantiation, and wrap-ignore on copy constructors is appropriate.

timosachsenberg added a commit to OpenMS/OpenMS that referenced this pull request Dec 30, 2025
This commit adds portable hash functions using std::hash specializations
for many OpenMS classes, enabling use in unordered containers and Python
set/dict collections.

C++ Changes:
- Add std::hash specializations using FNV-1a algorithm via HashUtils.h
- Classes include: Peak1D, Peak2D, ChromatogramPeak, MobilityPeak1D/2D,
  DPosition, DateTime, AASequence, EmpiricalFormula, PeptideHit,
  ProteinHit, PeptideIdentification, ProteinIdentification, Precursor,
  Product, Software, SpectrumSettings, ChromatogramSettings, CVTerm,
  CVTermList, MetaInfo, MetaInfoInterface, IonSource, IonDetector,
  MassAnalyzer, Residue, Element, Ribonucleotide, ResidueModification,
  ModificationDefinition, DigestionEnzyme, NASequence, Adduct, Compomer,
  ChargePair, ConvexHull2D, ParamValue, DataValue, IsotopeDistribution,
  ReactionMonitoringTransition, IncludeExcludeTarget, AnnotatedMSRun,
  TargetedExperimentHelper nested classes, and more

- Add comprehensive unit tests for all hash implementations

pyOpenMS Changes:
- Add wrap-hash: std directive to 40+ pxd files to expose C++ std::hash
  to Python via autowrap (requires OpenMS/autowrap#227)
- Classes now hashable in Python: can be used in set() and dict()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
timosachsenberg added a commit to OpenMS/OpenMS that referenced this pull request Dec 30, 2025
This commit adds portable hash functions using std::hash specializations
for many OpenMS classes, enabling use in unordered containers and Python
set/dict collections.

C++ Changes:
- Add std::hash specializations using FNV-1a algorithm via HashUtils.h
- Classes include: Peak1D, Peak2D, ChromatogramPeak, MobilityPeak1D/2D,
  DPosition, DateTime, AASequence, EmpiricalFormula, PeptideHit,
  ProteinHit, PeptideIdentification, ProteinIdentification, Precursor,
  Product, Software, SpectrumSettings, ChromatogramSettings, CVTerm,
  CVTermList, MetaInfo, MetaInfoInterface, IonSource, IonDetector,
  MassAnalyzer, Residue, Element, Ribonucleotide, ResidueModification,
  ModificationDefinition, DigestionEnzyme, NASequence, Adduct, Compomer,
  ChargePair, ConvexHull2D, ParamValue, DataValue, IsotopeDistribution,
  ReactionMonitoringTransition, IncludeExcludeTarget, AnnotatedMSRun,
  TargetedExperimentHelper nested classes, and more

- Add comprehensive unit tests for all hash implementations

pyOpenMS Changes:
- Add wrap-hash: std directive to 40+ pxd files to expose C++ std::hash
  to Python via autowrap (requires OpenMS/autowrap#227)
- Classes now hashable in Python: can be used in set() and dict()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
timosachsenberg added a commit to OpenMS/OpenMS that referenced this pull request Dec 30, 2025
This commit adds portable hash functions using std::hash specializations
for many OpenMS classes, enabling use in unordered containers and Python
set/dict collections.

C++ Changes:
- Add std::hash specializations using FNV-1a algorithm via HashUtils.h
- Classes include: Peak1D, Peak2D, ChromatogramPeak, MobilityPeak1D/2D,
  DPosition, DateTime, AASequence, EmpiricalFormula, PeptideHit,
  ProteinHit, PeptideIdentification, ProteinIdentification, Precursor,
  Product, Software, SpectrumSettings, ChromatogramSettings, CVTerm,
  CVTermList, MetaInfo, MetaInfoInterface, IonSource, IonDetector,
  MassAnalyzer, Residue, Element, Ribonucleotide, ResidueModification,
  ModificationDefinition, DigestionEnzyme, NASequence, Adduct, Compomer,
  ChargePair, ConvexHull2D, ParamValue, DataValue, IsotopeDistribution,
  ReactionMonitoringTransition, IncludeExcludeTarget, AnnotatedMSRun,
  TargetedExperimentHelper nested classes, and more

- Add comprehensive unit tests for all hash implementations

pyOpenMS Changes:
- Add wrap-hash: std directive to 40+ pxd files to expose C++ std::hash
  to Python via autowrap (requires OpenMS/autowrap#227)
- Classes now hashable in Python: can be used in set() and dict()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@jpfeuffer jpfeuffer merged commit 2faef85 into master Dec 30, 2025
11 checks passed
timosachsenberg added a commit to OpenMS/OpenMS that referenced this pull request Dec 30, 2025
This commit adds portable hash functions using std::hash specializations
for many OpenMS classes, enabling use in unordered containers and Python
set/dict collections.

C++ Changes:
- Add std::hash specializations using FNV-1a algorithm via HashUtils.h
- Classes include: Peak1D, Peak2D, ChromatogramPeak, MobilityPeak1D/2D,
  DPosition, DateTime, AASequence, EmpiricalFormula, PeptideHit,
  ProteinHit, PeptideIdentification, ProteinIdentification, Precursor,
  Product, Software, SpectrumSettings, ChromatogramSettings, CVTerm,
  CVTermList, MetaInfo, MetaInfoInterface, IonSource, IonDetector,
  MassAnalyzer, Residue, Element, Ribonucleotide, ResidueModification,
  ModificationDefinition, DigestionEnzyme, NASequence, Adduct, Compomer,
  ChargePair, ConvexHull2D, ParamValue, DataValue, IsotopeDistribution,
  ReactionMonitoringTransition, IncludeExcludeTarget, AnnotatedMSRun,
  TargetedExperimentHelper nested classes, and more

- Add comprehensive unit tests for all hash implementations

pyOpenMS Changes:
- Add wrap-hash: std directive to 40+ pxd files to expose C++ std::hash
  to Python via autowrap (requires OpenMS/autowrap#227)
- Classes now hashable in Python: can be used in set() and dict()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants