Skip to content

Conversation

@BobTheBuidler
Copy link
Contributor

@BobTheBuidler BobTheBuidler commented Aug 16, 2025

This PR adds a new primitive for all arg combinations of int.to_bytes

@BobTheBuidler BobTheBuidler marked this pull request as ready for review August 17, 2025 16:39
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I will keep this open for a day or two in case @JukkaL has some comments.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second look I think I have some more questions.

// int.to_bytes(length, byteorder, signed=False)
PyObject *CPyTagged_ToBytes(CPyTagged self, Py_ssize_t length, PyObject *byteorder, int signed_flag) {
PyObject *pyint = CPyTagged_StealAsObject(self);
if (!PyLong_Check(pyint)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second thought, all these type checks look unnecessary, normally Python wrappers should do them. You can probably verify this by adding some run tests with Anys in them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what, like this?

def f(x: Any) -> bytes:
    return int.to_bytes(x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on @JukkaL response to a similar question on #19673 I think we can safely remove this check since CPyTagged_StealAsObject guarantees the type

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CPyTagged_StealAsObject is not correct there, since it will transfer the ownership of the parameter, and this can cause a double free. CPyTagged_AsObject will return a new reference which you can decref at the end of the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BobTheBuidler

what, like this?

First, not just the self, second I think you should try more something like this

def to_bytes(n: int, length: int, byteorder: str = "little", signed: bool = False) -> bytes:
    return n.to_bytes(length, byteorder, signed=signed)

x: Any = "no"
bad: Any = "way"
to_bytes(x, bad)

and check that a TypeError will be given even before getting to your specialized code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could implement this test, but wouldn't we then just be testing the standard python-wrapper type validation functionality, as opposed to some specific functionality related to this PR?

I can still add the tests accordingly, I just want to make sure we have the same understanding of things before I proceed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could implement this test, but wouldn't we then just be testing the standard python-wrapper type validation functionality, as opposed to some specific functionality related to this PR?

That's exactly my point. This whole thread started from me saying "On the second thought, all these type checks look unnecessary, normally Python wrappers should do them. You can probably verify this by adding some run tests with Anys in them."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, you don't need to add the tests to the PR, but you can (if you want) to check that I am right by running such a test locally.

assert to_bytes(255, 2, "big") == b'\x00\xff'
assert to_bytes(255, 2, "little") == b'\xff\x00'
assert to_bytes(-1, 2, "big", True) == b'\xff\xff'
assert to_bytes(0, 1, "big") == b'\x00'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also test calling to_bytes() function from interpreted code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilevkivskyi how would I implement that? Is there a good example I can look from?

@BobTheBuidler
Copy link
Contributor Author

1.18 vs this branch

There was a ~5-8% increase in speed for various permutations of the int_to_big_endian benchmark which includes not only a call to to_bytes but a decent bit of other stuff as well

Okay, with the specializer this becomes a 9% improvement for all cases

@ilevkivskyi
Copy link
Member

@BobTheBuidler just in case you didn't notice, some tests are failing now.

@ilevkivskyi
Copy link
Member

@BobTheBuidler please ping me when this is ready for review/merge.

@BobTheBuidler
Copy link
Contributor Author

That's odd. I fixed the kw-only arg in to_bytes stub, but doing so breaks the IR. Is there a way for me to cover this case with a method_op?

@ilevkivskyi this open question is my only remaining blocker, unsure how to navigate this situation

@ilevkivskyi
Copy link
Member

I don't know, maybe add some prints to specialize_int_to_bytes. For starter I think self is not included in CallExpr.args for methods, but maybe this is something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants