Skip to content

Conversation

@HSY-999
Copy link

@HSY-999 HSY-999 commented Dec 20, 2025

related to #3467

i notice that _data_structures.DictionaryObject.clone() was calling self.__class__(), which looked incorrect to me. calling self.__class__() runs both __new__(cls, *args, **kwargs) and __init__(self). we are handling cloning the values in d__._clone(self) later, so we do not need to run __init__(self) constructor as we set the values of the instance using the original object.

this had a side effects by breaking PageObject cloning, but creating the __new__(cls, *args, **kwargs) function and adding the attribute names to the instance seem to resolve it. this should not change behavior of PageObject instantiation, because __init__(self) is called after __new__(cls, *args, **kwargs)

@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.30%. Comparing base (97d47a0) to head (1fe2e98).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3577   +/-   ##
=======================================
  Coverage   97.30%   97.30%           
=======================================
  Files          56       56           
  Lines        9838     9844    +6     
  Branches     1790     1790           
=======================================
+ Hits         9573     9579    +6     
  Misses        157      157           
  Partials      108      108           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have left some comments.

Additionally, could you please clarify whether this has further side effects? Does exposing the new method have the potential to break existing code? Are there any other aspects to keep in mind?

"DictionaryObject",
self._reference_clone(self.__class__(), pdf_dest, force_duplicate),
self._reference_clone(self.__new__(self.__class__), pdf_dest, force_duplicate),
# self.__new__(self.__class__) because we want instance of type __class__,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should go before, not after the code.

pypdf/_page.py Outdated
original_page: "PageObject" # very local use in writer when appending

def __new__(cls, *args: tuple[Any], **kwargs: dict[Any, Any]) -> "PageObject":
"""__new__ used here to make sure instance.pdf attribute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an explaining comment, not a docstring, thus please make it a "simple" comment. Are we able to re-use some of the values used here in the __init__ method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what values do you mean? i would want __init__ to handle initialization of attribute because that is a python standard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values initialized to None here.



def test_merge_page_with_annotation():
# added and adapted from issue #3467
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is superfluous.

@HSY-999
Copy link
Author

HSY-999 commented Dec 23, 2025

Thanks for the PR. I have left some comments.

Additionally, could you please clarify whether this has further side effects? Does exposing the new method have the potential to break existing code? Are there any other aspects to keep in mind?

i do not think it is possible to break code. __new__(cls, *args ,**kwargs) is always call before __init__(self) function. the attributes of PageObject are always set to None or initial value. all places where PageObject attributes are used expect either some value or None.

a potential side effect of the change would be that the values of pdf indirect_reference inline_images are always declared. PageObject could never throw AttributeError, so if behavior in the code base needs AttributeError to throw, it would change that behavior. i do not see that in the code base, however.

for other DictionaryObject, the code in __class__() already call __new__(cls, *args, **kwargs). we should not see different behavior as the key values pair were already being replace in _clone(). the code might be slightly faster because we do not do unnecessary initialization of attribute.

@stefan6419846
Copy link
Collaborator

Thanks for the heads-up. Given your comment about __new__ which should not have to care about the actual parameters, why do we have to initialize the three attributes in __new__ instead of using the default __new__ implementation? I am still not sure whether omitting __init__ does not really have any side effects, especially when it might set internal attributes for further processing or similar.

@HSY-999
Copy link
Author

HSY-999 commented Dec 24, 2025

Thanks for the heads-up. Given your comment about __new__ which should not have to care about the actual parameters, why do we have to initialize the three attributes in __new__ instead of using the default __new__ implementation? I am still not sure whether omitting __init__ does not really have any side effects, especially when it might set internal attributes for further processing or similar.

i review python documentation for __new__(cls). the default object.__new__(cls, *args, **kwargs) do not set the pdf inline_image indirect_reference attributes however, i was not correct about not call __init__ inside __new__. that is fine and expect practice so long as super().__new__(cls, *args, **kwargs) return instance to type cls.

i will rewrite code so PageObject.__init__() is called in __new__.

@HSY-999
Copy link
Author

HSY-999 commented Dec 24, 2025

i will rewrite code so PageObject.__init__() is called in __new__.

this cause side effect of double init when create object like PageObject() so it will not work as solution.

let me reexamine _clone function and see if change needs to be made.

@HSY-999
Copy link
Author

HSY-999 commented Dec 27, 2025

@stefan6419846 i have done more reading the codebase for maybe side effects. it seems that _clone() does not handle copy of attribute to clone objects. for example, calling clone() on a PageObject with self.pdf set on original object does not copy value to clone object. it also never allow that feature with current main branch.

there are more class that do not initialized instance variable also. StreamObject which is child of DictionaryObject have two instance variable do not get initialized and would riase error if accessed.

i want to ask about _clone(), is cloning a object attribute wrong to do? or can i add line to _clone() function to clone __dict__ from original to clone object so attributes are same for attributes other indirect_reference. this fix both raise error and not initialized variable

@stefan6419846
Copy link
Collaborator

Sorry, I am having some trouble to follow your comment. Could you please elaborate and reference some code and/or provide some example code showing the behavior in question?

@HSY-999
Copy link
Author

HSY-999 commented Dec 31, 2025

Sorry, I am having some trouble to follow your comment. Could you please elaborate and reference some code and/or provide some example code showing the behavior in question?

for key, value in vars(src).items():
    if key != "indirect_reference":
        setattr(self, key, value)

i want to place at the end of _clone() in DictionaryObject. i ignore indirect_reference because it set other place. if we add this code, i can remove __new__ function in PageObject.

@stefan6419846
Copy link
Collaborator

Thanks for the explanation. Given that we iterate over all values with your proposed vars() approach, why cannot we use the proper __init__ way directly? Is vars() really cleaner than passing the correct parameters as keyword arguments and avoiding explicitly calling __new__ completely?

@stefan6419846 stefan6419846 added the needs-discussion The PR/issue needs more discussion before we can continue label Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-discussion The PR/issue needs more discussion before we can continue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants