Skip to content

Resource Leak in xml.etree.ElementTree.iterparse When Exiting Loop Early #125397

@allrob23

Description

@allrob23

Bug report

Bug description:

Description:

There is a potential resource leak in the xml.etree.ElementTree.iterparse function when the parsing loop is exited prematurely using a break or return statement. In such cases, the underlying XML parser's close() method is never called, leading to unreleased resources.

Details:

In the implementation of iterparse, the function eventually calls pullparser._close_and_return_root() at line 1259 in ElementTree.py, which internally invokes the parser's close() method, ensuring the parser is properly closed.

However, if the user exits the loop early, the generator function returns at line 1253, and pullparser._close_and_return_root() is never called. This omission results in the parser's close() method not being executed.

Steps to Reproduce:

  1. Create an XML file (example.xml):
<?xml version="1.0"?>
<root>
    <greeting>Hello, World!</greeting>
    <farewell>Goodbye!</farewell>
</root>
  1. Run the following Python script (example.py):
import xml.etree.ElementTree as ET

class MyParser(ET.XMLParser):
    def feed(self, data):
        print('--> XMLParser.feed', data)
        return super().feed(data)

    def close(self):
        print('--> XMLParser.close')
        return super().close()

it = ET.iterparse('./example.xml', parser=MyParser())

def parseXml():
    for _, elem in it:
        if elem.tag == "greeting":
            print("Greeting:", elem.text)
            # Uncomment one of the following lines to exit the loop early
            # return
            # break
        elif elem.tag == "farewell":
            print("Farewell:", elem.text)
        elem.clear()

parseXml()
  1. Observe the Output:
  • When the loop runs to completion (no early exit):
--> XMLParser.feed b'<?xml version="1.0"?>\n<root>\n    <greeting>Hello, World!</greeting>\n    <farewell>Goodbye!</farewell>\n</root>'
Greeting: Hello, World!
Farewell: Goodbye!
--> XMLParser.close
  • When exiting the loop early (uncomment return or break):
--> XMLParser.feed b'<?xml version="1.0"?>\n<root>\n    <greeting>Hello, World!</greeting>\n    <farewell>Goodbye!</farewell>\n</root>'
Greeting: Hello, World!

Impact:

Failing to call close() on the parser can lead to resource leaks, especially when dealing with large XML files or multiple parsing operations. This may also result in incomplete parsing and unexpected behavior.

Possible Solutions:

• Ensure Parser Closure in iterparse:
Modify the iterparse function to guarantee that pullparser._close_and_return_root() is called, even if the loop is exited early. Adding it to the finally block could work.

• Update Documentation:
If modifying the code is not feasible, the documentation should clearly state that users must handle the parser's closure when exiting the loop prematurely.

Thank you for considering this issue. Please let me know if additional information is required.

CPython versions tested on:

3.12

Operating systems tested on:

Linux, macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirstdlibStandard Library Python modules in the Lib/ directorytopic-XMLtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions