-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Bug report
Bug description:
Description:
There is a potential resource leak in the xml.etree.ElementTree.iterparse
function when the parsing loop is exited prematurely using a break
or return
statement. In such cases, the underlying XML parser's close()
method is never called, leading to unreleased resources.
Details:
In the implementation of iterparse
, the function eventually calls pullparser._close_and_return_root()
at line 1259 in ElementTree.py, which internally invokes the parser's close()
method, ensuring the parser is properly closed.
However, if the user exits the loop early, the generator function returns at line 1253, and pullparser._close_and_return_root()
is never called. This omission results in the parser's close()
method not being executed.
Steps to Reproduce:
- Create an XML file (example.xml):
<?xml version="1.0"?>
<root>
<greeting>Hello, World!</greeting>
<farewell>Goodbye!</farewell>
</root>
- Run the following Python script (example.py):
import xml.etree.ElementTree as ET
class MyParser(ET.XMLParser):
def feed(self, data):
print('--> XMLParser.feed', data)
return super().feed(data)
def close(self):
print('--> XMLParser.close')
return super().close()
it = ET.iterparse('./example.xml', parser=MyParser())
def parseXml():
for _, elem in it:
if elem.tag == "greeting":
print("Greeting:", elem.text)
# Uncomment one of the following lines to exit the loop early
# return
# break
elif elem.tag == "farewell":
print("Farewell:", elem.text)
elem.clear()
parseXml()
- Observe the Output:
- When the loop runs to completion (no early exit):
--> XMLParser.feed b'<?xml version="1.0"?>\n<root>\n <greeting>Hello, World!</greeting>\n <farewell>Goodbye!</farewell>\n</root>'
Greeting: Hello, World!
Farewell: Goodbye!
--> XMLParser.close
- When exiting the loop early (uncomment return or break):
--> XMLParser.feed b'<?xml version="1.0"?>\n<root>\n <greeting>Hello, World!</greeting>\n <farewell>Goodbye!</farewell>\n</root>'
Greeting: Hello, World!
Impact:
Failing to call close() on the parser can lead to resource leaks, especially when dealing with large XML files or multiple parsing operations. This may also result in incomplete parsing and unexpected behavior.
Possible Solutions:
• Ensure Parser Closure in iterparse:
Modify the iterparse function to guarantee that pullparser._close_and_return_root()
is called, even if the loop is exited early. Adding it to the finally
block could work.
• Update Documentation:
If modifying the code is not feasible, the documentation should clearly state that users must handle the parser's closure when exiting the loop prematurely.
Thank you for considering this issue. Please let me know if additional information is required.
CPython versions tested on:
3.12
Operating systems tested on:
Linux, macOS