Release 5.0 in the works #511
dgunning
announced in
Announcements
Replies: 4 comments 1 reply
-
|
an excellent package, well done man! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Brilliant idea, great work you are doing! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
great work. I would love to test. |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
I would love to test too. So many tokens wasted with my LLM for sub tables Honestly this is one of the Python packages that shows it's possible to balance AI assistance with a personal understanding. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am planning on doing a Release 5.0 in a couple of weeks. The most important change is a rewritten HTML parser
edgar.documentswhich will replace the current one inedgar.files.html.HTML parsing is an important part of edgartools. It is used to extract clean text and tables from filing documents and is an important feature in this new LLM era where users need to feed clean text to LLMs for analysis. Also important is the ability to segment html, find subsections of filings e.g. Item 4 in a 10-K.
The old parser was extraordinarily complex and we were pushing the limit of fixing bugs in the code because of the need to understand the code and find and fix issues given the design. Now the new parser is also complex, but it was built from scratch using AI assistance (using Claude) and some of the old design flaws were ironed out.
That being said testing the new parser and making sure it is at least as good and better than the old parser took a lot of time, which is why it has been in the codebase without switchover for a few months. I think it is time to cutover and invite testing.
The plan is to do a couple 5.0 release candidates and invite testing. The old parser will remain a of a while (deprecated) but it will be available in case there are issues with the new parser. This discussion is to invite feedback on the release
Beta Was this translation helpful? Give feedback.
All reactions