Skip to content

WIP SRUopener #682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

WIP SRUopener #682

wants to merge 2 commits into from

Conversation

dr0i
Copy link
Member

@dr0i dr0i commented Mar 28, 2025

This is a draft and WIP.
@TobiasNx you can use it for functional testing.

Resolves #510.

@dr0i dr0i requested a review from TobiasNx March 28, 2025 12:18
@dr0i dr0i changed the title WIP SRUopener (#510) WIP SRUopener Mar 28, 2025
@dr0i dr0i moved this to Review in Metafacture Mar 28, 2025
@TobiasNx
Copy link
Contributor

Nice seems to work. +1
The printed logs are a little bit esoteric:

, startRecord=1, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=1001
, startRecord=1001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=2001
, startRecord=2001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=3001
, startRecord=3001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=4001
, startRecord=4001, maximumRecords=1000, istream.length=437

@TobiasNx
Copy link
Contributor

@dr0i is still in review?

@dr0i
Copy link
Member Author

dr0i commented Apr 10, 2025

As we found out in #510 this PR needs a complete redesign.

@dr0i dr0i force-pushed the 510-addSruOpener branch from ecd9c8c to c3f3ad6 Compare April 10, 2025 13:32
@dr0i dr0i force-pushed the 510-addSruOpener branch from 84d6845 to 3dc0416 Compare June 2, 2025 14:24
@dr0i
Copy link
Member Author

dr0i commented Jun 2, 2025

@TobiasNx can you do functional tests before I go on here? Have a look at the @Description to see how it works (hint: "stream" based, i.e. other than the OAI-PMH opener works atm.)
I've added the class to flux-commands.
[edit]: and ignore the failing editorconfigChecker for now.

@dr0i dr0i requested a review from TobiasNx June 2, 2025 14:38
@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 4, 2025

@dr0i I tried to install the dist: https://metafacture.github.io/metafacture-documentation/docs/flux/Flux-User-Guide.html#build-from-local-distribution to try the runner for functional testing

but it runs into errors:

$ ./gradlew installDist

> Configure project :
HEAD has no annotated tags
No SCM tag found. Making a snapshot build
Feature branch found
Version is feature-510-addSruOpener-SNAPSHOT

[Incubating] Problems report is available at: file:///home/user/git/metafacture-core/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.13/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

When I test the flux.sh then it outputs the following:

$ /home/user/git/metafacture-core/metafacture-runner/build/install/metafacture-core/flux.sh
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.metafacture.runner.Flux.main(Flux.java:62)
Caused by: org.metafacture.commons.reflection.ReflectionException: Class not found: org.metafacture.io.
        at org.metafacture.commons.reflection.ReflectionUtil.loadClass(ReflectionUtil.java:70)
        at org.metafacture.commons.reflection.ObjectFactory.loadClassesFromMap(ObjectFactory.java:57)
        at org.metafacture.flux.parser.FluxProgramm.<clinit>(FluxProgramm.java:54)
        ... 1 more
Caused by: java.lang.ClassNotFoundException: org.metafacture.io.
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
        at org.metafacture.commons.reflection.ReflectionUtil.loadClass(ReflectionUtil.java:67)
        ... 3 more

Can you help? (I tested the current master to compare, there $ ./gradlew installDist and $ /home/user/git/metafacture-core/metafacture-runner/build/install/metafacture-core/flux.sh works)

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 4, 2025
@dr0i
Copy link
Member Author

dr0i commented Jun 5, 2025

Ah, I accidently removed the TarReader.
Try again please.

@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 5, 2025
@dr0i dr0i force-pushed the 510-addSruOpener branch from 4a00838 to ac80718 Compare June 10, 2025 13:02
@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 18, 2025

Current version stucks in an endless SRU request loop starting by 1 again after finishing all request does not matter if a total number of records is given or not:

e.g.

"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;
"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10",total="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;

Both result in, see that recordPosition 1 is turning up again after the expected last recordPosition 8:

<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
    </datafield>
    <datafield ind1=" " ind2=" " tag="035">
      <subfield code="a">(DE-101)042278333</subfield>
...
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">k</subfield>
      <subfield code="a">Internationaler Sozialistenkongress</subfield>
      <subfield code="0">(DE-588c)4021089-3</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>8</recordPosition></record></records><echoedSearchRetrieveRequest><version>1.1</version><query>WOE=sozialistenkongress and COD=s</query><xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/><startRecord>6</startRecord><maximumRecords>5</maximumRecords><recordSchema>MARC21plus-xml</recordSchema></echoedSearchRetrieveRequest></searchRetrieveResponse>
<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
...
    </datafield>
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">c</subfield>
      <subfield code="a">Bern / Internationaler Sozialistenkongress &lt;1919&gt;</subfield>
      <subfield code="0">(DE-588c)4227833-8</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>1</recordPosition></record><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">1267605979</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20230329111229.0</controlfield>
    <controlfield tag="008">220908n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">1267605979</subfield>
      <subfield code="0">http://d-nb.info/gnd/1267605979</subfield>
      <subfield code="2">gnd</subfield>
...

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 18, 2025
Copy link
Contributor

@TobiasNx TobiasNx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that SRU opener stucks in infinite loop. See: #682 (comment)

@dr0i
Copy link
Member Author

dr0i commented Jun 20, 2025

The inifinite loop should be fixed with b92238b, please try again @TobiasNx .

@dr0i dr0i removed their assignment Jun 20, 2025
@TobiasNx TobiasNx self-requested a review June 20, 2025 11:37
Copy link
Contributor

@TobiasNx TobiasNx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice looks good! :) +1

@TobiasNx
Copy link
Contributor

Current version stucks in an endless SRU request loop starting by 1 again after finishing all request does not matter if a total number of records is given or not:

e.g.

"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;
"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10",total="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;

Both result in, see that recordPosition 1 is turning up again after the expected last recordPosition 8:

<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
    </datafield>
    <datafield ind1=" " ind2=" " tag="035">
      <subfield code="a">(DE-101)042278333</subfield>
...
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">k</subfield>
      <subfield code="a">Internationaler Sozialistenkongress</subfield>
      <subfield code="0">(DE-588c)4021089-3</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>8</recordPosition></record></records><echoedSearchRetrieveRequest><version>1.1</version><query>WOE=sozialistenkongress and COD=s</query><xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/><startRecord>6</startRecord><maximumRecords>5</maximumRecords><recordSchema>MARC21plus-xml</recordSchema></echoedSearchRetrieveRequest></searchRetrieveResponse>
<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
...
    </datafield>
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">c</subfield>
      <subfield code="a">Bern / Internationaler Sozialistenkongress &lt;1919&gt;</subfield>
      <subfield code="0">(DE-588c)4227833-8</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>1</recordPosition></record><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">1267605979</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20230329111229.0</controlfield>
    <controlfield tag="008">220908n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">1267605979</subfield>
      <subfield code="0">http://d-nb.info/gnd/1267605979</subfield>
      <subfield code="2">gnd</subfield>
...

Both now logs only one object batches and only 8 records are fetched. This is good!

@TobiasNx
Copy link
Contributor

One think that just came to my mind is that the sru opener needs the provide to provide a user agent

@dr0i dr0i assigned dr0i and unassigned TobiasNx Jun 23, 2025
@dr0i dr0i moved this from Review to Working in Metafacture Jun 23, 2025
@dr0i
Copy link
Member Author

dr0i commented Jun 23, 2025

Code review: @fsteeg

@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 24, 2025

This seems to be the same issue as in #694

Details

It seems that there is some kind of broken xml that is passed through:

"https://services.dnb.de/sru/zdb"
| open-sru(query="dnb.isil%3DDE-Sol1",RecordSchema="MARC21plus-xml",Version="1.1",total="2")
| decode-xml
| handle-generic-xml
| encode-json // same is true for encode-yaml
| print
;

This creates exzessive indetations.

e.g.:

                                                            "datafield" : {
                                                                "ind1" : " ",
                                                                "ind2" : "2",
                                                                "tag" : "852",
                                                                "subfield" : {
                                                                  "code" : "b",
                                                                  "value" : "Fachbibliothek Altertumswissenschaften, Zeitschriften"
                                                                },
                                                                "subfield" : {
                                                                  "code" : "9",
                                                                  "value" : "09"
                                                                }
                                                              },
                                                              "datafield" : {
                                                                "ind1" : " ",
                                                                "ind2" : " ",
                                                                "tag" : "852",
                                                                "subfield" : {
                                                                  "code" : "a",
                                                                  "value" : "AT-UBI"
                                                                }
                                                              },
                                                              "datafield" : {
                                                                "ind1" : "0",
                                                                "ind2" : "0",
                                                                "tag" : "859",
                                                                "subfield" : {
                                                                  "code" : "8",
                                                                  "value" : "1.1\\x"
                                                                },
                                                                "subfield" : {
                                                                  "code" : "i",
                                                                  "value" : "2011"
                                                                }
                                                              },
                                                              "datafield" : {
                                                                "ind1" : "3",
                                                                "ind2" : "0",
                                                                "tag" : "866",
                                                                "subfield" : {
                                                                  "code" : "a",
                                                                  "value" : "2011(2012)"
                                                                }
                                                              }
                                                            }
,
                                                            "" : {
                                                              "type" : "Holdings",
                                                              "leader" : {
                                                                "value" : "00000ny  a22000003n 4500"
                                                              },
                                                              "controlfield" : {
                                                                "tag" : "001",
                                                                "value" : "282147888"
                                                              },
                                                              "controlfield" : {
                                                                "tag" : "003",
                                                                "value" : "DE-101"
                                                              },
                                                              "controlfield" : {
                                                                "tag" : "005",
                                                                "value" : "20240131214804.0"
                                                              },
                                                              "controlfield" : {
                                                                "tag" : "008",
                                                                "value" : "140130||||||||||||||||ger|||||||"
                                                              },

Using list-fix-paths hints that there seems to be something wrong with the incoming xml:

"https://services.dnb.de/sru/zdb"
| open-sru(query="dnb.isil%3DDE-Sol1",RecordSchema="MARC21plus-xml",Version="1.1",total="2")
| decode-xml
| handle-generic-xml
| list-fix-paths
| print
;

Exception:

Exception in thread "main" java.lang.IllegalStateException: Entity starts and ends are not balanced
        at org.metafacture.metafix.Metafix.endRecord(Metafix.java:321)
        at org.metafacture.metafix.MetafixStreamAnalyzer.endRecord(MetafixStreamAnalyzer.java:91)
        at org.metafacture.metafix.ListFixPaths.endRecord(ListFixPaths.java:31)
        at org.metafacture.xml.GenericXmlHandler.endElement(GenericXmlHandler.java:200)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
        at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
        at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:89)
        at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:44)
        at org.metafacture.io.SruOpener.process(SruOpener.java:171)
        at org.metafacture.io.SruOpener.process(SruOpener.java:39)
        at org.metafacture.flux.parser.StringSender.process(StringSender.java:43)
        at org.metafacture.flux.parser.Flow.start(Flow.java:118)
        at org.metafacture.flux.parser.FluxProgramm.start(FluxProgramm.java:168)
        at org.metafacture.runner.Flux.main(Flux.java:87)

@dr0i
Copy link
Member Author

dr0i commented Jun 27, 2025

@TobiasNx Can you update here that this is no bug but caused by marcxmlplus (or so) ?

@TobiasNx
Copy link
Contributor

@dr0i The behaviour I reported was not the problem from the workshop. But the identation behaviour i reported is related to handle-generic-xml so it is no bug of the SRU opener.

dr0i added 2 commits July 10, 2025 13:27
Every single output is a valid XML by itself.
@dr0i dr0i force-pushed the 510-addSruOpener branch from 841fae3 to 65e7592 Compare July 10, 2025 11:28
@dr0i
Copy link
Member Author

dr0i commented Jul 10, 2025

Hi @blackwinter if you have some time: can you implement tests here? Functional-wise is the modul ready.

@dr0i dr0i assigned blackwinter and unassigned dr0i Jul 10, 2025
@dr0i dr0i moved this from Working to Selected in Metafacture Jul 10, 2025
Copy link
Member

@blackwinter blackwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass and left some comments. We should discuss tests after the open questions are resolved.

Comment on lines +22 to +23
api project(':metafacture-formatting')
api project(':metafacture-xml')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these added dependencies needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails Checkstyle check.

@@ -0,0 +1,234 @@
/* Copyright 2013 Pascal Christoph.
* Licensed under the Eclipse Public License 1.0 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the different license? Metafacture is APL.

private int maximumRecords = MAXIMUM_RECORDS;
private int startRecord = START_RECORD;
private int totalRecords = Integer.MAX_VALUE;
int numberOfRecords = Integer.MAX_VALUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this variable package-private?

*
* @param totalRecords total number of records to be retrieved
*/
public void setTotal(final String totalRecords) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these String setters instead of int?

DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document xmldoc = docBuilder.parse(inputStreamOfURl);

Transformer t = TransformerFactory.newInstance().newTransformer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

int nextRecordPosition = getIntegerValueFromElement(xmldoc,"nextRecordPosition", totalRecords);

recordsRetrieved = recordsRetrieved + nextRecordPosition - recordPosition;
startRecord = nextRecordPosition; // grenzwert : wenn maximumRcords > als in echt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translate comment, fix typo.


private InputStream retrieveUrl(StringBuilder srUrl, int startRecord, int maximumRecords) throws IOException {
final URL urlToOpen =
new URL(srUrl.toString() + "&maximumRecords=" + maximumRecords + "&startRecord=" + startRecord);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximumRecords is invariant and could be set on the base URL (srUrl).

Comment on lines +64 to +68
int offset = 0;
for (int i = 0; i < size; ++i) {
resultCollector.append(buffer, offset, size - offset);
offset = i + 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this loop supposed to do? It seems to append all suffixes of buffer to resultCollector, thus resulting in an OOM in test().

Suggested change
int offset = 0;
for (int i = 0; i < size; ++i) {
resultCollector.append(buffer, offset, size - offset);
offset = i + 1;
}
resultCollector.append(buffer, 0, size);


while (!stopRetrieving && recordsRetrieved < totalRecords && (startRecord < numberOfRecords)) {
InputStream inputStream = getXmlDocsViaSru(srUrl);
getReceiver().process(new InputStreamReader(inputStream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the repercussions of passing each page as an individual document to downstream consumers? Would it make more sense to pass either the result as a whole, or each record individually?

@blackwinter blackwinter assigned dr0i and unassigned blackwinter Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Selected
Development

Successfully merging this pull request may close these issues.

Add SRU opener / open-sru
3 participants