Skip to content

mrc file with utf-8 BOM fails #161

@patrickzurek

Description

@patrickzurek

JIRA issue created by: rcook
Originally opened: 2012-06-26 09:32 AM

Issue body:
GC Issue http://code.google.com/p/xcoaitoolkit/issues/detail?id=86 and there are attachments

Reported by project member banderson@library.rochester.edu, Jul 21, 2011

The attached file starts with 3 bytes (EF BB BF)
http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

When I run convertload.sh on it, I get:

ERROR - [LIB] MarcException unable to parse record length. NumberFormatException For input string: "03".

I'm not sure if we should support this or not, but let's decide.

randy_urresearch.mrk

28.0 KB Download
Delete comment
Comment 1 by project member rcook@library.rochester.edu, Jul 21, 2011

Nate reported a possibly similar failure with URResearch/IR+. The reason for the failure in IR+ was due to the byte order mark embedded in the file. I’m guessing the same error would occur with XC since it uses marc4j as well. If this is the same issue, Nate offered to give some advice on how to fix this issue. Please let me know if it is similar and we can get the correct discussions going.

Delete comment
Comment 2 by project member banderson@library.rochester.edu, Jul 21, 2011

Yes, this is the same issue that Nate encountered. I just spoke w/ him. He modified the file using marcedit to make it work. Steps involved:

  1. open marcedit (I used version 5.5.4218.36332)
  2. File -> MARC Tools -> MarcBreaker
    a) Input File: point to attached file
    b) Output File: give it a new name)
    c) check "Translate to UTF-8"
    d) click "Execute"
  3. Open the new file in MARCEditor
    a) File -> Compile file into MARC
    b) save the file somewhere (this file will then work in the oai-toolkit)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions