Skip to content

Commit 19d3be1

Browse files
committed
Handle papers with multiple content types. (#15)
1 parent fa6de1c commit 19d3be1

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

src/parserindexer/journalparser.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,10 @@ def parse_file(self, path):
2222
# (3) Input file is a PDF
2323
parsed = super(JournalParser, self).parse_file(path)
2424
pdf_md = parsed['metadata']
25-
assert pdf_md['Content-Type'] == JournalParser._PDF_TYPE
25+
if type(pdf_md['Content-Type']) == list:
26+
assert JournalParser._PDF_TYPE in pdf_md['Content-Type']
27+
else:
28+
assert pdf_md['Content-Type'] == JournalParser._PDF_TYPE
2629
# Why would we check that it's already been parsed before doing so?
2730
#assert JournalParser._JOURNAL_PARSER in set(pdf_md['X-Parsed-By'])
2831

0 commit comments

Comments
 (0)