The following wiki lists all output formats, platforms, and other stakeholders that relate to the development of [[open]] source metadata management system Thoth. Those wikis marked with an asterisk (*) are identified in the WP5 Scoping Report and are indexed as an issue in the Thoth Project. Those wikis marked with a caret (^) have been successfully implemented in Thoth.
The transfer protocols and file specifications listed in this wiki are those which have current or future support by Thoth. Any updates are welcome at info@thoth.pub.
The Thoth User Manual, for publishers and other creators of metadata records in Thoth, can be found here.
Data formats are the formats in which the textual content of an [[object]] are commonly digitally encoded. The Thoth project is agnostic as to which data format is used.
- [[AZW3]]
- [[EPUB]]
- [[HTML]]
- [[KFX]]
- [[MOBI]]
- [[PDF]]
- [[XML]]
Metadata formats are the formats in which the [[metadata]] of an [[object]] are stored. Metadata formats have been developed as part of software packages or as consortial standards and mayb be constructed on top of higher-level metadata syntax. Different stakeholders in the [[scholcomm pipeline]] use different metadata formats for different purposes. Moreover, different stakeholders maintain different subsets of the metadata potentially contained in specific formats. Thoth aims for a holistic approach, being able to export as many formats in as many flavors as possible. [[Dublin Core]] defines minimal metadata requirements that can be expressed in [[RDF]], [[JSON]], or [[XML]] based formats. [[CRediT]]* defines guidelines for the definition of contributor metadata.
- [[CSV]]*^ or [[XLSX]]
- [[KBART]]*^
- [[DCAT]]
- [[JSON]]^
- [[commonmeta.org]]
- [[MAB2]]
- [[MARC 21]]
- [[RDF]]
- [[BIBFRAME 2.0]]*
- [[SKOS]]
- [[Rioxx]]
- [[XML]]
- [[MABXML]]
- [[MARCXML]]*
- [[ONIX 2.1]]*^
- [[ONIX 3.0]]*^
- [[BITS XML]]
- [[OPDS]]
- [[WikiData]]
- [[WikiCite]]
- [[XML]]
- [[OAI-PMH]]
- [[RSS]]
- [[BibTeX]]*^
- [[CSL]]
- [[ENL]]
- [[ENW]]
- [[RefMan]]
- [[RIS]]
Persistent identifiers allow the unique identification of an [[object]] such as a book or chapter, institution, project, or [[maker]].
- [[DOI]]^
- [[ISBN]]^
- [[ISSN]]^
- [[LCCN]]^
- [[OAI Identifier]]
- [[OCLC Number]]^
- [[CrossRef]]*^
- [[DataCite]]
- [[EZID]]
- [[Bowker]] (USA)
- List of National Agencies (Wikipedia)
There are several databases of institution identifiers. The largest is GRID (98,598 records), which is freely downloadable. Funder Registry (21,242 records) is run by Crossref, which provides Crossref funder IDs. An open and community-led option is ROR, whose database is correlated with GRID, Wikidata, and ISNI.
- [[Funder Registry]]*^
- [[GRID]] (superseded by [[ROR]])
- [[ISNI]]
- [[ROR]]*^
- [[Wikidata]]
- [[RAiD]]*
- [[ORCiD]]*^
- [[ISNI]]*
Most subject identifiers are copyrighted schemata with limited reach, especially the schemes of [[Bowker]] and [[Amazon]]. The only open standard is [[Thema]].
- [[BIC]]^ (superseded by [[Thema]])
- [[BISAC]]^
- [[Bowker]]
- [[FAST]]
- [[LCC]]
- [[Thema]]^
- [[DORA]]
- [[Dublin Core]]
- [[FAIR]]
- [[Metadata2020]]
The following categorization in Content Platforms and Distributors and Catalogs and Indices has been adapted from Michael Clarke and Laura Ricci's draft report OA Books Supply Chain Mapping.
| Name | Governance | Membership | OA | non-OA | Ingest |
|---|---|---|---|---|---|
| [[EBSCO ebooks]]*^ | Commercial | Y | Y | Y | Push |
| [[JSTOR]]*^ | Non-Profit | Y | Y | Y | Push |
| [[Project MUSE]]*^ | Non-Profit | Y | Y | Y | Push |
| [[ProQuest Ebook Central]]^ | Commercial | Y | Y | Y | Push |
| Name | Governance | Membership | OA | non-OA | Ingest |
|---|---|---|---|---|---|
| [[Cairn]] | |||||
| [[CEEOL]] | |||||
| [[CLACSO]] | Non-Profit | ? | Y | N | ? |
| [[Érudit]] | Non-Profit | ? | Y | Y | ? |
| [[Finna.fi]] | Non-Profit | ? | Y | Y | ? |
| [[Hathi Trust Digital Library]] | Non-Profit | ? | Y | N | Push |
| [[Internet Archive]]*^ | Non-Profit | N | Y | Y | Push |
| [[OAPEN]]*^ | Non-Profit | Y | Y | N | Push |
| [[Open Bookshelf]] (DPLA) | Non-Profit | ? | Y | Y | ? |
| [[OpenEdition]] | Non-Profit | ? | Y | Y | ? |
| [[Open Library]] | Non-Profit | Y | Y | Y | Push |
| [[Project Gutenberg]] | Non-Profit | N | Y | N | Push |
| [[ScIELO Books]] | Non-Profit | ? | Y | Y | ? |
| [[Standard Ebooks]] | Non-Profit | ? | Y | Y | ? |
| [[TOME]] (figshare) | Non-Profit | ? | Y | Y | ? |
| [[UC Digitalis]] | |||||
| [[Verdensbiblioteket]] | Non-Profit | N | Y | N | ? |
| [[Wikibooks]] | Non-Profit | N | Y | N | ? |
| [[Wikisource]] | Non-Profit | N | Y | N | ? |
| Name | Governance | Membership | OA | non-OA | Ingest |
|---|---|---|---|---|---|
| [[Academia.edu]] | For Profit | Y | Y | Y | Push |
| [[ResearchGate]] | For Profit | Y | Y | Y | Push |
| Name | Governance | Membership | OA | non-OA | Ingest |
|---|---|---|---|---|---|
| [[Aaaaarg.fail]]* | Shadow | N | Y | Y | Push |
| [[Library Genesis]]* | Shadow | N | Y | Y | Push |
| [[Memory of the World]]* | Shadow | N | Y | Y | Push |
Check the List of National and State Libraries (Wiki)
- [[British Library]]*
- [[Library of Congress]]*
- [[National Library of France]]
- [[National Library of Germany]]
- [[National Library of Scotland]]
- [[National Library of Spain]]
- [[National Library of Sweden]]
- [[Europeana]]
- [[Digital Public Library of America]] (DPLA)
- [[Google Play Books]]*
- [[Kindle]]
- [[Kobo]]
- [[Nook]]
Ebook distributors differ from Digital Libraries in the sense that they do not claim to offer a scholarly function, be that to research institutions or to the general public. Distributors repackage and normalize ebook metadata. Most ebook distributors operate some form of monetization scheme, which may not be hospitable to OA books.
- [[Axiell]]
- [[Open Edition]]
- [[Open Research Library]]
- [[OverDrive]]*^
- [[RNIB Bookshare]]
- [[StreetLib]]
- [[Unglue.it]]
Platforms that aggregate and host metadata, promoting discovery of particular titles.
Catalog management systems for individual libraries.
- [[ExLibris]] Alma (ProQuest)
- [[Folio]]
- [[Koha]]
- [[VuFind]]
- [[Evergreen]]
- [[Worldshare]] (OCLC)
Library-agnostic, global content indices.
- [[Central KnowledgeBase]] (ProQuest ExLibris Alma)
- [[BDSLive]]*
- [[EBSCO Knowledge Base]]
- [[JISC Knowledge Base+]] (KB+)
- [[ProQuest 360 Core]]
- [[WikiData]]
- [[WorldCat KnowledgeBase]] (OCLC)
- [[Diamond Discovery Hub]] (OpenAIRE/EOSC/DIAMAS/CRAFT-OA)
- [[DOAB]]*^
- [[Unpaywall]]
Search engines built on top of publication databases tailored toward researches. See Jeroen Bosman's excellent Scholarly search engine comparison for useful information regarding availability of publication types, etc. See also the recent comparative study by Visser, Van Eck, and Waltman.
- [[BASE]] (Bielefeld Academic Search Engine)
- [[Connected Papers]]
- [[Dimensions]]
- [[FAIRsFAIR]]
- [[FatCat]] (Internet Archive)
- [[FederatedFAIR]]
- [[IEEE Xplore]]
- [[Inciteful]]
- [[Internet Archive Scholar]] (Internet Archive)
- [[ISIDORE]] (huma-num)
- [[Lens]]
- [[LexisNexis]]
- [[Library Hub Discover]]* (Jisc)
- [[MathSciNet]]
- [[OpenAIRE Research Graph]]
- [[OpenAlex]]
- [[Open Research Knowledge graph]]
- [[OpenTexts.World]]
- [[Orion Search]]
- [[PubMed]]
- [[ResearchGraph]]
- [[S2ORC - Semantic Scholar Open Research Corpus]]
- [[ScienceOpen]]
- [[SciFinder]]
- [[Scopus]] (RELX)
- [[Semantic Scholar]]
- [[SHARE]] (ARL/COS)
- [[Summon]] (ProQuest)
- [[Triple]]
- [[Web of Science]] (Clarivate)
- [[WorldCat Discovery]]* (OCLC)
- [[Google Scholar]]
- [[Microsoft Academic]]
- [[Semantic Scholar]]
Bibliographies managed by scholarly organizations related to a specific field of inquiry.
- [[MLA Bibliography]]
- [[Database of Medieval Digital Resources]] (Medieval Academy of America)
All currently available bibliographic management platforms support import from [[BibTeX]] and [[RIS]]. Overall, these appear to be the two major formats. An excellent overview of every single platform is provided on the OpenOffice wiki.
| Name | Target User | Type | [[BibTeX]] | [[RIS]] |
|---|---|---|---|---|
| [[BibDesk]] | Individual | Open Source | Y | Y |
| [[Bibliographix]] | Individual | Open Source | Y | Y |
| [[Biblioscape]] | Individual | Commercial | Y | Y |
| [[Bibus]] | Individual | Open Source | ? | ? |
| [[Citavi]] | Individual | Commercial | Y | Y |
| [[EndNote]] | Individual | Commercial | ? | Y |
| [[Mendeley]] | Individual | Commercial | Y | Y |
| [[Papers]] | Individual | Commercial | Y | Y |
| [[RefWorks]] | Individual | Commercial | ? | Y |
| [[Zotero]]* | Individual | Open Source | Y | Y |
- [[Bibliosuite]]
- [[BookSonix]]
- [[Consonance]]
- [[Firebrand]]
- [[Klopotek]]
Publishing platforms allow authors, editors, and publishers to collaborate in a digital, in-browser environment, with a potential to radically transform publishing production pipelines.
- [[Editoria]]
- [[Fulcrum]]
- [[Manifold]]
- [[Open Monograph Press]] (PKP)
- [[Pressbooks]]
- [[PubPub]]
- [[Scalar]]
There are basically only two main print book distributors available to OA publishers who wish to use print-on-demand for their hardcopy publications, [[Amazon KDP]] and [[Ingram Lightning Source]]. Both require the manual input of metadata upload of print-ready PDF files without apparent batch or automated upload options.
Open Educational Resources (OER) focus mainly on textbooks rather than scholarly publications.
| Name | Governance | Membership | OA | non-OA | Ingest |
|---|---|---|---|---|---|
| [[BCcampus OpenEd]] | Public | N | Y | N | Push |
| [[Merlot]] | Public | Y | Y | N | Push |
| [[OER Commons]] | Non-Profit | Y | Y | N | Push |
| [[Open Textbook Library]] | Non-Profit | N | Y | N | Push |
- [[OASIS]]
- [[Mason OER Metafinder]]
- [[OERWorldmap]]
- [[Pressbooks Directory]]
For open archiving there is an Open Archiving Information System ([[OAIS]]) ISO definition.
See also list: https://coptr.digipres.org/Category:Preservation_System
- [[Archive-It]]
- [[Archivematica]]
- [[Arkivum Perpetua]]
- [[APTrust]]
- [[CINES]]
- [[CLOCKSS]]
- [[Conifer]]
- [[Digital Bedrock]]
- [[Emulation as a Service Infrastructure (EaaSI)]]
- [[Fulcrum]]
- [[GitHub Archive Programme]]
- [[HathiTrust]]
- [[Internet Archive]]
- [[LIBSAFE]]
- [[LOCKSS]]
- [[MetaArchive]]
- [[Perma.cc]]
- [[Portico]]
- [[Preservica]]
- [[ReplayWeb.Page]]
- [[RODA]]
- [[Samvera]]
- [[Wayback Machine]]
- [[WebCite]]
- [[Zenodo]]
- [[XML]]
- [[METS]]
- [[PREMIS]]
- [[SWORD]]
- [[ARC]]
- [[BagIt]]
- [[HAR]]
- [[WACZ]]
- [[WARC]]
- [[WBN]]
- [[TAR]]
- [[ZIP]]
- [[G4]]
- [[ITU]]
- [[JP2 (JPEG 2000 part 1)]]
- [[TIFF]]
- [[CDX]]
- [[Unicode OCR]] (??)
- [[DSpace]]
- [[EPrints]]
- [[Figshare]]
- [[LibreCat]]
- [[Invenio]] (Zenodo)
