Add dataset: chronicling_america 

### A URL for this dataset

https://chroniclingamerica.loc.gov/about/api/#bulk-data

### Dataset description

Chronicling America is a Library of Congress project to digitise historic newspapers. The collection contains mostly English but also contains other languages. Breakdown by language: https://public.tableau.com/app/profile/chronicling.america#!/vizhome/ChroniclingAmericaLanguageCoverageBubble/All_Lang 

Various ways of accessing this data include bulk downloads and an API. The API may be the most helpful way of accessing this dataset (via dataset loading script) because this dataset is not static (more titles are digitised and added on a rolling basis). 

 The 'newspapers' API (https://chroniclingamerica.loc.gov/newspapers.json) is probably the best starting point. This starts instead from a list of Newspaper titles for which digital content is held. A title, i.e.  https://chroniclingamerica.loc.gov/lccn/sn86072192.json, contains a bunch of metadata.

<img width="549" alt="Screenshot 2022-09-27 at 16 32 26" src="https://user-images.githubusercontent.com/8995957/192570176-760181e5-6b04-405b-923d-2658c88ed2eb.png">. 

This API also contains all the issues for that title. For each issue, you get a set of pages. Each page contains the plain text generated from the OCR for that page, e.g. https://chroniclingamerica.loc.gov/lccn/sn82014726/1888-04-07/ed-1/seq-1/ocr.txt and a link to the image of that page, e.g. https://chroniclingamerica.loc.gov/lccn/sn82014726/1888-04-07/ed-1/seq-1.jp2. 

My suggested approach to loading this dataset would be to call `https://chroniclingamerica.loc.gov/newspapers.json` at the start of the script and, depending on some filters defined in the loading script, i.e. start/end date of interest, build up a list of relevant URLs for the text/images for each page.

If you want to work on this dataset, please cc @davanstrien and @albertvillanova! 

### Dataset modality

Mixed

### Dataset licence

Other license

### Other licence

https://chroniclingamerica.loc.gov/about/#rights

### How can you access this data

Via an open API

### size of dataset

>10GB

### Confirm the dataset has an open licence

- [X] To the best of my knowledge, this dataset is accessible via an open licence

### Contact details for data custodian

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dataset: chronicling_america #85

A URL for this dataset

Dataset description

Dataset modality

Dataset licence

Other licence

How can you access this data

size of dataset

Confirm the dataset has an open licence

Contact details for data custodian

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add dataset: chronicling_america #85

Description

A URL for this dataset

Dataset description

Dataset modality

Dataset licence

Other licence

How can you access this data

size of dataset

Confirm the dataset has an open licence

Contact details for data custodian

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions