Added Psychological Science scraper by chartgerink · Pull Request #40 · ContentMine/journal-scrapers

chartgerink · 2015-07-25T08:52:28Z

Hi,

I attempted to write my first scraper, according to your scraperJSON template, and succeeded for the most part. I have also included test links. I tried to scrape as much information as possible, and include some of my problems below, FYI.

Kind regards,
Chris

Introduction is not a defined section but just includes paragraph numbers (SAGE thing..)
Supplementary materials are included at a separate location AND include all files of one issue. Have not discovered an easy way to download these (also a SAGE thing..)
I have not yet succeeded in downloading Figures and tables.

Initial work on science direct/elsevier scraper.

Create science_direct.json

…ks and ran test sometime before breaking everything

tarrow · 2016-06-06T15:41:42Z

I rewrote this into #44 with a rebase because it could be merged to the master (it needed a rebase) before merging the 176 commits

chartgerink · 2016-08-18T09:38:02Z

I did some additional checking of the scrapers. I removed tf.json because it conflicted with taylorfrancis.json (which is a clearer filename, I think) and because taylorfrancis.json performed better. I incorporated the code from tf.json for the tables. TaylorFrancis really acts oddly, so we need to check that at some point.

Also checked and updated wiley, sage, springer, elsevier scrapers. Elsevier contains almost no metadata so the scraper only uses html and pdf extraction. I also incorporated some changes, but they eliminated a lot of metadata scraping and did renaming of elements. Are we still adhering to the scraperJSON standard or did that become a thing of the past?

Sorry for the extent of commits, I forgot about this. I can also create a new fork to make things easier and do a new PR. Let me know.

tarrow · 2016-08-18T09:55:23Z

We're still adhering to the scraperJSON standard; but not least because given that QS and thresher are the reference implementations basically if it works it's scraperJSON :).

If you could create a new branch from the current origin/master and cherrypick over these changes you've just made that would be awesome! Otherwise I can do that and make another PR. Let me know if you have problems.

tarrow · 2016-08-18T10:42:24Z

Great! I'll merge now! :)

blahah and others added 30 commits May 26, 2014 23:54

Update README.md

1ecac9e

update example

e05e2a3

fix README formatting and typo

a35fecd

add html and text special attributes to README

acb9e7c

Update README.md

a2a6d26

Create science_direct.json

ebea30a

Initial work on science direct/elsevier scraper.

Merge pull request #5 from ianthe/master

f6380f9

Create science_direct.json

travis setup

40b80a4

auto test generation script

13ef95b

move scrapers to subdir

073a4da

test generator script fixes

4962bab

self-populating tests and peerj example

cdb542b

Merge branch 'master' of github.com:ContentMine/journal-scrapers

3118404

move sciencedirect to scrapers

0ee30be

fix tmpdir use

c1cafdd

debug test generator tmpdir error

1cdbd3b

test set for peerj scraper

9c82fa0

fix test generator - now working

4e94869

fix test runner - now working

ad734f3

attempted fix for travis dependency install

86c3735

remove unneeded prints from tests

1ce66d6

tests for plos scraper

bf2c926

another attempted travis install fix

9b1f20d

delete wayward results file

aa0cfc6

add .gitignore

998b0de

add travis badge and explanation to README

fb463a8

tidy formatting in README

ab88faf

add science direct tests

8e9d287

Merge branch 'master' of https://github.com/ContentMine/journal-scrapers

04902e3

add CC0 license

da2a0f2

blahah and others added 22 commits August 8, 2015 04:53

Rename PNAS fulltext downloads (ContentMine/quickscrape#54)

4782e3d

Fix fulltext HTML capture for PNAS

e674803

IJSEM scraper

f79917c

First run at Psychological Science scraper (added json file, test lin…

659573b

…ks and ran test sometime before breaking everything

Modified psychologicalscience.json with some dynamic selectors

e91dc98

Ran tests for Psych Science scraper

acaf40a

Upgrade to new container-based Travis CI

1be754a

Remove unnecessary setup

c92010e

Flushing test changes

252e17f

Merge branch 'master' of https://github.com/contentmine/journal-scrapers

83c3cfd

Rename psychscience scraper to sage scraper

83c9d92

Minor update sage

2cc51db

Update test links sage

6349134

Add springer test urls

bc83997

Add taylorfrancis test urls

93f5287

Add wiley test urls

066890b

Add APA test urls

4f36179

First run at wiley scraper (fails figure download and pdf download)

b186403

Add elsevier, wiley, springer, sage; init apa

a215a42

minor changes to elsevier and springer, while testing

9d36fa4

Add TaylorFrancis scraper

7e861d1

Minor updates to springer and taylorfrancis definitions

187b323

petermr force-pushed the master branch from 1a1ab44 to fbedb6e Compare June 6, 2016 14:31

Check and update taylorfrancis, sage, elsevier, springer, wiley scrapers

a8ebe3e

tarrow merged commit 2cf1206 into ContentMine:master Aug 18, 2016

tarrow mentioned this pull request Aug 18, 2016

Revert "Added Psychological Science scraper" #47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Psychological Science scraper#40

Added Psychological Science scraper#40
tarrow merged 177 commits intoContentMine:masterfrom
chartgerink:master

chartgerink commented Jul 25, 2015

Uh oh!

tarrow commented Jun 6, 2016

Uh oh!

chartgerink commented Aug 18, 2016 •

edited

Loading

Uh oh!

tarrow commented Aug 18, 2016

Uh oh!

tarrow commented Aug 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

chartgerink commented Jul 25, 2015

Uh oh!

tarrow commented Jun 6, 2016

Uh oh!

chartgerink commented Aug 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarrow commented Aug 18, 2016

Uh oh!

tarrow commented Aug 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

chartgerink commented Aug 18, 2016 •

edited

Loading