Skip to content

Commit aeec310

Browse files
authored
Merge pull request #4 from MoseleyBioinformaticsLab/merge_enhancement
Merge enhancement
2 parents 4776d06 + ab9dfaf commit aeec310

File tree

172 files changed

+311485
-12366
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

172 files changed

+311485
-12366
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,6 @@ dist/
1212
coverage.xml
1313
.coverage
1414
htmlcov/
15-
README_old.rst
15+
README_old.rst
16+
testing_scratch/
17+
tests/testing_files/new_intermediate_results/

README.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ and use the example there to create it initially. The add_authors command can he
124124
with building the Authors section if you already have a csv file with author
125125
information. A good tool to help track down pesky JSON syntax errors is `here <https://csvjson.com/json_validator>`__.
126126
There are also examples in the `example_configs <https://github.com/MoseleyBioinformaticsLab/academic_tracker/tree/main/example_configs>`__
127-
directory of the GitHub repo. There are also more example in the supplemental
127+
directory of the GitHub repo. There are also more examples in the supplemental
128128
material of the paper https://doi.org/10.6084/m9.figshare.19412165.
129129

130130

docs/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,5 +31,7 @@ API
3131
:members:
3232
.. automodule:: academic_tracker.webio
3333
:members:
34+
.. automodule:: academic_tracker.emails_and_reports_helpers
35+
:members:
3436

3537

docs/changelog.rst

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
Change Log
2+
==========
3+
4+
Version 2.0.0
5+
~~~~~~~~~~~~~
6+
7+
Changes
8+
-------
9+
In the 1.0.0 version each source was queried in a certain order and if later sources found the
10+
same publicaiton as a previous one it was simply ignored. Now a best attempt is made to try and
11+
merge information from the previous source with information from later sources. An additional
12+
"queried_sources" attribute was added to the publication object created for each publication to
13+
indicate all of the sources where the publication was found. It is a list field, and each source
14+
is appended to it as it is found.
15+
16+
Enhancements
17+
------------
18+
A "references" attribute was added to the publication object for each publication and the references
19+
for the publication will appear there if available. It is a list of objects that have the attributes
20+
"citation", "title", "PMID", "PMCID", and "DOI". Fields that can't be determined will have a null value.
21+
22+
More information is able to be obtained from PubMed, such as DOI author affiliations, and author ORCIDs.
23+
24+
Collective authors can now be specified and are handled appropriately when present on information from
25+
queried sources.
26+
27+
All new publication attributes were added to the reporting and the documentation updated.
28+
29+
The raw queries from each source can now be saved using the --save-all-queries option. An "all_results.json"
30+
file will be saved in the output if the option is given.
31+
32+
The --keep-duplicates option was added to reference_search. This allows the user to force the search
33+
not to drop what it deems as duplicates. The default is that they are still dropped automatically, but
34+
this option allows for an override when the program thinks, incorrectly, that 2 references are the same.
35+
36+
Bug Fixes
37+
---------
38+
Crossref publication dates will now have day and month when available. A bug made it so only the year
39+
was captured even if month and day were available.
40+
41+
42+

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Documentation index:
2323
api
2424
license
2525
todo
26+
changelog
2627

2728

2829
Indices and tables

docs/jsonschema.rst

Lines changed: 10 additions & 155 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,11 @@ to search for goes. Every author in this section will be queried during author_s
136136

137137
The first_name and last_name attributes are for the author's first and last names
138138
respectively, and are used to validate that the author under search is the same
139-
as the queried author.
139+
as the queried author. There is a special type of author known as collective authors.
140+
These are not individuals, but are instead a collective and are published that way.
141+
Use the collective_name attribute to indicate that an author is a collective. This
142+
attribute takes priority, so if it is present the author will be treated as a collective
143+
author even if they have first_name and last_name attributes.
140144

141145
pubmed_name_search is used as the query string when querying sources. This is so
142146
the user can specify exactly what to query rather than simply querying the first
@@ -170,161 +174,12 @@ gen_reports_and_emails_auth
170174

171175
Validating Schema
172176
-----------------
173-
.. code-block:: console
174177

175-
{
176-
"$schema": "https://json-schema.org/draft/2020-12/schema",
177-
"title": "Configuration JSON",
178-
"description": "Input file that contains information for how the program should run.",
179-
180-
"type": "object",
181-
"properties": {
182-
"project_descriptions" : {
183-
"type": "object",
184-
"minProperties": 1,
185-
"additionalProperties": {
186-
"type":"object",
187-
"properties":{
188-
"grants": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
189-
"cutoff_year": {"type": "integer"},
190-
"affiliations": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
191-
"project_report": {"type": "object",
192-
"properties":{
193-
"columns": {"type": "object",
194-
"minProperties":1,
195-
"additionalProperties": {"type": "string", "minLength":1}},
196-
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
197-
"separator":{"type":"string", "maxLength":1, "minLength":1},
198-
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
199-
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
200-
"filename":{"type":"string", "minLength":1},
201-
"template": {"type": "string", "minLength":1},
202-
"from_email": {"type": "string", "format": "email"},
203-
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
204-
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
205-
"email_body": {"type": "string", "minLength":1},
206-
"email_subject": {"type": "string", "minLength":1},},
207-
"dependentRequired":{
208-
"from_email": ["email_body", "email_subject"],
209-
"to_email": ["from_email", "email_body", "email_subject"]}},
210-
"collaborator_report": {"type": "object",
211-
"properties":{
212-
"columns": {"type": "object",
213-
"minProperties":1,
214-
"additionalProperties": {"type": "string", "minLength":1}},
215-
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
216-
"separator":{"type":"string", "maxLength":1, "minLength":1},
217-
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
218-
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
219-
"filename":{"type":"string", "minLength":1},
220-
"template": {"type": "string", "minLength":1},
221-
"from_email": {"type": "string", "format": "email"},
222-
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
223-
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
224-
"email_body": {"type": "string", "minLength":1},
225-
"email_subject": {"type": "string", "minLength":1},},
226-
"dependentRequired":{
227-
"from_email": ["email_body", "email_subject"],
228-
"to_email": ["from_email", "email_body", "email_subject"]},},
229-
"authors": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
230-
},
231-
232-
"required": ["grants", "affiliations"]
233-
}
234-
},
235-
236-
"ORCID_search" : {"type":"object",
237-
"properties": {
238-
"ORCID_key": {"type": "string", "minLength":1},
239-
"ORCID_secret": {"type": "string", "minLength":1}},
240-
"required": ["ORCID_key", "ORCID_secret"]},
241-
"PubMed_search" : {"type":"object",
242-
"properties": {
243-
"PubMed_email": {"type": "string", "format":"email"}},
244-
"required":["PubMed_email"]},
245-
"Crossref_search" : {"type":"object",
246-
"properties": {
247-
"mailto_email": {"type": "string", "format":"email"}},
248-
"required":["mailto_email"]},
249-
"summary_report" : {"type": "object",
250-
"properties":{
251-
"columns": {"type": "object",
252-
"minProperties":1,
253-
"additionalProperties": {"type": "string", "minLength":1}},
254-
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
255-
"separator":{"type":"string", "maxLength":1, "minLength":1},
256-
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
257-
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
258-
"filename":{"type":"string", "minLength":1},
259-
"template": {"type": "string", "minLength":1},
260-
"from_email": {"type": "string", "format": "email"},
261-
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
262-
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
263-
"email_body": {"type": "string", "minLength":1},
264-
"email_subject": {"type": "string", "minLength":1},},
265-
"dependentRequired":{
266-
"from_email": ["email_body", "email_subject", "to_email"]}},
267-
"Authors" : { "type": "object",
268-
"minProperties": 1,
269-
"additionalProperties": {
270-
"type": "object",
271-
"properties":{
272-
"first_name": {"type": "string", "minLength":1},
273-
"last_name":{"type": "string", "minLength":1},
274-
"pubmed_name_search": {"type": "string", "minLength":1},
275-
"email":{"type": "string", "format":"email"},
276-
"ORCID":{"type": "string", "pattern":"^\d{4}-\d{4}-\d{4}-\d{3}[0,1,2,3,4,5,6,7,8,9,X]$"},
277-
"grants": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
278-
"cutoff_year": {"type": "integer"},
279-
"affiliations": {"type": "array", "minItems":1, "items": {"type": "string", "minLength": 1}},
280-
"scholar_id": {"type": "string", "minLength":1},
281-
"project_report": {"type": "object",
282-
"properties":{
283-
"columns": {"type": "object",
284-
"minProperties":1,
285-
"additionalProperties": {"type": "string", "minLength":1}},
286-
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
287-
"separator":{"type":"string", "maxLength":1, "minLength":1},
288-
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
289-
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
290-
"filename":{"type":"string", "minLength":1},
291-
"template": {"type": "string", "minLength":1},
292-
"from_email": {"type": "string", "format": "email"},
293-
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
294-
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
295-
"email_body": {"type": "string", "minLength":1},
296-
"email_subject": {"type": "string", "minLength":1},},
297-
"dependentRequired":{
298-
"from_email": ["email_body", "email_subject"],
299-
"to_email": ["from_email", "email_body", "email_subject"]}},
300-
"collaborator_report": {"type": "object",
301-
"properties":{
302-
"columns": {"type": "object",
303-
"minProperties":1,
304-
"additionalProperties": {"type": "string", "minLength":1}},
305-
"sort": {"type": "array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
306-
"separator":{"type":"string", "maxLength":1, "minLength":1},
307-
"column_order":{"type":"array", "uniqueItems":True, "items": {"type": "string", "minLength":1}, "minItems":1},
308-
"file_format":{"type":"string", "enum":["csv", "xlsx"]},
309-
"filename":{"type":"string", "minLength":1},
310-
"template": {"type": "string", "minLength":1},
311-
"from_email": {"type": "string", "format": "email"},
312-
"cc_email": {"type": "array", "items": {"type": "string", "format": "email"}},
313-
"to_email": {"type": "array", "items": {"type": "string", "format": "email"}},
314-
"email_body": {"type": "string", "minLength":1},
315-
"email_subject": {"type": "string", "minLength":1},},
316-
"dependentRequired":{
317-
"from_email": ["email_body", "email_subject"],
318-
"to_email": ["from_email", "email_body", "email_subject"]},},
319-
},
320-
"required" : ["first_name", "last_name", "pubmed_name_search"]
321-
322-
}
323-
}
324-
325-
},
326-
"required": ["project_descriptions", "ORCID_search", "PubMed_search", "Crossref_search", "Authors"]
327-
}
178+
.. literalinclude:: ../src/academic_tracker/tracker_schema.py
179+
:start-at: config_schema
180+
:end-before: ## config_end
181+
:language: none
182+
328183

329184

330185
Example

0 commit comments

Comments
 (0)