Skip to content

Commit e84df19

Browse files
Michael-D-JohnsonMichael D. Johnsonmax-zillalmarini
authored
Update user_id in extractions collection (#145)
* adding script to add user_id to documents in extractions collection if does not exist * moving authorID == null declaration within loop. renaming findAuthor... to foundAuthor... to be more consistent with foundFile * updating comments. adding update if job_id exists * moving UpdateUserId.js to scripts/updates. Updated documentation for script and added contributer Co-authored-by: Michael D. Johnson <[email protected]> Co-authored-by: Max Burnette <[email protected]> Co-authored-by: Luigi Marini <[email protected]>
1 parent fc57aa2 commit e84df19

File tree

4 files changed

+53
-2
lines changed

4 files changed

+53
-2
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
99

1010
### Added
1111
- Added support for Amplitude clickstream tracking. See Admin -> Customize to configure Amplitude apikey.
12+
- UpdateUserId.js to scripts/updates. This code adds user_id to each document in extractions collection in mongodb.
13+
user_id is taken from author id in uploads.files if exists, else it taken from author id in datasets collection.
1214
- Ability to submit multiple selected files within a dataset to an extractor.
1315

1416
### Fixed

CONTRIBUTORS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Following is a list of contributors in alphabetical order:
2121
- Mario Felarca
2222
- Max Burnette
2323
- Michal Ondrejcek
24+
- Michael Johnson
2425
- Michelle Pitcel
2526
- Mike Bobak
2627
- Mike Lambert

scripts/updates/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ are named with the actual name of the update .js
99
- update-avatar-url-to-https.js
1010

1111

12-
MISCALANOUS SCRIPTS:
12+
MISCELLANEOUS SCRIPTS:
1313

14-
fix-counts.js: script to redo the counts in clowder
14+
- fix-counts.js: script to redo the counts in clowder
15+
16+
- UpdateUserId.js: Adds user_id to documents in extractions collection in clowder mongo db. Uses author id in uploads.files if exists, else it takes the author id from datasets collection. Usage: mongo clowder UpdateUserId.js

scripts/updates/UpdateUserId.js

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
/***
2+
This code iterates through each document where user_id does not exist and checks if file_id exists
3+
in uploads.files collection. If a file exists, it grabs the author._id for use as user_id. If it
4+
does not exist (or if author._id is null), it searches for the file_id in the datasets collection.
5+
If found, it gets the author._id for use as the user_id. If an author id is found and not null
6+
an update to the extractions collection is made by adding user_id: author._id.
7+
***/
8+
9+
db.extractions.find({"user_id":{$exists: 0}}).forEach(function(ext) {
10+
let authorID = null;
11+
// Looping through each extraction where user_id doesn't exist,
12+
// if file_id found in uploads.files, get author._id
13+
let foundFile = db.uploads.files.findOne({"_id": ext.file_id})
14+
if (foundFile != null) {
15+
authorID = foundFile.author._id;
16+
}
17+
18+
// If file not found in uploads.files or if author._id doesn't exist,
19+
// look up file_id in datasets, get author.id if found
20+
if (foundFile == null || authorID == null) {
21+
let foundAuthorInDatasets = db.datasets.findOne({"files": {$in: [ext.file_id]}});
22+
if (foundAuthorInDatasets != null) {
23+
authorID = foundAuthorInDatasets.author._id;
24+
}
25+
}
26+
if (authorID != null) {
27+
// If job_id exists update author._id for all documents with job_id,
28+
// else update based on the current document id.
29+
if (ext.job_id != null) {
30+
// Update user_id for entry in extractions database
31+
db.extractions.update({"job_id": ext.job_id}, {
32+
"$set": {
33+
"user_id": authorID
34+
}
35+
});
36+
}
37+
else {
38+
// Update user_id for entry in extractions database
39+
db.extractions.update({"_id": ext._id}, {
40+
"$set": {
41+
"user_id": authorID
42+
}
43+
});
44+
}
45+
}
46+
});

0 commit comments

Comments
 (0)