Skip to content

Commit 35145e5

Browse files
authored
Merge pull request #11489 from QualitativeDataRepository/counter_script_updates
MDC: script updates for counter-processor-1.06+
2 parents 2fb4a0d + 41b2450 commit 35145e5

File tree

4 files changed

+39
-12
lines changed

4 files changed

+39
-12
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Not sure what is already in the notes w.r.t. counter-processor updates (separate repo) so I'll try to be ~complete here
2+
3+
### Make Data Count improvements
4+
5+
Counter-processor, used to power Make Data Count metrics in Dataverse is now maintained in the https://github.com/gdcc/counter-processor repository. Multiple improvements to efficiency and scalability have been made. The example counter_daily.sh and counter_weekly.sh scripts that automate using counter-processor, available from the MDC section of the Dataverse Guides (https://guides.dataverse.org/en/latest/admin/make-data-count.html) have been updated to work with the latest counter-processor release and also have minor improvements.
Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,17 @@
11
#! /bin/bash
2+
#counter_daily.sh
23

34
COUNTER_PROCESSOR_DIRECTORY="/usr/local/counter-processor-1.06"
45
MDC_LOG_DIRECTORY="/usr/local/payara6/glassfish/domains/domain1/logs/mdc"
5-
6-
# counter_daily.sh
6+
COUNTER_PROCESSOR_TMP_DIRECTORY="/tmp"
7+
# If you wish to keep the logs, use a directory that is not periodically cleaned, e.g.
8+
#COUNTER_PROCESSOR_TMP_DIRECTORY="/usr/local/counter-processor-1.06/tmp"
79

810
cd $COUNTER_PROCESSOR_DIRECTORY
911

10-
echo >>/tmp/counter_daily.log
11-
date >>/tmp/counter_daily.log
12-
echo >>/tmp/counter_daily.log
12+
echo >>$COUNTER_PROCESSOR_TMP_DIRECTORY/counter_daily.log
13+
date >>$COUNTER_PROCESSOR_TMP_DIRECTORY/counter_daily.log
14+
echo >>$COUNTER_PROCESSOR_TMP_DIRECTORY/counter_daily.log
1315

1416
# "You should run Counter Processor once a day to create reports in SUSHI (JSON) format that are saved to disk for Dataverse to process and that are sent to the DataCite hub."
1517

@@ -22,15 +24,33 @@ d=$(date -I -d "$YEAR_MONTH-01")
2224
while [ "$(date -d "$d" +%Y%m%d)" -le "$(date -d "$LAST" +%Y%m%d)" ];
2325
do
2426
if [ -f "$MDC_LOG_DIRECTORY/counter_$d.log" ]; then
25-
# echo "Found counter_$d.log"
27+
echo "Found counter_$d.log"
2628
else
27-
touch "$MDC_LOG_DIRECTORY/counter_$d.log"
29+
touch "$MDC_LOG_DIRECTORY/counter_$d.log"
2830
fi
2931
d=$(date -I -d "$d + 1 day")
3032
done
3133

3234
#run counter-processor as counter user
3335

34-
sudo -u counter YEAR_MONTH=$YEAR_MONTH python3 main.py >>/tmp/counter_daily.log
35-
36-
curl -X POST "http://localhost:8080/api/admin/makeDataCount/addUsageMetricsFromSushiReport?reportOnDisk=/tmp/make-data-count-report.json"
36+
sudo -u counter YEAR_MONTH=$YEAR_MONTH python3 main.py >>$COUNTER_PROCESSOR_TMP_DIRECTORY/counter_daily.log
37+
38+
# Process all make-data-count-report.json.* files
39+
for report_file in $COUNTER_PROCESSOR_TMP_DIRECTORY/make-data-count-report.json.*; do
40+
if [ -f "$report_file" ]; then
41+
echo "Processing $report_file" >>$COUNTER_PROCESSOR_TMP_DIRECTORY/counter_daily.log
42+
curl -X POST "http://localhost:8080/api/admin/makeDataCount/addUsageMetricsFromSushiReport?reportOnDisk=$report_file"
43+
echo "Finished processing $report_file" >>$COUNTER_PROCESSOR_TMP_DIRECTORY/counter_daily.log
44+
45+
# Extract the base filename and the extension
46+
file_base=$(basename "$report_file" | sed 's/\.json\..*//')
47+
file_ext=$(echo "$report_file" | sed -n 's/.*\.json\.\(.*\)/\1/p')
48+
echo $file_base
49+
echo $file_ext
50+
# Remove the old file if it exists
51+
rm -f $COUNTER_PROCESSOR_TMP_DIRECTORY/${file_base}.${YEAR_MONTH}.json.${file_ext}
52+
53+
# Move the processed file
54+
mv $report_file $COUNTER_PROCESSOR_TMP_DIRECTORY/${file_base}.${YEAR_MONTH}.json.${file_ext}
55+
fi
56+
done

doc/sphinx-guides/source/_static/util/counter_weekly.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
# A recursive method to process each Dataverse
88
processDV () {
9+
echo "Running counter_weekly.sh on $(date)"
910
echo "Processing Dataverse ID#: $1"
1011

1112
#Call the Dataverse API to get the contents of the Dataverse (without credentials, this will only list published datasets and dataverses
@@ -44,5 +45,5 @@ done
4445

4546
}
4647

47-
# Call the function on the root dataverse to start processing
48+
# Call the function on the root dataverse to start processing
4849
processDV 1

doc/sphinx-guides/source/admin/make-data-count.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ Populate Views and Downloads Nightly
130130

131131
Running ``main.py`` to create the SUSHI JSON file and the subsequent calling of the Dataverse Software API to process it should be added as a cron job.
132132

133-
The Dataverse Software provides example scripts that run the steps to process new accesses and uploads and update your Dataverse installation's database :download:`counter_daily.sh <../_static/util/counter_daily.sh>` and to retrieve citations for all Datasets from DataCite :download:`counter_weekly.sh <../_static/util/counter_weekly.sh>`. These scripts should be configured for your environment and can be run manually or as cron jobs.
133+
The Dataverse Software provides an example script that run the steps to process new accesses and uploads and update your Dataverse installation's database :download:`counter_daily.sh <../_static/util/counter_daily.sh>` The script should be configured for your environment and can be run manually or as a cron job.
134134

135135
Sending Usage Metrics to the DataCite Hub
136136
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -160,6 +160,7 @@ To confirm that the environment variable was set properly, you can use echo like
160160
``echo $DOI``
161161

162162
On some periodic basis (perhaps weekly) you should call the following curl command for each published dataset to update the list of citations that have been made for that dataset.
163+
The example :download:`counter_weekly.sh <../_static/util/counter_weekly.sh>` will do this for you. The script should be configured for your environment and can be run manually or as a cron job.
163164

164165
``curl -X POST "http://localhost:8080/api/admin/makeDataCount/:persistentId/updateCitationsForDataset?persistentId=$DOI"``
165166

0 commit comments

Comments
 (0)