Skip to content

Commit 27e51b0

Browse files
authored
Update tutorial.md
finaling OR review
1 parent 719b6bd commit 27e51b0

File tree

1 file changed

+45
-6
lines changed
  • topics/digital-humanities/tutorials/open-refine-tutorial

1 file changed

+45
-6
lines changed

topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -415,7 +415,9 @@ In this case, be sure to check out our other tutorials, particularly the introdu
415415
>
416416
> ![Workflow imported to Galaxy](images/workflow.png)
417417
>
418-
> Let's assume that you have imported a workflow to your Galaxy account.
418+
> This is one way of importing a workflow to your account.
419+
>
420+
> Let's assume you have done this and imported the workflow to your Galaxy account.
419421
> 1. You can find all workflows available to you by clicking on the Workflows Icon ({% icon galaxy-workflows-activity %}) on the left panel.
420422
>
421423
> ![Workflows button](images/workflows.png)
@@ -425,20 +427,57 @@ In this case, be sure to check out our other tutorials, particularly the introdu
425427
> ![Select this workflow](images/select_workflow.png)
426428
>
427429
> 3. Determine the inputs as follows:
428-
> Input: `openrefine-Galaxy file.tsv`
430+
>
431+
> Input: `openrefine-Galaxy file.tsv`—This is the file you cleaned in OpenRefine.
432+
>
429433
> stop_words_english: `stop_words_english.txt`, which is the file we provided to you in this tutorial.
430434
>
431435
> ![Determine the inputs of the workflow](images/workflow_inputs.png)
432436
>
433-
> 5. Click on the `Run Workflow` button at the top.
434-
> 6. You can follow the stages of different jobs (computational tasks). They will be created, scheduled, executed, and completed. When everything is green, your workflow has run fully and the results are ready.
437+
> 4. Click on the `Run Workflow` button at the top.
438+
> 5. You can follow the stages of different jobs (computational tasks). They will be created, scheduled, executed, and completed. When everything is green, your workflow has run fully and the results are ready.
435439
>
436440
> ![Overview of the workflow](images/workflow_overview.png)
437441
>
438442
{: .hands_on}
439443
440-
What can you see here? To follow along, we made all substeps of the task available as outputs. To answer our question of which year most elements in the museum derive from, we first remove the column of production time from the table and filter only the dates that derive from specific years, not year ranges. Regular expressions help clean remaining inconsistencies in the dataset. Sorting the production date in descending order reveals that one faulty dataset, which is supposed to have been created in 2041, is part of the table. We remove it. Datamash allows for summarising how many elements arrived at the museum in each year. The ascending order, we visualise in a bar chart. To determine from which year most objects originate, we use another sorting order. We parse the input as a conditional statement to search for object descriptions from the objects of that year. In our case, this is 1969. From all object descriptions from 1969, we create a word cloud using the offered stop word list.
441-
As a result, we find that most objects from the museum are negatives from Davis Mist, which he created that year and donated to the museum.
444+
What can you see here? To follow along, we made all substeps of the task available as outputs.
445+
To answer our question of which year most elements in the museum derive from, we first cut the column of production time from the table.
446+
You can see this in the file `Cut on Data (Number)`.
447+
From this, we filter only the dates that derive from specific years, not year ranges. (See `Filter Tabular on Data (Number)`.)
448+
You can click on the arrow in a cirlce button of this dataset (`Run job again`) to see what exact input was used to exclude year ranges.
449+
Regular expressions help clean remaining inconsistencies in the dataset. (Dataset: `Column Regex Find And Replace on Data (Number)`)
450+
Sorting the production date in descending order, as done in dataset `Sort on data (lowest Number)`, reveals that one faulty dataset, which is supposed to have been created in 2041, is part of the table.
451+
We remove it in the next step with the tool `Remove beginning`.
452+
453+
The tool **Datamash** allows for summarising how many elements arrived at the museum in each year. (Dataset: `Datamash on data (Number)`.)
454+
After we apply this tool, the dataset is no longer 7738 lines long, but only 259.
455+
This is because the amount of times, each year appeared in the table was summed up in a second column.
456+
Sorting in ascending order (`Sort on data (Number)`) shows a chronological dataset with the earliest enties in the beginning and the most recent entries at the end of the table.
457+
This, we can easily visualise in a (particularly crowded) bar chart directly within Galaxy. (`Bar chart on data (Number)`)
458+
But this is not the most optimal view to show us, what year most objects derive from.
459+
460+
To determine from which year most objects originate, we use another sorting order (`Sort on Data (highest number)`).
461+
462+
> <question-title></question-title>
463+
>
464+
> 1. From what year does the museum have most objects?
465+
>
466+
> > <solution-title></solution-title>
467+
> >
468+
> > 1. The dataset `Sort on data (highest number)` shows the amount of objexts by year, sorted from most to least. 288 items are noted for the year 1969 in the first row. This is the year from which the museum has most (clearly datable) objects.
469+
> >
470+
> {: .solution}
471+
{: .question}
472+
473+
The next four steps parse this year as a conditional statement step by step. (`Select first on data (Number)`, `Cut on data (Number)`, `Parse parameter value on data (Number)` and `Compose text parameter value`.)
474+
This means, even if you upload another dataset, the highest number is always selected and taken as an input for the next steps.
475+
476+
Based on this input, which is determined by the year with the highest input, ` Search in textfiles on data (Number)` now searches for object descriptions from the 288 objects of the most prominent year.
477+
478+
From all object descriptions from that year, we create a word cloud of the object descriptions by using the offered stop word list.
479+
This helps us quickly determine, what kinds of objects the museum has from this popular year.
480+
The dataset `Word cloud image` shows that most objects from the museum are negatives from Davis Mist, a famous Australian photographer, which he created that year and donated to the museum.
442481
443482
![Word cloud of objects' descriptions](images/display_1969.png)
444483

0 commit comments

Comments
 (0)