Skip to content

Commit 1b9825b

Browse files
authored
Update tutorial.md
1 parent 3eb321d commit 1b9825b

File tree

1 file changed

+15
-10
lines changed
  • topics/digital-humanities/tutorials/open-refine-tutorial

1 file changed

+15
-10
lines changed

topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,8 @@ We suggest that you download the data from the Zenodo record as explained below.
9595
>
9696
> {% snippet faqs/galaxy/datasets_import_from_data_library.md %}
9797
>
98-
> 4. Ensure that the datatype of "phm_collection_adapted" is "tsv". Otherwise, use convert datatype.
99-
> 5. Verify that the datatype of "stopwords-en" is "txt". If not, convert the datatype.
98+
> 3. Ensure that the datatype of "phm_collection_adapted" is "tsv". Otherwise, use convert datatype.
99+
> 4. Verify that the datatype of "stopwords-en" is "txt". If not, convert the datatype.
100100
>
101101
> {% snippet faqs/galaxy/datasets_change_datatype.md datatype="datatypes" %}
102102
>
@@ -224,13 +224,13 @@ Take a look at the `Categories` column of your dataset. Most objects were attrib
224224
> Many different categories describe the object. You may notice duplicates categorising the same object twice.
225225
> We also want to remove those to ensure we only have unique categories that describe a single object.
226226
>
227-
> 6. Click on the triangle on the left of `Categories`, hover over `Edit cells`, and click on `Transform...`.
227+
> 5. Click on the triangle on the left of `Categories`, hover over `Edit cells`, and click on `Transform...`.
228228
>
229229
> ![Edit cells Categories](images/filter_grel.png)
230230
>
231231
> ![Transform Categories](images/filter_grel2.png)
232232
>
233-
> 7. In the new window, use the following text `value.split('|').uniques().join('|')` as "Expression" and click on `OK`.
233+
> 6. In the new window, use the following text `value.split('|').uniques().join('|')` as "Expression" and click on `OK`.
234234
>
235235
{: .hands_on}
236236
@@ -319,7 +319,7 @@ You can now see, from which category the museum has the most objects, one of our
319319
320320
## Clustering
321321
322-
The clustering allows you to solve issues regarding case inconsistencies, incoherent use of either the singular or plural form, and simple spelling mistakes.
322+
The clustering allows you to solve issues regarding case inconsistencies, incoherent use of either the singular or plural form, and simple spelling mistakes. We apply those to the object categories for the next step of cleaning.
323323
324324
> <hands-on-title>Clustering of similar categories</hands-on-title>
325325
>
@@ -332,19 +332,24 @@ The clustering allows you to solve issues regarding case inconsistencies, incohe
332332
>
333333
> ![Clustered and merged similar Categories](images/cluster2.png)
334334
>
335-
> 4. Here, you can see different suggestions from OpenRefine to cluster different categories and merge them into one. In our tutorial, we merge all of the suggestions by clicking on `select > all` and then clicking on `Merge selected and re-cluster`.
336-
>
337-
> ![Join multi-valued cells on Categories](images/join.png)
335+
> 4. Here, you can see different suggestions from OpenRefine to cluster different categories and merge them into one. In our tutorial, we merge all the suggestions by clicking on `Select all` and then clicking on `Merge selected and re-cluster`.
338336
>
339337
> 5. Now, you can close the clustering window by clicking on `close`.
340338
>
341-
> Be careful! Some methods are too aggressive, so you might end up clustering values that do not belong together. Now that the values have been clustered individually, we can put them back together in a single cell.
339+
> Be careful with clustering! Some settings are very aggressive, so you might end up clustering values that do not belong together!
340+
>
341+
> Now that the different categories have been clustered individually, we can reassemble them in the respective object single cell.
342+
>
342343
> 6. Click the Categories triangle and hover over the `Edit cells` and click on `Join multi-valued cells`.
343-
> 7. Choose the pipe character (`\|`) as a separator and click on `OK`.
344+
> 7. Choose the pipe character (`|`) as a separator and click on `OK`.
345+
>
346+
> ![Join multi-valued cells on Categories](images/join.png)
347+
>
344348
> The rows now look like before, with a multi-valued Categories field.
345349
>
346350
{: .hands_on}
347351
352+
You have now successfully split, cleaned and re-joined the various categories of objects in the museum's metadata! Congratulations.
348353
When you’re happy with your analysis results, choose whether to export the dataset into your Galaxy history or download it directly onto your computer.
349354
350355
## Exporting your data back to Galaxy

0 commit comments

Comments
 (0)