Skip to content

Commit 82482e4

Browse files
authored
Update tutorial.md
1 parent 9ef0a16 commit 82482e4

File tree

1 file changed

+7
-3
lines changed
  • topics/digital-humanities/tutorials/open-refine-tutorial

1 file changed

+7
-3
lines changed

topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ We suggest that you download the data from the Zenodo record as explained below.
9696
> {% snippet faqs/galaxy/datasets_import_from_data_library.md %}
9797
>
9898
> 4. Ensure that the datatype of "phm_collection_adapted" is "tsv". Otherwise, use convert datatype.
99-
> 5. Check that the datatype of "stopwords-en" is txt. If not, convert the datatype.
99+
> 5. Verify that the datatype of "stopwords-en" is "txt". If not, convert the datatype.
100100
>
101101
> {% snippet faqs/galaxy/datasets_change_datatype.md datatype="datatypes" %}
102102
>
@@ -205,11 +205,15 @@ The dataset no longer contains duplicates based on the Record ID. However, we ne
205205
206206
## Use GREL
207207
208+
There are many ways to manipulate your dataset in OpenRefine. One of them is the Google Refine Expression Language (GREL). With the help of GREL, you can, for example, create custom facets or add columns by fetching URLs. We will use it to find and replace errors. For more information, refer to the [GREL documentation](https://openrefine.org/docs/manual/expressions).
209+
210+
Take a look at the `Categories` column of your dataset. Most objects were attributed to various categories, separated by "\|". However, several fields contain "\|\|" instead of "\|". We want to unify those.
211+
208212
> <hands-on-title>Find and replace typos using GREL</hands-on-title>
209213
>
210-
> To remove the occurance of double pipe \|\| from the file we can do the following:
214+
> To remove the occurance of double pipe "\|\|" from the file we can do the following:
211215
> 1. Click on the triangle on the left of `Categories` and select `Text filter`.
212-
> 2. On the left, using the `Facet/Filter` section, search for the occurrence of \| and \|\|. There are 71061 rows with \| and 9 rows with \|\|. We want to remove these 9 lines as they are there by mistake.
216+
> 2. On the left, using the `Facet/Filter` section, search for the occurrence of "\|" and "\|\|". There are 71061 rows with "\|" and 9 rows with "\|\|". We want to remove these nine lines as they were added by mistake.
213217
> 3. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
214218
> 4. In the new window, use the following text `value.replace('||', '|')` as "Expression" and click on `OK`.
215219
>

0 commit comments

Comments
 (0)