Skip to content

Commit 9dc8cd4

Browse files
authored
Update tutorial.md
1 parent 82482e4 commit 9dc8cd4

File tree

1 file changed

+12
-6
lines changed
  • topics/digital-humanities/tutorials/open-refine-tutorial

1 file changed

+12
-6
lines changed

topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -207,30 +207,36 @@ The dataset no longer contains duplicates based on the Record ID. However, we ne
207207
208208
There are many ways to manipulate your dataset in OpenRefine. One of them is the Google Refine Expression Language (GREL). With the help of GREL, you can, for example, create custom facets or add columns by fetching URLs. We will use it to find and replace errors. For more information, refer to the [GREL documentation](https://openrefine.org/docs/manual/expressions).
209209
210-
Take a look at the `Categories` column of your dataset. Most objects were attributed to various categories, separated by "\|". However, several fields contain "\|\|" instead of "\|". We want to unify those.
210+
Take a look at the `Categories` column of your dataset. Most objects were attributed to various categories, separated by "\|". However, several fields contain "\|\|" instead of "\|" as a separator. We want to unify those.
211211
212212
> <hands-on-title>Find and replace typos using GREL</hands-on-title>
213213
>
214214
> To remove the occurance of double pipe "\|\|" from the file we can do the following:
215215
> 1. Click on the triangle on the left of `Categories` and select `Text filter`.
216-
> 2. On the left, using the `Facet/Filter` section, search for the occurrence of "\|" and "\|\|". There are 71061 rows with "\|" and 9 rows with "\|\|". We want to remove these nine lines as they were added by mistake.
217-
> 3. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
216+
> 2. On the left, using the `Facet/Filter` section, search for the occurrence of \| and \|\|. There are 71061 rows with "\|" and 9 rows with "\|\|". We would like to remove these nine lines, as they were added by mistake.
217+
> 3. Click on the triangle on the left of `Categories`, hover over `Edit cells`, and click on `Transform...`.
218218
> 4. In the new window, use the following text `value.replace('||', '|')` as "Expression" and click on `OK`.
219219
>
220220
> ![Custom text transform on column Categories](images/filter_grel3.png)
221221
>
222-
> We can also remove the double occurrence of the same for different entries as follows:
222+
> The expression replaces \|\| with \|. If you search for the occurrence of \|\| again, you will no longer get any results.
223223
>
224-
> 5. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
224+
> There are currently many different categories within one cell, which is not so easy to work with.
225+
> We, therefore, split the values of the `Categories` column up into individual cells. This is possible by using the pipe character.
226+
> That way, we can also remove double occurrences of the same categories for one object.
227+
>
228+
> 6. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
225229
>
226230
> ![Edit cells Categories](images/filter_grel.png)
227231
>
228232
> ![Transform Categories](images/filter_grel2.png)
229233
>
230-
> 6. In the new window, use the following text `split('|').uniques().join('|')` as "Expression" and click on `OK`.value.
234+
> 7. In the new window, use the following text `value.split('|').uniques().join('|')` as "Expression" and click on `OK`.
231235
>
232236
{: .hands_on}
233237
238+
These expressions split categories at the pipe separator and join the unique ones within this column. As a result, duplicate categories for one object are deleted.
239+
234240
> <question-title></question-title>
235241
>
236242
> 1. How many cells had duplicated categories?

0 commit comments

Comments
 (0)