Skip to content

Commit 69ee03c

Browse files
authored
Update tutorial.md
1 parent 9dc8cd4 commit 69ee03c

File tree

1 file changed

+5
-5
lines changed
  • topics/digital-humanities/tutorials/open-refine-tutorial

1 file changed

+5
-5
lines changed

topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -221,9 +221,8 @@ Take a look at the `Categories` column of your dataset. Most objects were attrib
221221
>
222222
> The expression replaces \|\| with \|. If you search for the occurrence of \|\| again, you will no longer get any results.
223223
>
224-
> There are currently many different categories within one cell, which is not so easy to work with.
225-
> We, therefore, split the values of the `Categories` column up into individual cells. This is possible by using the pipe character.
226-
> That way, we can also remove double occurrences of the same categories for one object.
224+
> Many different categories describe the object. You may notice duplicates categorising the same object twice.
225+
> We also want to remove those to ensure we only have unique categories that describe a single object.
227226
>
228227
> 6. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
229228
>
@@ -250,10 +249,11 @@ These expressions split categories at the pipe separator and join the unique one
250249
251250
## Atomization
252251
252+
Once the duplicate records have been removed, we can examine the content of the "Categories" column more closely. Different categories are separated from each other by pipe (\|).
253+
Each entry can be assigned to more than one category. To leverage those keywords, the values in the Categories column must be split into individual cells using the pipe character.
254+
253255
> <hands-on-title>Atomization</hands-on-title>
254256
>
255-
> Once the duplicate records have been removed, we can have a closer look at the Categories column. Different categories are separated from each other by pipe (\|). Each entry can have more
256-
> than one category. In order to analyze in detail the use of the keywords, the values of the Categories column need to be split up into individual cells on the basis of the pipe character.
257257
> 1. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Split multi-valued cells...`.
258258
>
259259
> ![Atomization of Categories](images/split_multi_valued_cells.png)

0 commit comments

Comments
 (0)