You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -221,9 +221,8 @@ Take a look at the `Categories` column of your dataset. Most objects were attrib
221
221
>
222
222
> The expression replaces \|\| with \|. If you search for the occurrence of \|\| again, you will no longer get any results.
223
223
>
224
-
> There are currently many different categories within one cell, which is not so easy to work with.
225
-
> We, therefore, split the values of the `Categories` column up into individual cells. This is possible by using the pipe character.
226
-
> That way, we can also remove double occurrences of the same categories for one object.
224
+
> Many different categories describe the object. You may notice duplicates categorising the same object twice.
225
+
> We also want to remove those to ensure we only have unique categories that describe a single object.
227
226
>
228
227
> 6. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
229
228
>
@@ -250,10 +249,11 @@ These expressions split categories at the pipe separator and join the unique one
250
249
251
250
## Atomization
252
251
252
+
Once the duplicate records have been removed, we can examine the content of the "Categories" column more closely. Different categories are separated from each other by pipe (\|).
253
+
Each entry can be assigned to more than one category. To leverage those keywords, the values in the Categories column must be split into individual cells using the pipe character.
254
+
253
255
> <hands-on-title>Atomization</hands-on-title>
254
256
>
255
-
> Once the duplicate records have been removed, we can have a closer look at the Categories column. Different categories are separated from each other by pipe (\|). Each entry can have more
256
-
> than one category. In order to analyze in detail the use of the keywords, the values of the Categories column need to be split up into individual cells on the basis of the pipe character.
257
257
> 1. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Split multi-valued cells...`.
258
258
>
259
259
> 
0 commit comments