Skip to content

Commit a2d5ef1

Browse files
committed
fix more numbering by indenting
1 parent 9664403 commit a2d5ef1

File tree

1 file changed

+31
-30
lines changed
  • topics/digital-humanities/tutorials/open-refine-tutorial

1 file changed

+31
-30
lines changed

topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -151,33 +151,33 @@ Great, now that the dataset is in OpenRefine, we can start cleaning it.
151151
>
152152
> 1. Click on the triangle on the left of `Record ID`.
153153
>
154-
> ![Sort Record ID](images/sort.png)
154+
> ![Sort Record ID](images/sort.png)
155155
>
156156
> 2. Click on `Sort...`.
157157
>
158158
> 3. Select `numbers` and click on `OK`.
159159
>
160-
> ![Sort Record ID options](images/sort2.png)
160+
> ![Sort Record ID options](images/sort2.png)
161161
>
162162
> 4. Above the table, click on `Sort` and select `Reorder rows permanently`.
163163
>
164-
> ![Sort Record ID reorder permanently](images/sort3.png)
164+
> ![Sort Record ID reorder permanently](images/sort3.png)
165165
>
166166
> 5. Click on the triangle left of the `Record ID` column. Hover over `Edit cells` and select `Blank down`.
167167
>
168-
> ![Blank down Record ID](images/sort4.png)
168+
> ![Blank down Record ID](images/sort4.png)
169169
>
170170
> 6. Click on the triangle left of the `Record ID` column. Hover over `Facet`, then move your mouse to `Customized facets` and select `Facet by blank (null or empty string)`.
171171
>
172-
> ![Facet by blank Record ID](images/sort5.png)
172+
> ![Facet by blank Record ID](images/sort5.png)
173173
>
174174
> 7. On the left, a new option appears under `Facet/Filter` with the title `Record ID`. Click on `true`.
175175
>
176-
> ![Facet by blank true Record ID](images/sort6.png)
176+
> ![Facet by blank true Record ID](images/sort6.png)
177177
>
178178
> 8. Click on the triangle to the left of the column called `All`. Hover over `Edit rows`, and select `remove matching rows`.
179179
>
180-
> ![Remove matching rows Record ID](images/deduplicate.png)
180+
> ![Remove matching rows Record ID](images/deduplicate.png)
181181
>
182182
> 9. Close the `Facet` by clicking on the cross (x) to see all rows.
183183
>
@@ -206,16 +206,17 @@ The dataset does not contain any more blank rows now. But we need to do more cle
206206
> 3. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
207207
> 4. In the new window, use the following text `value.replace('||', '|')` as "Expression" and click on `OK`.
208208
>
209-
> ![Custom text transform on column Categories](images/filter_grel3.png)
209+
> ![Custom text transform on column Categories](images/filter_grel3.png)
210+
>
211+
> We can also remove the double occurrence of the same for different entries as follows:
210212
>
211-
> We can also remove the double occurrence of the same for different entries as follows:
212213
> 5. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Transform...`.
213214
>
214-
> ![Edit cells Categories](images/filter_grel.png)
215+
> ![Edit cells Categories](images/filter_grel.png)
215216
>
216-
> ![Transform Categories](images/filter_grel2.png)
217+
> ![Transform Categories](images/filter_grel2.png)
217218
>
218-
> 2. In the new window, use the following text `split('|').uniques().join('|')` as "Expression" and click on `OK`.value.
219+
> 6. In the new window, use the following text `split('|').uniques().join('|')` as "Expression" and click on `OK`.value.
219220
>
220221
{: .hands_on}
221222
@@ -238,11 +239,11 @@ The dataset does not contain any more blank rows now. But we need to do more cle
238239
> than one category. In order to analyze in detail the use of the keywords, the values of the Categories column need to be split up into individual cells on the basis of the pipe character.
239240
> 1. Click on the triangle on the left of `Categories`, hover over `edit cells`, and click on `Split multi-valued cells...`.
240241
>
241-
> ![Atomization of Categories](images/split_multi_valued_cells.png)
242+
> ![Atomization of Categories](images/split_multi_valued_cells.png)
242243
>
243244
> 2. Define the `Separator` as `\|` (pipe). Click on `OK`.
244245
>
245-
> ![Facet Blank of atomized Categories](images/facet_categories_blank.png)
246+
> ![Facet Blank of atomized Categories](images/facet_categories_blank.png)
246247
>
247248
{: .hands_on}
248249
@@ -270,15 +271,15 @@ Now, let's use faceting based on text.
270271
> 1. Click on the triangle on the left of `Categories`, hover over `facet`, and click on`Text facet`.
271272
> 2. On the left panel, it mentions the total number of choices. The default value of `count limit` is low for this dataset, and we should increase it. Click on `Set choice count limit`.
272273
>
273-
> ![Text faceting of atomized Categories](images/text_facet.png)
274+
> ![Text faceting of atomized Categories](images/text_facet.png)
274275
>
275276
> 3. Enter `5000` as the new limit and click on `Ok`.
276277
>
277-
> ![Increasing the limit of text facetring](images/text_facet2.png)
278+
> ![Increasing the limit of text facetring](images/text_facet2.png)
278279
>
279280
> 4. Now, you see all categories. Click on `count` to see the categories sorted in descending order.
280281
>
281-
> ![Text faceting of atomized Categories sorted by count](images/text_facet3.png)
282+
> ![Text faceting of atomized Categories sorted by count](images/text_facet3.png)
282283
>
283284
{: .hands_on}
284285
@@ -303,21 +304,21 @@ The clustering allows you to solve issues regarding case inconsistencies, incohe
303304
> 1. Click on the `Cluster` button on the left in the `Facet/Filter` tab.
304305
> 2. Use `Key collision` as clustering method. Change the Keying function to `n-Gram fingerprint` and change the n-Gram size to `3`.
305306
>
306-
> ![Cluster and edit column Categories](images/cluster.png)
307+
> ![Cluster and edit column Categories](images/cluster.png)
307308
>
308309
> 3. Click on the `cluster` button in the middle window.
309310
>
310-
> ![Clustered and merged similar Categories](images/cluster2.png)
311+
> ![Clustered and merged similar Categories](images/cluster2.png)
311312
>
312313
> 4. Here, you can see different suggestions from OpenRefine to cluster different categories and merge them into one. In our tutorial, we merge all of the suggestions by clicking on `select > all` and then clicking on `Merge selected and re-cluster`.
313314
>
314-
> ![Join multi-valued cells on Categories](images/join.png)
315+
> ![Join multi-valued cells on Categories](images/join.png)
315316
>
316317
> 5. Now, you can close the clustering window by clicking on `close`.
317318
>
318-
> Be careful! Some methods are too aggressive, so you might end up clustering values that do not belong together. Now that the values have been clustered individually, we can put them back together in a single cell.
319-
> 1. Click the Categories triangle and hover over the `Edit cells` and click on `Join multi-valued cells`.
320-
> 2. Choose the pipe character (`\|`) as a separator and click on `OK`.
319+
> Be careful! Some methods are too aggressive, so you might end up clustering values that do not belong together. Now that the values have been clustered individually, we can put them back together in a single cell.
320+
> 6. Click the Categories triangle and hover over the `Edit cells` and click on `Join multi-valued cells`.
321+
> 7. Choose the pipe character (`\|`) as a separator and click on `OK`.
321322
> The rows now look like before, with a multi-valued Categories field.
322323
>
323324
{: .hands_on}
@@ -331,12 +332,12 @@ When you’re happy with your analysis results, choose whether to export the dat
331332
> 1. Click on `Export` at the top of the table.
332333
> 2. Select `Galaxy exporter`. Wait a few seconds. In a new page, you will see a text as follows: "Dataset has been exported to Galaxy, please close this tab". When you see this, you can close that tab. Alternatively, you can download your cleaned dataset in various formats such as CSV, TSV, and Excel. You can also close the extra tab that contains OpenRefine and click on the orange item `OpenRefine on data [and a number]`. You do not need it for your next steps
333334
>
334-
> ![Export results of OpenRefine](images/export_results3.png)
335+
> ![Export results of OpenRefine](images/export_results3.png)
335336
>
336337
> 3. You can find a new dataset in your Galaxy History (with a green background) that contains your cleaned dataset for further analysis.
337338
> 4. You can click on the eye icon ({% icon galaxy-eye %}) and investigate the table.
338339
>
339-
> ![Cleaned dataset](images/dataset_cleaned.png)
340+
> ![Cleaned dataset](images/dataset_cleaned.png)
340341
>
341342
{: .hands_on}
342343
@@ -345,7 +346,7 @@ When you’re happy with your analysis results, choose whether to export the dat
345346
> 1. Click on `Undo/Redo` on the left panel.
346347
> 2. Click on `Extract...`.
347348
>
348-
> ![Extract OpenRefine](images/extract_tasks.png)
349+
> ![Extract OpenRefine](images/extract_tasks.png)
349350
>
350351
> 3. Click on the steps that you want to extract. Here, we selected everything.
351352
> 4. Click on `Export`. Give your file a name to save it on your computer.
@@ -379,22 +380,22 @@ In this case, be sure to check out our other tutorials, particularly the introdu
379380
> Let's assume that you have imported a workflow to your Galaxy account.
380381
> 1. You can find all workflows available to you by clicking on the Workflows Icon ({% icon galaxy-workflows-activity %}) on the left panel.
381382
>
382-
> ![Workflows button](images/workflows.png)
383+
> ![Workflows button](images/workflows.png)
383384
>
384385
> 2. Then, you can select and run different workflows (if you have any workflows in your account). Here, let's click on the Run button ({% icon workflow-run %}) of the workflow we provided to you in this tutorial.
385386
>
386-
> ![Select this workflow](images/select_workflow.png)
387+
> ![Select this workflow](images/select_workflow.png)
387388
>
388389
> 3. Determine the inputs as follows:
389390
> Input: `openrefine-Galaxy file.tsv`
390391
> stop_words_english: `stop_words_english.txt`, which is the file we provided to you in this tutorial.
391392
>
392-
> ![Determine the inputs of the workflow](images/workflow_inputs.png)
393+
> ![Determine the inputs of the workflow](images/workflow_inputs.png)
393394
>
394395
> 5. Click on the `Run Workflow` button at the top.
395396
> 6. You can follow the stages of different jobs (computational tasks). They will be created, scheduled, executed, and completed. When everything is green, your workflow has run fully and the results are ready.
396397
>
397-
> ![Overview of the workflow](images/workflow_overview.png)
398+
> ![Overview of the workflow](images/workflow_overview.png)
398399
>
399400
{: .hands_on}
400401

0 commit comments

Comments
 (0)