Skip to content

Commit 9ef0a16

Browse files
authored
Update tutorial.md
1 parent 186ee21 commit 9ef0a16

File tree

1 file changed

+15
-9
lines changed
  • topics/digital-humanities/tutorials/open-refine-tutorial

1 file changed

+15
-9
lines changed

topics/digital-humanities/tutorials/open-refine-tutorial/tutorial.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ contributions:
2222
- dianichj
2323
- dadrasarmin
2424
- Sch-Da
25+
reviewing:
26+
- Sch-Da
2527
funding:
2628
- nfdi4culture
2729
requirements:
@@ -93,13 +95,15 @@ We suggest that you download the data from the Zenodo record as explained below.
9395
>
9496
> {% snippet faqs/galaxy/datasets_import_from_data_library.md %}
9597
>
96-
> 3. **Rename** {% icon galaxy-pencil %} the dataset: "**Powerhouse Museum metadata**."
97-
> 4. Ensure that the datatype is "tsv". Otherwise, use convert datatype.
98+
> 4. Ensure that the datatype of "phm_collection_adapted" is "tsv". Otherwise, use convert datatype.
99+
> 5. Check that the datatype of "stopwords-en" is txt. If not, convert the datatype.
98100
>
99101
> {% snippet faqs/galaxy/datasets_change_datatype.md datatype="datatypes" %}
100102
>
101103
{: .hands_on}
102104
105+
In this first part, we will focus on working with the metadata from the Powerhouse Museum. The additional
106+
103107
# Use OpenRefine to explore and clean your dataset
104108
105109
Access OpenRefine as an interactive tool in Galaxy and explore your data.
@@ -108,14 +112,14 @@ Access OpenRefine as an interactive tool in Galaxy and explore your data.
108112
109113
> <hands-on-title>Opening the dataset with OpenRefine</hands-on-title>
110114
>
111-
> 1. Open the {% tool [OpenRefine](interactive_tool_openrefine) %}: Working with messy data
115+
> 1. Open the tool {% tool [OpenRefine](interactive_tool_openrefine) %}: Working with messy data
112116
> - *"Input file in tabular format"*: `openrefine-phm-collection.tsv`
113117
>
114118
> 2. Click on "Run Tool".
115119
>
116120
> ![OpenRefine tool interface in Galaxy](images/openrefine.png)
117121
>
118-
> 3. After around 30 seconds, using the interactive tools section on the left panel, you can open OpenRefine by clicking on its name. Make sure to wait until you see the symbol with an arrow > pointing outside the box, which allows you to start OpenRefine in a new tab.
122+
> 3. After around 30 seconds, a red dot appears over the interactive tools section on the left panel. Click on "interactive tools". A new window opens. Make sure to wait until you see the symbol with an arrow pointing outside the box, which indicates that you can start OpenRefine in a new tab. Now you can open OpenRefine by clicking on its name.
119123
>
120124
> ![Open OpenRefine tool as an Interactive tool](images/interactive_tools.png)
121125
>
@@ -146,9 +150,11 @@ Access OpenRefine as an interactive tool in Galaxy and explore your data.
146150
147151
Great, now that the dataset is in OpenRefine, we can start cleaning it.
148152
149-
## Remove blank rows
153+
## Remove duplicates
154+
155+
In large datasets, errors are common. Some basic cleaning exercises can help enhance the data quality. One of those steps is to remove duplicate entries.
150156
151-
> <hands-on-title>Removing the blank rows</hands-on-title>
157+
> <hands-on-title>Removing duplicates</hands-on-title>
152158
>
153159
> 1. Click on the triangle on the left of `Record ID`.
154160
>
@@ -172,15 +178,15 @@ Great, now that the dataset is in OpenRefine, we can start cleaning it.
172178
>
173179
> ![Facet by blank Record ID](images/sort5.png)
174180
>
175-
> 7. On the left, a new option appears under `Facet/Filter` with the title `Record ID`. Click on `true`.
181+
> 7. On the left, a new option appears under `Facet/Filter` with the title `Record ID`. It shows two choices, `true` and `false`. Click on `true`.
176182
>
177183
> ![Facet by blank true Record ID](images/sort6.png)
178184
>
179185
> 8. Click on the triangle to the left of the column called `All`. Hover over `Edit rows`, and select `remove matching rows`.
180186
>
181187
> ![Remove matching rows Record ID](images/deduplicate.png)
182188
>
183-
> 9. Close the `Facet` by clicking on the cross (x) to see all rows.
189+
> 9. Close the `Record ID` under `Facet/Filter` by clicking on the cross (x) to see all rows.
184190
>
185191
{: .hands_on}
186192
@@ -195,7 +201,7 @@ Great, now that the dataset is in OpenRefine, we can start cleaning it.
195201
> {: .solution}
196202
{: .question}
197203
198-
The dataset does not contain any more blank rows now. But we need to do more cleaning to improve the dataset.
204+
The dataset no longer contains duplicates based on the Record ID. However, we need to perform further cleaning to enhance the dataset.
199205
200206
## Use GREL
201207

0 commit comments

Comments
 (0)