You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this first part, we will focus on working with the metadata from the Powerhouse Museum. The additional
106
+
103
107
# Use OpenRefine to explore and clean your dataset
104
108
105
109
Access OpenRefine as an interactive tool in Galaxy and explore your data.
@@ -108,14 +112,14 @@ Access OpenRefine as an interactive tool in Galaxy and explore your data.
108
112
109
113
> <hands-on-title>Opening the dataset with OpenRefine</hands-on-title>
110
114
>
111
-
> 1. Open the {% tool [OpenRefine](interactive_tool_openrefine) %}: Working with messy data
115
+
> 1. Open the tool {% tool [OpenRefine](interactive_tool_openrefine) %}: Working with messy data
112
116
> - *"Input file in tabular format"*: `openrefine-phm-collection.tsv`
113
117
>
114
118
> 2. Click on "Run Tool".
115
119
>
116
120
> 
117
121
>
118
-
> 3. After around 30 seconds, using the interactive tools section on the left panel, you can open OpenRefine by clicking on its name. Make sure to wait until you see the symbol with an arrow > pointing outside the box, which allows you to start OpenRefine in a new tab.
122
+
> 3. After around 30 seconds, a red dot appears over the interactive tools section on the left panel. Click on "interactive tools". A new window opens. Make sure to wait until you see the symbol with an arrow pointing outside the box, which indicates that you can start OpenRefine in a new tab. Now you can open OpenRefine by clicking on its name.
119
123
>
120
124
> 
121
125
>
@@ -146,9 +150,11 @@ Access OpenRefine as an interactive tool in Galaxy and explore your data.
146
150
147
151
Great, now that the dataset is in OpenRefine, we can start cleaning it.
148
152
149
-
## Remove blank rows
153
+
## Remove duplicates
154
+
155
+
In large datasets, errors are common. Some basic cleaning exercises can help enhance the data quality. One of those steps is to remove duplicate entries.
150
156
151
-
> <hands-on-title>Removing the blank rows</hands-on-title>
0 commit comments