|
58 | 58 | "* `scientificName`: the accepted scientific name of the species\n",
|
59 | 59 | "* `decimalLatitude`/`decimalLongitude`: coordinates of the occurrence in WGS84 format\n",
|
60 | 60 | "* `sex`: either `male` or `female` to characterize the sex of the occurrence\n",
|
61 |
| - "* `occurrenceID`: a identifier within the data set to identify the individual records\n", |
| 61 | + "* `occurrenceID`: an identifier within the data set to identify the individual records\n", |
62 | 62 | "* `datasetName`: a static string defining the source of the data\n",
|
63 | 63 | "\n",
|
64 | 64 | "Furthermore, additional information concerning the taxonomy will be added using an external API service"
|
|
529 | 529 | "cell_type": "markdown",
|
530 | 530 | "metadata": {},
|
531 | 531 | "source": [
|
532 |
| - "To check what the frequency of occurrences is for male/female of the categories, a bar chart is one possible representation:" |
| 532 | + "To check what the frequency of occurrences is for male/female of the categories, a bar chart is a possible representation:" |
533 | 533 | ]
|
534 | 534 | },
|
535 | 535 | {
|
|
859 | 859 | "cell_type": "markdown",
|
860 | 860 | "metadata": {},
|
861 | 861 | "source": [
|
862 |
| - "There apparently exists a double entry: `'DM and SH'`, which basically defines two records and should be decoupled to two individual records (i.e. rows). Hence, we should be able to create a additional row based on this split. To do so, Pandas provides a dedicated function since version 0.25, called `explode`. Starting from a small subset example:" |
| 862 | + "There apparently exists a double entry: `'DM and SH'`, which basically defines two records and should be decoupled to two individual records (i.e. rows). Hence, we should be able to create an additional row based on this split. To do so, Pandas provides a dedicated function since version 0.25, called `explode`. Starting from a small subset example:" |
863 | 863 | ]
|
864 | 864 | },
|
865 | 865 | {
|
|
1050 | 1050 | "cell_type": "markdown",
|
1051 | 1051 | "metadata": {},
|
1052 | 1052 | "source": [
|
1053 |
| - "The function takes a `DataFrame` as input, splits the record into separate rows and returns an updated `DataFrame`. We can use this function to get an update of the `DataFrame`, with the an additional row (observation) added by decoupling the specific field. Let's apply this new function." |
| 1053 | + "The function takes a `DataFrame` as input, splits the record into separate rows and returns an updated `DataFrame`. We can use this function to get an update of the `DataFrame`, with an additional row (observation) added by decoupling the specific field. Let's apply this new function." |
1054 | 1054 | ]
|
1055 | 1055 | },
|
1056 | 1056 | {
|
|
1358 | 1358 | "cell_type": "markdown",
|
1359 | 1359 | "metadata": {},
|
1360 | 1360 | "source": [
|
1361 |
| - "The `record_id` is no longer a unique identifier for each observation after the decoupling of this data set. We will make a new data set specific identifier, by adding a column called `occurrenceID` that takes a new counter as identifier. As a simply and straightforward approach, we will use a new counter for the whole dataset, starting with 1:" |
| 1361 | + "The `record_id` is no longer a unique identifier for each observation after the decoupling of this data set. We will make a new data set specific identifier, by adding a column called `occurrenceID` that takes a new counter as identifier. As a simple and straightforward approach, we will use a new counter for the whole dataset, starting with 1:" |
1362 | 1362 | ]
|
1363 | 1363 | },
|
1364 | 1364 | {
|
|
0 commit comments