Skip to content

Commit d92510d

Browse files
authored
Merge pull request #81 from LimaRAF/dev
Dev
2 parents 2a84d3f + e1ee4f0 commit d92510d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+1293
-598
lines changed

DESCRIPTION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ Suggests:
5151
remotes,
5252
rmarkdown,
5353
sp,
54+
s2,
5455
stats,
5556
testthat (>= 3.0.0),
5657
utils

NEWS.md

Lines changed: 42 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
# plantR 0.1.4
1+
# plantR NEWS
2+
3+
<br/>
4+
5+
## version 0.1.4
26

37
* Fixing an issue related to non-NA type specifications in `validateTax()` and
48
`checkList()` used to validate species identifications and detect type
@@ -7,9 +11,12 @@ specimens.
711
* Fixing a small issue related to non-preserved specimens observation in
812
`validateTax()` to assign confidence levels to species identifications.
913

10-
* Inclusion of the (simple) internal function `plotCoord()` to help the
14+
* New (simple) internal function `plotCoord()` to help the
1115
visualization of the coordinate validation process
1216

17+
* New (simple) internal function `fixEncoding()` to help solving
18+
common encoding problems of text in 'latin1'
19+
1320
* Inclusion of an argument to make optional the prints of functions:
1421
`prepFamily()`, `summaryData()`, `summaryFlags()` and `validateTax()`
1522

@@ -25,21 +32,34 @@ separately for the taxonomic, locality and coordinate information. We also fixed
2532
a bug related to the homogenization of the geographical and locality information
2633

2734
* Function `prepTDWG()` (and consequently `prepName()`) now provides the
28-
argument 'pretty' which controls the way in which the output names are
29-
presented. By default the function returns, as before, names in a 'pretty' way
30-
(i.e. only the first letter of names capitalized, initials separated by points
31-
and no spaces, and family name prepositions in lower cases). But now the
32-
function also returns the names in the desired format but presented in the same
33-
way as the input names
35+
argument 'pretty' which controls how the output names are presented. By default
36+
the function returns, as before, names in a 'pretty' way (i.e. only the first
37+
letter of names capitalized, initials separated by points and no spaces, and
38+
family name prepositions in lower cases). But now the function also returns the
39+
names in the desired format but presented in the same way as the input names
3440

35-
* Minor improvements in functions `formatDwc()`, `checkList()`, ,
36-
`fixSpecies()`, `prepName()`, `getPrep()`, `getInit()`, `lastName()`,
37-
`prepFamily()` and `addRank()`.
41+
* Function `formatDwc()` now accepts data downloaded from the BIEN database and
42+
it includes a dataset-specific option to solve common latin1 encoding problems
43+
44+
* Function `fixName()` now includes an option to detect and solve potentially
45+
problematic cases when the name notation uses commas to separate multiple
46+
people's names. This option is controlled by the new argument `bad.comma`
47+
48+
* Minor improvements in `checkList()`, `fixSpecies()`, `prepName()`,
49+
`getPrep()`, `getInit()`, `getYear()`, `colNumber()`, `lastName()`,
50+
`checkCoord()`, `mahalanobisDist()`, `prepFamily()` and `addRank()`.
3851

3952
* Adding tests to most of the package functions
4053

54+
* Solving some minor problems in the world map objects to make them compatible
55+
to the use of the spherical geometry operators of package __s2__, which is now
56+
the default of package __sf__ version >= 1.0
57+
4158

42-
# plantR 0.1.3
59+
60+
<br/>
61+
62+
## version 0.1.3
4363

4464
* Fixing bugs in `formatDwc()` related to the difference in the number of
4565
columns that speciesLink returns for queries using different taxa and for the
@@ -57,7 +77,9 @@ binding of columns with different data types.
5777
`formatTax()`
5878

5979

60-
# plantR 0.1.2
80+
<br/>
81+
82+
## version 0.1.2
6183

6284
* New tutorial on how __plantR__ can be used to update databases of biological
6385
collections (currently in Portuguese, only)
@@ -83,9 +105,11 @@ prioritize the merge of taxonomic information within duplicates
83105
list of taxonomists related to the new tutorial
84106

85107

86-
# plantR 0.1.1
108+
<br/>
87109

88-
* Added function `readData()` to read DwC-A zip files from GBIF
110+
## version 0.1.1
111+
112+
* New function `readData()` to read DwC-A zip files from GBIF
89113

90114
* `checkCoord()` now supports user provided maps
91115

@@ -94,6 +118,8 @@ list of taxonomists related to the new tutorial
94118
* Added a `NEWS.md` file to track changes to the package.
95119

96120

97-
# plantR 0.1.0
121+
<br/>
122+
123+
## version 0.1.0
98124

99125
* The first public version of the package in GitHub.

R/accessory_geo.R

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ minDist <- function(lon, lat, min.dist = 0.001, output = NULL) {
154154
#' (`method`= 'robust'). The two methods take into account the geographical
155155
#' center of the coordinates distribution and the spatial covariance between
156156
#' them. But they vary in the way the covariance matrix of the distribution is
157-
#' defined: the classic method uses an approach based on Pearsons method,
157+
#' defined: the classic method uses an approach based on Pearson's method,
158158
#' while the robust method uses a Minimum Covariance Determinant (MCD)
159159
#' estimator.
160160
#'
@@ -321,13 +321,21 @@ mahalanobisDist <- function(lon, lat, method = NULL, n.min = 5, digs = 4,
321321
if (class(rob) == "try-error") {
322322
df1$lon2 <- jitter(df1$lon, factor = 0.001)
323323
df1$lat2 <- jitter(df1$lat, factor = 0.001)
324-
rob <- robustbase::covMcd(df1[use_these, c("lon2", "lat2")],
325-
alpha = 1 / 2, tol = 1e-20)
326-
res0 <- cbind.data.frame(dup.coord.ID = df1$dup.coord.ID,
327-
res = sqrt(stats::mahalanobis(df1[, c("lon2", "lat2")],
328-
center = rob$center,
329-
cov = rob$cov, tol=1e-20)))
324+
rob <- suppressWarnings(try(
325+
robustbase::covMcd(df1[use_these, c("lon2", "lat2")],
326+
alpha = 1 / 2, tol = 1e-20), TRUE))
327+
if (class(rob) == "try-error") {
328+
res0 <- cbind.data.frame(dup.coord.ID = df1$dup.coord.ID,
329+
res = NA_character_)
330+
} else {
331+
res0 <- cbind.data.frame(dup.coord.ID = df1$dup.coord.ID,
332+
res = sqrt(stats::mahalanobis(df1[, c("lon2", "lat2")],
333+
center = rob$center,
334+
cov = rob$cov, tol=1e-20)))
335+
}
336+
330337
} else {
338+
331339
if (length(rob$singularity) > 0) {
332340
df1$lon2 <- jitter(df1$lon, factor = 0.005)
333341
df1$lat2 <- jitter(df1$lat, factor = 0.005)

R/checkBorders.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
#'
88
#' @author Andrea Sánchez-Tapia & Sara Mortara
99
#'
10+
#' @encoding UTF-8
11+
#'
1012
#' @keywords internal
1113
#'
1214
shares_border <- function(country1 = "brazil",
@@ -76,6 +78,8 @@ shares_border <- function(country1 = "brazil",
7678
#'
7779
#' @author Andrea Sánchez-Tapia & Sara Mortara
7880
#'
81+
#' @encoding UTF-8
82+
#'
7983
#' @export checkBorders
8084
#'
8185
checkBorders <- function(x,

R/checkCoord.R

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@
6060
#' the validation of the locality information (see function `getLoc()` and the
6161
#' default __plantR__ maps 'worldMap' and 'latamMap').
6262
#'
63+
#' @encoding UTF-8
6364
#'
6465
#' @export checkCoord
6566
#'
@@ -99,7 +100,7 @@ checkCoord <- function(x,
99100
##Preliminary edits
100101
cols.x <- names(x) # original data column names
101102
x$tmp.order <- 1:nrow(x)
102-
x[, str.name][x[, str.name] %in% "no_loc"] <- NA #porque nao é pais (rafl: concordo, mas não achei nehuma funcao onde esse 'no_loc' é gerado; melhor alterar direto na função que obtém o string, getLoc()?)
103+
x[, str.name][x[, str.name] %in% "no_loc"] <- NA #porque nao eh pais (rafl: concordo, mas nao achei nehuma funcao onde esse 'no_loc' eh gerado; melhor alterar direto na funcao que obtem o string, getLoc()?)
103104

104105
##Defining the country, state and county columns
105106
x <- tidyr::separate(
@@ -119,7 +120,6 @@ checkCoord <- function(x,
119120
paste0("ok_", x[ids.gazet, res.gazet], "_gazet")
120121
ids.no.coord <- x[, orig.coord] %in% "no_coord"
121122
geo.check[ids.no.coord] <- "no_cannot_check"
122-
#rafl: checar com mais dados se pode ter NAs ou outras classes
123123

124124
## Subsetting data for geographical checking
125125
tmp <- x[is.na(geo.check), ]
@@ -166,6 +166,21 @@ checkCoord <- function(x,
166166
join = sf::st_intersects))
167167
names(tmp)[which(names(tmp) == "NAME_0")] <- "pais_wo"
168168

169+
##Solving misterious problems with the country map (could not iolate the problem)
170+
check_these <- grepl("\\.[0-9]", rownames(tmp))
171+
if (any(check_these)){
172+
tmp$keep_these <- rep(TRUE, dim(tmp)[1])
173+
dup.orders <- tmp$tmp.order[check_these]
174+
for(i in seq_along(dup.orders)) {
175+
dups.i <- tmp[tmp$tmp.order %in% dup.orders[i], ]
176+
dups.i$keep_these[dups.i$country.new != dups.i$pais_wo] <- FALSE
177+
if (all(dups.i$keep_these))
178+
dups.i$keep_these[-1] <- FALSE
179+
tmp$keep_these[tmp$tmp.order %in% dup.orders[i]] <- dups.i$keep_these
180+
}
181+
tmp <- tmp[tmp$keep_these, ]
182+
}
183+
169184
##Defining which coordinates fall into the sea (i.e. original coordinates but no country, state or county)
170185
geo.check[is.na(geo.check)][is.na(tmp$pais_wo)] <- "sea"
171186

@@ -215,33 +230,32 @@ checkCoord <- function(x,
215230
# cria o vetor para checar
216231
x2$loc.coord <- paste(x2$NAME_0, x2$NAME_1, x2$NAME_2, sep = "_")
217232
x2$loc.coord[x2$loc.coord %in% "NA_NA_NA"] <- NA_character_
218-
x2$loc.coord <- gsub("_NA_NA$", "", x2$loc.coord, perl = TRUE) #rafl: necessário, certo?
219-
x2$loc.coord <- gsub("_NA$", "", x2$loc.coord, perl = TRUE) #rafl: necessário, certo?
220-
# ast: na real loc.coord nao é usado mais.então tudo isto poderia sumir.
221-
# rafl: vdd, mas acho legal a possibilidade de retornar essa info. Pode ajudar na gestão/correção de coleções.
233+
x2$loc.coord <- gsub("_NA_NA$", "", x2$loc.coord, perl = TRUE)
234+
x2$loc.coord <- gsub("_NA$", "", x2$loc.coord, perl = TRUE)
222235

223236
# recupera todas as linhas
224237
x3 <- suppressMessages(
225238
dplyr::left_join(x,
226239
x2[,c("tmp.order",
227240
"NAME_0", "NAME_1", "NAME_2", "NAME_3",
228241
"loc.coord")]))
229-
#ast: eu nao sei se vc está tirando colunas aqui mas pelo menos tirei o by que ia criar colunas duplicadas.
230-
#rafl: ok! removi o geo.check e adicionei o suppressWarnings
231242

232243
### GEO-VALIDATION STEPS ###
233244
##1- Validating the coordinates at different levels - exact matches
234245
#1.1 Country-level: good country? All countries
235246
x3$country.check <- dplyr::if_else(x3$country.gazet == x3$NAME_0,
236-
"ok_country", "bad_country", missing = "no_country")
247+
"ok_country", "bad_country",
248+
missing = "no_country")
237249

238250
#1.2 State-level: good state? All countries
239251
x3$state.check <- dplyr::if_else(x3$state.gazet == x3$NAME_1,
240-
"ok_state", "bad_state", missing = "no_state")
252+
"ok_state", "bad_state",
253+
missing = "no_state")
241254

242255
#1.3 County-level. All countries
243256
x3$county.check <- dplyr::if_else(x3$county.gazet == x3$NAME_2,
244-
"ok_county", "bad_county", missing = "no_county")
257+
"ok_county", "bad_county",
258+
missing = "no_county")
245259

246260
## Updating geo.check
247261
tmp1 <- apply(x3[ , c("country.check",

R/checkInverted.R

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,12 @@
4343
#'
4444
#' @importFrom sf st_as_sf st_crs st_set_crs st_coordinates st_join st_intersects st_geometry
4545
#'
46-
#' @export checkInverted
47-
#'
4846
#' @author Andrea Sánchez-Tapia, Sara Mortara & Renato A. F. de Lima
4947
#'
48+
#' @encoding UTF-8
49+
#'
50+
#' @export checkInverted
51+
#'
5052
checkInverted <- function(x,
5153
check.names = c("geo.check", "border.check", "shore.check"),
5254
country.gazetteer = "country.gazet",
@@ -127,7 +129,7 @@ checkInverted <- function(x,
127129
tmp$inv_lat <- -tmp[, lat]
128130
inv_lon <- "inv_lon"
129131
inv_lat <- "inv_lat"
130-
#rafl: no codigo antigo eu fazia apenas os casos 1,2,3 e 4. Se me lembro bem, não achei os casos 5 e 6. Mas o 7 deve ter...
132+
#rafl: no codigo antigo eu fazia apenas os casos 1,2,3 e 4. Se me lembro bem, nao achei os casos 5 e 6. Mas o 7 deve ter...
131133
types <- list(invert_lon = c(inv_lon, lat),
132134
invert_lat = c(lon, inv_lat),
133135
invert_both = c(inv_lon, inv_lat), #signos

R/checkOut.R

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@
8787
#' Conservation, 252: 108825.
8888
#'
8989
#' Liu, C., White, M., and Newell, G. 2018. Detecting outliers in species
90-
#' distribution data. Journal of Biogeography, 45(1): 164176.
90+
#' distribution data. Journal of Biogeography, 45(1): 164-176.
9191
#'
9292
#'
9393
#' @seealso
@@ -105,7 +105,7 @@ checkOut <- function(x,
105105
tax.name = "scientificName.new",
106106
geo.name = "geo.check",
107107
cult.name = "cult.check",
108-
n.min = 5,
108+
n.min = 6,
109109
center = "median",
110110
geo.patt = "ok_",
111111
cult.patt = NA,
@@ -117,11 +117,14 @@ checkOut <- function(x,
117117
robust.cut <- out.check <- NULL
118118

119119
## check input
120-
if (!class(x) == "data.frame")
120+
if (!class(x)[1] == "data.frame")
121121
stop("Input object needs to be a data frame!")
122122

123+
if (dim(x)[1] == 0)
124+
stop("Input data frame is empty!")
125+
123126
if (!all(c(lat, lon) %in% colnames(x)))
124-
stop("Coordinate column names do not match those of the input object: please rename or specify the correct names")
127+
stop("Coordinates column names do not match those of the input object: please rename or specify the correct names")
125128

126129
if (!tax.name %in% colnames(x)) {
127130
rm.tax <- TRUE
@@ -157,7 +160,6 @@ checkOut <- function(x,
157160
cult.patt = cult.patt),
158161
by = c("tax.wrk")]
159162

160-
161163
dt[!is.na(lon.wrk) & !is.na(lat.wrk),
162164
maha.robust := mahalanobisDist(lon.wrk, lat.wrk, n.min = n.min,
163165
method = "robust",

R/checkShore.R

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@
4545
#'
4646
#' @author Andrea Sánchez-Tapia, Sara Mortara & Renato A. Ferreira de Lima
4747
#'
48+
#' @encoding UTF-8
49+
#'
4850
#' @export checkShore
4951
checkShore <- function(x,
5052
geo.check = "geo.check",
@@ -77,7 +79,7 @@ checkShore <- function(x,
7779
if (type == "buffer") {
7880
land <- landBuff
7981
test_shore <-
80-
suppressMessages(sf::st_intersects(tmp, land, by_element = TRUE))
82+
suppressMessages(sf::st_intersects(tmp, land))
8183
shore.check <- lengths(test_shore) == 1
8284
}
8385

0 commit comments

Comments
 (0)