|
| 1 | +--- |
| 2 | +title: "cppRouting package" |
| 3 | +author: "Vincent LARMET" |
| 4 | +date: "15 juin 2019" |
| 5 | +output: github_document |
| 6 | +--- |
| 7 | + |
| 8 | +```{r setup, include=FALSE} |
| 9 | +knitr::opts_chunk$set(echo = TRUE) |
| 10 | +knitr::opts_knit$set(root.dir = 'D:/users/vlarmet/Dijkstra/exemple/data') |
| 11 | +``` |
| 12 | + |
| 13 | +#Package presentation |
| 14 | + |
| 15 | +`cppRouting` is an `R` package which provide functions to calculate distances, shortest paths and isochrones/isodistances on non-negative weighted graphs. |
| 16 | +For now, `cppRouting` can implement : |
| 17 | + |
| 18 | + - uni-directional Dijkstra algorithm, |
| 19 | + - bi-directional Dijkstra algorithm, |
| 20 | + - uni-directional A* algorithm |
| 21 | + - New bi-directional A* algorithm (Piljs & Post, 2009 : see http://repub.eur.nl/pub/16100/ei2009-10.pdf) |
| 22 | + |
| 23 | +All these functions are written in C++ and use std::priority_queue container from the Standard Template Library. |
| 24 | +This package have been made with `Rcpp` and `parallel` packages. |
| 25 | + |
| 26 | +#Install |
| 27 | +###Stable version from CRAN |
| 28 | +```{r,eval=FALSE} |
| 29 | +install.packages("cppRouting") |
| 30 | +``` |
| 31 | +###or from github |
| 32 | +```{r,eval=FALSE} |
| 33 | +library(devtools) |
| 34 | +devtools::install_github("vlarmet/cppRouting") |
| 35 | +``` |
| 36 | +#Data |
| 37 | +The data presented here is the official french road network describing over 500000 km of roads. |
| 38 | +All data used in this README are free and can be downloaded here : |
| 39 | + |
| 40 | +- roads : http://professionnels.ign.fr/route500 |
| 41 | +- general practitioners location : https://www.insee.fr/fr/statistiques/3568614?sommaire=3568656#consulter |
| 42 | +- maternity wards location : https://www.insee.fr/fr/statistiques/3568611?sommaire=3568656#dictionnaire |
| 43 | +- shapefile of the ~36000 communes in France : http://professionnels.ign.fr/adminexpress |
| 44 | + |
| 45 | +Graph data have been preprocessed for more readability (see data_preparation.R). |
| 46 | + |
| 47 | +The final graph is composed of 234615 nodes and 685118 edges. |
| 48 | +Data has to be a 3 columns data.frame or matrix containing from, to and a cost/distance column. Here the cost is the time needed to travel in each edges (in minutes). From and to are vertices IDs (character or numeric). |
| 49 | + |
| 50 | +#Main functions |
| 51 | + |
| 52 | +`cppRouting` package provide these functions : |
| 53 | + |
| 54 | +- `get_distance_matrix` : compute distance matrix (between all combinations origin-destination nodes - *one-to-many*), |
| 55 | +- `get_distance_pair` : compute distances between origin and destination by pair (*one-to-one*), |
| 56 | +- `get_path_pair` : compute shortest paths between origin and destination by pair (*one-to-one*), |
| 57 | +- `get_multi_paths` : compute shortest paths between all origin nodes and all destination nodes (*one-to-many*), |
| 58 | +- `get_isochrone` : compute isochrones/isodistances with one or multiple breaks. |
| 59 | + |
| 60 | +The choice between all the algorithms is available for *one-to-one* calculation like `get_distance_pair` and `get_path_pair`. |
| 61 | +In these functions, uni-directional Dijkstra algorithm is stopped when the destination node is reached. |
| 62 | +`A*` and `NBA*` are relevant if geographic coordinates of all nodes are provided. Note that coordinates should be expressed in a **projection system**. |
| 63 | +To be accurate and efficient, `A*` and `NBA*` algorithms should use an admissible heuristic function (here the Euclidean distance), e.g cost and heuristic function must be expressed in the same unit. |
| 64 | +In `cppRouting`, heuristic function `h` is defined such that : h(xi,yi,xdestination,ydestination)/k, with a constant k; so in the case where coordinates are expressed in meters and cost is expressed in time, k is the maximum speed allowed on the road. By default, constant is 1 and is designed for graphs with cost expressed in the same unit than coordinates (for example in meters). |
| 65 | + |
| 66 | +If coordinates cannot be provided, bi-directional Dijkstra algorithm can offer a good alternative to A* in terms of performance. |
| 67 | + |
| 68 | +##Examples |
| 69 | +###Prepare data |
| 70 | +```{r pressure, echo=TRUE,message=FALSE} |
| 71 | +library(cppRouting) |
| 72 | +library(dplyr) |
| 73 | +library(sf) |
| 74 | +library(ggplot2) |
| 75 | +library(concaveman) |
| 76 | +library(ggmap) |
| 77 | +
|
| 78 | +#Reading french road data |
| 79 | +roads<-read.csv("roads.csv",colClasses = c("character","character","numeric")) |
| 80 | +#Shapefile data of communes |
| 81 | +com<-read_sf("com_simplified_geom.shp") |
| 82 | +#Correspondance file between communes and nodes in the graph |
| 83 | +ndcom<-read.csv("node_commune.csv",colClasses = c("character","character","numeric")) |
| 84 | +#General practitioners locations |
| 85 | +med<-read.csv("doctor.csv",colClasses = c("character","numeric","character","numeric")) |
| 86 | +#Import nodes coordinates (projected in EPSG : 2154) |
| 87 | +coord<-read.csv("coordinates.csv",colClasses = c("character","numeric","numeric")) |
| 88 | +``` |
| 89 | + |
| 90 | +####Head of road network data |
| 91 | +```{r , echo=TRUE,message=FALSE} |
| 92 | +head(roads) |
| 93 | +``` |
| 94 | + |
| 95 | +####Head of coordinates data |
| 96 | +```{r , echo=TRUE,message=FALSE} |
| 97 | +head(coord) |
| 98 | +``` |
| 99 | + |
| 100 | + |
| 101 | +###Instantiate the graph |
| 102 | +```{r,echo=TRUE} |
| 103 | +#Instantiate a graph with coordinates |
| 104 | +graph<-makegraph(roads,directed = T,coords = coord) |
| 105 | +``` |
| 106 | +###Distances by pairs between nodes (*one-to-one*) |
| 107 | +####Using uni-directional Dijkstra algorithm |
| 108 | +```{r,echo=TRUE} |
| 109 | +#Generate 2000 random origin and destination nodes |
| 110 | +origin<-sample(roads$from,2000) |
| 111 | +destination<-sample(roads$from,2000) |
| 112 | +
|
| 113 | +#Uni-directional : single core |
| 114 | +system.time( |
| 115 | +pair_dijkstra<-get_distance_pair(graph,origin,destination) |
| 116 | +) |
| 117 | +
|
| 118 | +``` |
| 119 | +####Using bi-directional Dijkstra algorithm |
| 120 | +```{r,} |
| 121 | +#Bi-directional : single core |
| 122 | +system.time( |
| 123 | +pair_bidijkstra<-get_distance_pair(graph,origin,destination,algorithm = "bi") |
| 124 | +) |
| 125 | +
|
| 126 | +``` |
| 127 | +####Using A* algorithm |
| 128 | +Coordinates are defined in meters and max speed is 110km/h; so for the heuristic function to be admissible, the constant equal 110/0.06 : |
| 129 | +```{r,echo=TRUE} |
| 130 | +#A* single node |
| 131 | +system.time( |
| 132 | +pair_astar<-get_distance_pair(graph,origin,destination,algorithm = "A*",constant = 110/0.06) |
| 133 | +) |
| 134 | +
|
| 135 | +``` |
| 136 | +####Using NBA* algorithm |
| 137 | +```{r,echo=TRUE} |
| 138 | +#NBA* single node |
| 139 | +system.time( |
| 140 | +pair_nba<-get_distance_pair(graph,origin,destination,algorithm = "NBA",constant = 110/0.06) |
| 141 | +) |
| 142 | +
|
| 143 | +``` |
| 144 | +####Output |
| 145 | +```{r,echo=TRUE} |
| 146 | +head(cbind(pair_dijkstra,pair_bidijkstra,pair_astar,pair_nba)) |
| 147 | +
|
| 148 | +``` |
| 149 | +#####In `get_distance_pair` function, all the algorithms can be ran in parallel by setting TRUE to allcores argument. |
| 150 | + |
| 151 | + |
| 152 | +###Compute isochrones |
| 153 | +Let's compute isochrones around Dijon city |
| 154 | +```{r,echo=TRUE,message=FALSE,warning=FALSE} |
| 155 | +#Compute isochrones |
| 156 | +iso<-get_isochrone(graph,from = "205793",lim = c(15,25,45,60,90,120)) |
| 157 | +#Convert nodes to concave polygons with concaveman package |
| 158 | +poly<-lapply(iso[[1]],function(x){ |
| 159 | + x<-data.frame(noeuds=x,stringsAsFactors = F) |
| 160 | + x<-left_join(x,coord,by=c("noeuds"="ID")) |
| 161 | + return(concaveman(summarise(st_as_sf(x,coords=c("X","Y"),crs=2154)))) |
| 162 | +}) |
| 163 | +
|
| 164 | +poly<-do.call(rbind,poly) |
| 165 | +poly$time<-as.factor(names(iso[[1]])) |
| 166 | +#Multipolygon |
| 167 | +poly2<-st_cast(poly,"MULTIPOLYGON") |
| 168 | +poly2$time<-reorder(poly2$time,c(120,90,60,45,25,15)) |
| 169 | +#Reproject for plotting |
| 170 | +poly2<-st_transform(poly2,"+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") |
| 171 | +#Import map backgroung |
| 172 | +dijon=get_map(location=c(lon=5.041140,lat=47.323025),zoom=7, source="google",maptype = "toner-2010") |
| 173 | +#Plot the map |
| 174 | +p<-ggmap(dijon)+ |
| 175 | + geom_sf(data=poly2,aes(fill=time),alpha=.8,inherit.aes = FALSE)+ |
| 176 | + scale_fill_brewer(palette = "YlOrRd")+ |
| 177 | + labs(fill="Minutes")+ |
| 178 | + ggtitle("Isochrones around Dijon")+ |
| 179 | + theme(axis.text.x = element_blank(), |
| 180 | + axis.text.y = element_blank(), |
| 181 | + axis.ticks = element_blank(), |
| 182 | + axis.title.y=element_blank(),axis.title.x=element_blank()) |
| 183 | +p |
| 184 | +
|
| 185 | +``` |
| 186 | + |
| 187 | + |
| 188 | +#Applications |
| 189 | +## Application 1 : Calculate Two Step Floating Catchment Areas (2SFCA) of general practitioners in France |
| 190 | + |
| 191 | +2SFCA method is explained here : https://en.wikipedia.org/wiki/Two-step_floating_catchment_area_method |
| 192 | + |
| 193 | + |
| 194 | +###First step |
| 195 | +Isochrones are calculated with the `cppRouting` function `get_isochrone` |
| 196 | +```{r,echo=TRUE} |
| 197 | +#Isochrone around doctor locations with time limit of 15 minutes |
| 198 | +iso<-get_isochrone(graph,from = ndcom[ndcom$com %in% med$CODGEO,"id_noeud"],lim = 15) |
| 199 | +#Convert list to long data frame |
| 200 | +df<-stack(setNames(iso, seq_along(iso))) |
| 201 | +df$ind<-rep(names(iso),times=sapply(iso,length)) |
| 202 | +df<-df[df$values %in% ndcom$id_noeud,] |
| 203 | +#Joining and summing population located in each isochrone |
| 204 | +df<-left_join(df,ndcom[,c("id_noeud","POPULATION")],by=c("values"="id_noeud")) |
| 205 | +df<-df %>% group_by(ind) %>% |
| 206 | + summarise(pop=sum(POPULATION)) |
| 207 | +#Joining number of doctors |
| 208 | +df<-left_join(df,med[,c("id_noeud","NB_D201")],by=c("ind"="id_noeud")) |
| 209 | +#Calculate ratios |
| 210 | +df$ratio<-df$NB_D201/df$pop |
| 211 | +``` |
| 212 | + |
| 213 | +###Second step |
| 214 | +```{r,echo=TRUE,message=FALSE} |
| 215 | +#Isochrone around each commune with time limit of 15 minutes (few seconds to compute) |
| 216 | +iso2<-get_isochrone(graph,from=ndcom$id_noeud,lim = 15) |
| 217 | +#Convert list to long data frame |
| 218 | +df2<-stack(setNames(iso2, seq_along(iso2))) |
| 219 | +df2$ind<-rep(names(iso2),times=sapply(iso2,length)) |
| 220 | +#Joining and summing ratios calculated in first step |
| 221 | +df2<-left_join(df2,df[,c("ind","ratio")],by=c("values"="ind")) |
| 222 | +df2<-df2 %>% group_by(ind) %>% |
| 223 | + summarise(sfca=sum(ratio,na.rm=T)) |
| 224 | +``` |
| 225 | + |
| 226 | +###Plot the map for Bourgogne-Franche-Comte region |
| 227 | +```{r,echo=TRUE,message=FALSE} |
| 228 | +#Joining commune IDs to nodes |
| 229 | +df2<-left_join(df2,ndcom[,c("id_noeud","com")],by=c("ind"="id_noeud")) |
| 230 | +#Joining 2SFCA to shapefile |
| 231 | +com<-left_join(com,df2[,c("com","sfca")],by=c("INSEE_COM"="com")) |
| 232 | +#Plot for one region |
| 233 | +p<-ggplot()+ |
| 234 | + geom_sf(data=com[com$NOM_REG=="BOURGOGNE-FRANCHE-COMTE",],aes(fill=sfca),colour=NA)+ |
| 235 | + coord_sf(datum=NA)+ |
| 236 | + scale_fill_gradient(low="#BB2528",high = "#FFFF66")+ |
| 237 | + labs(fill="2SFCA")+ |
| 238 | + ggtitle("2SFCA applied to general practitioners") |
| 239 | +p |
| 240 | +``` |
| 241 | + |
| 242 | +##Application 2 : Calculate the minimum travel time to the closest maternity ward in France |
| 243 | +```{r,echo=TRUE,message=FALSE} |
| 244 | +#Import materinty ward locations |
| 245 | +maternity<-read.csv("maternity.csv",colClasses = c("character","numeric")) |
| 246 | +``` |
| 247 | + |
| 248 | +###Shortest travel time matrix |
| 249 | +The shortest travel time is computed with the `cppRouting` function `get_distance_matrix`. |
| 250 | +In order to compute multiple distances from one source, original uni-directional Dijkstra algorithm is ran without early stopping. |
| 251 | +We compute travel time from all commune nodes to all maternity ward nodes (e.g ~36000*400 distances). |
| 252 | +```{r,echo=TRUE,message=FALSE,warning=FALSE,include=TRUE} |
| 253 | +#Distance matrix (around 10 minutes to compute) |
| 254 | +dists<-get_distance_matrix(graph, |
| 255 | + from=ndcom$id_noeud, |
| 256 | + to=ndcom$id_noeud[ndcom$com %in% maternity$CODGEO], |
| 257 | + allcores=TRUE) |
| 258 | +#We extract each minimum travel time for all the communes |
| 259 | +dists2<-data.frame(node=ndcom$id_noeud,mindist=apply(dists,1,min,na.rm=T)) |
| 260 | +#Joining commune IDs to nodes |
| 261 | +dists2<-left_join(dists2,ndcom[,c("id_noeud","com")],by=c("node"="id_noeud")) |
| 262 | +#Joining minimum travel time to the shapefile |
| 263 | +com<-left_join(com,dists2[,c("com","mindist")],by=c("INSEE_COM"="com")) |
| 264 | +``` |
| 265 | + |
| 266 | +#Plot the map of minimum travel time in Bourgogne-Franche-Comte region |
| 267 | +```{r,echo=TRUE,message=FALSE} |
| 268 | +p<-ggplot()+ |
| 269 | + geom_sf(data=com[com$NOM_REG=="BOURGOGNE-FRANCHE-COMTE",],aes(fill=mindist),colour=NA)+ |
| 270 | + coord_sf(datum=NA)+ |
| 271 | + scale_fill_gradient(low="#009900",high="#003300")+ |
| 272 | + labs(fill="Minutes")+ |
| 273 | + ggtitle("Travel time to the closest maternity ward") |
| 274 | +p |
| 275 | +``` |
| 276 | + |
| 277 | + |
| 278 | +#Benchmark with other R packages |
| 279 | +To show the efficiency of `cppRouting`, we can make some benchmarking with the famous R package `igraph`, and the `dodgr` package which provide highly optimized heaps. |
| 280 | + |
| 281 | +###Distance matrix : one core |
| 282 | +```{r,echo=TRUE,warning=FALSE,message=FALSE} |
| 283 | +library(igraph) |
| 284 | +library(dodgr) |
| 285 | +#Sampling 1000 random origin/destination nodes (1000000 distances to compute) |
| 286 | +origin<-sample(unique(roads$from),1000,replace = F) |
| 287 | +destination<-sample(unique(roads$from),1000,replace = F) |
| 288 | +``` |
| 289 | + |
| 290 | +```{r, echo=TRUE,warning=FALSE} |
| 291 | +#igraph |
| 292 | +graph_igraph<-graph_from_data_frame(roads,directed = TRUE) |
| 293 | +
|
| 294 | +system.time( |
| 295 | + test_igraph<-distances(graph_igraph,origin,to=destination,weights = E(graph_igraph)$weight,mode="out") |
| 296 | +) |
| 297 | +
|
| 298 | +
|
| 299 | +#dodgr |
| 300 | +#Adding coordinates to data |
| 301 | +roads2<-roads |
| 302 | +colnames(roads2)[3]<-"dist" |
| 303 | +roads2<-left_join(roads2,coord,by=c("from"="ID")) |
| 304 | +colnames(roads2)[4:5]<-c("from_lon","from_lat") |
| 305 | +roads2<-left_join(roads2,coord,by=c("to"="ID")) |
| 306 | +colnames(roads2)[6:7]<-c("to_lon","to_lat") |
| 307 | +colnames(roads2)[1:2]<-c("from_id","to_id") |
| 308 | +roads2$from_id<-as.character(roads2$from_id) |
| 309 | +roads2$to_id<-as.character(roads2$to_id) |
| 310 | +
|
| 311 | +system.time( |
| 312 | +test_dodgr<-dodgr_dists(graph=data.frame(roads2),from=origin,to=destination,parallel=FALSE) |
| 313 | +) |
| 314 | +
|
| 315 | +#cppRouting |
| 316 | +system.time( |
| 317 | +test_cpp<-get_distance_matrix(graph,origin,destination,allcores = FALSE) |
| 318 | +) |
| 319 | +``` |
| 320 | +####Ouput |
| 321 | +```{r,echo=TRUE} |
| 322 | +head(cbind(test_igraph[,1],test_dodgr[,1],test_cpp[,1])) |
| 323 | +``` |
| 324 | + |
| 325 | +###Distance matrix : parallel |
| 326 | +```{r,echo=TRUE,warning=FALSE} |
| 327 | +#dodgr |
| 328 | +system.time( |
| 329 | +test_dodgr<-dodgr_dists(graph=data.frame(roads2),from=origin,to=destination,parallel=TRUE) |
| 330 | +) |
| 331 | +
|
| 332 | +#cppRouting |
| 333 | +system.time( |
| 334 | +test_cpp<-get_distance_matrix(graph,origin,destination,allcores = TRUE) |
| 335 | +) |
| 336 | +``` |
| 337 | + |
| 338 | +##Benchmark on computing shortest paths by pairs |
| 339 | +```{r,echo=TRUE,warning=FALSE} |
| 340 | +#Sampling 500 random origin/destination nodes |
| 341 | +origin<-sample(unique(roads$from),500,replace = F) |
| 342 | +destination<-sample(unique(roads$from),500,replace = F) |
| 343 | +#dodgr |
| 344 | +system.time( |
| 345 | +test_dodgr<-dodgr_paths(graph=data.frame(roads2),from=origin,to=destination,pairwise = TRUE) |
| 346 | +) |
| 347 | +
|
| 348 | +#cppRouting |
| 349 | +system.time( |
| 350 | +test_cpp<-get_path_pair(graph,origin,destination,algorithm = "NBA",constant=110/0.06) |
| 351 | +) |
| 352 | +``` |
| 353 | +###Test similarity of the first travel |
| 354 | +```{r,echo=TRUE,warning=FALSE} |
| 355 | +#Number of nodes |
| 356 | +length(test_dodgr[[1]][[1]]) |
| 357 | +length(test_cpp[[1]]) |
| 358 | +
|
| 359 | +#Setdiff |
| 360 | +setdiff(test_dodgr[[1]][[1]],test_cpp[[1]]) |
| 361 | +
|
| 362 | +``` |
| 363 | + |
| 364 | +#New algorithms `cppRouting` will provide in the future |
| 365 | + |
| 366 | +- Detours admitting shortest paths : finding the nodes that are reachable under a fixed detour time around the shortest path |
| 367 | +- Graph simplification by removing irrelevant nodes in order to compute in a faster way the shortest distance or travel time |
| 368 | +- Contraction hierarchies implementation |
0 commit comments