Skip to content

Commit 82811b0

Browse files
authored
Add files via upload
1 parent 00a829a commit 82811b0

File tree

2 files changed

+424
-42
lines changed

2 files changed

+424
-42
lines changed

README.Rmd

Lines changed: 368 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,368 @@
1+
---
2+
title: "cppRouting package"
3+
author: "Vincent LARMET"
4+
date: "15 juin 2019"
5+
output: github_document
6+
---
7+
8+
```{r setup, include=FALSE}
9+
knitr::opts_chunk$set(echo = TRUE)
10+
knitr::opts_knit$set(root.dir = 'D:/users/vlarmet/Dijkstra/exemple/data')
11+
```
12+
13+
#Package presentation
14+
15+
`cppRouting` is an `R` package which provide functions to calculate distances, shortest paths and isochrones/isodistances on non-negative weighted graphs.
16+
For now, `cppRouting` can implement :
17+
18+
- uni-directional Dijkstra algorithm,
19+
- bi-directional Dijkstra algorithm,
20+
- uni-directional A* algorithm
21+
- New bi-directional A* algorithm (Piljs & Post, 2009 : see http://repub.eur.nl/pub/16100/ei2009-10.pdf)
22+
23+
All these functions are written in C++ and use std::priority_queue container from the Standard Template Library.
24+
This package have been made with `Rcpp` and `parallel` packages.
25+
26+
#Install
27+
###Stable version from CRAN
28+
```{r,eval=FALSE}
29+
install.packages("cppRouting")
30+
```
31+
###or from github
32+
```{r,eval=FALSE}
33+
library(devtools)
34+
devtools::install_github("vlarmet/cppRouting")
35+
```
36+
#Data
37+
The data presented here is the official french road network describing over 500000 km of roads.
38+
All data used in this README are free and can be downloaded here :
39+
40+
- roads : http://professionnels.ign.fr/route500
41+
- general practitioners location : https://www.insee.fr/fr/statistiques/3568614?sommaire=3568656#consulter
42+
- maternity wards location : https://www.insee.fr/fr/statistiques/3568611?sommaire=3568656#dictionnaire
43+
- shapefile of the ~36000 communes in France : http://professionnels.ign.fr/adminexpress
44+
45+
Graph data have been preprocessed for more readability (see data_preparation.R).
46+
47+
The final graph is composed of 234615 nodes and 685118 edges.
48+
Data has to be a 3 columns data.frame or matrix containing from, to and a cost/distance column. Here the cost is the time needed to travel in each edges (in minutes). From and to are vertices IDs (character or numeric).
49+
50+
#Main functions
51+
52+
`cppRouting` package provide these functions :
53+
54+
- `get_distance_matrix` : compute distance matrix (between all combinations origin-destination nodes - *one-to-many*),
55+
- `get_distance_pair` : compute distances between origin and destination by pair (*one-to-one*),
56+
- `get_path_pair` : compute shortest paths between origin and destination by pair (*one-to-one*),
57+
- `get_multi_paths` : compute shortest paths between all origin nodes and all destination nodes (*one-to-many*),
58+
- `get_isochrone` : compute isochrones/isodistances with one or multiple breaks.
59+
60+
The choice between all the algorithms is available for *one-to-one* calculation like `get_distance_pair` and `get_path_pair`.
61+
In these functions, uni-directional Dijkstra algorithm is stopped when the destination node is reached.
62+
`A*` and `NBA*` are relevant if geographic coordinates of all nodes are provided. Note that coordinates should be expressed in a **projection system**.
63+
To be accurate and efficient, `A*` and `NBA*` algorithms should use an admissible heuristic function (here the Euclidean distance), e.g cost and heuristic function must be expressed in the same unit.
64+
In `cppRouting`, heuristic function `h` is defined such that : h(xi,yi,xdestination,ydestination)/k, with a constant k; so in the case where coordinates are expressed in meters and cost is expressed in time, k is the maximum speed allowed on the road. By default, constant is 1 and is designed for graphs with cost expressed in the same unit than coordinates (for example in meters).
65+
66+
If coordinates cannot be provided, bi-directional Dijkstra algorithm can offer a good alternative to A* in terms of performance.
67+
68+
##Examples
69+
###Prepare data
70+
```{r pressure, echo=TRUE,message=FALSE}
71+
library(cppRouting)
72+
library(dplyr)
73+
library(sf)
74+
library(ggplot2)
75+
library(concaveman)
76+
library(ggmap)
77+
78+
#Reading french road data
79+
roads<-read.csv("roads.csv",colClasses = c("character","character","numeric"))
80+
#Shapefile data of communes
81+
com<-read_sf("com_simplified_geom.shp")
82+
#Correspondance file between communes and nodes in the graph
83+
ndcom<-read.csv("node_commune.csv",colClasses = c("character","character","numeric"))
84+
#General practitioners locations
85+
med<-read.csv("doctor.csv",colClasses = c("character","numeric","character","numeric"))
86+
#Import nodes coordinates (projected in EPSG : 2154)
87+
coord<-read.csv("coordinates.csv",colClasses = c("character","numeric","numeric"))
88+
```
89+
90+
####Head of road network data
91+
```{r , echo=TRUE,message=FALSE}
92+
head(roads)
93+
```
94+
95+
####Head of coordinates data
96+
```{r , echo=TRUE,message=FALSE}
97+
head(coord)
98+
```
99+
100+
101+
###Instantiate the graph
102+
```{r,echo=TRUE}
103+
#Instantiate a graph with coordinates
104+
graph<-makegraph(roads,directed = T,coords = coord)
105+
```
106+
###Distances by pairs between nodes (*one-to-one*)
107+
####Using uni-directional Dijkstra algorithm
108+
```{r,echo=TRUE}
109+
#Generate 2000 random origin and destination nodes
110+
origin<-sample(roads$from,2000)
111+
destination<-sample(roads$from,2000)
112+
113+
#Uni-directional : single core
114+
system.time(
115+
pair_dijkstra<-get_distance_pair(graph,origin,destination)
116+
)
117+
118+
```
119+
####Using bi-directional Dijkstra algorithm
120+
```{r,}
121+
#Bi-directional : single core
122+
system.time(
123+
pair_bidijkstra<-get_distance_pair(graph,origin,destination,algorithm = "bi")
124+
)
125+
126+
```
127+
####Using A* algorithm
128+
Coordinates are defined in meters and max speed is 110km/h; so for the heuristic function to be admissible, the constant equal 110/0.06 :
129+
```{r,echo=TRUE}
130+
#A* single node
131+
system.time(
132+
pair_astar<-get_distance_pair(graph,origin,destination,algorithm = "A*",constant = 110/0.06)
133+
)
134+
135+
```
136+
####Using NBA* algorithm
137+
```{r,echo=TRUE}
138+
#NBA* single node
139+
system.time(
140+
pair_nba<-get_distance_pair(graph,origin,destination,algorithm = "NBA",constant = 110/0.06)
141+
)
142+
143+
```
144+
####Output
145+
```{r,echo=TRUE}
146+
head(cbind(pair_dijkstra,pair_bidijkstra,pair_astar,pair_nba))
147+
148+
```
149+
#####In `get_distance_pair` function, all the algorithms can be ran in parallel by setting TRUE to allcores argument.
150+
151+
152+
###Compute isochrones
153+
Let's compute isochrones around Dijon city
154+
```{r,echo=TRUE,message=FALSE,warning=FALSE}
155+
#Compute isochrones
156+
iso<-get_isochrone(graph,from = "205793",lim = c(15,25,45,60,90,120))
157+
#Convert nodes to concave polygons with concaveman package
158+
poly<-lapply(iso[[1]],function(x){
159+
x<-data.frame(noeuds=x,stringsAsFactors = F)
160+
x<-left_join(x,coord,by=c("noeuds"="ID"))
161+
return(concaveman(summarise(st_as_sf(x,coords=c("X","Y"),crs=2154))))
162+
})
163+
164+
poly<-do.call(rbind,poly)
165+
poly$time<-as.factor(names(iso[[1]]))
166+
#Multipolygon
167+
poly2<-st_cast(poly,"MULTIPOLYGON")
168+
poly2$time<-reorder(poly2$time,c(120,90,60,45,25,15))
169+
#Reproject for plotting
170+
poly2<-st_transform(poly2,"+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
171+
#Import map backgroung
172+
dijon=get_map(location=c(lon=5.041140,lat=47.323025),zoom=7, source="google",maptype = "toner-2010")
173+
#Plot the map
174+
p<-ggmap(dijon)+
175+
geom_sf(data=poly2,aes(fill=time),alpha=.8,inherit.aes = FALSE)+
176+
scale_fill_brewer(palette = "YlOrRd")+
177+
labs(fill="Minutes")+
178+
ggtitle("Isochrones around Dijon")+
179+
theme(axis.text.x = element_blank(),
180+
axis.text.y = element_blank(),
181+
axis.ticks = element_blank(),
182+
axis.title.y=element_blank(),axis.title.x=element_blank())
183+
p
184+
185+
```
186+
187+
188+
#Applications
189+
## Application 1 : Calculate Two Step Floating Catchment Areas (2SFCA) of general practitioners in France
190+
191+
2SFCA method is explained here : https://en.wikipedia.org/wiki/Two-step_floating_catchment_area_method
192+
193+
194+
###First step
195+
Isochrones are calculated with the `cppRouting` function `get_isochrone`
196+
```{r,echo=TRUE}
197+
#Isochrone around doctor locations with time limit of 15 minutes
198+
iso<-get_isochrone(graph,from = ndcom[ndcom$com %in% med$CODGEO,"id_noeud"],lim = 15)
199+
#Convert list to long data frame
200+
df<-stack(setNames(iso, seq_along(iso)))
201+
df$ind<-rep(names(iso),times=sapply(iso,length))
202+
df<-df[df$values %in% ndcom$id_noeud,]
203+
#Joining and summing population located in each isochrone
204+
df<-left_join(df,ndcom[,c("id_noeud","POPULATION")],by=c("values"="id_noeud"))
205+
df<-df %>% group_by(ind) %>%
206+
summarise(pop=sum(POPULATION))
207+
#Joining number of doctors
208+
df<-left_join(df,med[,c("id_noeud","NB_D201")],by=c("ind"="id_noeud"))
209+
#Calculate ratios
210+
df$ratio<-df$NB_D201/df$pop
211+
```
212+
213+
###Second step
214+
```{r,echo=TRUE,message=FALSE}
215+
#Isochrone around each commune with time limit of 15 minutes (few seconds to compute)
216+
iso2<-get_isochrone(graph,from=ndcom$id_noeud,lim = 15)
217+
#Convert list to long data frame
218+
df2<-stack(setNames(iso2, seq_along(iso2)))
219+
df2$ind<-rep(names(iso2),times=sapply(iso2,length))
220+
#Joining and summing ratios calculated in first step
221+
df2<-left_join(df2,df[,c("ind","ratio")],by=c("values"="ind"))
222+
df2<-df2 %>% group_by(ind) %>%
223+
summarise(sfca=sum(ratio,na.rm=T))
224+
```
225+
226+
###Plot the map for Bourgogne-Franche-Comte region
227+
```{r,echo=TRUE,message=FALSE}
228+
#Joining commune IDs to nodes
229+
df2<-left_join(df2,ndcom[,c("id_noeud","com")],by=c("ind"="id_noeud"))
230+
#Joining 2SFCA to shapefile
231+
com<-left_join(com,df2[,c("com","sfca")],by=c("INSEE_COM"="com"))
232+
#Plot for one region
233+
p<-ggplot()+
234+
geom_sf(data=com[com$NOM_REG=="BOURGOGNE-FRANCHE-COMTE",],aes(fill=sfca),colour=NA)+
235+
coord_sf(datum=NA)+
236+
scale_fill_gradient(low="#BB2528",high = "#FFFF66")+
237+
labs(fill="2SFCA")+
238+
ggtitle("2SFCA applied to general practitioners")
239+
p
240+
```
241+
242+
##Application 2 : Calculate the minimum travel time to the closest maternity ward in France
243+
```{r,echo=TRUE,message=FALSE}
244+
#Import materinty ward locations
245+
maternity<-read.csv("maternity.csv",colClasses = c("character","numeric"))
246+
```
247+
248+
###Shortest travel time matrix
249+
The shortest travel time is computed with the `cppRouting` function `get_distance_matrix`.
250+
In order to compute multiple distances from one source, original uni-directional Dijkstra algorithm is ran without early stopping.
251+
We compute travel time from all commune nodes to all maternity ward nodes (e.g ~36000*400 distances).
252+
```{r,echo=TRUE,message=FALSE,warning=FALSE,include=TRUE}
253+
#Distance matrix (around 10 minutes to compute)
254+
dists<-get_distance_matrix(graph,
255+
from=ndcom$id_noeud,
256+
to=ndcom$id_noeud[ndcom$com %in% maternity$CODGEO],
257+
allcores=TRUE)
258+
#We extract each minimum travel time for all the communes
259+
dists2<-data.frame(node=ndcom$id_noeud,mindist=apply(dists,1,min,na.rm=T))
260+
#Joining commune IDs to nodes
261+
dists2<-left_join(dists2,ndcom[,c("id_noeud","com")],by=c("node"="id_noeud"))
262+
#Joining minimum travel time to the shapefile
263+
com<-left_join(com,dists2[,c("com","mindist")],by=c("INSEE_COM"="com"))
264+
```
265+
266+
#Plot the map of minimum travel time in Bourgogne-Franche-Comte region
267+
```{r,echo=TRUE,message=FALSE}
268+
p<-ggplot()+
269+
geom_sf(data=com[com$NOM_REG=="BOURGOGNE-FRANCHE-COMTE",],aes(fill=mindist),colour=NA)+
270+
coord_sf(datum=NA)+
271+
scale_fill_gradient(low="#009900",high="#003300")+
272+
labs(fill="Minutes")+
273+
ggtitle("Travel time to the closest maternity ward")
274+
p
275+
```
276+
277+
278+
#Benchmark with other R packages
279+
To show the efficiency of `cppRouting`, we can make some benchmarking with the famous R package `igraph`, and the `dodgr` package which provide highly optimized heaps.
280+
281+
###Distance matrix : one core
282+
```{r,echo=TRUE,warning=FALSE,message=FALSE}
283+
library(igraph)
284+
library(dodgr)
285+
#Sampling 1000 random origin/destination nodes (1000000 distances to compute)
286+
origin<-sample(unique(roads$from),1000,replace = F)
287+
destination<-sample(unique(roads$from),1000,replace = F)
288+
```
289+
290+
```{r, echo=TRUE,warning=FALSE}
291+
#igraph
292+
graph_igraph<-graph_from_data_frame(roads,directed = TRUE)
293+
294+
system.time(
295+
test_igraph<-distances(graph_igraph,origin,to=destination,weights = E(graph_igraph)$weight,mode="out")
296+
)
297+
298+
299+
#dodgr
300+
#Adding coordinates to data
301+
roads2<-roads
302+
colnames(roads2)[3]<-"dist"
303+
roads2<-left_join(roads2,coord,by=c("from"="ID"))
304+
colnames(roads2)[4:5]<-c("from_lon","from_lat")
305+
roads2<-left_join(roads2,coord,by=c("to"="ID"))
306+
colnames(roads2)[6:7]<-c("to_lon","to_lat")
307+
colnames(roads2)[1:2]<-c("from_id","to_id")
308+
roads2$from_id<-as.character(roads2$from_id)
309+
roads2$to_id<-as.character(roads2$to_id)
310+
311+
system.time(
312+
test_dodgr<-dodgr_dists(graph=data.frame(roads2),from=origin,to=destination,parallel=FALSE)
313+
)
314+
315+
#cppRouting
316+
system.time(
317+
test_cpp<-get_distance_matrix(graph,origin,destination,allcores = FALSE)
318+
)
319+
```
320+
####Ouput
321+
```{r,echo=TRUE}
322+
head(cbind(test_igraph[,1],test_dodgr[,1],test_cpp[,1]))
323+
```
324+
325+
###Distance matrix : parallel
326+
```{r,echo=TRUE,warning=FALSE}
327+
#dodgr
328+
system.time(
329+
test_dodgr<-dodgr_dists(graph=data.frame(roads2),from=origin,to=destination,parallel=TRUE)
330+
)
331+
332+
#cppRouting
333+
system.time(
334+
test_cpp<-get_distance_matrix(graph,origin,destination,allcores = TRUE)
335+
)
336+
```
337+
338+
##Benchmark on computing shortest paths by pairs
339+
```{r,echo=TRUE,warning=FALSE}
340+
#Sampling 500 random origin/destination nodes
341+
origin<-sample(unique(roads$from),500,replace = F)
342+
destination<-sample(unique(roads$from),500,replace = F)
343+
#dodgr
344+
system.time(
345+
test_dodgr<-dodgr_paths(graph=data.frame(roads2),from=origin,to=destination,pairwise = TRUE)
346+
)
347+
348+
#cppRouting
349+
system.time(
350+
test_cpp<-get_path_pair(graph,origin,destination,algorithm = "NBA",constant=110/0.06)
351+
)
352+
```
353+
###Test similarity of the first travel
354+
```{r,echo=TRUE,warning=FALSE}
355+
#Number of nodes
356+
length(test_dodgr[[1]][[1]])
357+
length(test_cpp[[1]])
358+
359+
#Setdiff
360+
setdiff(test_dodgr[[1]][[1]],test_cpp[[1]])
361+
362+
```
363+
364+
#New algorithms `cppRouting` will provide in the future
365+
366+
- Detours admitting shortest paths : finding the nodes that are reachable under a fixed detour time around the shortest path
367+
- Graph simplification by removing irrelevant nodes in order to compute in a faster way the shortest distance or travel time
368+
- Contraction hierarchies implementation

0 commit comments

Comments
 (0)