|
| 1 | +# Latency map of the internet |
| 2 | + |
| 3 | +Here we process data from the [RIPE Atlas](https://www.ripe.net/analyse/internet-measurements/ripe-atlas/), which is one of the largest internet measurement networks. The result is a data table of ASN-to-ASN round-trip times (RTT) for pings. Each [Autonomous System](https://en.wikipedia.org/wiki/Autonomous_system_(Internet)) is identified by an ASN (autonomous system number). |
| 4 | + |
| 5 | +These data have been used for the latencies in the mock up of a synthetic version of the Cardano `mainnet`, for use in simulation studies of Ouroboros protocol design and development. |
| 6 | + |
| 7 | + |
| 8 | +## Data files |
| 9 | + |
| 10 | +- [asn\_rtt\_stat.csv.gz](asn_rtt_stat.csv.gz) ASN-to-ASN latencies |
| 11 | +- [country\_rtt\_stat.csv.gz](country_rtt_stat.csv.gz) country-to-country latencies |
| 12 | +- [intra\_rtt\_stat.csv.gz](intra_rtt_stat.csv.gz) intra-ASN latencies |
| 13 | + |
| 14 | + |
| 15 | +## Data dictionary |
| 16 | + |
| 17 | +Latencies are assumed to be symmetical, so the data files only record the "upper triangle" of the latency matrix. |
| 18 | + |
| 19 | +| Field | Units | Description | |
| 20 | +|----------------|----------|--------------------------------------------------------------------| |
| 21 | +| `asn1`, `asn2` | n/a | The autonomous system numbers between which latencies are measured | |
| 22 | +| `cty1`, `cty2` | n/a | The countries between which latencies are measured | |
| 23 | +| `rtt_cnt` | unitless | The number of pings in the raw dataset. | |
| 24 | +| `rtt_avg` | ms | The round-trip time (RTT) for ping between the two locations. | |
| 25 | +| `rtt_std` | ms | The sample standard deviation of the RTT ping measurements. | |
| 26 | +| `rtt_min` | ms | The minimum of the RTT ping measurments in th sample. | |
| 27 | +| `rtt_max` | ms | The maximum of the RTT ping measurements in the sample. | |
| 28 | + |
| 29 | + |
| 30 | +## Summary statistics |
| 31 | + |
| 32 | + |
| 33 | +### ASN to ASN |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | +### Country to country |
| 39 | + |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +### Intra-ASN |
| 44 | + |
| 45 | +| Mean | Standard deviation | Minimum | Maximum | |
| 46 | +|--------:|-------------------:|--------:|------------:| |
| 47 | +| 80.4 ms | 103.5 ms | 0 ms | 249625.7 ms | |
| 48 | + |
| 49 | + |
| 50 | +## Data processing |
| 51 | + |
| 52 | +This section provides a reproducible recipe for creating or updating the data files. |
| 53 | + |
| 54 | + |
| 55 | +### Schema |
| 56 | + |
| 57 | +Use PostgreSQL to process the data. |
| 58 | + |
| 59 | +First we create a schema for data processing. |
| 60 | + |
| 61 | +```sql |
| 62 | +create schema asn; |
| 63 | +``` |
| 64 | + |
| 65 | + |
| 66 | +### Ping measurements |
| 67 | + |
| 68 | +Download the `ping` measurements for several days from the RIPE Atlas. The origin, destination, and RTT measurements are extracted from these files. |
| 69 | + |
| 70 | +```bash |
| 71 | +for d in 2025-03-29 2025-03-30 2025-03-31 2025-04-18 2025-04-19 |
| 72 | +do |
| 73 | + for t in 0000 0100 0200 0300 0400 0500 0600 0700 0800 0900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 |
| 74 | + do |
| 75 | + curl -sS "https://data-store.ripe.net/datasets/atlas-daily-dumps/${d}/ping-${d}T${t}.bz2" \ |
| 76 | + | bunzip2 -c \ |
| 77 | + | jq -r ' |
| 78 | + select(.result) |
| 79 | + | select(.src_addr) |
| 80 | + | select(.dst_addr) |
| 81 | + | select(.src_addr | contains(".")) |
| 82 | + | select(.dst_addr | contains(".")) |
| 83 | + | .src_addr as $src |
| 84 | + | .dst_addr as $dst |
| 85 | + | [ |
| 86 | + .result.[] |
| 87 | + | select(.rtt) |
| 88 | + | ($src + "," + $dst + "," + (.rtt | tostring)) |
| 89 | + ] |
| 90 | + | .[] |
| 91 | + ' \ |
| 92 | + | pigz -9c \ |
| 93 | + > "ping-${d}T${t}.gz" |
| 94 | + sleep 10s |
| 95 | + done |
| 96 | +done |
| 97 | +``` |
| 98 | + |
| 99 | +Create a table for the `ping` measurements. |
| 100 | + |
| 101 | +```sql |
| 102 | +create table asn.ping ( |
| 103 | + src inet |
| 104 | +, dst inet |
| 105 | +, rtt real |
| 106 | +); |
| 107 | +``` |
| 108 | + |
| 109 | +Load the data extracts into the table. |
| 110 | + |
| 111 | +```bash |
| 112 | +for f in ping-*T*.gz |
| 113 | +do |
| 114 | + echo "$f" |
| 115 | + zcat $f > tmp.csv |
| 116 | + psql -c "\\copy asn.ping from 'tmp.csv' csv" |
| 117 | +done |
| 118 | +rm tmp.csv |
| 119 | +``` |
| 120 | + |
| 121 | +```sql |
| 122 | +select count(*) from asn.ping; |
| 123 | +``` |
| 124 | + |
| 125 | +```console |
| 126 | + count |
| 127 | +------------ |
| 128 | + 2480314130 |
| 129 | +(1 row) |
| 130 | +``` |
| 131 | + |
| 132 | + |
| 133 | +### IP ranges for ASNs |
| 134 | + |
| 135 | +The [iptoasn](https://iptoasn.com) web site provides a table of the IP ranges corresponding to each ASN. (Note that the RIPE Atlas's *probe* data file does not contain sufficient information for reconstructing destination ASNs in the `ping` data, wo we use this alternative dataset.) |
| 136 | + |
| 137 | +Download the IP to ASN correspondence table. |
| 138 | + |
| 139 | +```bash |
| 140 | +curl -sS -o ip2asn-v4.tsv.gz https://iptoasn.com/data/ip2asn-v4.tsv.gz |
| 141 | +zcat ip2asn-v4.tsv.gz \ |
| 142 | +| gawk ' |
| 143 | +BEGIN { |
| 144 | + FS="\t" |
| 145 | + OFS="," |
| 146 | + print "start", "end", "asn", "country", "label" |
| 147 | +} |
| 148 | +{ |
| 149 | + print $1, $2, $3, $4, "\"" $5 "\"" |
| 150 | +} |
| 151 | +' \ |
| 152 | +> ip2asn.csv |
| 153 | +``` |
| 154 | + |
| 155 | +Load that download into a table. |
| 156 | + |
| 157 | +```sql |
| 158 | +create table asn.ip2asn ( |
| 159 | + ip_start inet |
| 160 | +, ip_end inet |
| 161 | +, asn bigint |
| 162 | +, country varchar(8) |
| 163 | +, organization text |
| 164 | +); |
| 165 | + |
| 166 | +\copy asn.ip2asn from 'ip2asn.csv' csv header |
| 167 | +``` |
| 168 | + |
| 169 | +```console |
| 170 | +INSERT 0 56 |
| 171 | +``` |
| 172 | + |
| 173 | + |
| 174 | +### Known IP addresses |
| 175 | + |
| 176 | +For efficiency we create a table of all of the IP addresses seen in the `ping` data. We do this because we want to only perform the costly `BETWEEN` query on IP addresses a single time. |
| 177 | + |
| 178 | +```sql |
| 179 | +create table asn.addr as |
| 180 | +select src as ip from asn.ping |
| 181 | +union |
| 182 | +select dst from asn.ping |
| 183 | +; |
| 184 | +``` |
| 185 | + |
| 186 | +```console |
| 187 | +SELECT 73754 |
| 188 | +``` |
| 189 | + |
| 190 | + |
| 191 | +### Correspondence tables for know IP addresses |
| 192 | + |
| 193 | +Now identify which ASN(s) correspond to each IP address and note the country code. |
| 194 | + |
| 195 | +```sql |
| 196 | +create table asn.ips as |
| 197 | +select |
| 198 | + ip |
| 199 | + , asn |
| 200 | + , country |
| 201 | + from asn.addr |
| 202 | + inner join asn.ip2asn |
| 203 | + on ip between ip_start and ip_end |
| 204 | +; |
| 205 | +``` |
| 206 | + |
| 207 | +```console |
| 208 | +SELECT 82710 |
| 209 | +``` |
| 210 | + |
| 211 | + |
| 212 | +## RTT statistics |
| 213 | + |
| 214 | +Summarize the `ping` statistics. We tabulate the minimum, maximum, mean, and standard deviation so that we can later sample from a truncated Gaussian distribution when generating synthetic data. |
| 215 | + |
| 216 | + |
| 217 | +### ASNs |
| 218 | + |
| 219 | +```sql |
| 220 | +create table asn.asn_stat as |
| 221 | +select |
| 222 | + least(src.asn, dst.asn) as asn1 |
| 223 | + , greatest(src.asn, dst.asn) as asn2 |
| 224 | + , count(*) as rtt_cnt |
| 225 | + , avg(rtt) as rtt_avg |
| 226 | + , stddev(rtt) as rtt_std |
| 227 | + , min(rtt) as rtt_min |
| 228 | + , max(rtt) as rtt_max |
| 229 | + from asn.ping as ping |
| 230 | + inner join asn.ips as src |
| 231 | + on src.ip = ping.src |
| 232 | + inner join asn.ips as dst |
| 233 | + on dst.ip = ping.dst |
| 234 | + group by 1, 2 |
| 235 | +; |
| 236 | + |
| 237 | +\copy asn.asn_stat to 'asn_rtt_stat.csv' csv header |
| 238 | +``` |
| 239 | + |
| 240 | +```console |
| 241 | +COPY 422372 |
| 242 | +``` |
| 243 | + |
| 244 | +```bash |
| 245 | +gzip -9v asn_rtt_stat.csv |
| 246 | +``` |
| 247 | + |
| 248 | + |
| 249 | +### Countries |
| 250 | + |
| 251 | +```sql |
| 252 | +create table asn.cty_stat as |
| 253 | +select |
| 254 | + least(src.country, dst.country) as cty1 |
| 255 | + , greatest(src.country, dst.country) as cty2 |
| 256 | + , count(*) as rtt_cnt |
| 257 | + , avg(rtt) as rtt_avg |
| 258 | + , stddev(rtt) as rtt_std |
| 259 | + , min(rtt) as rtt_min |
| 260 | + , max(rtt) as rtt_max |
| 261 | + from asn.ping as ping |
| 262 | + inner join asn.ips as src |
| 263 | + on src.ip = ping.src |
| 264 | + inner join asn.ips as dst |
| 265 | + on dst.ip = ping.dst |
| 266 | + group by 1, 2 |
| 267 | +; |
| 268 | + |
| 269 | +\copy asn.cty_stat to 'country_rtt_stat.csv' csv header |
| 270 | +``` |
| 271 | + |
| 272 | +```console |
| 273 | +COPY 6827 |
| 274 | +``` |
| 275 | + |
| 276 | +```bash |
| 277 | +gzip -9v country_rtt_stat.csv |
| 278 | +``` |
| 279 | + |
| 280 | + |
| 281 | +### Intra-ASN |
| 282 | + |
| 283 | +```sql |
| 284 | +create table asn.intra_stat as |
| 285 | +select |
| 286 | + count(*) as rtt_cnt |
| 287 | + , avg(rtt) as rtt_avg |
| 288 | + , stddev(rtt) as rtt_std |
| 289 | + , min(rtt) as rtt_min |
| 290 | + , max(rtt) as rtt_max |
| 291 | + from asn.ping as ping |
| 292 | + inner join asn.ips as src |
| 293 | + on src.ip = ping.src |
| 294 | + inner join asn.ips as dst |
| 295 | + on dst.ip = ping.dst |
| 296 | + where src.country = dst.country |
| 297 | +; |
| 298 | + |
| 299 | +\copy asn.intra_stat to 'intra_rtt_stat.csv' csv header |
| 300 | +``` |
| 301 | + |
| 302 | +```console |
| 303 | +COPY 1 |
| 304 | +``` |
| 305 | + |
| 306 | +```bash |
| 307 | +gzip -9v intra_rtt_stat.csv |
| 308 | +``` |
| 309 | + |
0 commit comments