Skip to content

Commit 01fc589

Browse files
committed
Merge #16702: p2p: supplying and using asmap to improve IP bucketing in addrman
3c1bc40 Add extra logging of asmap use and bucketing (Gleb Naumenko) e4658aa Return mapped AS in RPC call getpeerinfo (Gleb Naumenko) ec45646 Integrate ASN bucketing in Addrman and add tests (Gleb Naumenko) 8feb4e4 Add asmap utility which queries a mapping (Gleb Naumenko) Pull request description: This PR attempts to solve the problem explained in #16599. A particular attack which encouraged us to work on this issue is explained here [[Erebus Attack against Bitcoin Peer-to-Peer Network](https://erebus-attack.comp.nus.edu.sg/)] (by @muoitranduc) Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided. A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!). Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach). In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed. I believe that alternative selective re-bucketing for only updated ranges would require substantial changes. TODO: - ~~more unit tests~~ - ~~find a way to test the code without including >1 MB mapping file in the repo.~~ - find a way to check that mapping file is not corrupted (checksum?) - comments and separate tests for asmap.cpp - make python code for .map generation public - figure out asmap distribution (?) ~Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?~ ACKs for top commit: laanwj: re-ACK 3c1bc40 jamesob: ACK 3c1bc40 ([`jamesob/ackr/16702.3.naumenkogs.p2p_supplying_and_using`](https://github.com/jamesob/bitcoin/tree/ackr/16702.3.naumenkogs.p2p_supplying_and_using)) jonatack: ACK 3c1bc40 Tree-SHA512: e2dc6171188d5cdc2ab2c022fa49ed73a14a0acb8ae4c5ffa970172a0365942a249ad3d57e5fb134bc156a3492662c983f74bd21e78d316629dcadf71576800c
2 parents c434282 + 3c1bc40 commit 01fc589

File tree

15 files changed

+631
-93
lines changed

15 files changed

+631
-93
lines changed

src/Makefile.am

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,7 @@ BITCOIN_CORE_H = \
210210
txmempool.h \
211211
ui_interface.h \
212212
undo.h \
213+
util/asmap.h \
213214
util/bip32.h \
214215
util/bytevectorhash.h \
215216
util/check.h \
@@ -510,6 +511,7 @@ libbitcoin_util_a_SOURCES = \
510511
support/cleanse.cpp \
511512
sync.cpp \
512513
threadinterrupt.cpp \
514+
util/asmap.cpp \
513515
util/bip32.cpp \
514516
util/bytevectorhash.cpp \
515517
util/error.cpp \

src/Makefile.test.include

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,8 @@ JSON_TEST_FILES = \
8282
test/data/tx_invalid.json \
8383
test/data/tx_valid.json
8484

85-
RAW_TEST_FILES =
85+
RAW_TEST_FILES = \
86+
test/data/asmap.raw
8687

8788
GENERATED_TEST_FILES = $(JSON_TEST_FILES:.json=.json.h) $(RAW_TEST_FILES:.raw=.raw.h)
8889

@@ -635,3 +636,12 @@ endif
635636
echo "};};"; \
636637
} > "[email protected]" && mv -f "[email protected]" "$@"
637638
@echo "Generated $@"
639+
640+
%.raw.h: %.raw
641+
@$(MKDIR_P) $(@D)
642+
@{ \
643+
echo "static unsigned const char $(*F)_raw[] = {" && \
644+
$(HEXDUMP) -v -e '8/1 "0x%02x, "' -e '"\n"' $< | $(SED) -e 's/0x ,//g' && \
645+
echo "};"; \
646+
} > "[email protected]" && mv -f "[email protected]" "$@"
647+
@echo "Generated $@"

src/addrman.cpp

Lines changed: 44 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,27 @@
77

88
#include <hash.h>
99
#include <serialize.h>
10+
#include <logging.h>
1011

11-
int CAddrInfo::GetTriedBucket(const uint256& nKey) const
12+
int CAddrInfo::GetTriedBucket(const uint256& nKey, const std::vector<bool> &asmap) const
1213
{
1314
uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetKey()).GetCheapHash();
14-
uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << (hash1 % ADDRMAN_TRIED_BUCKETS_PER_GROUP)).GetCheapHash();
15-
return hash2 % ADDRMAN_TRIED_BUCKET_COUNT;
15+
uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup(asmap) << (hash1 % ADDRMAN_TRIED_BUCKETS_PER_GROUP)).GetCheapHash();
16+
int tried_bucket = hash2 % ADDRMAN_TRIED_BUCKET_COUNT;
17+
uint32_t mapped_as = GetMappedAS(asmap);
18+
LogPrint(BCLog::NET, "IP %s mapped to AS%i belongs to tried bucket %i.\n", ToStringIP(), mapped_as, tried_bucket);
19+
return tried_bucket;
1620
}
1721

18-
int CAddrInfo::GetNewBucket(const uint256& nKey, const CNetAddr& src) const
22+
int CAddrInfo::GetNewBucket(const uint256& nKey, const CNetAddr& src, const std::vector<bool> &asmap) const
1923
{
20-
std::vector<unsigned char> vchSourceGroupKey = src.GetGroup();
21-
uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << vchSourceGroupKey).GetCheapHash();
24+
std::vector<unsigned char> vchSourceGroupKey = src.GetGroup(asmap);
25+
uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup(asmap) << vchSourceGroupKey).GetCheapHash();
2226
uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << vchSourceGroupKey << (hash1 % ADDRMAN_NEW_BUCKETS_PER_SOURCE_GROUP)).GetCheapHash();
23-
return hash2 % ADDRMAN_NEW_BUCKET_COUNT;
27+
int new_bucket = hash2 % ADDRMAN_NEW_BUCKET_COUNT;
28+
uint32_t mapped_as = GetMappedAS(asmap);
29+
LogPrint(BCLog::NET, "IP %s mapped to AS%i belongs to new bucket %i.\n", ToStringIP(), mapped_as, new_bucket);
30+
return new_bucket;
2431
}
2532

2633
int CAddrInfo::GetBucketPosition(const uint256 &nKey, bool fNew, int nBucket) const
@@ -153,7 +160,7 @@ void CAddrMan::MakeTried(CAddrInfo& info, int nId)
153160
assert(info.nRefCount == 0);
154161

155162
// which tried bucket to move the entry to
156-
int nKBucket = info.GetTriedBucket(nKey);
163+
int nKBucket = info.GetTriedBucket(nKey, m_asmap);
157164
int nKBucketPos = info.GetBucketPosition(nKey, false, nKBucket);
158165

159166
// first make space to add it (the existing tried entry there is moved to new, deleting whatever is there).
@@ -169,7 +176,7 @@ void CAddrMan::MakeTried(CAddrInfo& info, int nId)
169176
nTried--;
170177

171178
// find which new bucket it belongs to
172-
int nUBucket = infoOld.GetNewBucket(nKey);
179+
int nUBucket = infoOld.GetNewBucket(nKey, m_asmap);
173180
int nUBucketPos = infoOld.GetBucketPosition(nKey, true, nUBucket);
174181
ClearNew(nUBucket, nUBucketPos);
175182
assert(vvNew[nUBucket][nUBucketPos] == -1);
@@ -233,7 +240,7 @@ void CAddrMan::Good_(const CService& addr, bool test_before_evict, int64_t nTime
233240
return;
234241

235242
// which tried bucket to move the entry to
236-
int tried_bucket = info.GetTriedBucket(nKey);
243+
int tried_bucket = info.GetTriedBucket(nKey, m_asmap);
237244
int tried_bucket_pos = info.GetBucketPosition(nKey, false, tried_bucket);
238245

239246
// Will moving this address into tried evict another entry?
@@ -301,7 +308,7 @@ bool CAddrMan::Add_(const CAddress& addr, const CNetAddr& source, int64_t nTimeP
301308
fNew = true;
302309
}
303310

304-
int nUBucket = pinfo->GetNewBucket(nKey, source);
311+
int nUBucket = pinfo->GetNewBucket(nKey, source, m_asmap);
305312
int nUBucketPos = pinfo->GetBucketPosition(nKey, true, nUBucket);
306313
if (vvNew[nUBucket][nUBucketPos] != nId) {
307314
bool fInsert = vvNew[nUBucket][nUBucketPos] == -1;
@@ -439,7 +446,7 @@ int CAddrMan::Check_()
439446
if (vvTried[n][i] != -1) {
440447
if (!setTried.count(vvTried[n][i]))
441448
return -11;
442-
if (mapInfo[vvTried[n][i]].GetTriedBucket(nKey) != n)
449+
if (mapInfo[vvTried[n][i]].GetTriedBucket(nKey, m_asmap) != n)
443450
return -17;
444451
if (mapInfo[vvTried[n][i]].GetBucketPosition(nKey, false, n) != i)
445452
return -18;
@@ -545,7 +552,7 @@ void CAddrMan::ResolveCollisions_()
545552
CAddrInfo& info_new = mapInfo[id_new];
546553

547554
// Which tried bucket to move the entry to.
548-
int tried_bucket = info_new.GetTriedBucket(nKey);
555+
int tried_bucket = info_new.GetTriedBucket(nKey, m_asmap);
549556
int tried_bucket_pos = info_new.GetBucketPosition(nKey, false, tried_bucket);
550557
if (!info_new.IsValid()) { // id_new may no longer map to a valid address
551558
erase_collision = true;
@@ -609,10 +616,33 @@ CAddrInfo CAddrMan::SelectTriedCollision_()
609616
CAddrInfo& newInfo = mapInfo[id_new];
610617

611618
// which tried bucket to move the entry to
612-
int tried_bucket = newInfo.GetTriedBucket(nKey);
619+
int tried_bucket = newInfo.GetTriedBucket(nKey, m_asmap);
613620
int tried_bucket_pos = newInfo.GetBucketPosition(nKey, false, tried_bucket);
614621

615622
int id_old = vvTried[tried_bucket][tried_bucket_pos];
616623

617624
return mapInfo[id_old];
618625
}
626+
627+
std::vector<bool> CAddrMan::DecodeAsmap(fs::path path)
628+
{
629+
std::vector<bool> bits;
630+
FILE *filestr = fsbridge::fopen(path, "rb");
631+
CAutoFile file(filestr, SER_DISK, CLIENT_VERSION);
632+
if (file.IsNull()) {
633+
LogPrintf("Failed to open asmap file from disk.\n");
634+
return bits;
635+
}
636+
fseek(filestr, 0, SEEK_END);
637+
int length = ftell(filestr);
638+
LogPrintf("Opened asmap file %s (%d bytes) from disk.\n", path, length);
639+
fseek(filestr, 0, SEEK_SET);
640+
char cur_byte;
641+
for (int i = 0; i < length; ++i) {
642+
file >> cur_byte;
643+
for (int bit = 0; bit < 8; ++bit) {
644+
bits.push_back((cur_byte >> bit) & 1);
645+
}
646+
}
647+
return bits;
648+
}

src/addrman.h

Lines changed: 76 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,17 @@
1212
#include <sync.h>
1313
#include <timedata.h>
1414
#include <util/system.h>
15+
#include <clientversion.h>
1516

1617
#include <map>
1718
#include <set>
1819
#include <stdint.h>
1920
#include <vector>
21+
#include <iostream>
22+
#include <streams.h>
23+
#include <fs.h>
24+
#include <hash.h>
25+
2026

2127
/**
2228
* Extended statistics about a CAddress
@@ -68,15 +74,15 @@ class CAddrInfo : public CAddress
6874
}
6975

7076
//! Calculate in which "tried" bucket this entry belongs
71-
int GetTriedBucket(const uint256 &nKey) const;
77+
int GetTriedBucket(const uint256 &nKey, const std::vector<bool> &asmap) const;
7278

7379
//! Calculate in which "new" bucket this entry belongs, given a certain source
74-
int GetNewBucket(const uint256 &nKey, const CNetAddr& src) const;
80+
int GetNewBucket(const uint256 &nKey, const CNetAddr& src, const std::vector<bool> &asmap) const;
7581

7682
//! Calculate in which "new" bucket this entry belongs, using its default source
77-
int GetNewBucket(const uint256 &nKey) const
83+
int GetNewBucket(const uint256 &nKey, const std::vector<bool> &asmap) const
7884
{
79-
return GetNewBucket(nKey, source);
85+
return GetNewBucket(nKey, source, asmap);
8086
}
8187

8288
//! Calculate in which position of a bucket to store this entry.
@@ -170,6 +176,7 @@ static const int64_t ADDRMAN_TEST_WINDOW = 40*60; // 40 minutes
170176
*/
171177
class CAddrMan
172178
{
179+
friend class CAddrManTest;
173180
protected:
174181
//! critical section to protect the inner data structures
175182
mutable RecursiveMutex cs;
@@ -264,9 +271,29 @@ class CAddrMan
264271
void SetServices_(const CService &addr, ServiceFlags nServices) EXCLUSIVE_LOCKS_REQUIRED(cs);
265272

266273
public:
274+
// Compressed IP->ASN mapping, loaded from a file when a node starts.
275+
// Should be always empty if no file was provided.
276+
// This mapping is then used for bucketing nodes in Addrman.
277+
//
278+
// If asmap is provided, nodes will be bucketed by
279+
// AS they belong to, in order to make impossible for a node
280+
// to connect to several nodes hosted in a single AS.
281+
// This is done in response to Erebus attack, but also to generally
282+
// diversify the connections every node creates,
283+
// especially useful when a large fraction of nodes
284+
// operate under a couple of cloud providers.
285+
//
286+
// If a new asmap was provided, the existing records
287+
// would be re-bucketed accordingly.
288+
std::vector<bool> m_asmap;
289+
290+
// Read asmap from provided binary file
291+
static std::vector<bool> DecodeAsmap(fs::path path);
292+
293+
267294
/**
268295
* serialized format:
269-
* * version byte (currently 1)
296+
* * version byte (1 for pre-asmap files, 2 for files including asmap version)
270297
* * 0x20 + nKey (serialized as if it were a vector, for backward compatibility)
271298
* * nNew
272299
* * nTried
@@ -298,7 +325,7 @@ class CAddrMan
298325
{
299326
LOCK(cs);
300327

301-
unsigned char nVersion = 1;
328+
unsigned char nVersion = 2;
302329
s << nVersion;
303330
s << ((unsigned char)32);
304331
s << nKey;
@@ -341,6 +368,13 @@ class CAddrMan
341368
}
342369
}
343370
}
371+
// Store asmap version after bucket entries so that it
372+
// can be ignored by older clients for backward compatibility.
373+
uint256 asmap_version;
374+
if (m_asmap.size() != 0) {
375+
asmap_version = SerializeHash(m_asmap);
376+
}
377+
s << asmap_version;
344378
}
345379

346380
template<typename Stream>
@@ -349,7 +383,6 @@ class CAddrMan
349383
LOCK(cs);
350384

351385
Clear();
352-
353386
unsigned char nVersion;
354387
s >> nVersion;
355388
unsigned char nKeySize;
@@ -379,16 +412,6 @@ class CAddrMan
379412
mapAddr[info] = n;
380413
info.nRandomPos = vRandom.size();
381414
vRandom.push_back(n);
382-
if (nVersion != 1 || nUBuckets != ADDRMAN_NEW_BUCKET_COUNT) {
383-
// In case the new table data cannot be used (nVersion unknown, or bucket count wrong),
384-
// immediately try to give them a reference based on their primary source address.
385-
int nUBucket = info.GetNewBucket(nKey);
386-
int nUBucketPos = info.GetBucketPosition(nKey, true, nUBucket);
387-
if (vvNew[nUBucket][nUBucketPos] == -1) {
388-
vvNew[nUBucket][nUBucketPos] = n;
389-
info.nRefCount++;
390-
}
391-
}
392415
}
393416
nIdCount = nNew;
394417

@@ -397,7 +420,7 @@ class CAddrMan
397420
for (int n = 0; n < nTried; n++) {
398421
CAddrInfo info;
399422
s >> info;
400-
int nKBucket = info.GetTriedBucket(nKey);
423+
int nKBucket = info.GetTriedBucket(nKey, m_asmap);
401424
int nKBucketPos = info.GetBucketPosition(nKey, false, nKBucket);
402425
if (vvTried[nKBucket][nKBucketPos] == -1) {
403426
info.nRandomPos = vRandom.size();
@@ -413,20 +436,48 @@ class CAddrMan
413436
}
414437
nTried -= nLost;
415438

416-
// Deserialize positions in the new table (if possible).
439+
// Store positions in the new table buckets to apply later (if possible).
440+
std::map<int, int> entryToBucket; // Represents which entry belonged to which bucket when serializing
441+
417442
for (int bucket = 0; bucket < nUBuckets; bucket++) {
418443
int nSize = 0;
419444
s >> nSize;
420445
for (int n = 0; n < nSize; n++) {
421446
int nIndex = 0;
422447
s >> nIndex;
423448
if (nIndex >= 0 && nIndex < nNew) {
424-
CAddrInfo &info = mapInfo[nIndex];
425-
int nUBucketPos = info.GetBucketPosition(nKey, true, bucket);
426-
if (nVersion == 1 && nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && vvNew[bucket][nUBucketPos] == -1 && info.nRefCount < ADDRMAN_NEW_BUCKETS_PER_ADDRESS) {
427-
info.nRefCount++;
428-
vvNew[bucket][nUBucketPos] = nIndex;
429-
}
449+
entryToBucket[nIndex] = bucket;
450+
}
451+
}
452+
}
453+
454+
uint256 supplied_asmap_version;
455+
if (m_asmap.size() != 0) {
456+
supplied_asmap_version = SerializeHash(m_asmap);
457+
}
458+
uint256 serialized_asmap_version;
459+
if (nVersion > 1) {
460+
s >> serialized_asmap_version;
461+
}
462+
463+
for (int n = 0; n < nNew; n++) {
464+
CAddrInfo &info = mapInfo[n];
465+
int bucket = entryToBucket[n];
466+
int nUBucketPos = info.GetBucketPosition(nKey, true, bucket);
467+
if (nVersion == 2 && nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && vvNew[bucket][nUBucketPos] == -1 &&
468+
info.nRefCount < ADDRMAN_NEW_BUCKETS_PER_ADDRESS && serialized_asmap_version == supplied_asmap_version) {
469+
// Bucketing has not changed, using existing bucket positions for the new table
470+
vvNew[bucket][nUBucketPos] = n;
471+
info.nRefCount++;
472+
} else {
473+
// In case the new table data cannot be used (nVersion unknown, bucket count wrong or new asmap),
474+
// try to give them a reference based on their primary source address.
475+
LogPrint(BCLog::ADDRMAN, "Bucketing method was updated, re-bucketing addrman entries from disk\n");
476+
bucket = info.GetNewBucket(nKey, m_asmap);
477+
nUBucketPos = info.GetBucketPosition(nKey, true, bucket);
478+
if (vvNew[bucket][nUBucketPos] == -1) {
479+
vvNew[bucket][nUBucketPos] = n;
480+
info.nRefCount++;
430481
}
431482
}
432483
}

0 commit comments

Comments
 (0)