Skip to content

Commit c3c49ed

Browse files
Merge pull request #58 from Blockchain-Technology-Lab/docs
Add initial docs
2 parents a534b16 + 3ea6ef7 commit c3c49ed

File tree

8 files changed

+381
-2
lines changed

8 files changed

+381
-2
lines changed

.github/workflows/docs.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
name: Docs
2+
on:
3+
push:
4+
branches: [ main ]
5+
paths: [ docs/** ]
6+
workflow_dispatch:
7+
jobs:
8+
build:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- uses: actions/checkout@v2
12+
with:
13+
fetch-depth: 0
14+
- uses: actions/setup-python@v2
15+
- run: pip install --upgrade pip && pip install mkdocs mkdocs-gen-files
16+
- run: git config user.name 'github-actions[bot]' && git config user.email 'github-actions[bot]@users.noreply.github.com'
17+
- name: Publish docs
18+
run: mkdocs gh-deploy

data_collection_scripts/big_query_balance_data.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
77
Attention! Before running this script, you need to generate service account credentials from Google, as described
88
here (https://developers.google.com/workspace/guides/create-credentials#service-account) and save your key in the
9-
root directory of the project under the name 'google-service-account-key-0.json'. Any additional keys should be
10-
named 'google-service-account-key-1.json', 'google-service-account-key-2.json', etc.
9+
data_collection_scripts directory of the project under the name 'google-service-account-key-0.json'. Any additional
10+
keys should be named 'google-service-account-key-1.json', 'google-service-account-key-2.json', etc.
1111
"""
1212
import google.cloud.bigquery as bq
1313
import csv

docs/contribute.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# How to contribute
2+
3+
You can contribute to the tool by adding support for a ledger, updating the
4+
mapping process for an existing ledger, or adding a new metric. In all cases,
5+
the information should be submitted via a GitHub PR.
6+
7+
...

docs/data.md

Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
# Data collection
2+
3+
Currently, the data for the analysis of the different ledgers is collected through
4+
[Google BigQuery](https://console.cloud.google.com/bigquery) .
5+
6+
7+
## Queries
8+
9+
One can retrieve the data directly from BigQuery using the queries below:
10+
11+
### Bitcoin
12+
13+
```
14+
WITH double_entry_book AS (
15+
SELECT array_to_string(inputs.addresses, ",") as address, inputs.type, -inputs.value as value
16+
FROM `bigquery-public-data.crypto_bitcoin.inputs` as inputs
17+
WHERE block_timestamp < "{{timestamp}}"
18+
UNION ALL
19+
SELECT array_to_string(outputs.addresses, ",") as address, outputs.type, outputs.value as value
20+
FROM `bigquery-public-data.crypto_bitcoin.outputs` as outputs
21+
WHERE block_timestamp < "{{timestamp}}"
22+
)
23+
SELECT address, type, sum(value) as balance
24+
FROM double_entry_book
25+
GROUP BY 1,2
26+
HAVING balance > 0
27+
ORDER BY balance DESC
28+
```
29+
30+
### Bitcoin Cash
31+
32+
```
33+
WITH double_entry_book AS (
34+
SELECT array_to_string(inputs.addresses, ",") as address, inputs.type, -inputs.value as value
35+
FROM `bigquery-public-data.crypto_bitcoin_cash.inputs` as inputs
36+
WHERE block_timestamp < "{{timestamp}}"
37+
UNION ALL
38+
SELECT array_to_string(outputs.addresses, ",") as address, outputs.type, outputs.value as value
39+
FROM `bigquery-public-data.crypto_bitcoin_cash.outputs` as outputs
40+
WHERE block_timestamp < "{{timestamp}}"
41+
)
42+
SELECT address, type, sum(value) as balance
43+
FROM double_entry_book
44+
GROUP BY 1,2
45+
HAVING balance > 0
46+
ORDER BY balance DESC
47+
```
48+
49+
### Cardano
50+
51+
```
52+
SELECT *
53+
FROM
54+
(
55+
WITH blocks AS (
56+
SELECT
57+
slot_no AS block_number,
58+
block_time
59+
FROM `iog-data-analytics.cardano_mainnet.block`
60+
WHERE block_time < "{{timestamp}}"
61+
),
62+
OUTPUTS AS (
63+
SELECT
64+
slot_no as output_slot_number,
65+
CAST(JSON_VALUE(a, '$.out_address') AS STRING) AS address,
66+
CAST(JSON_VALUE(a, '$.out_idx') AS INT64) as out_idx,
67+
CAST(JSON_VALUE(a, '$.out_value') AS INT64 ) AS value
68+
FROM `iog-data-analytics.cardano_mainnet.vw_tx_in_out_with_inputs_value`
69+
JOIN blocks ON block_number = slot_no
70+
JOIN UNNEST(JSON_QUERY_ARRAY(outputs)) AS a
71+
),
72+
INPUTS AS (
73+
SELECT
74+
address,
75+
CAST(JSON_VALUE(i, '$.out_value') AS INT64 ) AS value
76+
FROM `iog-data-analytics.cardano_mainnet.vw_tx_in_out_with_inputs_value`
77+
JOIN OUTPUTS ON slot_no = output_slot_number
78+
JOIN UNNEST(JSON_QUERY_ARRAY(inputs)) AS i ON CAST(JSON_VALUE(i, '$.in_idx') AS INT64) = OUTPUTS.out_idx
79+
),
80+
INCOMING AS (
81+
SELECT address, SUM(CAST(value AS numeric)) as sum_incoming
82+
FROM INPUTS
83+
GROUP BY address
84+
),
85+
OUTGOING AS (
86+
SELECT address, SUM(CAST(value AS numeric)) as sum_outgoing
87+
FROM OUTPUTS
88+
GROUP BY address
89+
)
90+
SELECT i.address, i.sum_incoming - o.sum_outgoing AS balance
91+
FROM INCOMING AS i
92+
JOIN OUTGOING AS o ON i.address = o.address
93+
)
94+
WHERE balance > 0
95+
ORDER BY balance DESC
96+
```
97+
98+
### Dogecoin
99+
100+
```
101+
WITH double_entry_book AS (
102+
SELECT array_to_string(inputs.addresses, ",") as address, inputs.type, -inputs.value as value
103+
FROM `bigquery-public-data.crypto_dogecoin.inputs` as inputs
104+
WHERE block_timestamp < "{{timestamp}}"
105+
UNION ALL
106+
SELECT array_to_string(outputs.addresses, ",") as address, outputs.type, outputs.value as value
107+
FROM `bigquery-public-data.crypto_dogecoin.outputs` as outputs
108+
WHERE block_timestamp < "{{timestamp}}"
109+
)
110+
SELECT address, type, sum(value) as balance
111+
FROM double_entry_book
112+
GROUP BY 1,2
113+
HAVING balance > 0
114+
ORDER BY balance DESC
115+
```
116+
117+
### Ethereum
118+
119+
```
120+
WITH double_entry_book AS (
121+
SELECT to_address as address, value AS value
122+
FROM `bigquery-public-data.crypto_ethereum.traces`
123+
WHERE to_address IS NOT null
124+
AND status = 1
125+
AND (call_type NOT IN ('delegatecall', 'callcode', 'staticcall') OR call_type IS null)
126+
AND block_timestamp < "{{timestamp}}"
127+
UNION ALL
128+
SELECT from_address as address, -value AS value
129+
FROM `bigquery-public-data.crypto_ethereum.traces`
130+
WHERE from_address IS NOT null
131+
AND status = 1
132+
AND (call_type NOT IN ('delegatecall', 'callcode', 'staticcall') OR call_type IS null)
133+
AND block_timestamp < "{{timestamp}}"
134+
UNION ALL
135+
SELECT miner AS address, sum(cast(receipt_gas_used as numeric) * cast(gas_price as numeric)) AS value
136+
FROM `bigquery-public-data.crypto_ethereum.transactions` AS transactions
137+
JOIN `bigquery-public-data.crypto_ethereum.blocks` AS blocks on blocks.number = transactions.block_number
138+
WHERE transactions.block_timestamp < "{{timestamp}}"
139+
GROUP BY blocks.miner
140+
UNION ALL
141+
SELECT from_address AS address, -(cast(receipt_gas_used as numeric) * cast(gas_price as numeric)) AS value
142+
FROM `bigquery-public-data.crypto_ethereum.transactions`
143+
WHERE block_timestamp < "{{timestamp}}"
144+
)
145+
SELECT address, sum(value) AS balance
146+
FROM double_entry_book
147+
GROUP BY address
148+
HAVING balance > 0
149+
ORDER BY balance DESC
150+
```
151+
152+
### Litecoin
153+
154+
```
155+
WITH double_entry_book AS (
156+
SELECT array_to_string(inputs.addresses, ",") as address, inputs.type, -inputs.value as value
157+
FROM `bigquery-public-data.crypto_litecoin.inputs` as inputs
158+
WHERE block_timestamp < "{{timestamp}}"
159+
UNION ALL
160+
SELECT array_to_string(outputs.addresses, ",") as address, outputs.type, outputs.value as value
161+
FROM `bigquery-public-data.crypto_litecoin.outputs` as outputs
162+
WHERE block_timestamp < "{{timestamp}}"
163+
)
164+
SELECT address, type, sum(value) as balance
165+
FROM double_entry_book
166+
GROUP BY 1,2
167+
HAVING balance > 0
168+
ORDER BY balance DESC
169+
```
170+
171+
### Tezos
172+
173+
```
174+
WITH double_entry_book as (
175+
SELECT IF(kind = 'contract', contract, delegate) AS address, change AS value
176+
FROM `public-data-finance.crypto_tezos.balance_updates`
177+
WHERE (status IS NULL OR status = 'applied') AND (timestamp < "{{timestamp}}")
178+
UNION ALL
179+
SELECT address, balance_change
180+
FROM `public-data-finance.crypto_tezos.migrations`
181+
WHERE timestamp < "{{timestamp}}"
182+
)
183+
SELECT address, SUM(value) AS balance
184+
FROM double_entry_book
185+
GROUP BY address
186+
HAVING balance > 0
187+
ORDER BY balance DESC
188+
```
189+
190+
### Zcash
191+
192+
```
193+
WITH double_entry_book AS (
194+
SELECT array_to_string(inputs.addresses, ",") as address, inputs.type, -inputs.value as value
195+
FROM `bigquery-public-data.crypto_zcash.inputs` as inputs
196+
WHERE block_timestamp < "{{timestamp}}"
197+
UNION ALL
198+
SELECT array_to_string(outputs.addresses, ",") as address, outputs.type, outputs.value as value
199+
FROM `bigquery-public-data.crypto_zcash.outputs` as outputs
200+
WHERE block_timestamp < "{{timestamp}}"
201+
)
202+
SELECT address, type, sum(value) as balance
203+
FROM double_entry_book
204+
GROUP BY 1,2
205+
HAVING balance > 0
206+
ORDER BY balance DESC
207+
```
208+
209+
## Automating the data collection process
210+
211+
Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is
212+
also possible to automate the process using a
213+
[script](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/data_collection_scripts/big_query_balance_data.py)
214+
and collect all relevant data in one go. Executing this script will run queries
215+
from [this file](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/data_collection_scripts/queries.yaml).
216+
217+
IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to
218+
generate the relevant credentials from Google, as described
219+
[here](https://developers.google.com/workspace/guides/create-credentials#service-account) and save your key in the
220+
`data_collections_scripts/` directory of the project under the name 'google-service-account-key-0.json'. Any additional
221+
keys should be named 'google-service-account-key-1.json', 'google-service-account-key-2.json', and so on.
222+
There is a
223+
[sample file](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/data_collection_scripts/google-service-account-key-SAMPLE.json)
224+
that you can consult, which shows what your credentials are supposed to look like (but note that this is for
225+
informational purposes only, this file is not used in the code).
226+
227+
Once you have set up the credentials, you can just run the following command from the root
228+
directory to retrieve data for all supported blockchains:
229+
230+
`python -m data_collection_scripts.big_query_balance_data`
231+
232+
There are also three command line arguments that can be used to customize the data collection process:
233+
234+
- `ledgers` accepts any number of the supported ledgers (case-insensitive). For example, adding `--ledgers bitcoin`
235+
results in collecting data only for Bitcoin, while `--ledgers Bitcoin Ethereum` would collect data for
236+
Bitcoin and Ethereum. If the `ledgers` argument is omitted, then the default value is used, which
237+
is taken from the
238+
[configuration file](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/config.yaml)
239+
and typically corresponds to all supported blockchains.
240+
- `snapshot_dates` accepts any number of dates formatted as YYYY-MM-DD, YYYY-MM, or YYYY. Then, data is collected for
241+
the specified date(s). Again, if this argument is omitted, the default value is taken from the
242+
[configuration file](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/config.yaml).
243+
- `--force-query` forces the collection of all raw data files, even if some or all of the files already
244+
exist. By default, this flag is set to False and the script only fetches data for some blockchain if the
245+
corresponding file does not already exist.

docs/index.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Tokenomics Blockchain Decentralization - Documentation
2+
3+
This is the documentation for the Tokenomics Decentralization Analysis tool developed by the University of Edinburgh's
4+
Blockchain Technology Lab. The tool is responsible for analyzing the token distribution of various blockchains and measuring their
5+
subsequent levels of decentralization.
6+
7+
The relevant source code is available on [GitHub](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization).
8+
9+
## Overview
10+
11+
Currently, the supported ledgers are:
12+
13+
- Bitcoin
14+
- Bitcoin Cash
15+
- Dogecoin
16+
- Ethereum
17+
- Litecoin
18+
- Tezos
19+
20+
We intend to add more ledgers to this list in the future.
21+
22+
## Contributing
23+
24+
This is an open source project licensed under the terms and conditions of the
25+
[MIT license](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/LICENSE) and
26+
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
27+
Everyone is welcome to contribute to it by proposing or implementing their
28+
ideas. Example contributions include, but are not limited to, reporting
29+
potential bugs, supplying useful information for the clustering of supported
30+
ledgers, adding support for a new ledger, or making the code more efficient.
31+
All contributions to the project will also be covered by the above-mentioned
32+
license.
33+
34+
When making changes in the code, contributors are required to fork the project's repository first and then issue a pull
35+
request with their changes. Each PR will be reviewed before being merged to the main branch. Bugs can be reported
36+
in the [Issues](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/issues) page.
37+
Other comments and ideas can be brought up in the project's
38+
[Discussions](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/discussions/).
39+
40+
For more information on how to make specific contributions, see [How to Contribute](contribute.md).

docs/metrics.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Metrics
2+
3+
The metrics that have been implemented so far are the following:
4+
5+
1. **Nakamoto coefficient**: The Nakamoto coefficient represents the minimum number of entities that
6+
collectively produce more than 50% of the total blocks within a given timeframe. The output of the metric is an
7+
integer.
8+
2. **Gini coefficient**: The Gini coefficient represents the degree of inequality in block production. The
9+
output of the metric is a decimal number in [0,1]. Values close to 0 indicate equality (all entities in
10+
the system produce the same number of blocks) and values close to 1 indicate inequality (one entity
11+
produces most or all blocks).
12+
3. **Entropy**: Entropy represents the expected amount of information in the distribution of blocks across entities.
13+
The output of the metric is a real number. Typically, a higher value of entropy indicates higher decentralization
14+
(lower predictability). Entropy is parameterized by a base rate α, which defines different types of entropy:
15+
- α = -1: min entropy
16+
- α = 0: Hartley entropy
17+
- α = 1: Shannon entropy (this is used by default)
18+
- α = 2: collision entropy
19+
4. **HHI**: The Herfindahl-Hirschman Index (HHI) is a measure of market concentration. It is defined as the sum of the
20+
squares of the market shares (as whole numbers, e.g. 40 for 40%) of the entities in the system. The output of the
21+
metric is a real number in (0, 10000]. Values close to 0 indicate low concentration (many entities produce a similar
22+
number of blocks) and values close to 1 indicate high concentration (one entity produces most or all blocks).
23+
The U.S. Department of Justice has set the following thresholds for interpreting HHI values (in traditional markets):
24+
- (0, 1500): Competitive market
25+
- [1500, 2500]: Moderately concentrated market
26+
- (2500, 10000]: Highly concentrated market
27+
5. **Theil index**: The Theil index is another measure of entropy which is intended to capture the lack of diversity,
28+
or the redundancy, in a population. In practice, it is calculated as the maximum possible entropy minus the observed
29+
entropy. The output is a real number. Values close to 0 indicate equality and values towards infinity indicate
30+
inequality. Therefore, a high Theil Index suggests a population that is highly centralized.
31+
6. **Max power ratio**: The max power ratio represents the share of blocks that are produced by the most "powerful"
32+
entity, i.e. the entity that produces the most blocks. The output of the metric is a decimal number in [0,1].
33+
7. **Tau-decentralization index**: The tau-decentralization index is a generalization of the Nakamoto coefficient.
34+
It is defined as the minimum number of entities that collectively produce more than a given threshold of the total
35+
blocks within a given timeframe. The threshold parameter is a decimal in [0, 1] (0.66 by default) and the output of
36+
the metric is an integer.

docs/setup.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Setup
2+
3+
## Installation
4+
5+
To install the tokenomics decentralization analysis tool, simply clone this GitHub repository:
6+
7+
git clone https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization.git
8+
9+
The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally.
10+
11+
The [requirements file](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/requirements.txt) lists
12+
the dependencies of the project.
13+
Make sure you have all of them installed before running the scripts. To install
14+
all of them in one go, run the following command from the root directory of the
15+
project:
16+
17+
python -m pip install -r requirements.txt
18+
19+
20+
## Execution
21+
22+
The tokenomics decentralization analysis tool is a CLI tool.
23+
The following process describes the most typical workflow.
24+
25+
...

mkdocs.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
site_name: Tokenomics Blockchain Decentralization - Docs
2+
nav:
3+
- Home: index.md
4+
- How to use: setup.md
5+
- Data Collection: data.md
6+
- Metrics: metrics.md
7+
- How to contribute: contribute.md
8+
theme: readthedocs

0 commit comments

Comments
 (0)