Skip to content

Commit d60b78f

Browse files
Merge pull request #63 from Blockchain-Technology-Lab/docs
Update documentation
2 parents 81346fd + a1b8af6 commit d60b78f

File tree

4 files changed

+155
-70
lines changed

4 files changed

+155
-70
lines changed

docs/contribute.md

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,76 @@ You can contribute to the tool by adding support for a ledger, updating the
44
mapping process for an existing ledger, or adding a new metric. In all cases,
55
the information should be submitted via a GitHub PR.
66

7-
...
7+
## Add support for ledgers
8+
9+
You can add support for a ledger that is not already supported as follows.
10+
11+
### Mapping information
12+
13+
In the directory `mapping_information/`, there exist two folders: `addresses`
14+
and `special_addresses`.
15+
16+
`addresses` contains information about the owner or manager of an address. This
17+
information should be publicly available and verifiable, for example it may come
18+
from a public explorer, social media or forum posts, articles, etc. Each file in
19+
this folder is named `<project_name>.json` (for the corresponding ledger) and
20+
contains a dictionary where the key is the address and the value is a dictionary
21+
with the following information:
22+
(i) the name of the entity (that controls the address);
23+
(ii) the source of the information (e.g., an explorer's URL);
24+
(iii) (optional) a boolean value `is_contract` (if omitted then it is assumed false);
25+
(iv) (optional) `extra_info` that might be relevant or interesting (not used for
26+
the analysis).
27+
28+
`special_addresses` contains information about addresses that should be treated
29+
specially, e.g., excluded from the analysis. This includes burn addresses,
30+
protocol-related addresses (e.g., Ethereum's staking contract), treasury
31+
addresses, etc. Here each file is named `<project_name>.json` and contains a
32+
list of dictionaries with the following information:
33+
(i) the address;
34+
(ii) the source of the information;
35+
(iii) `extra_info` which describes the reason why the address is special.
36+
37+
To contribute mapping information you can either update an existing file, by
38+
changing and/or adding some entries, or create a new file for a newly-supported
39+
ledger.
40+
41+
### Price information
42+
43+
The directory `price_data/` contains information about the supported ledgers'
44+
market price. Each file in this folder is named `<project_name>.csv` (for the
45+
corresponding ledger). The csv file has no header and each line contains two
46+
comma-separated values:
47+
(i) a day (in the form YYYY-MM-DD);
48+
(ii) the USD market price of the token on the set day.
49+
50+
To contribute price information you can either update an existing file, by
51+
adding entries for days where data is missing, or create a new file for a
52+
newly-supported ledger and add historical price data.
53+
54+
## Add metrics
55+
56+
To add a new metric, you should do the following steps.
57+
58+
First, create a relevant function in the script
59+
`tokenomics_decentralization/metrics.py`. The function should be named
60+
`compute_{metric_name}` and is given two parameters:
61+
(i) a list of tuples, where each tuple's first value is a numeric type that
62+
defines the balance of an address;
63+
(ii) an integer that defines the circulation (that is the sum of all address
64+
balances).
65+
66+
Second, import this new function to `tokenomics_decentralization/analyze.py`.
67+
In this file, include the function as the value to the dictionary
68+
`compute_functions` of the `analyze_snapshot` function, using as a key the name
69+
of the function (which will be used in the config file).
70+
71+
Third, add the name of the metric (which was used as the key to the dictionary
72+
in `analyze.py`) to the file `config.yaml` under `metrics`. You can optionally
73+
also add it under the plot parameters, if you want it to be included in the
74+
plots by default.
75+
76+
Finally, you should add unit tests for the new metric
77+
[here](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/tree/main/tests)
78+
and update the [corresponding documentation
79+
page](https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization/blob/main/docs/metrics.md)

docs/data.md

Lines changed: 0 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -46,55 +46,6 @@ WITH double_entry_book AS (
4646
ORDER BY balance DESC
4747
```
4848

49-
### Cardano
50-
51-
```
52-
SELECT *
53-
FROM
54-
(
55-
WITH blocks AS (
56-
SELECT
57-
slot_no AS block_number,
58-
block_time
59-
FROM `iog-data-analytics.cardano_mainnet.block`
60-
WHERE block_time < "{{timestamp}}"
61-
),
62-
OUTPUTS AS (
63-
SELECT
64-
slot_no as output_slot_number,
65-
CAST(JSON_VALUE(a, '$.out_address') AS STRING) AS address,
66-
CAST(JSON_VALUE(a, '$.out_idx') AS INT64) as out_idx,
67-
CAST(JSON_VALUE(a, '$.out_value') AS INT64 ) AS value
68-
FROM `iog-data-analytics.cardano_mainnet.vw_tx_in_out_with_inputs_value`
69-
JOIN blocks ON block_number = slot_no
70-
JOIN UNNEST(JSON_QUERY_ARRAY(outputs)) AS a
71-
),
72-
INPUTS AS (
73-
SELECT
74-
address,
75-
CAST(JSON_VALUE(i, '$.out_value') AS INT64 ) AS value
76-
FROM `iog-data-analytics.cardano_mainnet.vw_tx_in_out_with_inputs_value`
77-
JOIN OUTPUTS ON slot_no = output_slot_number
78-
JOIN UNNEST(JSON_QUERY_ARRAY(inputs)) AS i ON CAST(JSON_VALUE(i, '$.in_idx') AS INT64) = OUTPUTS.out_idx
79-
),
80-
INCOMING AS (
81-
SELECT address, SUM(CAST(value AS numeric)) as sum_incoming
82-
FROM INPUTS
83-
GROUP BY address
84-
),
85-
OUTGOING AS (
86-
SELECT address, SUM(CAST(value AS numeric)) as sum_outgoing
87-
FROM OUTPUTS
88-
GROUP BY address
89-
)
90-
SELECT i.address, i.sum_incoming - o.sum_outgoing AS balance
91-
FROM INCOMING AS i
92-
JOIN OUTGOING AS o ON i.address = o.address
93-
)
94-
WHERE balance > 0
95-
ORDER BY balance DESC
96-
```
97-
9849
### Dogecoin
9950

10051
```

docs/metrics.md

Lines changed: 13 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,19 @@
22

33
The metrics that have been implemented so far are the following:
44

5-
1. **Nakamoto coefficient**: The Nakamoto coefficient represents the minimum number of entities that
6-
collectively produce more than 50% of the total blocks within a given timeframe. The output of the metric is an
7-
integer.
8-
2. **Gini coefficient**: The Gini coefficient represents the degree of inequality in block production. The
5+
1. **Nakamoto coefficient**: The Nakamoto coefficient represents the minimum number of entities that collectively control
6+
more than 50% of all tokens in circulation at a given point in time. The output of the metric is an integer.
7+
2. **Gini coefficient**: The Gini coefficient represents the degree of inequality in token ownership. The
98
output of the metric is a decimal number in [0,1]. Values close to 0 indicate equality (all entities in
10-
the system produce the same number of blocks) and values close to 1 indicate inequality (one entity
11-
produces most or all blocks).
12-
3. **Entropy**: Entropy represents the expected amount of information in the distribution of blocks across entities.
9+
the system control the same amount of assets) and values close to 1 indicate inequality (one entity
10+
holds most or all tokens).
11+
3. **Entropy**: Shannon entropy represents the expected amount of information in the distribution of tokens across entities.
1312
The output of the metric is a real number. Typically, a higher value of entropy indicates higher decentralization
14-
(lower predictability). Entropy is parameterized by a base rate α, which defines different types of entropy:
15-
- α = -1: min entropy
16-
- α = 0: Hartley entropy
17-
- α = 1: Shannon entropy (this is used by default)
18-
- α = 2: collision entropy
13+
(lower predictability).
1914
4. **HHI**: The Herfindahl-Hirschman Index (HHI) is a measure of market concentration. It is defined as the sum of the
2015
squares of the market shares (as whole numbers, e.g. 40 for 40%) of the entities in the system. The output of the
21-
metric is a real number in (0, 10000]. Values close to 0 indicate low concentration (many entities produce a similar
22-
number of blocks) and values close to 1 indicate high concentration (one entity produces most or all blocks).
16+
metric is a real number in (0, 10000]. Values close to 0 indicate low concentration (many entities hold a similar
17+
number of tokens) and values close to 10000 indicate high concentration (one entity controls most or all tokens).
2318
The U.S. Department of Justice has set the following thresholds for interpreting HHI values (in traditional markets):
2419
- (0, 1500): Competitive market
2520
- [1500, 2500]: Moderately concentrated market
@@ -28,9 +23,9 @@ The metrics that have been implemented so far are the following:
2823
or the redundancy, in a population. In practice, it is calculated as the maximum possible entropy minus the observed
2924
entropy. The output is a real number. Values close to 0 indicate equality and values towards infinity indicate
3025
inequality. Therefore, a high Theil Index suggests a population that is highly centralized.
31-
6. **Max power ratio**: The max power ratio represents the share of blocks that are produced by the most "powerful"
32-
entity, i.e. the entity that produces the most blocks. The output of the metric is a decimal number in [0,1].
26+
6. **Max power ratio**: The max power ratio represents the share of tokens that are owned by the most "powerful"
27+
entity, i.e. the wealthiest entity. The output of the metric is a decimal number in [0,1].
3328
7. **Tau-decentralization index**: The tau-decentralization index is a generalization of the Nakamoto coefficient.
34-
It is defined as the minimum number of entities that collectively produce more than a given threshold of the total
35-
blocks within a given timeframe. The threshold parameter is a decimal in [0, 1] (0.66 by default) and the output of
29+
It is defined as the minimum number of entities that collectively control more than a given threshold of the total
30+
tokens in circulation. The threshold parameter is a decimal in [0, 1] (0.66 by default) and the output of
3631
the metric is an integer.

docs/setup.md

Lines changed: 69 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,77 @@ project:
1616

1717
python -m pip install -r requirements.txt
1818

19-
2019
## Execution
2120

2221
The tokenomics decentralization analysis tool is a CLI tool.
23-
The following process describes the most typical workflow.
22+
To run the tool simply do:
23+
24+
python run.py
25+
26+
The execution is controlled and parameterized by the configuration file
27+
`config.yml` as follows.
28+
29+
`metrics` defines the metrics that should be computed in the analysis. By
30+
default all supported metrics are included here (to add support for a new metric
31+
see the [conributions
32+
page](https://blockchain-technology-lab.github.io/tokenomics-decentralization/contribute/)).
33+
34+
`ledgers` defines the ledgers that should be analyzed. By default, all supported
35+
ledgers are included here (to add support for a new ledger see the [conributions
36+
page](https://blockchain-technology-lab.github.io/tokenomics-decentralization/contribute/)).
37+
38+
`execution_flags` defines various flags that control the data handling:
39+
40+
* `force_map_addresses`: the address helper data from the directory
41+
`mapping_information` is re-computed; you should set this flag to true if the
42+
data has been updated since the last execution for the given ledger
43+
* `force_map_balances`: the balance data of the ledger's addresses is
44+
recomputed; you should set this flag to true if the data has been updated
45+
since the last execution for the given ledger
46+
* `force_analyze`: the computation of a metric is recomputed; you should set
47+
this flag to true if any type of data has been updated since the last
48+
execution for the given ledger
49+
50+
`analyze_flags` defines various analysis-related flags:
51+
52+
* `no_clustering`: a boolean that disables clustering of addresses (under the
53+
same entity, as defined in the mapping information)
54+
* `top_limit_type`: a string of two values (`absolute` or `percentage`) that
55+
enables applying a threshold on the addresses that will be considered
56+
* `top_limit_value`: the value of the top limit that should be applied; if 0,
57+
then no limit is used (regardless of the value of `top_limit_type`); if the
58+
type is `absolute`, then the `top_limit_value` should be an integer (e.g., if
59+
set to 100, then only the 100 wealthiest entities/addresses will be considered
60+
in the analysis); if the type is `percentage` the the `top_limit_value` should
61+
be an integer (e.g., if set to 0.50, then only the top 50% of wealthiest
62+
entities/addresses will be considered)
63+
* `exclude_contract_addresses`: a boolean value that enables the exclusion of
64+
contract addresses from the analysis
65+
* `exclude_below_usd_cent`: a boolean value that enables the exclusion of
66+
addresses, the balance of which at the analyzed point in time was less than
67+
$0.01 (based on the historical price information in the directory
68+
`price_data`)
69+
70+
`snapshot_dates` and `granularity` control the snapshots for which an analysis
71+
will be performed. `granularity` is a string that can be empty or one of `day`, `week`,
72+
`month`, `year`. If granularity is empty, then `snapshot_dates` define the exact
73+
time points for which an analysis will be conducted, in the form YYYY-MM-DD.
74+
Otherwise, if granularity is set, then the two farthest entries in
75+
`snapshot_dates` define the timeframe over which the analysis will be conducted,
76+
at the set granular rate. For example, if the farthest points are `2010` and
77+
`2023` and the granularity is set to `month`, then (the first day of) every
78+
month in the years 2010-2023 (inclusive) will be analyzed.
79+
80+
`input_directories` and `output_directories` are both lists of directories that
81+
define the source of data. `input_directories` defines the directories that
82+
contain raw address balance information, as obtained from BigQuery or a full
83+
node (for more information about this see the [data collection
84+
page](https://blockchain-technology-lab.github.io/tokenomics-decentralization/data/)).
85+
`output_directories` defines the directories to store the databases which
86+
contain the mapping information and analyzed data. The first entry in the output
87+
directories is also used to store the output files of the analysis and the
88+
plots.
2489

90+
Finally, `plot_parameters` contains various parameters that control the type and
91+
data that will be produced as plots.
2592
...

0 commit comments

Comments
 (0)