Skip to content

Commit d29b9ec

Browse files
authored
GIP-0083: Substreams On The Network. (#63)
* First draft of Substreams On The Network. * Do NOT suggest enabling rewards. * Remove artifact * Move to GIP 0082. Address Pablo's comments. * Massive improvements to the GIP, clarified what is part of the specs, what is informational. Linked to documents with payments flow details. * Add final commitment. * another update * 0083 instead * Added a nudge to GIP-0056 Permissionless Payers. * More details about pricing, and ECON details. * Fix yaml issue * Some updates after ffeedback. * Rework the attestation, no more DataEdge, a Deployment ID would be best, and can be allocated and disputed on. More clarity on the attestation signature payload, including the module_hash being covered, which would facilitate pinning down culprits. * Unescessarry. * Update 0083.md Added details about network subgraph, fixed a few typos. * Update 0083.md Take out arbitration charter modifications. * Update 0009-arbitration-charter.md Revert arbitration charter changes.
1 parent 8df3b0e commit d29b9ec

File tree

1 file changed

+232
-0
lines changed

1 file changed

+232
-0
lines changed

gips/0083.md

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
---
2+
GIP: "0083"
3+
Title: Make Substreams an official The Graph Network product
4+
Authors: Alexandre Bourget <[email protected]>
5+
Created: 2024-10-27
6+
Updated: 2024-10-27
7+
Stage: Draft
8+
Category: Protocol Logic, Protocol Interfaces
9+
Implementations: https://thegraph.market, https://susbtreams.dev, https://github.com/streamingfast/substreams
10+
---
11+
12+
# Abstract
13+
14+
This GIP proposes the formal recognition of Substreams as an official product within The Graph Network. Upon approval, this GIP represents the community's consensus to fully integrate Substreams into the network's suite of products, documentation, and marketing materials. While this integration paves the way for future indexing rewards, this GIP does not authorize such rewards.
15+
16+
# Motivation
17+
18+
The Graph ecosystem has successfully developed and deployed Substreams over several years, demonstrating significant performance improvements for indexing, particularly with large datasets and complex computations. However, the lack of formal network approval has limited its full integration and adoption.
19+
20+
Official recognition would:
21+
1. Increase developer and user trust in the product
22+
2. Enable confident marketing and go-to-market efforts
23+
3. Allow the community to fully leverage Substreams' performance benefits
24+
4. Maintain The Graph's decentralization and security principles
25+
5. Strengthen The Graph's position as a premier blockchain infrastructure provider
26+
27+
# Non-normative sections
28+
29+
IMPORTANT: Non-normative (or Informative) sections were added for the sake of clarity, providing clearer examples, but are not part of the specification proposed in this GIP. They also do not hold back the value accrued to the Network if this proposal is approved.
30+
31+
# Specification
32+
33+
## Permissionless Discovery
34+
35+
This section discusses how Indexers may be listed on the Payments Gateway UI.
36+
37+
Substreams providers MUST register their service on-chain by publishing a _Substreams Service Deployment_ manifest to IPFS, and [`allocate`ing](https://github.com/graphprotocol/contracts/blob/main/packages/contracts/contracts/staking/Staking.sol#L296) to it through the Staking contract on the Arbitrum chain.
38+
39+
Here is a sample _Substreams Service Deployment_ manifest:
40+
```yaml
41+
specVersion: 0.0.5
42+
description: "Substreams Data Service for MyNetwork"
43+
service:
44+
type: substreams-v1
45+
endpoint: https://mynetwork.substreams.example.com
46+
network: mynetwork
47+
provider:
48+
id: my-company
49+
name: My Company
50+
logo: https://company.example.com/logo.png
51+
```
52+
53+
The service `substreams-v1` means that the endpoint makes available a gRPC endpoint responding to these methods:
54+
55+
- sf.substreams.rpc.v2.Stream/Block
56+
- sf.substreams.rpc.v2.EndpointInfo/Info
57+
58+
as specified by https://github.com/streamingfast/substreams/blob/develop/proto/sf/substreams/rpc/v2/service.proto updated from time-to-time in backwards compatible ways.
59+
60+
The Payment Gateway MUST listen to the Arbitrum Staking contract for such registrations (through a Substreams module streamed, a Subgraph like the canonical Network Subgraph or other means), and update its local view of the network.
61+
62+
The Payment Gateway CAN have systems to perform health checks (e.g. on `/health` or `/info` endpoints), or other checks to ensure active, up-to-date and properly configured providers are offered to end users. A Payment Gateway CAN check block height to ensure the backing provider is close to chain head before offering it to users.
63+
64+
65+
## Indexer Selection Algorithm (ISA)
66+
67+
To provide consumers with fair access to Substreams providers, an Indexer Selection Algorithm is needed.
68+
69+
Given that there's no gateway involved with Substreams (it's direct point-to-point between consumer and provider), the existing Indexer Selection Algorithm (ISA), which happens at the time of query, cannot be applied for Substreams. Therefore, a round-robin selection of providers is offered on the front-end of the Payment Gateway (https://thegraph.market), ensuring fair distribution of request loads, while still allowing consumers to choose their provider.
70+
71+
> NON-NORMATIVE: A reason for selecting a provider could be that it has pre-cached data for a given dataset, which would speed up and lower the costs for a consumer.
72+
73+
> NON-NORMATIVE: Future GIPs will propose improved ISA, based on better data and better metrics, which might be desirable to consumers.
74+
75+
## Payments
76+
77+
Payments are handled by the Payment Gateway, which is responsible for collecting payments from consumers and distributing them to indexers.
78+
79+
> INFORMATIVE: The Graph Market acts as the first Payments Gateway. Such a gateway DOES NOT route traffic, but does route payments and receive consumption signals from Indexers.
80+
81+
Such a Payment Gateway is responsible for invoking the `collect` function in the staking contract, thereby transfering funds from Consumers to Indexers.
82+
83+
This builds on work done in [GIP-0056 Permissionless Payers](https://github.com/graphprotocol/graph-improvement-proposals/blob/main/gips/0056-permissionless-payers.md), which opened up an opportunity for such new data services to use the network for payment.
84+
85+
> INFORMATIVE: [TAP](https://docs.rs/tap_core/latest/tap_core/index.html) is not required for Substreams to honor the promises of the protocol. However, as TAP brings augmented trust minimization properties, it would be incorporated in future work.
86+
87+
After collecting consumption signals, the Payment Gateway MUST sum up all what is due to the Indexer, and transfer a payment to the Indexer's wallet, using the `collect` function in the staking contract.
88+
89+
### Current implementation and future opportunities
90+
91+
THIS SECTION IS NON NORMATIVE
92+
93+
An example of the payment flow today, which ends up abiding by this specification (calling `collect`) can be seen here:
94+
95+
https://github.com/streamingfast/network-payments-cli
96+
97+
It involves a small dance, of publishing a manifest similar to:
98+
99+
```yaml
100+
specVersion: 0.0.5
101+
description: "thegraph.market Payment Gateway Usage"
102+
usage:
103+
serviceType: substreams-v1
104+
network: mainnet
105+
nonce: some-uuid
106+
```
107+
108+
then allocating to it, sharing the allocation ID with the Payment Gateway, in order for the Indexer to receive payment. Mind you, this isn't the same thing as the _Substreams Service Deployment_ manifest, as this one is short-lived as to not be exposed to the curation fee market.
109+
110+
This flow is expected to be simplified in the future, with the Payment Gateway being able to automatically detect new Substreams services, and allocate to them, without manual intervention.
111+
112+
Future opportunities would include self-serve APIs for Indexers to claim payment and do the dance automatically. Thus an Indexer would be able to lower the risks of not receiving payments from the Gateway.
113+
114+
Eventually, TAP could be used to send payments alongside blockchain data payloads (BlockScopedData), or some other trust-minimized payment mechanism.
115+
116+
117+
## Pricing
118+
119+
Pricing is currently outside of the scope of this proposal. The protocol (contracts) itself does not provide pricing mechanisms, so this responsibility is passed to the Payment Gateway.
120+
121+
The Payment Gateway MUST transfer funds to the Indexer according to a Pricing Period. Ideally this period is the shortest possible, to reduce counterparty risks.
122+
123+
124+
### Current implementation and future opportunities
125+
126+
THIS SECTION IS NON-NORMATIVE
127+
128+
Current pricing is set at a fixed price, defined on The Graph Market (https://thegraph.market). It has been measured by the Core Devs to have unit economics that makes sense for operators, and will be revised as the system matures.
129+
130+
Consumption is measured in terms of Terabytes read from the backing stores and/or Egress bytes. Fees are cumulated during a period of time, until payment is settled. This time period will start at 1 month and be adjusted to become lower and lower as the system matures, to address counterparty risks.
131+
132+
Although initial integration starts with a simple pricing strategy, the integration of Substreams on The Graph Network by no means limits the future opportunities to adjust and augment the business models and pricing strategies.
133+
134+
Future GIPs will be proposed to enable more complex pricing strategies, such as pay-per-cpu-time, pay-per-storage, pay-per-query, pay-per-block, pay-per-attestation, etc.
135+
136+
### Curation
137+
138+
THIS SECTION IS NON-NORMATIVE
139+
140+
Curation can happen on the published Subgraph manifest above, although a nonce may be added to avoid paying curation fees. Because there are no indexer rewards, curation is not adding value, therefore it is reasonable for participants to want to avoid it.
141+
142+
143+
144+
## Economic Security and Disputes
145+
146+
Economic security is achieved through slashing and disputes, similar to subgraphs as of Jan 24th 2025. Indexers providing incorrect data CAN have their stake slashed. Therefore, to be an Indexer recognized by the Payment Gateway, one must staked the Network token, just like for subgraphs.
147+
148+
The means are similar to subgraphs, by providing signed attestations, attached to allocations, and having an Arbiter validate those attestations, and slash the Indexer's stake if necessary.
149+
150+
### Attestations
151+
152+
Having no queries like subgraphs have, Proofs of Indexing for Substreams are always null.
153+
154+
> [!NOTE]
155+
> This is why enabling indexing rewards for Substreams is not part of this GIP. Future GIPs, potentially with Horizon, will propose a mechanism to reward Indexers for providing Substreams services.
156+
157+
Substreams endpoints provide signed attestations with each block's response, as per [the `attestation` field](https://github.com/streamingfast/substreams/blob/develop/proto/sf/substreams/rpc/v2/service.proto#L96) in the `BlockScopedData` response, matching the operator key specified in [the `attestation_public_key` field](https://github.com/streamingfast/substreams/blob/develop/proto/sf/substreams/rpc/v2/service.proto#L106) of the `SessionInit` response.
158+
159+
The payload signed over is composed of:
160+
- the Substreams output module's payload, hashed with SHA256 (32 bytes), followed by:
161+
- the 'b' character, followed by:
162+
- the block ID as UTF-8 encoded string, followed by:
163+
- the 'm' character, followed by:
164+
- the module hash for the top-level Substreams module being attested (32 bytes)
165+
166+
The attestation payload MUST be signed by a special _attestation key_, derived from the operator's key, so that disputes can be opened on the _Substreams Service Deployment ID_.
167+
168+
A valid _attestation key_, attached to such an allocation MUST be verified by the Payment Gateway, before giving the assurance to Consumers that the Indexer can be slashed for misbehaving.
169+
170+
An Arbiter MUST be able to validate payloads, through multiple providers and investigation, and after judgement, be able to slash the Indexer's stake, using the information provided in those payloads via an on-chain allocations.
171+
172+
> INFORMATIVE: The module hash covers everything needed for deterministic execution, and nothing more. It is easily computed by all known compliant Substreams library.
173+
174+
175+
#### Methods of analysis
176+
177+
THIS SECTION IS NON-NORMATIVE
178+
179+
This section specifies potential methods an Arbiter can use to analyze disputes. Other more optimal tools could be written, and are out of scope of this specification.
180+
181+
**Detecting Faults:** Discrepancies in data returned to a Fisherman or a Consumer, by different providers for the same stream and block, is material to open an investigation.
182+
183+
The Arbiter can use the `substreams gui` tool or the `substreams run --show-attestations` tool to inspect attestations and contents, and compare stream data at specific block ranges, verifying data integrity and determinism.
184+
185+
`substreams tools` will contain a way to validate those attestations, and help with investigation. This tool will take the content being disputed (BlockScopedData) from the replay.log, which contain the attestations over the content within, and validate their output, in order to compare against providers, and end up slashing an Indexer.
186+
187+
```
188+
substreams tools validate-attestation --indexer-address 0x123123123 --attestation eth:120398102398102938109283012983 --replay-log-file ./path/to/replay.log
189+
```
190+
191+
This command will process all messages in the provided replay log and validate their attestations. Replay logs can be generated using `substreams gui --with-replay`.
192+
193+
The Payment Gateway can keep an in-memory mapping will link the operator key to the staking key.
194+
195+
- The `setOperator` function ([Staking.sol#L226](https://github.com/graphprotocol/contracts/blob/ce3ec16484dacc89a1b9cf08455256be830790e3/packages/contracts/contracts/staking/Staking.sol#L226)) links the operator key to the staking key and emits the `SetOperator` event ([IStakingBase.sol#L99](https://github.com/graphprotocol/contracts/blob/main/packages/contracts/contracts/staking/IStakingBase.sol#L99)).
196+
197+
Such a system can hydrate from the Network Subgraph.
198+
199+
200+
### Tools
201+
202+
THIS SECTION IS NON-NORMATIVE. Different tools can exist to do those operations. We are listing some for informational purposes here.
203+
204+
For Indexers, in order to register a new Substreams provider on-chain, and show up on the Payment Gateway, you can use commands such as:
205+
206+
```
207+
substreams network register --operator-wallet 0x123123123 --operator-priv-key-file my.key --service substreams-v1 --endpoint mainnet.eth.example.com --provider-logo https://www.example.com/static/mylogo.png --provider-id example-company --provider-name "Example Company" --network ethereum
208+
```
209+
210+
This command allows providers to register their service, specifying their operator wallet, private key file, the service type (`substreams-v1`), the endpoint URL, logo, name, and the supported network. The registration data is signed by the operator's key. An update to an existing registration can be achieved by re-running the command with updated parameters, based on the `service`, `providerId` and `network` tripled. A corresponding `unregister` or `revoke` command would be added to allow providers to take down their service from the front-end.
211+
212+
213+
# Dependencies & Backwards Compatibility
214+
215+
The only dependency for this GIP is the ratification of the changes to GIP-0009 as proposed in this Pull Request.
216+
217+
Most of the elements described above are already rolled out and currently work.
218+
219+
To account for signed attestation, some backwards compatible fields were be added to the Substreams requests and responses. These fields should not affect current Substreams rollout, but will be necessary for those wanting to join the Network.
220+
221+
222+
# Risks and Security Considerations
223+
224+
Risks associated with data integrity and malicious actors will be mitigated through signed attestations, the arbitration process, and economic security enforced via slashing.
225+
226+
There are some centralization risks in the Payment Gateway until TAP is integrated. For instance, Indexers need to trust the Gateway for payment. In the same vein, there is some counterparty risk for Indexers, as they have to trust that the Payment Gateway _will pay_ - this can be mitigated by requesting payment early and often.
227+
228+
No other risks have been identified during the consultation period.
229+
230+
# Copyright Waiver
231+
232+
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

0 commit comments

Comments
 (0)