2021-12-11: Filecoin Data Storage Drop Analysis #407
Replies: 1 comment 2 replies
-
x-posting from #slingshot-announcements Dear Slingshot community, As you saw from the above post, over this weekend, several collocated storage providers terminated many PiBs of sectors containing Slingshot data. Storage outages happen frequently in storage networks/datacenters/etc., requiring repair and recovery as a result. It was expected that this would happen on Filecoin too (hence, the idea of repair mining in the original Filecoin whitepaper). We are initiating an effort to Repair and Restore the data that was contained in these terminated sectors. Repair is the process of creating new replicas by retrieving a surviving replica and making new deals with one or more storage providers. Restore is a process in which network participants make new deals by getting the data from the source of the public Slingshot dataset source. We will publish more information about the Repair and Restore program later this afternoon. In the meantime, the Slingshot Admin Team has been doing an additional investigation into the incident:
A few implications:
Slingshot Phase 2 has largely been a huge success, but we know there is a need to adjust the program as we enter the new year and take stock of the goals that matter most for Filecoin data onboarding in 2022. We also need to account for the actions that led to this data loss. We are taking the following actions immediately:
We are truly looking forward to building repair functionality into Filecoin, and demonstrating the resilience of the Filecoin network in the face of these kinds of events. We’ll be sending out more information over the next couple days as we begin the Repair and Restore effort to recover several PBs of public good data on the Filecoin network. Thanks, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR: We believe 1+ Filecoin SPs storing replicas of data for the Slingshot competition terminated a number of sectors yesterday. Slingshot clients involved will need to make new redundant deals with different storage providers over the next few days to refresh their target data replication rate.
An event where an individual storage provider has a failure and loses data due to force majeure, is a normal occurrence for all storage networks. For example, in global storage networks, individual machines and even full data centers may have extended outages - which are mitigated by having many globally redundant replicas.
The Filecoin network similarly helps ensure resiliency despite localized failures by:
On Dec 11, 2021, 1+ Filecoin Storage Providers (SPs) terminated around 465k sectors containing ~420K deals. We think this is related to a SP that has claimed all their sector data and original raw data were lost due to a fire (Slack thread source). The full list of the storage providers that are involved in this event can be found here. As a consequence of terminating their sectors, these storage providers burned 56k FIL as penalties taken from their available balance, locked rewards and collaterals.


Based on the initial investigation, we believe these SPs are participating in Slingshot and the terminated sectors mostly contain replicas of data stored for the Slingshot competition. According to Slingshot’s rules, this data must be stored redundantly in 10 independent replicas across 4+ miners (rules), so the impact of a single copy going offline is minimal.
The Slingshot team is currently working on gathering a list of deal IDs and piece CIDs (initial draft here) that were in the terminated sectors to initiate normal repair actions and maintain data replication rates. The Slingshot clients involved will need to make new redundant deals with different storage providers over the next few days to refresh their target data replication rate. In the future, with the addition of a programmability layer like FVM, this process can be smoothly automated via “repair” contracts and won’t need client engagement.
This is a great reminder for all Filecoin network participants to ensure healthy replication across many independent storage providers to ensure resiliency to faults and smooth repair!
Beta Was this translation helpful? Give feedback.
All reactions