Does anybody use Lotus/Filecoin storage for enterprise class storage requirements? #7860

photonclock · 2021-12-31T19:55:57Z

photonclock
Dec 31, 2021

I've read through the docs a couple times, and I'm wondering - does anybody use Lotus/Filecoin for large scale/enterprise data storage requirements?

For example, a typical single project at my company may generate an average of about 5TB of data per day, and we have anywhere from 20-50 projects ongoing simultaneously. Each 5TB daily dataset might be comprised of somewhere between 1,000 and 1,000,000 files depending on the project type. On big days, the dataset per project may spike 3x-4x. We have multiple 10GbE connections to backbone and typically move data to S3 storage at about 600-800MB/s per project. Everything runs 24/7.

Reading the docs and tutorials - it seems like Lotus data duplication and processing overhead requirements result in an astoundingly slow scheme that would be all but impossible to facilitate even a tiny fraction of the data we would need to push into a Filecoin storage environment, even if (and this is the whole point of why I'm researching this) we were the storage provider/miner for our own dataset and could bypass transmitting the data via Lotus to the Tier-1 storage.

I don't get any sense from the tutorial/docs as to why some of the processing overhead in Lotus is so severe. Quotes from the tutorial:

Lotus creates a directed acyclic graph (DAG) based off the payload. This process takes a few minutes. Once it’s complete, Lotus will output the payload CID.

A few minutes per file? What if it's 50 100GB files, or 1,000 5GB files, or a million files that are 5 MB each? Can this procedure be batched and parallelized to exponentially reduce overhead?

Wait for Lotus to finish calculating the size of your payload. Lotus calculates this size by counting the individual bits in your payload to ensure that the size is accurate....In tests, Lotus took around 20 minutes file of a ~7.5GB file with a 4-core CPU and 8GB RAM.

I assume that was a very slow test environment? I can generate an md5 or xxhash64 hash on a 7.5 GB file in a matter of seconds reading from NAS over a 10GbE network. I can generate the hash in a split second on 12-drive spinning RAID6, and a fraction of that time on an SSD array. Why is the Lotus/Filecoin payload size calculation so slow?

In the previous step, you stored some data on the Filecoin network. It takes up to 24 hours for a storage provider to seal the data.

24 hours to seal a 5GB test file? How does that scale if the data set is anywhere near what the requirements I describe?

TLDR: At this point in Lotus/Filecoin deployment, what is the highest throughput use case currently being implemented on the Lotus/Filecoin network?

Also, regarding the GPU requirements, is mining at all feasible in a Mac environment - latest Mac Pro's and AMD GPUs - or does mining require Linux/NVidia due to the CUDA dependencies? (please don't tell me I need to run Windows :-)

Has anyone done a write up on use cases that approach these requirements?

Thank you!

JeanCarloEM · 2022-01-01T02:11:45Z

JeanCarloEM
Jan 1, 2022

I've read through the docs a couple times, and I'm wondering - does anybody use Lotus/Filecoin for large scale/enterprise data storage requirements?

For example, a typical single project at my company may generate an average of about 5TB of data per day, and we have anywhere from 20-50 projects ongoing simultaneously. Each 5TB daily dataset might be comprised of somewhere between 1,000 and 1,000,000 files depending on the project type. On big days, the dataset per project may spike 3x-4x. We have multiple 10GbE connections to backbone and typically move data to S3 storage at about 600-800MB/s per project. Everything runs 24/7.

Reading the docs and tutorials - it seems like Lotus data duplication and processing overhead requirements result in an astoundingly slow scheme that would be all but impossible to facilitate even a tiny fraction of the data we would need to push into a Filecoin storage environment, even if (and this is the whole point of why I'm researching this) we were the storage provider/miner for our own dataset and could bypass transmitting the data via Lotus to the Tier-1 storage.

I don't get any sense from the tutorial/docs as to why some of the processing overhead in Lotus is so severe. Quotes from the tutorial:

Lotus creates a directed acyclic graph (DAG) based off the payload. This process takes a few minutes. Once it’s complete, Lotus will output the payload CID.

A few minutes per file? What if it's 50 100GB files, or 1,000 5GB files, or a million files that are 5 MB each? Can this procedure be batched and parallelized to exponentially reduce overhead?

Wait for Lotus to finish calculating the size of your payload. Lotus calculates this size by counting the individual bits in your payload to ensure that the size is accurate....In tests, Lotus took around 20 minutes file of a ~7.5GB file with a 4-core CPU and 8GB RAM.

I assume that was a very slow test environment? I can generate an md5 or xxhash64 hash on a 7.5 GB file in a matter of seconds reading from NAS over a 10GbE network. I can generate the hash in a split second on 12-drive spinning RAID6, and a fraction of that time on an SSD array. Why is the Lotus/Filecoin payload size calculation so slow?

In the previous step, you stored some data on the Filecoin network. It takes up to 24 hours for a storage provider to seal the data.

24 hours to seal a 5GB test file? How does that scale if the data set is anywhere near what the requirements I describe?

TLDR: At this point in Lotus/Filecoin deployment, what is the highest throughput use case currently being implemented on the Lotus/Filecoin network?

Also, regarding the GPU requirements, is mining at all feasible in a Mac environment - latest Mac Pro's and AMD GPUs - or does mining require Linux/NVidia due to the CUDA dependencies? (please don't tell me I need to run Windows :-)

Has anyone done a write up on use cases that approach these requirements?

Thank you!

My knowledge is very superficial, but I understand that one thing is filecoin and another thing is IPFS. As far as I understand Lotus is intended to be a filecoin miner. Therefore, there is a difference between simply making disk space available, and quite another is making the space available for a fee.

In this sense, a similar technology, but a competitor to IPFS, is BTFS (bittorrent). But, as the torrent is much older, I think the concept is more understandable. As a comparison, in bittorrent it is possible to share (seed) disk space, without having remuneration, and this happens "instantly", however, to be remunerated with BTT is another story.

As I said, my knowledge is superficial, but I think I managed to explain my conception. It is one thing to simply join the IPFS network and provide space, quite another to receive filecoin for it. Lotus aims to mine filecoin by providing space in IPFS.

https://docs.filecoin.io/about-filecoin/ipfs-and-filecoin/
https://ipfs.io/
https://btfs.io/

By your report, your need is only for IPFS: https://ipfs.io/#install

See more in https://proto.school/anatomy-of-a-cid/01

1 reply

photonclock Jan 3, 2022
Author

By your report, your need is only for IPFS: https://ipfs.io/#install

No, my research on this topic is predicated on monetizing the vast amount of storage that we already have to purchase & maintain for our existing projects. That is why I am interested in Filecoin.

stuberman · 2022-01-03T22:01:25Z

stuberman
Jan 3, 2022

There are a number of initiatives running right now to provide enterprise class data storage capabilities to large customers. I think you are getting a misimpression of the throughput that Filecoin provides regardless of file size. You can see the accelerator we are about to kick off at ESPA. Take a look.

The more difficult issues that Filecoin needs to improve or create:

Fast and reliable file retrieval (currently in the works)
Privacy/access controls for retrieval
Encryption of data
Enterprise grade standards/certifications (in process)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does anybody use Lotus/Filecoin storage for enterprise class storage requirements? #7860

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Does anybody use Lotus/Filecoin storage for enterprise class storage requirements? #7860

Uh oh!

Uh oh!

photonclock Dec 31, 2021

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

JeanCarloEM Jan 1, 2022

Uh oh!

Uh oh!

photonclock Jan 3, 2022 Author

Uh oh!

stuberman Jan 3, 2022

photonclock
Dec 31, 2021

Replies: 2 comments 1 reply

JeanCarloEM
Jan 1, 2022

photonclock Jan 3, 2022
Author

stuberman
Jan 3, 2022