Skip to content

Conversation

@nickyc975
Copy link
Contributor

This pull request adds support for file segments to Transfer Engine and Mooncake Store, then introduces a generic NVMeoF transport implementation to Transfer Engine.

Please read the RFC (#780) and the documents in this PR for more information.

@nickyc975 nickyc975 force-pushed the jinlong/nvmeof-upstream branch from 561c299 to ab2a023 Compare August 29, 2025 09:55
@stmatengss stmatengss self-assigned this Aug 29, 2025
@nickyc975 nickyc975 force-pushed the jinlong/nvmeof-upstream branch from ab2a023 to 8ada073 Compare August 29, 2025 15:09
@SgtPepperr
Copy link
Contributor

Hello, I'm very happy to see the code support for NVMe over Fabrics as a transport. However, I have a small question to ask. Will adding NVMe segments here conflict with the currently implemented tiered caching mechanism #578 , Mooncake+3FS, or are the two compatible?

@nickyc975
Copy link
Contributor Author

Hello, I'm very happy to see the code support for NVMe over Fabrics as a transport. However, I have a small question to ask. Will adding NVMe segments here conflict with the currently implemented tiered caching mechanism #578 , Mooncake+3FS, or are the two compatible?

Hello, the two should be compatible by design. Although not recommended, NVMe segments can be used with 3FS, just like memory segments.

local_buffer_size, protocol,
rdma_devices, master_server_addr);
})
.def("setup_with_files",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add more parameters in setup or use environment variables to complete the file setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can.
The only difference between setup_with_files and setup is that setup_with_files takes a files argument instead of the global_segment_size argument. We can add the files argument as the last argument of setup for compatibility. Do you think this is better than adding setup_with_files?

@nickyc975 nickyc975 requested a review from stmatengss September 8, 2025 02:04
@stmatengss
Copy link
Collaborator

Currently working on releasing 0.3.5 and will review this PR ASAP. @nickyc975

@tianlang-wq
Copy link

1.请问一下通过NVMeoF实现传输,是client持久化的逻辑吗?
2.这块可以替换客户端持久化的逻辑吗,除了DFS好像没有什么办法统一所有客户端的持久层?

@nickyc975
Copy link
Contributor Author

nickyc975 commented Sep 22, 2025

1.请问一下通过NVMeoF实现传输,是client持久化的逻辑吗? 2.这块可以替换客户端持久化的逻辑吗,除了DFS好像没有什么办法统一所有客户端的持久层?

NVMeoF传输与当前的client持久化逻辑是相互独立的,暂时还不能替换现有的持久化逻辑。

Comment on lines 61 to 62
std::string segment_name, FileBufferID file_id,
void* buffer_ptr, std::size_t size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid add a new parameter. Could we combine FileBufferID and buffer_ptr as a union or an optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the file_id parameter can be removed according to #883 . I am working on it.

std::vector<BufferDesc> buffers;
// this is for nvmeof.
std::vector<NVMeoFBufferDesc> nvmeof_buffers;
// Generic file buffers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be #ifdef USE_NVMEOF_GENERIC here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically we can have other transports than support file buffers. Therefore, file buffer components are intended to be separated from NVMeoF components.

@stmatengss
Copy link
Collaborator

Could you resolve the conflicts? thx. @nickyc975

@nickyc975
Copy link
Contributor Author

Could you resolve the conflicts? thx. @nickyc975

Done.

@stmatengss
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a generic NVMeoF transport implementation, which is a significant and well-structured feature addition. The changes are extensive, touching documentation, CMake build files, C++ source code for both the store and transfer engine, and Python bindings. The overall implementation looks solid. I've identified a few areas for improvement, primarily related to maintainability and documentation clarity. Specifically, I've pointed out a typo in the documentation, suggested refactoring a couple of functions that use goto to improve readability, and recommended adding more detailed comments to explain a workaround for CacheLib's memory allocation.

```bash
./build/mooncake-transfer-engine/example/transfer_engine_nvmeof_generic_bench \
--local_server_name=127.0.0.1:8081 \
--metadata_server=http://127.0.0.0.0:8080/metadata \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the IP address for the metadata server. It's written as 127.0.0.0.0, but it should likely be 127.0.0.1 to match the other examples and standard loopback usage.

Suggested change
--metadata_server=http://127.0.0.0.0:8080/metadata \
--metadata_server=http://127.0.0.1:8080/metadata \

Comment on lines 82 to 83
// Add a padding to base to support zero-based buffers.
auto padded_base = base + facebook::cachelib::Slab::kSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment "Add a padding to base to support zero-based buffers" is a bit brief. To improve maintainability, could you add a more detailed explanation for why this padding is necessary for CacheLib? For example, explaining if CacheLib has issues with zero addresses or if this is to avoid some other specific behavior would be very helpful for future developers who might need to work with this code.

Comment on lines 42 to 79
static size_t getFileSize(const std::string& file) {
size_t size = 0;
struct stat st;
int rc;

int fd = open(file.c_str(), O_RDONLY);
if (fd < 0) {
LOG(ERROR) << "Failed to open file " << file << ", errno=" << errno;
goto out;
}

rc = fstat(fd, &st);
if (rc < 0) {
LOG(ERROR) << "Failed fstat on file " << file << ", errno=" << errno;
goto close_file;
}

if (S_ISLNK(st.st_mode)) {
LOG(ERROR) << "File " << file << " is a symbolic link";
goto close_file;
}

if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
rc = ioctl(fd, BLKGETSIZE64, &size);
if (rc < 0) {
LOG(ERROR) << "Failed ioctl on file " << file
<< ", errno=" << errno;
size = 0;
}
} else if (S_ISREG(st.st_mode)) {
size = st.st_size;
}

close_file:
close(fd);
out:
return size;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The getFileSize function uses goto statements for flow control. While functional, this can make the code harder to read and maintain. Consider refactoring to avoid goto, for example by using early returns. This will improve code clarity and align with modern C++ best practices.

static size_t getFileSize(const std::string& file) {
    size_t size = 0;
    int fd = open(file.c_str(), O_RDONLY);
    if (fd < 0) {
        LOG(ERROR) << "Failed to open file " << file << ", errno=" << errno;
        return 0;
    }

    struct stat st;
    if (fstat(fd, &st) < 0) {
        LOG(ERROR) << "Failed fstat on file " << file << ", errno=" << errno;
        close(fd);
        return 0;
    }

    if (S_ISLNK(st.st_mode)) {
        LOG(ERROR) << "File " << file << " is a symbolic link";
        close(fd);
        return 0;
    }

    if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
        if (ioctl(fd, BLKGETSIZE64, &size) < 0) {
            LOG(ERROR) << "Failed ioctl on file " << file
                       << ", errno=" << errno;
            size = 0;
        }
    } else if (S_ISREG(st.st_mode)) {
        size = st.st_size;
    }

    close(fd);
    return size;
}

Comment on lines 265 to 292
static size_t getFileSize(const std::string &file) {
size_t size = 0;
struct stat st;

int fd = open(file.c_str(), O_RDONLY);
if (fd < 0) {
goto err_out;
}

if (fstat(fd, &st) != 0) {
goto err_close_file;
}

if (S_ISLNK(st.st_mode)) {
goto err_close_file;
}

if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
ioctl(fd, BLKGETSIZE64, &size);
} else if (S_ISREG(st.st_mode)) {
size = st.st_size;
}

err_close_file:
close(fd);
err_out:
return size;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This getFileSize function uses goto for error handling. To improve code readability and maintainability, consider refactoring it to use early returns instead. This makes the control flow more straightforward and is generally considered a better practice in modern C++.

static size_t getFileSize(const std::string &file) {
    size_t size = 0;
    struct stat st;

    int fd = open(file.c_str(), O_RDONLY);
    if (fd < 0) {
        return 0;
    }

    if (fstat(fd, &st) != 0) {
        close(fd);
        return 0;
    }

    if (S_ISLNK(st.st_mode)) {
        close(fd);
        return 0;
    }

    if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
        ioctl(fd, BLKGETSIZE64, &size);
    } else if (S_ISREG(st.st_mode)) {
        size = st.st_size;
    }

    close(fd);
    return size;
}

@nickyc975 nickyc975 force-pushed the jinlong/nvmeof-upstream branch from 52d7c85 to 41cdb49 Compare October 15, 2025 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants