Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions devtools/etdump/buffer_data_sink.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Copyright (c) Meta Platforms, Inc. and affiliates.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree.
*/

#include <executorch/devtools/etdump/buffer_data_sink.h>
#include <executorch/devtools/etdump/utils.h>

using ::executorch::runtime::Error;
using ::executorch::runtime::Result;

namespace executorch {
namespace etdump {

Result<size_t> BufferDataSink::write(const void* ptr, size_t length) {
if (length == 0) {
return static_cast<size_t>(-1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an earlier comment:

@dbort:

size_t is unsigned, so -1 will look like a very large positive number: 18 exabytes on a 64-bit machine. If this is part of the contract, please document it in the base class. But I'd recommend just returning offset_ in this case since it's a zero-sized buffer and the offset doesn't matter.

@Gasoonjia:

let me create another diff for solving that; this function copy&paste from copy_tensor_to_debug_buffer function in ETDumpGen; updating the returning data here may introduce BP issue

What is "BP"? A behavior change in this new function will not affect anyone, because it is not used yet. If some future caller needs this (size_t)-1 behavior for some reason, then they can deal with that themselves: they know when they're writing zero bytes. I'd prefer not to land potentially-dangerous logic like this, because it might never be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry that's a typo. I wanted to type "BC", which refers to "Backward-Compatibility" issue.

This is not a new function. It has existed in the ETDumpGen class (https://github.com/pytorch/executorch/blob/main/devtools/etdump/etdump_flatcc.cpp#L511) for writing tensors into user-owned buffer, and our buffer_data_sink extracted that part of logic into our class for maintaining the current behavior.

Therefore, I would like to reproduce the existing logic first. I know it is not good enough. Will work on it in the following diffs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a new function that no-one uses yet. If ETDumpGen::copy_tensor_to_debug_buffer() calls this, then you can isolate the old behavior there, like

size_t ETDumpGen::copy_tensor_to_debug_buffer(executorch::aten::Tensor tensor) {
  // (comment explaining this special case)
  if (tensor.nbytes() == 0) {
    return static_cast<size_t>(-1);
  }
  return data_sink_->write(tensor.const_data_ptr(), tensor.nbytes());
}

This way, the new method can behave properly from the start, and other users/subclasses of DataSinkBase don't need to duplicate the old behavior.

uint8_t* offset_ptr =
internal::align_pointer(debug_buffer_.data() + offset_, 64);

// Zero out the padding between data blobs.
size_t n_zero_pad = offset_ptr - debug_buffer_.data() - offset_;
memset(debug_buffer_.data() + offset_, 0, n_zero_pad);

offset_ = (offset_ptr - debug_buffer_.data()) + length;

// Raise access error if offset_ is out of range.
if (offset_ > debug_buffer_.size()) {
return Error::OutOfResources;
}

memcpy(offset_ptr, ptr, length);
return (size_t)(offset_ptr - debug_buffer_.data());
}

Result<size_t> BufferDataSink::get_storage_size() const {
return debug_buffer_.size();
}

size_t BufferDataSink::get_used_bytes() const {
return offset_;
}

} // namespace etdump
} // namespace executorch
81 changes: 81 additions & 0 deletions devtools/etdump/buffer_data_sink.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/*
* Copyright (c) Meta Platforms, Inc. and affiliates.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree.
*/

#pragma once

#include <executorch/devtools/etdump/data_sink_base.h>
#include <executorch/runtime/core/exec_aten/exec_aten.h>
#include <executorch/runtime/core/span.h>

namespace executorch {
namespace etdump {

/**
* BufferDataSink is a concrete implementation of the DataSinkBase class,
* designed to store debug data in a pre-allocated, user-owned buffer. This
* class provides methods to write raw data and tensor data into the buffer,
* ensuring proper alignment and managing padding as needed. It is the standard
* DataSink used by ETDumpGen.
*/
class BufferDataSink : public DataSinkBase {
public:
/**
* Constructs a BufferDataSink with a given buffer.
*
* @param[in] buffer A Span object representing the buffer where data will be
* stored.
*/
explicit BufferDataSink(::executorch::runtime::Span<uint8_t> buffer)
: debug_buffer_(buffer), offset_(0) {}

BufferDataSink(const BufferDataSink&) = delete;
BufferDataSink& operator=(const BufferDataSink&) = delete;
BufferDataSink(BufferDataSink&&) = default;
BufferDataSink& operator=(BufferDataSink&&) = default;

~BufferDataSink() override = default;

/**
* Write data into the debug buffer and return the offset of the starting
* location of the data within the buffer.
*
* @param[in] ptr A pointer to the data to be written into the storage.
* @param[in] size The size of the data in bytes.
* @return A Result object containing either:
* - The offset of the starting location of the data within the
* debug buffer, or
* - An error code indicating the failure reason, if any issue
* occurs during the write process.
*/
::executorch::runtime::Result<size_t> write(const void* ptr, size_t size)
override;

/**
* Retrieves the total size of the buffer.
*
* @return A Result object containing the total size of the buffer in bytes.
*/
::executorch::runtime::Result<size_t> get_storage_size() const override;

/**
* Retrieves the number of bytes currently used in the buffer.
*
* @return The amount of data currently stored in the buffer in bytes.
*/
size_t get_used_bytes() const override;

private:
// A Span object representing the buffer used for storing debug data.
::executorch::runtime::Span<uint8_t> debug_buffer_;

// The offset of the next available location in the buffer.
size_t offset_;
};

} // namespace etdump
} // namespace executorch
70 changes: 70 additions & 0 deletions devtools/etdump/data_sink_base.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
/*
* Copyright (c) Meta Platforms, Inc. and affiliates.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree.
*/

#pragma once

#include <executorch/runtime/core/exec_aten/exec_aten.h>
#include <executorch/runtime/core/result.h>

namespace executorch {
namespace etdump {

/**
* DataSinkBase is an abstract class that users can inherit and implement
* to customize the storage and management of debug data in ETDumpGen. This
* class provides a basic and essential interface for writing datablob to a
* user-defined storage, retrieving storage capacity, and tracking the amount of
* data stored.
*/
class DataSinkBase {
public:
/**
* Virtual destructor to ensure proper cleanup of derived classes.
*/
virtual ~DataSinkBase() = default;
/**
* Write data into the debug storage. This method should be implemented
* by derived classes to handle the specifics of data storage.
*
* This function should return the offset of the starting location of the
* data within the debug storage if the write operation succeeds, or an
* Error code if any issue occurs during the write process.
*
* @param[in] ptr A pointer to the data to be written into the storage.
* @param[in] length The size of the data in bytes.
* @return A Result object containing either:
* - The offset of the starting location of the data within the
* debug storage, which will be recorded in the corresponding
* metadata of ETDump, or
* - An error code indicating the failure reason, if any issue
* occurs during the write process.
*/
virtual ::executorch::runtime::Result<size_t> write(
const void* ptr,
size_t length) = 0;
/**
* Get the maximum capacity of the debug storage in bytes.
* Should return Error::NotSupported if the capacity is not available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this might return NotSupported, then users of the base class can't depend on it. Ideally, base classes should only contain methods that all subclasses will be able to implement.

I recommend removing this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but some of users need to get the size of debug buffer from ETDumpGen (https://github.com/pytorch/executorch/blob/main/devtools/etdump/etdump_flatcc.cpp#L646). Under current datasink scenerio, we need to have a way for retreiving the size of every datasink.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see one use of get_debug_buffer_size today, at https://fburl.com/code/3od7ch5t; and that is a bad use of the API. We should talk to them and see if they can do it in a different way, removing the only usage of this method.

we need to have a way for retreiving the size of every datasink

This just isn't possible for all datasinks. If the old ETDump interface needs to support it for now, then we can find a way to isolate that knowledge in the ETDump code, instead of requiring all DataSinks to follow the old pattern.

* (e.g. unbounded storage like internet or file)
*
* @return A Result object containing either:
* - The total size of the debug storage, or
* - An error code indicating the failure reason, if any issue
* occurs when retrieving the storage capacity.
*/
virtual ::executorch::runtime::Result<size_t> get_storage_size() const = 0;
/**
* Get the number of bytes currently used in the debug storage.
*
* @return The amount of data currently stored in bytes.
*/
virtual size_t get_used_bytes() const = 0;
};

} // namespace etdump
} // namespace executorch
20 changes: 6 additions & 14 deletions devtools/etdump/etdump_flatcc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include <executorch/devtools/etdump/emitter.h>
#include <executorch/devtools/etdump/etdump_schema_flatcc_builder.h>
#include <executorch/devtools/etdump/etdump_schema_flatcc_reader.h>
#include <executorch/devtools/etdump/utils.h>
#include <executorch/runtime/core/exec_aten/exec_aten.h>
#include <executorch/runtime/core/exec_aten/util/scalar_type_util.h>
#include <executorch/runtime/platform/assert.h>
Expand Down Expand Up @@ -94,16 +95,6 @@ etdump_Tensor_ref_t add_tensor_entry(
return etdump_Tensor_end(builder_);
}

static uint8_t* alignPointer(void* ptr, size_t alignment) {
intptr_t addr = reinterpret_cast<intptr_t>(ptr);
if ((addr & (alignment - 1)) == 0) {
// Already aligned.
return reinterpret_cast<uint8_t*>(ptr);
}
addr = (addr | (alignment - 1)) + 1;
return reinterpret_cast<uint8_t*>(addr);
}

} // namespace

// Constructor implementation
Expand All @@ -113,9 +104,10 @@ ETDumpGen::ETDumpGen(Span<uint8_t> buffer) {
// Initialize the flatcc builder_ using the buffer and buffer size.

if (buffer.data() != nullptr) {
builder_ = (struct flatcc_builder*)alignPointer(buffer.data(), 64);
uintptr_t buffer_with_builder =
(uintptr_t)alignPointer(builder_ + sizeof(struct flatcc_builder), 64);
builder_ =
(struct flatcc_builder*)internal::align_pointer(buffer.data(), 64);
uintptr_t buffer_with_builder = (uintptr_t)internal::align_pointer(
builder_ + sizeof(struct flatcc_builder), 64);
size_t builder_size =
(size_t)(buffer_with_builder - (uintptr_t)buffer.data());
size_t min_buf_size = max_alloc_buf_size + builder_size;
Expand Down Expand Up @@ -513,7 +505,7 @@ size_t ETDumpGen::copy_tensor_to_debug_buffer(executorch::aten::Tensor tensor) {
return static_cast<size_t>(-1);
}
uint8_t* offset_ptr =
alignPointer(debug_buffer_.data() + debug_buffer_offset_, 64);
internal::align_pointer(debug_buffer_.data() + debug_buffer_offset_, 64);
debug_buffer_offset_ = (offset_ptr - debug_buffer_.data()) + tensor.nbytes();
ET_CHECK_MSG(
debug_buffer_offset_ <= debug_buffer_.size(),
Expand Down
47 changes: 47 additions & 0 deletions devtools/etdump/targets.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,54 @@ def define_common_targets():
exported_external_deps = ["flatccrt"],
)

runtime.cxx_library(
name = "utils",
srcs = [],
exported_headers = [
"utils.h",
],
visibility = [

],
)

for aten_mode in (True, False):
aten_suffix = "_aten" if aten_mode else ""

runtime.cxx_library(
name = "data_sink_base" + aten_suffix,
exported_headers = [
"data_sink_base.h",
],
exported_deps = [
"//executorch/runtime/core/exec_aten/util:scalar_type_util" + aten_suffix,
],
visibility = [
"//executorch/...",
"@EXECUTORCH_CLIENTS",
],
)

runtime.cxx_library(
name = "buffer_data_sink" + aten_suffix,
headers = [
"buffer_data_sink.h",
],
srcs = [
"buffer_data_sink.cpp",
],
deps = [
":utils",
],
exported_deps = [
"//executorch/runtime/core/exec_aten:lib" + aten_suffix,
":data_sink_base" + aten_suffix,
],
visibility = [
"//executorch/...",
"@EXECUTORCH_CLIENTS",
],
)
runtime.cxx_library(
name = "etdump_flatcc" + aten_suffix,
srcs = [
Expand All @@ -106,6 +152,7 @@ def define_common_targets():
],
exported_deps = [
":etdump_schema_flatcc",
":utils",
"//executorch/runtime/core:event_tracer" + aten_suffix,
"//executorch/runtime/core/exec_aten/util:scalar_type_util" + aten_suffix,
],
Expand Down
Loading
Loading