Skip to content

Conversation

@aharbii
Copy link

@aharbii aharbii commented Jan 18, 2025

This is an initial PR for review.

  • Adding required TensorFlow protobufs
  • Adding basic TensorBoard logging functionality to test the integration and setup.

Copy link
Member

@eugeneo eugeneo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible start! Protobufs are really confusing in the start.

My comments are mostly to align our coding styles.

private:
TBoardFile(std::ofstream file) : file_(std::move(file)) {}

void AddScalar(const std::string& tag, double value, uint32_t step);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void AddScalar(const std::string& tag, double value, uint32_t step);
void AddScalar(std::string_view tag, double value, uint32_t step);

Can also be made public, replacing RecordLoss and RecordScalar

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of std::string_view cannot be done in the functions deals directly with TensorFlow API as it cannot be passed as arguments for the functions. and we will end up explicitly casting to std::string preventing benefiting of string_view usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_tag takes rvalue reference and copies anyway. Constructing std::string when calling it is fine. Having std::string_view in API has a lot of benefits as it allows callers use whatever string they like, not just std::string.

for (int i = 0; i < 10; ++i) {
tboard_file->RecordLoss(0.5 * i, i);
}
SUCCEED() << "Loss recorded";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know it was recorded? :) Call succeeded, but it does not guarantee the data is in the file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing methodology is still not yet defined, as we have several points, but I think eventually we will need a way to parse the output file for testing.

As currently the only way of validating the logic is to manually check the output tfevents file with TensorBoard viewer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We need to have a way to verify stuff like that early on.


// If the path is a directory, make the directory if it doesn't exist.
if (path.find('/') != std::string::npos) {
std::string_view directory = path.substr(0, path.find_last_of('/'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::filesystem::path has a lot of functionality for path manipulation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@eugeneo eugeneo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Steady progress.

Comment on lines +10 to +12
#ifdef _WIN32
#include <winsock2.h>
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#ifdef _WIN32
#include <winsock2.h>
#endif

I don't think this is being used. In general, there should be no network access in this module.

Comment on lines +37 to +40
std::string timestamp =
absl::FormatTime("%Y%m%d-%H%M%S", absl::Now(), absl::UTCTimeZone());
std::filesystem::path events_directory =
std::filesystem::path(logs_directory) / std::filesystem::path(timestamp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory should come from the outside (e.g. as a function parameter). This is a library so it should be as idempotent as possible - e.g. calling the API with same arguments should yield the same result. E.g. rerunning same training (say, because we crashed) should be able to override/add to docs.

Comment on lines +48 to +52
std::string hostname = "localhost";
char hostname_buffer[HOST_NAME_MAX + 1];
if (gethostname(hostname_buffer, sizeof(hostname_buffer)) == 0) {
hostname = hostname_buffer;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, client should provide this.

if (!encoded_image.ok()) {
LOG(ERROR) << "Failed to Add image \"" << image_name
<< "\": " << encoded_image.status();
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an error, it needs to be propagated. AddImage should return absl::Status

uint32_t crc_buffer = crc32c::Crc32c((uint8_t*)buffer.data(), buffer_length);
crc_buffer = TBoardFile::Mask(crc_buffer);

file_.write((char*)&buffer_length, sizeof(size_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_.write((char*)&buffer_length, sizeof(size_t));
file_.write(static_cast<char*>(&buffer_length), sizeof(size_t));

Please use appropriate C++ cast. This helps prevent UB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why the char? Char can be signed on some platforms or even be over 1 byte long.

I do not the file format, so I apologize if the comment is wrong.

file_.write((char*)&crc_buffer, sizeof(uint32_t));
}

absl::StatusOr<std::string> TBoardFile::EncodeImage(std::string_view image) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not need to be a member function. Move it into the anonymous namespace at the top of the file so it does not pollute the symbols table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not read image files here. Require users to provide images as std::span<std::byte>. Images may not exist in the local fs or may be stored in some special format.

// Motivation: it is problematic to compute the CRC of a string that
// contains embedded CRCs. Therefore we recommend that CRCs stored
// somewhere (e.g., in files) should be masked before being stored.
uint32_t TBoardFile::Mask(uint32_t crc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move into anonymous namespace, does not need to be a method.

private:
TBoardFile(std::ofstream file) : file_(std::move(file)) {}

void AddScalar(const std::string& tag, double value, uint32_t step);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_tag takes rvalue reference and copies anyway. Constructing std::string when calling it is fine. Having std::string_view in API has a lot of benefits as it allows callers use whatever string they like, not just std::string.

for (int i = 0; i < 10; ++i) {
tboard_file->RecordLoss(0.5 * i, i);
}
SUCCEED() << "Loss recorded";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We need to have a way to verify stuff like that early on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants