Skip to content

Conversation

@asudarsa
Copy link
Contributor

@asudarsa asudarsa commented Oct 2, 2024

SYCL objects are normally associated with properties that provide additional information about those objects. This information should be preserved during compilation and made available to the SYCL runtime via property sets. This PR provides an implementation of a simple library which allows to construct and serialize/de-serialize a sequence of typed property sets, where each property is a <name,typed value> pair.

PropertyValue set format:
'['']'
='|'
='|'
...
'['']'
='|'
where
, are strings
- string representation of the property type
- string representation of the property value.

For example:
[Staff/Ages]
person1=1|20
person2=1|25
[Staff/Experience]
person1=1|1
person2=1|2
person3=1|12

This library will be used in the SYCL offloading compilation flow to pass information about specialization constants, etc, to the SYCL runtime.

SYCL objects are normally associated with properties that provide additional information about those objects. This information should be preserved during compilation and made available to the SYCL runtime via property sets.
This PR provides an implementation of a simple library which allows to construct and serialize/de-serialize a sequence of typed property sets, where each property is a <name,typed value> pair.

PropertyValue set format:
   '['<PropertyValue set name>']'
   <property name>=<property type>'|'<property value>
   <property name>=<property type>'|'<property value>
   ...
   '['<PropertyValue set name>']'
   <property name>=<property type>'|'<property value>
where
   <PropertyValue set name>, <property name> are strings
   <property type> - string representation of the property type
   <property value> - string representation of the property value.

For example:
[Staff/Ages]
person1=1|20
person2=1|25
[Staff/Experience]
person1=1|1
person2=1|2
person3=1|12

This library will be used in the SYCL offloading compilation flow to pass information about specialization constants, etc, to the SYCL runtime.

Signed-off-by: Arvind Sudarsanam <[email protected]>
@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2024

@llvm/pr-subscribers-llvm-support

Author: Arvind Sudarsanam (asudarsa)

Changes

SYCL objects are normally associated with properties that provide additional information about those objects. This information should be preserved during compilation and made available to the SYCL runtime via property sets. This PR provides an implementation of a simple library which allows to construct and serialize/de-serialize a sequence of typed property sets, where each property is a <name,typed value> pair.

PropertyValue set format:
'['<PropertyValue set name>']'
<property name>=<property type>'|'<property value>
<property name>=<property type>'|'<property value>
...
'['<PropertyValue set name>']'
<property name>=<property type>'|'<property value>
where
<PropertyValue set name>, <property name> are strings
<property type> - string representation of the property type
<property value> - string representation of the property value.

For example:
[Staff/Ages]
person1=1|20
person2=1|25
[Staff/Experience]
person1=1|1
person2=1|2
person3=1|12

This library will be used in the SYCL offloading compilation flow to pass information about specialization constants, etc, to the SYCL runtime.


Patch is 20.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110771.diff

6 Files Affected:

  • (modified) llvm/include/llvm/Support/Base64.h (+26)
  • (added) llvm/include/llvm/Support/PropertySetIO.h (+256)
  • (modified) llvm/lib/Support/CMakeLists.txt (+1)
  • (added) llvm/lib/Support/PropertySetIO.cpp (+212)
  • (modified) llvm/unittests/Support/CMakeLists.txt (+1)
  • (added) llvm/unittests/Support/PropertySetIOTest.cpp (+72)
diff --git a/llvm/include/llvm/Support/Base64.h b/llvm/include/llvm/Support/Base64.h
index 3d96884749b32f..91317ffeca4170 100644
--- a/llvm/include/llvm/Support/Base64.h
+++ b/llvm/include/llvm/Support/Base64.h
@@ -56,6 +56,32 @@ template <class InputBytes> std::string encodeBase64(InputBytes const &Bytes) {
 
 llvm::Error decodeBase64(llvm::StringRef Input, std::vector<char> &Output);
 
+// General-purpose Base64 encoder/decoder class wrapper.
+class Base64 {
+public:
+  using byte = uint8_t;
+
+  // Get the size of the encoded byte sequence of given size.
+  static size_t getEncodedSize(size_t SrcSize);
+
+  // Encode a byte sequence of given size into an output stream.
+  // Returns the number of bytes in the encoded result.
+  static size_t encode(const byte *Src, raw_ostream &Out, size_t SrcSize);
+
+  // Get the size of the encoded byte sequence of given size.
+  static size_t getDecodedSize(size_t SrcSize);
+
+  // Decode a sequence of given size into a pre-allocated memory.
+  // Returns the number of bytes in the decoded result or 0 in case of error.
+  static Expected<size_t> decode(const char *Src, byte *Dst, size_t SrcSize);
+
+  // Allocate minimum required amount of memory and decode a sequence of given
+  // size into it.
+  // Returns the decoded result. The size can be obtained via getDecodedSize.
+  static Expected<std::unique_ptr<byte[]>> decode(const char *Src,
+                                                  size_t SrcSize);
+};
+
 } // end namespace llvm
 
 #endif
diff --git a/llvm/include/llvm/Support/PropertySetIO.h b/llvm/include/llvm/Support/PropertySetIO.h
new file mode 100644
index 00000000000000..9f02cfd31f54ec
--- /dev/null
+++ b/llvm/include/llvm/Support/PropertySetIO.h
@@ -0,0 +1,256 @@
+//==-- PropertySetIO.h -- models a sequence of property sets and their I/O -==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Models a sequence of property sets and their input and output operations.
+// TODO use Yaml as I/O engine.
+// PropertyValue set format:
+//   '['<PropertyValue set name>']'
+//   <property name>=<property type>'|'<property value>
+//   <property name>=<property type>'|'<property value>
+//   ...
+//   '['<PropertyValue set name>']'
+//   <property name>=<property type>'|'<property value>
+// where
+//   <PropertyValue set name>, <property name> are strings
+//   <property type> - string representation of the property type
+//   <property value> - string representation of the property value.
+//
+// For example:
+// [Staff/Ages]
+// person1=1|20
+// person2=1|25
+// [Staff/Experience]
+// person1=1|1
+// person2=1|2
+// person3=1|12
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_PROPERTYSETIO_H
+#define LLVM_SUPPORT_PROPERTYSETIO_H
+
+#include "llvm/ADT/MapVector.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Support/xxhash.h"
+
+namespace llvm {
+namespace util {
+
+// Represents a property value. PropertyValue name is stored in the encompassing
+// container.
+class PropertyValue {
+public:
+  // Type of the size of the value. Value size gets serialized along with the
+  // value data in some cases for later reading at runtime, so size_t is not
+  // suitable as its size varies.
+  using SizeTy = uint64_t;
+  using byte = uint8_t;
+
+  // Defines supported property types
+  enum Type { first = 0, NONE = first, UINT32, BYTE_ARRAY, last = BYTE_ARRAY };
+
+  // Translates C++ type to the corresponding type tag.
+  template <typename T> static Type getTypeTag();
+
+  // Casts from int value to a type tag.
+  static Expected<Type> getTypeTag(int T) {
+    if (T < first || T > last)
+      return createStringError(std::error_code(), "bad property type ", T);
+    return static_cast<Type>(T);
+  }
+
+  ~PropertyValue() {
+    if ((getType() == BYTE_ARRAY) && Val.ByteArrayVal)
+      delete[] Val.ByteArrayVal;
+  }
+
+  PropertyValue() = default;
+  PropertyValue(Type T) : Ty(T) {}
+
+  PropertyValue(uint32_t Val) : Ty(UINT32), Val({Val}) {}
+  PropertyValue(const byte *Data, SizeTy DataBitSize);
+  template <typename C, typename T = typename C::value_type>
+  PropertyValue(const C &Data)
+      : PropertyValue(reinterpret_cast<const byte *>(Data.data()),
+                      Data.size() * sizeof(T) * /* bits in one byte */ 8) {}
+  PropertyValue(const llvm::StringRef &Str)
+      : PropertyValue(reinterpret_cast<const byte *>(Str.data()),
+                      Str.size() * sizeof(char) * /* bits in one byte */ 8) {}
+  PropertyValue(const PropertyValue &P);
+  PropertyValue(PropertyValue &&P);
+
+  PropertyValue &operator=(PropertyValue &&P);
+
+  PropertyValue &operator=(const PropertyValue &P);
+
+  // get property value as unsigned 32-bit integer
+  uint32_t asUint32() const {
+    if (Ty != UINT32)
+      llvm_unreachable("must be UINT32 value");
+    return Val.UInt32Val;
+  }
+
+  // Get raw data size in bits.
+  SizeTy getByteArraySizeInBits() const {
+    if (Ty != BYTE_ARRAY)
+      llvm_unreachable("must be BYTE_ARRAY value");
+    SizeTy Res = 0;
+
+    for (size_t I = 0; I < sizeof(SizeTy); ++I)
+      Res |= (SizeTy)Val.ByteArrayVal[I] << (8 * I);
+    return Res;
+  }
+
+  // Get byte array data size in bytes.
+  SizeTy getByteArraySize() const {
+    SizeTy SizeInBits = getByteArraySizeInBits();
+    constexpr unsigned int MASK = 0x7;
+    return ((SizeInBits + MASK) & ~MASK) / 8;
+  }
+
+  // Get byte array data size in bytes, including the leading bytes encoding the
+  // size.
+  SizeTy getRawByteArraySize() const {
+    return getByteArraySize() + sizeof(SizeTy);
+  }
+
+  // Get byte array data including the leading bytes encoding the size.
+  const byte *asRawByteArray() const {
+    if (Ty != BYTE_ARRAY)
+      llvm_unreachable("must be BYTE_ARRAY value");
+    return Val.ByteArrayVal;
+  }
+
+  // Get byte array data excluding the leading bytes encoding the size.
+  const byte *asByteArray() const {
+    if (Ty != BYTE_ARRAY)
+      llvm_unreachable("must be BYTE_ARRAY value");
+    return Val.ByteArrayVal + sizeof(SizeTy);
+  }
+
+  bool isValid() const { return getType() != NONE; }
+
+  // set property value; the 'T' type must be convertible to a property type tag
+  template <typename T> void set(T V) {
+    if (getTypeTag<T>() != Ty)
+      llvm_unreachable("invalid type tag for this operation");
+    getValueRef<T>() = V;
+  }
+
+  Type getType() const { return Ty; }
+
+  SizeTy size() const {
+    switch (Ty) {
+    case UINT32:
+      return sizeof(Val.UInt32Val);
+    case BYTE_ARRAY:
+      return getRawByteArraySize();
+    default:
+      llvm_unreachable_internal("unsupported property type");
+    }
+  }
+
+private:
+  template <typename T> T &getValueRef();
+  void copy(const PropertyValue &P);
+
+  Type Ty = NONE;
+  // TODO: replace this union with std::variant when uplifting to C++17
+  union {
+    uint32_t UInt32Val;
+    // Holds first sizeof(size_t) bytes of size followed by actual raw data.
+    byte *ByteArrayVal;
+  } Val;
+};
+
+/// Structure for specialization of DenseMap in PropertySetRegistry.
+struct PropertySetKeyInfo {
+  static unsigned getHashValue(const SmallString<16> &K) { return xxHash64(K); }
+
+  static SmallString<16> getEmptyKey() { return SmallString<16>(""); }
+
+  static SmallString<16> getTombstoneKey() { return SmallString<16>("_"); }
+
+  static bool isEqual(StringRef L, StringRef R) { return L == R; }
+};
+
+using PropertyMapTy = DenseMap<SmallString<16>, unsigned, PropertySetKeyInfo>;
+/// A property set. Preserves insertion order when iterating elements.
+using PropertySet = MapVector<SmallString<16>, PropertyValue, PropertyMapTy>;
+
+/// A registry of property sets. Maps a property set name to its
+/// content.
+///
+/// The order of keys is preserved and corresponds to the order of insertion.
+class PropertySetRegistry {
+public:
+  using MapTy = MapVector<SmallString<16>, PropertySet, PropertyMapTy>;
+
+  // Specific property category names used by tools.
+  static constexpr char SYCL_SPECIALIZATION_CONSTANTS[] =
+      "SYCL/specialization constants";
+  static constexpr char SYCL_SPEC_CONSTANTS_DEFAULT_VALUES[] =
+      "SYCL/specialization constants default values";
+  static constexpr char SYCL_DEVICELIB_REQ_MASK[] = "SYCL/devicelib req mask";
+  static constexpr char SYCL_KERNEL_PARAM_OPT_INFO[] = "SYCL/kernel param opt";
+  static constexpr char SYCL_PROGRAM_METADATA[] = "SYCL/program metadata";
+  static constexpr char SYCL_MISC_PROP[] = "SYCL/misc properties";
+  static constexpr char SYCL_ASSERT_USED[] = "SYCL/assert used";
+  static constexpr char SYCL_EXPORTED_SYMBOLS[] = "SYCL/exported symbols";
+  static constexpr char SYCL_IMPORTED_SYMBOLS[] = "SYCL/imported symbols";
+  static constexpr char SYCL_DEVICE_GLOBALS[] = "SYCL/device globals";
+  static constexpr char SYCL_DEVICE_REQUIREMENTS[] = "SYCL/device requirements";
+  static constexpr char SYCL_HOST_PIPES[] = "SYCL/host pipes";
+  static constexpr char SYCL_VIRTUAL_FUNCTIONS[] = "SYCL/virtual functions";
+
+  /// Function for bulk addition of an entire property set in the given
+  /// \p Category .
+  template <typename MapTy> void add(StringRef Category, const MapTy &Props) {
+    assert(PropSetMap.find(Category) == PropSetMap.end() &&
+           "category already added");
+    auto &PropSet = PropSetMap[Category];
+
+    for (const auto &Prop : Props)
+      PropSet.insert_or_assign(Prop.first, PropertyValue(Prop.second));
+  }
+
+  /// Adds the given \p PropVal with the given \p PropName into the given \p
+  /// Category .
+  template <typename T>
+  void add(StringRef Category, StringRef PropName, const T &PropVal) {
+    auto &PropSet = PropSetMap[Category];
+    PropSet.insert({PropName, PropertyValue(PropVal)});
+  }
+
+  /// Parses from the given \p Buf a property set registry.
+  static Expected<std::unique_ptr<PropertySetRegistry>>
+  read(const MemoryBuffer *Buf);
+
+  /// Dumps the property set registry to the given \p Out stream.
+  void write(raw_ostream &Out) const;
+
+  MapTy::const_iterator begin() const { return PropSetMap.begin(); }
+  MapTy::const_iterator end() const { return PropSetMap.end(); }
+
+  /// Retrieves a property set with given \p Name .
+  PropertySet &operator[](StringRef Name) { return PropSetMap[Name]; }
+  /// Constant access to the underlying map.
+  const MapTy &getPropSets() const { return PropSetMap; }
+
+private:
+  MapTy PropSetMap;
+};
+
+} // namespace util
+
+} // namespace llvm
+
+#endif // #define LLVM_SUPPORT_PROPERTYSETIO_H
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 97188b0672f032..0a4d86bfca5121 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -217,6 +217,7 @@ add_llvm_component_library(LLVMSupport
   Parallel.cpp
   PluginLoader.cpp
   PrettyStackTrace.cpp
+  PropertySetIO.cpp  
   RandomNumberGenerator.cpp
   Regex.cpp
   RewriteBuffer.cpp
diff --git a/llvm/lib/Support/PropertySetIO.cpp b/llvm/lib/Support/PropertySetIO.cpp
new file mode 100644
index 00000000000000..2fe7cac00fb144
--- /dev/null
+++ b/llvm/lib/Support/PropertySetIO.cpp
@@ -0,0 +1,212 @@
+//==- PropertySetIO.cpp - models a sequence of property sets and their I/O -==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/PropertySetIO.h"
+
+#include "llvm/ADT/APInt.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/Base64.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/LineIterator.h"
+
+#include <memory>
+#include <string>
+
+using namespace llvm::util;
+using namespace llvm;
+
+namespace {
+
+using byte = Base64::byte;
+
+::llvm::Error makeError(const Twine &Msg) {
+  return createStringError(std::error_code{}, Msg);
+}
+
+} // anonymous namespace
+
+Expected<std::unique_ptr<PropertySetRegistry>>
+PropertySetRegistry::read(const MemoryBuffer *Buf) {
+  auto Res = std::make_unique<PropertySetRegistry>();
+  PropertySet *CurPropSet = nullptr;
+
+  for (line_iterator LI(*Buf); !LI.is_at_end(); LI++) {
+    // see if this line starts a new property set
+    if (LI->starts_with("[")) {
+      // yes - parse the category (property name)
+      auto EndPos = LI->rfind(']');
+      if (EndPos == StringRef::npos)
+        return makeError("invalid line: " + *LI);
+      StringRef Category = LI->substr(1, EndPos - 1);
+      CurPropSet = &(*Res)[Category];
+      continue;
+    }
+    if (!CurPropSet)
+      return makeError("property category missing");
+    // parse name and type+value
+    auto Parts = LI->split('=');
+
+    if (Parts.first.empty() || Parts.second.empty())
+      return makeError("invalid property line: " + *LI);
+    auto TypeVal = Parts.second.split('|');
+
+    if (TypeVal.first.empty() || TypeVal.second.empty())
+      return makeError("invalid property value: " + Parts.second);
+    APInt Tint;
+
+    // parse type
+    if (TypeVal.first.getAsInteger(10, Tint))
+      return makeError("invalid property type: " + TypeVal.first);
+    Expected<PropertyValue::Type> Ttag =
+        PropertyValue::getTypeTag(static_cast<int>(Tint.getSExtValue()));
+    StringRef Val = TypeVal.second;
+
+    if (!Ttag)
+      return Ttag.takeError();
+    PropertyValue Prop(Ttag.get());
+
+    // parse value depending on its type
+    switch (Ttag.get()) {
+    case PropertyValue::Type::UINT32: {
+      APInt ValV;
+      if (Val.getAsInteger(10, ValV))
+        return createStringError(std::error_code{},
+                                 "invalid property value: ", Val.data());
+      Prop.set(static_cast<uint32_t>(ValV.getZExtValue()));
+      break;
+    }
+    case PropertyValue::Type::BYTE_ARRAY: {
+      Expected<std::unique_ptr<byte[]>> DecArr =
+          Base64::decode(Val.data(), Val.size());
+      if (!DecArr)
+        return DecArr.takeError();
+      Prop.set(DecArr.get().release());
+      break;
+    }
+    default:
+      return createStringError(std::error_code{},
+                               "unsupported property type: ", Ttag.get());
+    }
+    (*CurPropSet)[Parts.first] = std::move(Prop);
+  }
+  if (!CurPropSet)
+    return makeError("invalid property set registry");
+
+  return Expected<std::unique_ptr<PropertySetRegistry>>(std::move(Res));
+}
+
+namespace llvm {
+// output a property to a stream
+raw_ostream &operator<<(raw_ostream &Out, const PropertyValue &Prop) {
+  Out << static_cast<int>(Prop.getType()) << "|";
+  switch (Prop.getType()) {
+  case PropertyValue::Type::UINT32:
+    Out << Prop.asUint32();
+    break;
+  case PropertyValue::Type::BYTE_ARRAY: {
+    util::PropertyValue::SizeTy Size = Prop.getRawByteArraySize();
+    Base64::encode(Prop.asRawByteArray(), Out, (size_t)Size);
+    break;
+  }
+  default:
+    llvm_unreachable(
+        ("unsupported property type: " + utostr(Prop.getType())).c_str());
+  }
+  return Out;
+}
+} // namespace llvm
+
+void PropertySetRegistry::write(raw_ostream &Out) const {
+  for (const auto &PropSet : PropSetMap) {
+    Out << "[" << PropSet.first << "]\n";
+
+    for (const auto &Props : PropSet.second) {
+      Out << Props.first << "=" << Props.second << "\n";
+    }
+  }
+}
+
+namespace llvm {
+namespace util {
+
+template <> uint32_t &PropertyValue::getValueRef<uint32_t>() {
+  return Val.UInt32Val;
+}
+
+template <> byte *&PropertyValue::getValueRef<byte *>() {
+  return Val.ByteArrayVal;
+}
+
+template <> PropertyValue::Type PropertyValue::getTypeTag<uint32_t>() {
+  return UINT32;
+}
+
+template <> PropertyValue::Type PropertyValue::getTypeTag<byte *>() {
+  return BYTE_ARRAY;
+}
+
+PropertyValue::PropertyValue(const byte *Data, SizeTy DataBitSize) {
+  constexpr int ByteSizeInBits = 8;
+  Ty = BYTE_ARRAY;
+  SizeTy DataSize = (DataBitSize + (ByteSizeInBits - 1)) / ByteSizeInBits;
+  constexpr size_t SizeFieldSize = sizeof(SizeTy);
+
+  // Allocate space for size and data.
+  Val.ByteArrayVal = new byte[SizeFieldSize + DataSize];
+
+  // Write the size into first bytes.
+  for (size_t I = 0; I < SizeFieldSize; ++I) {
+    Val.ByteArrayVal[I] = (byte)DataBitSize;
+    DataBitSize >>= ByteSizeInBits;
+  }
+  // Append data.
+  std::memcpy(Val.ByteArrayVal + SizeFieldSize, Data, DataSize);
+}
+
+PropertyValue::PropertyValue(const PropertyValue &P) { *this = P; }
+
+PropertyValue::PropertyValue(PropertyValue &&P) { *this = std::move(P); }
+
+PropertyValue &PropertyValue::operator=(PropertyValue &&P) {
+  copy(P);
+
+  if (P.getType() == BYTE_ARRAY)
+    P.Val.ByteArrayVal = nullptr;
+  P.Ty = NONE;
+  return *this;
+}
+
+PropertyValue &PropertyValue::operator=(const PropertyValue &P) {
+  if (P.getType() == BYTE_ARRAY)
+    *this = PropertyValue(P.asByteArray(), P.getByteArraySizeInBits());
+  else
+    copy(P);
+  return *this;
+}
+
+void PropertyValue::copy(const PropertyValue &P) {
+  Ty = P.Ty;
+  Val = P.Val;
+}
+
+constexpr char PropertySetRegistry::SYCL_SPECIALIZATION_CONSTANTS[];
+constexpr char PropertySetRegistry::SYCL_DEVICELIB_REQ_MASK[];
+constexpr char PropertySetRegistry::SYCL_SPEC_CONSTANTS_DEFAULT_VALUES[];
+constexpr char PropertySetRegistry::SYCL_KERNEL_PARAM_OPT_INFO[];
+constexpr char PropertySetRegistry::SYCL_PROGRAM_METADATA[];
+constexpr char PropertySetRegistry::SYCL_MISC_PROP[];
+constexpr char PropertySetRegistry::SYCL_ASSERT_USED[];
+constexpr char PropertySetRegistry::SYCL_EXPORTED_SYMBOLS[];
+constexpr char PropertySetRegistry::SYCL_IMPORTED_SYMBOLS[];
+constexpr char PropertySetRegistry::SYCL_DEVICE_GLOBALS[];
+constexpr char PropertySetRegistry::SYCL_DEVICE_REQUIREMENTS[];
+constexpr char PropertySetRegistry::SYCL_HOST_PIPES[];
+constexpr char PropertySetRegistry::SYCL_VIRTUAL_FUNCTIONS[];
+
+} // namespace util
+} // namespace llvm
diff --git a/llvm/unittests/Support/CMakeLists.txt b/llvm/unittests/Support/CMakeLists.txt
index d64f89847aa8e7..0261ecdd7ea0bd 100644
--- a/llvm/unittests/Support/CMakeLists.txt
+++ b/llvm/unittests/Support/CMakeLists.txt
@@ -69,6 +69,7 @@ add_llvm_unittest(SupportTests
   PerThreadBumpPtrAllocatorTest.cpp
   ProcessTest.cpp
   ProgramTest.cpp
+  PropertySetIOTest.cpp
   RegexTest.cpp
   ReverseIterationTest.cpp
   ReplaceFileTest.cpp
diff --git a/llvm/unittests/Support/PropertySetIOTest.cpp b/llvm/unittests/Support/PropertySetIOTest.cpp
new file mode 100644
index 00000000000000..74ade9f9dcc219
--- /dev/null
+++ b/llvm/unittests/Support/PropertySetIOTest.cpp
@@ -0,0 +1,72 @@
+//===- llvm/unittest/Support/PropertySetIO.cpp - Property set I/O tests ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/PropertySetIO.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include "gtest/gtest.h"
+
+using namespace llvm;
+using namespace llvm::util;
+
+namespace {
+
+TEST(PropertySet, IntValuesIO) {
+  // '1' in '1|20' means 'integer property'
+  auto Content = "[Staff/Ages]\n"
+                 "person1=1|20\n"
+                 "person2=1|25\n"
+                 "[Staff/Experience]\n"
+                 "person1=1|1\n"
+                 "person2=1|2\n"
+                 "person3=1|12\n";
+  auto MemBuf = MemoryBuffer::getMemBuffer(Content);
+  // Parse a property set registry
+  auto PropSetsPtr = PropertySetRegistry::read(MemBuf.get());
+
+  if (!PropSetsPtr)
+    FAIL() << "PropertySetRegistry::read failed\n";
+
+  std::string Serialized;
+  {
+    llvm::raw_string_ostream OS(Serialized);
+    // Serialize
+    PropSetsPtr->get()->write(OS);
+  }
+  // Check that the original and the serialized version are equal
+  ASSERT_EQ(Serialized, Content);
+}
+
+TEST(PropertySet, ByteArrayValuesIO) {
+  // '2' in '2|...' means 'byte array prope...
[truncated]

@asudarsa
Copy link
Contributor Author

asudarsa commented Oct 2, 2024

@jhuber6

Can you please take a look at this PR when convenient? This support is required to add SYCL offloading compilation support.
It is specifically used in the sycl-post-link step.

Thanks

Signed-off-by: Arvind Sudarsanam <[email protected]>
@asudarsa asudarsa requested a review from jhuber6 October 2, 2024 01:50
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me some background on the problem we're trying to solve here? This seems to be the serialization of some kind of binary metadata. I think the go-to solution for that is stuff like yaml2obj or obj2yaml. For example, that's what I did for the OffloadBinary class and it's what AMDGPU uses for its kernel metadata.

@asudarsa
Copy link
Contributor Author

asudarsa commented Oct 2, 2024

Can you give me some background on the problem we're trying to solve here? This seems to be the serialization of some kind of binary metadata. I think the go-to solution for that is stuff like yaml2obj or obj2yaml. For example, that's what I did for the OffloadBinary class and it's what AMDGPU uses for its kernel metadata.

Hi @jhuber6

You are right. We are trying to represent a unique configuration of kernel metadata using a specific data structure called PropertySet. We provide encode/decode functionality using Base64 representation so that the metadata can be serialized and written to a string that can be included in the StringMap of the OffloadingImage. The SYCL offload wrapper decodes this string and the decoded property set is included into SYCL binary image. SYCL runtime has been programmed to identify the property sets and extract required information during runtime.
This support was added to our downstream repository way back in 2021 and has provided us with a well-oiled infrastructure to transmit kernel properties across multiple stages of compilation and runtime.
I briefly looked at yaml support to represent kernel metadata and it does seem like a possible fit for our purposes.

Our current objective is to get an end-to-end compilation+runtime flow for SYCL available in upstream LLVM. Once available, we can surely look at both approaches and try to come up with a unified solution.

Please let me know if this is agreeable.
Thanks again for your feedback thus far.

Sincerely

@asudarsa asudarsa requested a review from jhuber6 October 2, 2024 17:43
@@ -0,0 +1,259 @@
//==-- PropertySetIO.h -- models a sequence of property sets and their I/O -==//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is support where this should go? Feels more like an object thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'Support' is a better option. It is just yet another data format.

Thanks

~PropertyValue() {
if (std::holds_alternative<byte *>(Val)) {
byte *ByteArrayVal = std::get<byte *>(Val);
delete ByteArrayVal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use unique_ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time I am using std::variant. I looked up some documentation and this was one of the suggested ways to 'free' memory. I am not sure how to use unique_ptr here.
Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have used unique_ptr inside the 'read' routine. That's the only place where we allocate memory for 'Val'. So, by transferring the ownership back to caller from 'read' routine, the caller now has full ownership of the memory allocated to 'Val'.
Hope that's agreeable.
Thanks

PropSetsPtr->get()->write(OS);
}
// Check that the original and the serialized version are equal
ASSERT_EQ(Serialized, Content);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ASSERT_EQ(Serialized, Content);
EXPECT_EQ(Serialized, Content);

copy(P);

if (P.getType() == BYTE_ARRAY) {
auto &ByteArrayVal = std::get<byte *>(P.Val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto &ByteArrayVal = std::get<byte *>(P.Val);
auto *ByteArrayVal = std::get<byte *>(P.Val);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the reference here as we want to set it to nullptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified code to access P.Val directly.

Thanks

@asudarsa
Copy link
Contributor Author

asudarsa commented Oct 3, 2024

Hi @arsenm

Thanks very much for your time and feedback. I have addressed all your feedback in the latest commit, except for two.

  1. Renaming with SYCL prefix
  2. Adding tests to check for error messages

I will work on these and post a new commit soon.

Thanks again
Sincerely

@github-actions
Copy link

github-actions bot commented Oct 3, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Signed-off-by: Arvind Sudarsanam <[email protected]>
Copy link
Contributor

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see motivation for adding an additional base64 implementation to LLVM. If improvements to the existing base64 implementation are desired or warranted, please evolve the existing implementation rather than adding an additional one.

@asudarsa
Copy link
Contributor Author

asudarsa commented Oct 3, 2024

Hi @tahonermann

Thanks for the feedback. There is some history behind the two different implementations. One was submitted directly upstream and the other one was developed downstream.
I do agree with your suggestion. Let me try to use the implementation available upstream and see if it will work for us.

Thanks

@asudarsa asudarsa requested review from arsenm and jhuber6 October 9, 2024 20:54
if (std::holds_alternative<std::byte *>(Val)) {
auto ByteArrayVal = std::get<std::byte *>(Val);
if (ByteArrayVal)
delete[] ByteArrayVal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just use unique_ptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arsenm

I will give it another shot. I had a bit of an issue in getting unique_ptr to work with remainder of the code. I must be missing something. I will put up a change soon. Thanks again for looking into this.

Sincerely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arsenm

I added use of unique_ptr in 0d5af17
It was a good learning curve for me.

Thanks
Sincerely

@llvm-beanz
Copy link
Collaborator

In the meanwhile, I have added some description inside the summary of this PR. Hope that answers some of your queries here.

The description in the summary doesn't really explain how this is used other than "to pass information about specialization constants, etc, to the SYCL runtime."

That's not really descriptive, and doesn't explain why this new textual format was designed. Some specific questions:

  1. Is this text format part of the SYCL specification, or is it an implementation detail?
  2. If it is an implementation detail, can it be changed to a standard defined textual format (YAML, JSON, XML)? I ask this specifically because custom serialization formats are often great targets for security vulnerabilities, so using a standardized and defined format is preferable to making your own. Obviously if this is a required part of the SYCL specification that has different considerations.
  3. How is this used during the compiler? Specifically, where are they generated, and where are they consumed?

else if (Ty == ByteArray)
Val = (std::byte *)(0);
else
llvm_unreachable_internal("unsupported SYCL property type");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just assert(Ty != None) at the top of the function? Also if this needs to handle all possible cases of Ty a switch would be preferable so that you get diagnostics for missing cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used switch-case instead of if-then-else. Thanks

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 15, 2024

2. If it is an implementation detail, can it be changed to a standard defined textual format (YAML, JSON, XML)? I ask this specifically because custom serialization formats are often great targets for security vulnerabilities, so using a standardized and defined format is preferable to making your own. Obviously if this is a required part of the SYCL specification that has different considerations.

My gut instinct is that this should probably be a binary format w/ all the YAML serialization all the others get. But my assumption is that SYCL's implementation uses this somehow so I'm unsure how much work it is to rip it all up and do something else.

@asudarsa
Copy link
Contributor Author

Hi @llvm-beanz

Thanks so much for guiding this code review process. I have tried to answer your questions here.

  1. Is this text format part of the SYCL specification, or is it an implementation detail?
    This text format is not part of the SYCL specification. It is used to represent SYCL properties which is part of the SYCL specification, but the format itself is more of an implementation detail.
  1. If it is an implementation detail, can it be changed to a standard defined textual format (YAML, JSON, XML)? I ask this specifically because custom serialization formats are often great targets for security vulnerabilities, so using a standardized and defined format is preferable to making your own. Obviously if this is a required part of the SYCL specification that has different considerations.
    One of my colleagues had tried to switch to YAML some time ago, but that effort got stuck pretty quickly. The main problem that we encountered with switching to YAML was lack of flexibility: we don't know how many SYCL properties there will be and SYCL properties within a single set (which could have been represented as an array) may have different types (which makes it hard to represent them as an array).
  1. How is this used during the compiler? Specifically, where are they generated, and where are they consumed?

Here is a concise sequence of steps done during SYCL device code linking.

clang: .cpp -> .bc
llvm-link: .bc (multiple) -> .bc (single)
sycl-post-link: .bc (single) -> .bc + .prop (**Generation of data in PropertySetIO format**) 
device-compiler/spirv-llvm-translator here to transform device code: .bc -> .image
clang-offload-wrapper: .image + .prop -> .bc (later linked with host code) (**Consumption of data in PropertySetIO format**) 

Hope this gives a better picture of why we need SYCLPropertySetIO and how it is used.

Thanks
Sincerely

Signed-off-by: Arvind Sudarsanam <[email protected]>
@jhuber6
Copy link
Contributor

jhuber6 commented Oct 16, 2024

sycl-post-link: .bc (single) -> .bc + .prop (Generation of data in PropertySetIO format)
device-compiler/spirv-llvm-translator here to transform device code: .bc -> .image

Ideally this will be a single step once the backend target is done.

clang-offload-wrapper: .image + .prop -> .bc (later linked with host code) (Consumption of data in PropertySetIO format)

I think I deleted this in community LLVM ages ago, was planning on adding it back in as an llvm tool but never got around to doing it.

@asudarsa
Copy link
Contributor Author

sycl-post-link: .bc (single) -> .bc + .prop (Generation of data in PropertySetIO format)
device-compiler/spirv-llvm-translator here to transform device code: .bc -> .image

Ideally this will be a single step once the backend target is done.

clang-offload-wrapper: .image + .prop -> .bc (later linked with host code) (Consumption of data in PropertySetIO format)

I think I deleted this in community LLVM ages ago, was planning on adding it back in as an llvm tool but never got around to doing it.

Yes. That's the objective. We will try to minimize number of steps once the SPIR-V backend is ready. We also do not intend to use 'clang-offload-wrapper' tool. Wrapping will be done using API calls (as is done in the community).

Thanks @jhuber6

@llvm-beanz
Copy link
Collaborator

One of my colleagues had tried to switch to YAML some time ago, but that effort got stuck pretty quickly. The main problem that we encountered with switching to YAML was lack of flexibility: we don't know how many SYCL properties there will be and SYCL properties within a single set (which could have been represented as an array) may have different types (which makes it hard to represent them as an array).

YAML files can contain multiple "documents" as well. I'm not sure I see how YAML is unsuitable for this use case.

That aside, it really seems to me that this design decision has a bunch of different implications.

If we go forward with this solution I think we need a lot more tests of the parser itself since parsing is generally a source of security bugs.

@asudarsa
Copy link
Contributor Author

One of my colleagues had tried to switch to YAML some time ago, but that effort got stuck pretty quickly. The main problem that we encountered with switching to YAML was lack of flexibility: we don't know how many SYCL properties there will be and SYCL properties within a single set (which could have been represented as an array) may have different types (which makes it hard to represent them as an array).

YAML files can contain multiple "documents" as well. I'm not sure I see how YAML is unsuitable for this use case.

That aside, it really seems to me that this design decision has a bunch of different implications.

If we go forward with this solution I think we need a lot more tests of the parser itself since parsing is generally a source of security bugs.

Hi @llvm-beanz

Thanks for your feedback. Just a heads up. I am currently looking at adding more tests.
A quick addon comment. One of our primary objective is to provide a fully functional SYCL offloading support and PropertySetIO plays a small, but important role in our compilation flow. So, it will be great to have this upstreamed to unblock other parts of our upstreaming process. One good source of confidence for our implementation is that it has been present in our intel/llvm compiler for a while. However, I completely agree with your pointer that this will benefit from additional testing.

Thanks

@bader bader added the SYCL https://registry.khronos.org/SYCL label Oct 21, 2024
@asudarsa
Copy link
Contributor Author

Hi @llvm-beanz

First of all, thanks so much for your continued reviews here.
For adding more testing, I looked at unittest/Support/YAMLIOTest.cpp as a reference and I also looked at the use-cases of PropertySetIO in our downstream SYCL compiler.
I have managed to add several test cases covering the use-cases in the SYCL compiler and also several test cases that check if errors are captured as expected.
Kindly take a look when convenient. Please let me know if there are gaps and I will try to handle them.

Thanks again
Sincerely

Signed-off-by: Arvind Sudarsanam <[email protected]>
@asudarsa
Copy link
Contributor Author

Hi @tahonermann, I have addressed your concern about 'additional base64 implementation' in de19806. I have completely removed the 'additional' base64 implementation and have refactored the PropertySetIO implementation to use community implementation. It will be great if you can take a look when convenient and verify if I have addressed your concerns adequately. Thanks so much.

@asudarsa
Copy link
Contributor Author

Hi @bader, Can you please provide your review when convenient for this PR as a SYCL owner?

Thanks

@asudarsa
Copy link
Contributor Author

Hi @arsenm

Thanks very much for your feedback thus far. Based on your feedback, I am now using 'unique_ptr' in my implementation. I have also tried to address all other concerns to my best ability. Can you please take another look when convenient and see if I have adequately addressed all concerns?

Sincerely

@asudarsa
Copy link
Contributor Author

Hi @jhuber6

Thanks again for feedback and guidance thus far. It will be great if you can take a look at this PR when convenient and see if it is in good shape. i have addressed all concerns to my best of ability.

Sincerely

}
EXPECT_EQ(Serialized, Expected);
}
} // namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this?
[SomeValue][OhNoAnotherValue]\nbleh=oops

@asudarsa
Copy link
Contributor Author

Closing this as we are working on alternate approach to implement SYCL property sets.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:support SYCL https://registry.khronos.org/SYCL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants