-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[llvm]Add a simple Telemetry framework #102323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 46 commits
dbb8b15
6e43f67
e25f5fc
ad3906c
24d07d6
0057bcf
750e4ac
1378ed4
02e750e
63e99fc
fa88512
0866f64
a8523f7
47e8b06
690c6ab
db668f4
0290a14
14b5234
6ca87e5
4155add
11ead4a
48228ee
ed8a3f1
990e1ca
479cd13
6d8aee2
3a17dbb
a6a6c7b
f30dec6
fd3da20
60616ef
8c0ac5a
c8be829
1393c1f
eb07577
0376abc
e2f7c23
17dfac7
1ea0906
39fd0e7
67b7688
efd25d8
4a8276f
a16344b
26ee5eb
b766e3c
c8cddab
5f8d65f
dffeacd
398ed3e
e462908
df8a9af
70f4742
8eab77a
f9e1a65
2a1fbe5
6e3a85d
f9b1cce
0f38a74
ad57099
a2fcd4d
628e7e9
6ea4e92
08cf32f
3c52401
07f06c0
06e746e
2476294
5ce406c
fd57d06
a8a878b
54eeaba
32dfc6a
0893c5b
155f243
b255e3a
14d1c92
bf2172d
c2dcb7d
bbe9009
e2036c6
05947d5
f1100bb
2f64b29
a5df7a0
585fee8
3812cda
0cdc240
da8056f
c116074
a4e2799
2763d46
75c1b4b
9780ec7
4715743
d231bf2
1941b1f
bd3df5e
4351b94
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,214 @@ | ||
| =========================== | ||
| Telemetry framework in LLVM | ||
| =========================== | ||
|
|
||
| .. contents:: | ||
| :local: | ||
|
|
||
| .. toctree:: | ||
| :hidden: | ||
|
|
||
| =========================== | ||
| Telemetry framework in LLVM | ||
| =========================== | ||
|
|
||
| Objective | ||
| ========= | ||
|
|
||
| Provides a common framework in LLVM for collecting various usage and performance | ||
| metrics. | ||
| It is located at `llvm/Telemetry/Telemetry.h` | ||
|
|
||
| Characteristics | ||
| --------------- | ||
| * Configurable and extensible by: | ||
|
|
||
| * Tools: any tool that wants to use Telemetry can extend and customize it. | ||
| * Vendors: Toolchain vendors can also provide custom implementation of the | ||
| library, which could either override or extend the given tool's upstream | ||
| implementation, to best fit their organization's usage and privacy models. | ||
| * End users of such tool can also configure Telemetry (as allowed by their | ||
| vendor). | ||
|
|
||
jh7370 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Important notes | ||
| ---------------- | ||
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| * There is no concrete implementation of a Telemetry library in upstream LLVM. | ||
| We only provide the abstract API here. Any tool that wants telemetry will | ||
| implement one. | ||
|
|
||
| The rationale for this is that, all the tools in LLVM are very different in | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| what they care about (what/where/when to instrument data). Hence, it might not | ||
| be practical to have a single implementation. | ||
| However, in the future, if we see enough common pattern, we can extract them | ||
| into a shared place. This is TBD - contributions are welcomed. | ||
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| * No implementation of Telemetry in upstream LLVM shall store any of the | ||
| collected data due to privacy and security reasons: | ||
|
|
||
| * Different organizations have different privacy models: | ||
|
|
||
| * Which data is sensitive, which is not? | ||
| * Whether it is acceptable for instrumented data to be stored anywhere? | ||
| (to a local file, what not?) | ||
|
|
||
| * Data ownership and data collection consents are hard to accommodate from | ||
| LLVM developers' point of view: | ||
|
|
||
| * Eg., data collected by Telemetry is not neccessarily owned by the user | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| of an LLVM tool with Telemetry enabled, hence the user's consent to data | ||
| collection is not meaningful. On the other hand, LLVM developers have no | ||
| reasonable ways to request consent from the "real" owners. | ||
|
|
||
|
|
||
| High-level design | ||
| ================= | ||
|
|
||
| Key components | ||
| -------------- | ||
|
|
||
| The framework consists of four important classes: | ||
|
|
||
| * `llvm::telemetry::Telemeter`: The class responsible for collecting and | ||
|
||
| transmitting telemetry data. This is the main point of interaction between the | ||
| framework and any tool that wants to enable telemery. | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `llvm::telemetry::TelemetryInfo`: Data courier | ||
| * `llvm::telemetry::Destination`: Data sink to which the Telemetry framework | ||
| sends data. | ||
| Its implementation is transparent to the framework. | ||
| It is up to the vendor to decide which pieces of data to forward and where | ||
| to forward them to their final storage. | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `llvm::telemetry::Config`: Configurations on the `Telemeter`. | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| .. image:: llvm_telemetry_design.png | ||
|
|
||
| How to implement and interact with the API | ||
| ------------------------------------------ | ||
|
|
||
| To use Telemetry in your tool, you need to provide a concrete implementation of the `Telemeter` class and `Destination`. | ||
|
|
||
| 1) Define a custom `Telemeter` and `Destination` | ||
|
|
||
| .. code-block:: c++ | ||
|
|
||
| // This destination just prints the given entry to a stdout. | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| // In "real life", this would be where you forward the data to your | ||
| // custom data storage. | ||
| class MyStdoutDestination : public llvm::telemetry::Destination { | ||
|
||
| public: | ||
| Error emitEntry(const TelemetryInfo* Entry) override { | ||
| return sendToBlackBox(Entry); | ||
|
|
||
| } | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| private: | ||
| Error sendToBlackBox(const TelemetryInfo* Entry) { | ||
|
||
| // This could send the data anywhere. | ||
| // But we're simply sending it to stdout for the example. | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| llvm::outs() << entryToString(Entry) << "\n"; | ||
| return llvm::success(); | ||
| } | ||
|
|
||
| std::string entryToString(const TelemetryInfo* Entry) { | ||
| // make a string-representation of the given entry. | ||
| } | ||
| }; | ||
|
|
||
| // This defines a custom TelemetryInfo that has an addition Msg field. | ||
| struct MyTelemetryInfo : public llvm::telemetry::TelemetryInfo { | ||
| std::string Msg; | ||
|
|
||
| json::Object serializeToJson() const { | ||
|
||
| json::Object Ret = TelemeteryInfo::serializeToJson(); | ||
| Ret.emplace_back("MyMsg", Msg); | ||
| return std::move(Ret); | ||
| } | ||
|
|
||
| // TODO: implement getKind() and classof() to support dyn_cast operations. | ||
|
||
| }; | ||
|
|
||
| class MyTelemeter : public llvm::telemery::Telemeter { | ||
| public: | ||
| static std::unique_ptr<MyTelemeter> createInstatnce(llvm::telemetry::Config* config) { | ||
| // If Telemetry is not enabled, then just return null; | ||
| if (!config->EnableTelemetry) return nullptr; | ||
|
|
||
| std::make_unique<MyTelemeter>(); | ||
| } | ||
| MyTelemeter() = default; | ||
|
||
|
|
||
| void logStartup(llvm::StringRef ToolName, TelemetryInfo* Entry) override { | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| if (MyTelemetryInfo* M = dyn_cast<MyTelemetryInfo>(Entry)) { | ||
| M->Msg = "Starting up tool with name: " + ToolName; | ||
| emitToAllDestinations(M); | ||
| } else { | ||
| emitToAllDestinations(Entry); | ||
| } | ||
| } | ||
|
|
||
| void logExit(llvm::StringRef ToolName, TelemetryInfo* Entry) override { | ||
| if (MyTelemetryInfo* M = dyn_cast<MyTelemetryInfo>(Entry)) { | ||
| M->Msg = "Exitting tool with name: " + ToolName; | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| emitToAllDestinations(M); | ||
| } else { | ||
| emitToAllDestinations(Entry); | ||
| } | ||
| } | ||
|
|
||
| void addDestination(Destination* dest) override { | ||
| destinations.push_back(dest); | ||
| } | ||
|
|
||
| // You can also define additional instrumentation points.) | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| void logAdditionalPoint(TelemetryInfo* Entry) { | ||
| // .... code here | ||
| } | ||
| private: | ||
| void emitToAllDestinations(const TelemetryInfo* Entry) { | ||
| // Note: could do this in parallel, if needed. | ||
|
||
| for (Destination* Dest : Destinations) | ||
| Dest->emitEntry(Entry); | ||
| } | ||
| std::vector<Destination> Destinations; | ||
| }; | ||
|
|
||
| 2) Use the library in your tool. | ||
|
|
||
| Logging the tool init-process: | ||
|
|
||
| .. code-block:: c++ | ||
|
|
||
| // At tool's init code | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| auto StartTime = std::chrono::time_point<std::chrono::steady_clock>::now(); | ||
| llvm::telemetry::Config MyConfig = makeConfig(); // build up the appropriate Config struct here. | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| auto Telemeter = MyTelemeter::createInstance(&MyConfig); | ||
| std::string CurrentSessionId = ...; // Make some unique ID corresponding to the current session here. | ||
|
||
|
|
||
| // Any other tool's init code can go here | ||
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| // ... | ||
|
|
||
| // Finally, take a snapshot of the time now so we know how long it took the | ||
| // init process to finish | ||
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| auto EndTime = std::chrono::time_point<std::chrono::steady_clock>::now(); | ||
| MyTelemetryInfo Entry; | ||
| Entry.SessionId = CurrentSessionId ; // Assign some unique ID here. | ||
|
||
| Entry.Stats = {StartTime, EndTime}; | ||
|
||
| Telemeter->logStartup("MyTool", &Entry); | ||
|
|
||
| Similar code can be used for logging the tool's exit. | ||
|
|
||
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Additionally, at any other point in the tool's lifetime, it can also log telemetry: | ||
|
|
||
| .. code-block:: c++ | ||
|
|
||
| // At some execution point: | ||
| auto StartTime = std::chrono::time_point<std::chrono::steady_clock>::now(); | ||
|
|
||
| // ... other events happening here | ||
|
|
||
| auto EndTime = std::chrono::time_point<std::chrono::steady_clock>::now(); | ||
|
||
| MyTelemetryInfo Entry; | ||
| Entry.SessionId = CurrentSessionId ; // Assign some unique ID here. | ||
| Entry.Stats = {StartTime, EndTime}; | ||
| Telemeter->logAdditionalPoint(&Entry); | ||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,136 @@ | ||||||||||
| //===- llvm/Telemetry/Telemetry.h - Telemetry -------------------*- C++ -*-===// | ||||||||||
|
||||||||||
| // | ||||||||||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||||||||||
| // See https://llvm.org/LICENSE.txt for license information. | ||||||||||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||||||||||
| // | ||||||||||
| //===----------------------------------------------------------------------===// | ||||||||||
| /// | ||||||||||
| /// \file | ||||||||||
| /// This file provides the basic framework for Telemetry | ||||||||||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| /// | ||||||||||
| /// It comprises of three important structs/classes: | ||||||||||
| /// | ||||||||||
| /// - Telemeter: The class responsible for collecting and forwarding | ||||||||||
| /// telemery data. | ||||||||||
| /// - TelemetryInfo: data courier | ||||||||||
| /// - TelemetryConfig: this stores configurations on Telemeter. | ||||||||||
|
||||||||||
| /// | ||||||||||
| /// Refer to its documentation at llvm/docs/Telemetry.rst for more details. | ||||||||||
| //===---------------------------------------------------------------------===// | ||||||||||
|
|
||||||||||
| #ifndef LLVM_TELEMETRY_TELEMETRY_H | ||||||||||
| #define LLVM_TELEMETRY_TELEMETRY_H | ||||||||||
|
|
||||||||||
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| #include <chrono> | ||||||||||
| #include <ctime> | ||||||||||
| #include <memory> | ||||||||||
| #include <optional> | ||||||||||
| #include <string> | ||||||||||
|
|
||||||||||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| #include "llvm/ADT/StringExtras.h" | ||||||||||
| #include "llvm/ADT/StringRef.h" | ||||||||||
| #include "llvm/Support/Error.h" | ||||||||||
| #include "llvm/Support/JSON.h" | ||||||||||
|
||||||||||
|
|
||||||||||
| namespace llvm { | ||||||||||
| namespace telemetry { | ||||||||||
|
|
||||||||||
| /// Configuration for the Telemeter class. | ||||||||||
|
||||||||||
| /// This stores configurations from both users and vendors and is passed | ||||||||||
| /// to the Telemeter upon construction. (Any changes to the config after | ||||||||||
| /// the Telemeter's construction will not have effect on it). | ||||||||||
|
||||||||||
| /// the Telemeter's construction will not have effect on it). | |
| /// the Telemeter's construction will not have effect any on it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment below about the mechanism for setting this being vendor-specific, perhaps this should be an abstract method isTelemetryEnabled or similar. In this class's current state, the tool has to specifically opt in or out of telemetry, whereas I think the decision needs to depend on how the vendor wants to control it. With the abstract method, the vendor will define a Config subclass that specifies how this can be determined, e.g. via reading an environment variable. My only concern here is how would a "can enable/disable it on the command-line" behaviour be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern here is how would a "can enable/disable it on the command-line" behaviour be implemented?
Short answer to your question: Yes, definitely.
The idea is to have a telemetry::makeConfig() function that can be overridden by tool or vendors (the default impl will, of course, return a mostly empty config with telemetry disabled).
In the makeConfig() function, the vendor can set whatever value they want - they could also extend the class to have additional flags, if needed.
Does that address your concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of, but not entirely.
Two thoughts:
- By making the
EnableTelemetrymember a public non-const variable, it's possible for tool code to change its value, regardless of the intended behaviour by the vendor. Obviously, this would be incorrect to do, but a developer adding telemetry might not realise that. Making it an abstract method would force the behaviour to be defined in the downstream vendor and make it unmodifiable. - How would
makeConfigaccess options specified on the command-line if they're not stored in global variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can make
EnableTelemetrya const variable. - I'm not sure I understand the issue here. In my mind, the functin's impl can read from the commandline variable if it wants.
Eg.:
// Upstream (ie., default) impl
std::unique_ptr<Config> llvm::telemetry::makeConfig() {
return std::Make_unique<Config>{/*EnableTelemetry*/false};
}
// In some tool specific sub-directory:
std::unique_ptr<Config> mytool::telemetry::makeConfig() {
InputArgList args = ....;
bool enableTelemetry = args.getLastArgValue(OPT_EnableTelemetry);
return std::make_unique<Config>{enableTelemetry);
}
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this contain a list of telemetry::Destination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this field intended to contain the description of the Destinations. In other words, the Config struct is intended to contain descriptions and configurations from the tool (and its user) - it is then passed to the Telemetry framework to construct the object(s).
Because Destination classes are defined in vendor plugins, it's not ideal to make the tool (upstrream code) or users build up the Destinations themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm beginning to think that this field doesn't belong in the generic upstream implementation. I think we can all agree that a field for enabling telemetry is common, though I also think it's fair to say that the mechanism for setting it will be vendor-specific. However, the destinations feel entirely like they should be vendor side, given we're not providing any concrete destinations in the upstream implementation.
For each destination a vendor defines, they'll either have it enabled by default, or disabled by default, with a user-facing option to enable it. It may be the case that they have NO destinations that are user-configurable, in which case this field is entirely dead weight. Even if it is configurable by the user somehow, it would seem like the logical way of doing this will be implementation dependent and therefore should be part of the vendor's Config subclass. For example, they might prefer a couple of boolean fields "enableStdoutLogging", "sendDataToServer" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @jh7370 comment. I don't think this belongs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, can remove it.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// expectend to be extended by subclasses which may have | |
| /// expected to be extended by subclasses which may have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// from the tool being monitored to the Telemery framework. | |
| /// from the tool being monitored to the Telemetry framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
| /// Downstream implementations can add more fields as needed to describe | |
| /// additional events. | |
| /// Downstream implementations can define subclasses to provide different bits | |
| /// of information. |
I'm suggesting this, because in our downstream implementation, we have lots of event types, each of which would be a variation of TelemtryInfo but with different data, whereas the way the comment is currently phrased implies there should be one event class, with lots of different fields, to cover all the different events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - yes, the intent is to allow for different event classes , not one class with a bunch of fields.
(I meant to write "Subclasses can add more fields as needed ...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we've discussed time points before and I've been giving it a bit more thought today. I think there are three broad time-related variations of a piece of telemetry data:
- Things that describe some state related to the telemetry session that are not specific to a point in time.
- Things that describe some state at a specific point in time, or that describe an occurrence of something at a specific point of time.
- Things that describe the same as 2, but over a specific time range rather than a single point in time.
For a given event type, I don't think it makes sense for it to support more than one of these variations. So I think that implies either three subclasses, or probably simpler, three constructors for TelemetryInfo, like in the following pseudo-code.
TelemetryInfo(); // Sets timepoints to null.
TelemetryInfo(time_point tp); // Sets "start" time to tp, and end time to null.
TelemetryInfo(time_point start, time_point end); // Sets the corresponding start and end times.
I feel like this is generic enough that it deserves to be in the base framework, rather than having most/all downstream vendors having to implement the same things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly prefer to not include time in the base implementation. Our downstream implementation exclusively uses counts and has a different mechanism for collecting performance telemetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're looking at an outdated snapshot of the patch somehow.
In my tree, the TelemetryInfo class currently only has one field, which is the SessionId.
Would that resolve your concern here?
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function still feels wrong to me. I reckon it should look like this:
virtual void serialize(Serializer &serializer) const = 0;
Concrete implementations populate it with functions like this:
void serialize(Serializer &serializer) const override {
serializer.write("key1", value);
serializer.write("key2", value2);
}
The Serializer class is then an abstract class, with concrete implementations defined in partnership with the Destination classes. For example, you'd have a JsonSerializer that knows how to record key/value pairs as Json, and then Destination classes could be StdoutDestination or FileDestination. Since Json is likely to be a popular choice, we might want to have JsonSerializer defined upstream. We could have StdoutDestination and/or FileDestination upstream too (or more generically StreamDestination). By restricting the destination enabling to vendor-side code, there's no security risk here, because the vendor can simply not provide the hooks to configure these destinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to clarify the rationale of this a bit. Previously(see comments on Sept 3), you raised an issue with the design of Destination class where you said it tried to do two things ("decide how and where to send the data"). To address that, I've taken the "how" (ie., "how to serialize" )part out and put it into this TelemetryInfo class. Now in retrospect, maybe the documentation should have been written simply as "The Destination class is responsible for receiving telemetry data" (where the data eventually go AFTER that is its own implementation). Would that have addressed your concern?
Second point, w.r.t your concern about the explosion of (serialization-methods) x (final destinations), I think that should be an implementation detail on the Destination class(es). If the vendor wants to delegate the serialization-work to some common util to be shared amongst different Destinations(which, of course, they should), then they can do that - but it doesn't need to be explicitly specified here in the upstream interface.
Third point, your proposal for having the Serializer class does not work well for LLDB (and for our internal) use cases:
- If we want to write structured-data, then we'd end up replicating most of the
llvm::jsoninterface - If we want to serialize to protobuf, which is what we would use internally at Google, then it's not quite straightforward because the code here would have to tease out the right key string.
This was also why I initially wanted to "hide" the serialization step inside Destination. The idea is that the Destination class would take an Info object directly, pick out what fields it wants, and put them into whatever serialization form defined by the vendor. (Again, the idea of sharing the serialization code would still be possible, but it's an implementation detail and not part of the interface).
Does this sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we've got slightly different visions of things and it's driving the way we're looking at the design differently. Reading between the lines of some of the comments and responses you've made, I get the impression you expect either all concrete TelemetryInfo subclasses to be defined by vendors, or at least the all downstream (i.e. all) Destination classes to know how to convert them to whatever the Destination wants. In my view, I'd prefer it if a) there was the option to have some (but certainly not all) TelemetryInfo subclasses in the upstream LLVM repository, and b) a way to avoid repeating more than is fundamentally necessary to do the serialization. This is a strategy we use successfully in our internal telemetry project.
In my mind, unless we have concrete instances of at least TelemetryInfo in the upstream codebase, and therefore the ability for upstream tools to dispatch telemetry with no more than a vendor-specified Destination (specified in just one place), the whole upstream framework becomes pointless, because downstream vendors still end up with the same level of patches needed to actually make use of the framework.
Regarding the details, I'd expect the theoretical JsonSerializer class to make use of the llvm::json interface, so converting the input types passed by the TelemetryInfo subclass serialize method into the appropriate Json types (aside: this is probably generic enough that it could live in the core LLVM code). Similarly, a ProtobufSerializer would do something equivalent. The keys are intended to be string literals typically that the TelemetryInfo subclasses specify in their serialize method - I'm not quite sure what you mean by the "right key string" in this context, as I'm not familiar with protobuf; it would help if you could elaborate some more what the issue here is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we've got slightly different visions of things and it's driving the way we're looking at the design differently.
I think we have very similar ideas of how this ought to work, but there's some miscommunication probably. (ie., , your second paragraph describes pretty much how it is implemented, at least for LLDB's telemetry)
Reading between the lines of some of the comments and responses you've made, I get the impression you expect either all concrete
TelemetryInfosubclasses to be defined by vendors, or at least the all downstream (i.e. all)Destinationclasses to know how to convert them to whatever the Destination wants.
I want to point out that we should not conflate vendor-specific code with tool-specific code.
While it is true that I do not propose any concrete implementation here in llvm::telemetry, there is a concrete implementation in lldb::telemetry.
So - there are two levels of (potential) overrides here:
- One: by a tool that wants to use telemetry (eg., LLDB, clang, etc).
- Two: by different vendors (for the same tool).
I would expect, the bulk of code implementing subclasses of TelemetryInfo to be upstream in the individual tool's directory. That is to say, all of LLDB's subclasses of TelemetryInfo would live in lldb/telemetry/... (but not here in llvm/telemetry, because, as pointed out before, different tools care about different events and concepts).
Then, different vendors (for the same tool) will provide the concrete implementation of Destination. In this sense, I believe the Destination class does exactly what you envision the Serializer class would (ie., it takes a TelemetryInfo object and picks some or all of the data from that object and forward them elsewhere).
In other words, the only vendor-defined pieces are the concrete implementation of Destination
If you would like to have some share utils (eg., how to take a TelemetryInfo and turns it into a json object, you could still do that by putting the code in the tool's upstream directory, then your vendor-specific code can call into that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S: Concretely, it could look something like this:
// In upstream LLDB:
llvm::Error lldb::telemetry::Manager::atStartup( TelemetryInfo* info) {
// Collect additional data and/or fill in the missing fields that the caller might have
// left out. (Eg., figure out which debuger-object this is since there could be many)
// Forward the `info` to a bunch of pre-registered destinations:
for (Destination* dest : m_destinations)
dest->emitEntry(info);
}
llvm::Error lldb::telemetry::Manager::atMainExecutableLoad( MainExecutableInfo* info) {
// Collect additional data and/or fill in the missing fields that the caller might have
// left out
// Eg., read the binary for its BUILD_UUID
....
// Forward the `info` to a bunch of pre-registered destinations:
for (Destination* dest : m_destinations)
dest->emitEntry(info);
}
llvm::Error lldb::telemetry::Manager::atDebugCommandExecution( DebugCommandInfo* info) {
// Collect additional data and/or fill in the missing fields that the caller might have
// left out
// Eg.,figure out whether this command was attached to a particular target or a standalone
// ...
// forward the `info` to a bunch of pre-registered destinations:
for (Destination* dest : m_destinations)
dest->emitEntry(info);
}
// In some hypothetical vendor code
llvm::Error lldb::google::InternalStorageDestination::emitEntry(TelemetryInfo* info) override{
switch(info->getKind()) {
// ... dispatch to the corresponding type handler
}
}
llvm::Error lldb::google::InternalStorageDestination::emitMainExecEntry(MainExecutableInfo* info) {
google::MainExecutableProto p = make_proto<MainExecutableProto>();
// pick out the fields we want to record from the argument and copy them into the proto.
// send the proto to final storage:
sendToInternalStorage(std::move(p));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the details, I'd expect the theoretical
JsonSerializerclass to make use of thellvm::jsoninterface, so converting the input types passed by the TelemetryInfo subclassserializemethod into the appropriate Json types (aside: this is probably generic enough that it could live in the core LLVM code). Similarly, aProtobufSerializerwould do something equivalent. The keys are intended to be string literals typically that theTelemetryInfosubclasses specify in theirserializemethod - I'm not quite sure what you mean by the "right key string" in this context, as I'm not familiar with protobuf; it would help if you could elaborate some more what the issue here is.
Let me try to elaborate on this (I was discussing this with Vy last week). Protobufs are more strongly typed than json (you can basically think of them as structs which know how to serialize themselves to a byte stream). The natural way to work with them is to work with named accessors (my_proto.set_field1(value1); my_proto.set_field2(value2)). While they have a reflection API which allows accessing the fields using strings (which means a protobuf serializer could implement a virtual write(StringRef key, ...) function, it's a rather long-winded way of using them (you convert the field to a string in the TelemetryInfo object, and then immediately undo that in the reflection API. Since vendor "knows" about all of the telemetry types it wants to collect (*), it could do it more directly with my_proto.set_field1(info.field1).
The second source of confusion (now I'm moving on to the "right key string" part) is that this serializer.write API (at least in the way I/we understand it) is only suitable for implementing simple dictionary types (map<string, BasicType>). It's not very suitable for more complex value types like lists or (sub)dictionaries, as you'd have to add a new (virtual) function for each new data type. (This can be avoided by creating some sort of a polymorphic value type that can hold arbitrary structured data, but at that point, we're sort of reinventing json::Value.)
(*) we assume that to be the case since the vendor needs to decide (based on its use case, data collection policy, etc.) whether a particular field can be collected or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations - I now see what you're thinking.
I want to point out that we should not conflate vendor-specific code with tool-specific code.
This is a fair point. The key thing this means is that code in the top-level framework that is designed to be common needs to be able to work with subclasses defined in tool-level and vendor code. In particular, I think it's reasonable to expect there to be TelemetryInfo subclasses defined in common LLVM code (so for example a DurationEvent might be something desired by LLD, clang, llvm-objdump etc), in tools (for events specific to particular tools like LLDB) and in vendor code (for events that downstream vendors want which might not be applicable upstream).
I think it's also reasonable to assume that more event types will be written over time (and existing events will be extended). What I think we want to avoid though is for the need for every Destination (or some other class) to need to be extended every time a new event type is implemented. Indeed, it's going to be near impossible for downstream vendors to keep tabs on additions to upstream events in the current structure, I think, so they won't know whether there's a need to review new fields etc, with the current design. I guess the flipside of this is that if some vendors need to review fields for legal reasons, we need to make it clear to them that something has changed that needs review. Mind you, I think there'd still be cases where vendors need to review things without any changes in TelemetryInfo classes at all: imagine the case where someone changes the thing used to populate a field upstream, e.g. to use full paths instead of filenames - for some vendors, having the full path may not be acceptable.
Sorry, that last paragraph is a bit rambly, but I'm leaving it as-is, so that you can see some of the thoughts going through my head.
Ultimately, I think my point is that the advantage of having something like TelemetryInfo::serialize is that the Destination classes don't necessarily all need modifying every time a new event field or type is added. I'm open to other ways of how this might be achieved.
I'll acknowledge that something strongly typed like protobuf doesn't work well with the serialize method approach, because of the extra indirection. However, you could have ProtobufDestination not use the serialize approach at all and dispatch by switching based on type, as described. If we allowed both options, it would admittedly mean the serialize method wouldn't always be used, but it would allow some Destination types to make use of it.
The second source of confusion (now I'm moving on to the "right key string" part) is that this
serializer.writeAPI (at least in the way I/we understand it) is only suitable for implementing simple dictionary types (map<string, BasicType>). It's not very suitable for more complex value types like lists or (sub)dictionaries, as you'd have to add a new (virtual) function for each new data type. (This can be avoided by creating some sort of a polymorphic value type that can hold arbitrary structured data, but at that point, we're sort of reinventing json::Value.)
FWIW, in our downstream implementation, our Serializer base class declares startArray/endArray/startObject/endObject methods that our concrete serializers then implement for their different types. There's also a concrete method defined in the base class for serializing iterable types like arrays and vectors, and another for maps (the latter currently only supports std::unordered_map, but there's no reason it couldn't be extended to all map-like objects). Clearly for other complex object types, you'd need to define how to serialize that object type, though I've only rarely needed to resort to this in our downstream code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've thought about this for awhile, I think we can go with James's Serializer proposal.
(To handle the protobuf case, we can, as pointed, simply not use the Serializer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(NOTE: to be implemented ... this is not resolved yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you ever intend to create instances of the base class? Should the class be abstract (perhaps through making this method abstract (=0))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not immediately in LLDB, but some simpler use case might just need the base class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, though I find that unlikely, since this entry carries essentially no information. And one could easily define a concrete empty class, if that's what one really needed.
One advantage of making this abstract is that you then don't need the EntryKind::Base enum and can leave the choice of enum values completely in the hands of the user. The second advantage is that it prevents object slicing (since you can't construct actual instances of the base). I think that making all non-leaf classes abstract is a useful property, particularly as here you indent to make copies of these objects (which usually forbidden for class hierarchies precisely due to slicing).
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see null checks in the other classof implementations (I think that because we have dyn_cast and dyn_cast_or_null cast flavour versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for a base class, I think the proper implementation of this function is return true (as everything is an instance of the base type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe there shouldn't be a null check here. we have llvm::(dyn_)cast*_if_present* for that.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a Destination should be emitting anything - it should be receiving things. So I'd call this receiveEntry or receiveEvent or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll respond to this in the same comment w.r.t the serializer because these two are related and it's easier to group comments into one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done - renamed to receiveEntry.
oontvoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird name. Could you try to use English names only, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is English, AFAIK. https://www.oxfordlearnersdictionaries.com/us/definition/english/telemeter_2
P.S: Basically means a device for recording & sending telemetry, which is the appropriate name for this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shows what I know 😄
Not sure it's a well-known word though, so it doesn't really help convey meaning though, which is the key thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't resolve conversations I've been involved with (see https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178 for a lengthy discussion). I'll resolve these when I'm happy with the resolution. In particular, as noted, I don't think the name "Telemeter" is going to convey the meaning clearly enough. Furthermore, as noted earlier, having all the classes prefixed with Telemetry in their name seems unnecessary, given their presence in the telemetry namespace. Could they simply be called Destination, Config, Event etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed the Destiantion, Config, etc to get rid of the prefix.
What are your suggestions for alternatives for Telemeter?
(prefer something that does not have Log or Logger in it - as someone pointed out it might cause confusions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S: How about TelemetryCollector?
(It doesn't fully communicate the aspect of "also transmitting the collected data" which Telemeter does, but I suppose that's close enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I had a similar reaction as @jh7370 when I saw the class name. Without the Telemetry prefix, it seems reasonable to use that as the class name? That seems to convey everything the class is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with Session, personally, because the class seems to most closely resemble our same-named class in our downstream implementation. I'd love to avoid Collector, mostly because for us our "telemetry collector" is actually a separate executable that monitors the file system for telemetry data files created by the individual tools, before forwarding that data onto the servers (in a roundabout way it actually does what this class is trying to do, just at a different point in the process, but I digress...)! However, neither of these points are especially strong, given they're more to do with how our downstream implementation works.
I see this class as being the thing that a) receives the user-created configuration data, b) receives the telemetry data, c) does things related to setup and teardown of the telemetry system. It then forwards that data onto the Destinations and generally manages things to do with telemetry. I therefore could be persuaded by Session, Manager, Service or even System.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Session, assuming this class is responsible for generating/storing the unique session ID. I would imagine in LLDB a session is created during Initialization and cleaned up during Termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I believe this is resolved - we've agreed to name it telemetry::Manager)
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow why atStartup and atExit should receive a TelemetryInfo. I also don't follow why atExit would need a ToolPath argument.
Our telemetry Session class constructor takes an ApplicationInfo object, which, among other things, includes the tool name, the tool version and various properties about the tool that are more vendor specific. This data is attached to every one of our events, so that we can see in the database what the source of our different events are. Perhaps the StringRef ToolPath should be replaced with something similar?
Also, is atStartup really a separate function, or should it be part of the constructor? Should it then take a Config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to paraphase the 3 questions (I think) posed in this comment:
-
Why does
atStartupandatExithave to take a TelemetryInfo?
Answer: The intention is that the caller would put data into this TelemetryInfo and give to thetelemetry::Managervia theseatStartuporatExit. In otherwords, theTelemetryInfo(and its subclasses) acts as the "data courier" to move data from the tool (the caller) to the telemtry framework. -
Why would they need a
Toolpathargument?
Answer: No, in fact, it doesn't. This piece of data could be put into the TelemetryInfo object, -
does
atStartupneed to be separate function vs part of ctor (presumably of thetelemetry::Managerclass)
It could go either way. My preference for having a separate atStartup from the ctor is that it allows for flexibility on when the telemetry::Manager is created. But that doesn't mean down stream implementations can't change this one way or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the explanation. I'm happy with the 2 and 3 responses. For the 1 case, how might atStart and atExit actually differ internally, since all the data is stored in the TelemetryInfo input? Relatedly, how would you send events at other points in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the 1 case, how might
atStartandatExitactually differ internally, since all the data is stored in theTelemetryInfoinput?
Do you mean, how would the implementation of atStart and atExit be different?
At the base level, perhaps not much. But different tools could have different things it wants to do at startup VS at exit.
Relatedly, how would you send events at other points in time?
The subclass of telemetry::Manager can define additional functions for that (eg., for LLDB's , we define lldb::telemetry::LldbManager::logMainExecutableLoad(ModuleTelemetryInfo) to be called when the main executable is loaded, or ::logDebugCommand (DebugCommandTelemetryInfo) to be called when a command is executed.)
These functions would be specific to each tool, hence not shared in the llvm::telemetry::Manager interface.
Also, their arguments (ModuleTelemetryInfo and DebugCommandTelemetryInfo) would be subclasses of llvm::telemetry::TelemetryInfo, as these classes can have additional fields/attributes to describe additional concepts.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I expressed this in the past, but I still take issue with these two methods, I don't think they belong in the base implementation and should be specific instance of TelemtryInfo. This only makes sense in the context of a tool and not in the context of library like LLDB. When used from Python the calls to Initialize and Terminate do not have to correspond to startup or exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what you'd like to be in this class?
Right now, it only has atStartup and atExit because they are the only two common actions most tools have.
Without them, the class would be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think a generic dispatch (or alternatively named function) that takes a TelemetryInfo and forwards it to the configured Destination(s) would make more sense. Then in the tool/library code, it simply calls dispatch whenever required (which could include during startup/close down) and passes in the desired TelemetryInfo to do what it wants.
As well as the dispatch method, I think a std::vector<std::unique_ptr<Destination>> probably belongs in the Manager class (which dispatch uses). Otherwise, where are the Destination instances owned? If the downstream vendor has an implementation that needs to own stuff, then the Destination stored in Manager could just be a thing that references the downstream vendor piece.
dispatch would look something like:
void dispatch(TelemetryInfo *event) {
// Maybe have an `onDispatch` hook that is vendor-defined (default do
// nothing to allow the vendor to do extra stuff here).
for (Destination *dest : destinations)
dest->receiveEntry(event);
}
Then in the tool code, you might have something like:
int tool_main(...) {
...
manager.dispatch(StartupEvent());
...
manager.dispatch(OtherEvent());
...
manager.dispatch(ShutdownEvent());
}
Finally, I think it makes sense for the Manager constructor to take a Config, so that the downstream implementations of Manager can then hook into the stuff defined by Config in some manner. In particular, I could see an argument that the dispatch method should do nothing when telemetry is not enabled, so it would need access to that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done - changed this to dispatch()
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| add_llvm_component_library(LLVMTelemetry | ||
| Telemetry.cpp | ||
|
|
||
| ADDITIONAL_HEADER_DIRS | ||
| "${LLVM_MAIN_INCLUDE_DIR}/llvm/Telemetry" | ||
| ) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| #include "llvm/Telemetry/Telemetry.h" | ||
|
|
||
| namespace llvm { | ||
| namespace telemetry { | ||
|
|
||
| llvm::json::Object TelemetryInfo::serializeToJson() const { | ||
| return json::Object{ | ||
| {"SessionId", SessionId}, | ||
| }; | ||
| }; | ||
|
|
||
| } // namespace telemetry | ||
| } // namespace llvm |
Uh oh!
There was an error while loading. Please reload this page.