forked from cms-sw/cmssw
-
Notifications
You must be signed in to change notification settings - Fork 2
HeterogeneousCore/SonicTriton: add RetryActionDiffServer; expose connectToServer; update tests #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
trevin-lee
wants to merge
12
commits into
fastmachinelearning:master
from
trevin-lee:SonicRetry_CMSSW_15_0_0_pre3
Closed
Changes from 4 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
c1dce2b
RetryAction compiles
4dde1c6
Include RetryAction in SonicClientBase
5b093e4
Update PR comments
c062112
PR comments, fix fillDescriptions
ca570d7
Move RetrySameServerAction to plugins. SonicTriton test works.
fe03eb4
Add update server function for client
7586344
Add test for Triton retry action in BuildFile.xml
trevin-lee ab9b098
Implement retry logic in RetryActionDiffServer and add connectToServe…
trevin-lee 5b2c3c3
Add RetryActionDiffServer class documentation, implement testing cons…
trevin-lee 0212e70
SonicTriton: implement retry action against different server; update …
trevin-lee 3e927af
Refactor RetryActionDiffServer to utilize TritonService for server se…
trevin-lee 3c98e44
Refactor TritonClient to support unit testing; remove deprecated cons…
trevin-lee File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,4 +6,4 @@ | |
| <use name="FWCore/Utilities"/> | ||
| <export> | ||
| <lib name="1"/> | ||
| </export>i | ||
| </export> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33 changes: 33 additions & 0 deletions
33
HeterogeneousCore/SonicTriton/interface/RetryActionDiffServer.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| #ifndef HeterogeneousCore_SonicTriton_RetryActionDiffServer_h | ||
| #define HeterogeneousCore_SonicTriton_RetryActionDiffServer_h | ||
|
|
||
| #include "HeterogeneousCore/SonicCore/interface/RetryActionBase.h" | ||
|
|
||
| /** | ||
| * @class RetryActionDiffServer | ||
| * @brief A concrete implementation of RetryActionBase that attempts to retry an inference | ||
| * request on a different, user-specified Triton server. | ||
| * | ||
| * This class is designed to provide a fallback mechanism. If an initial inference | ||
| * request fails (e.g., due to server unavailability or a model-specific error), | ||
| * this action will be triggered. It reads an alternative server URL from the | ||
| * ParameterSet and instructs the TritonClient to reconnect to this new server | ||
| * for the retry attempt. This action is designed for one-time use per inference | ||
| * call; after the retry attempt, it disables itself until the next `start()` call. | ||
| */ | ||
|
|
||
| class RetryActionDiffServer : public RetryActionBase { | ||
| public: | ||
| RetryActionDiffServer(const edm::ParameterSet& conf, SonicClientBase* client); | ||
| ~RetryActionDiffServer() override = default; | ||
|
|
||
| void retry() override; | ||
| void start() override; | ||
|
|
||
| private: | ||
| std::string alt_server_url_; | ||
| std::string alt_server_token_; | ||
| }; | ||
|
|
||
| #endif | ||
|
|
220 changes: 117 additions & 103 deletions
220
HeterogeneousCore/SonicTriton/interface/TritonClient.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,103 +1,117 @@ | ||
| #ifndef HeterogeneousCore_SonicTriton_TritonClient | ||
| #define HeterogeneousCore_SonicTriton_TritonClient | ||
|
|
||
| #include "FWCore/ParameterSet/interface/ParameterSet.h" | ||
| #include "FWCore/ParameterSet/interface/ParameterSetDescription.h" | ||
| #include "FWCore/ServiceRegistry/interface/ServiceToken.h" | ||
| #include "HeterogeneousCore/SonicCore/interface/SonicClient.h" | ||
| #include "HeterogeneousCore/SonicTriton/interface/TritonData.h" | ||
| #include "HeterogeneousCore/SonicTriton/interface/TritonService.h" | ||
|
|
||
| #include <map> | ||
| #include <vector> | ||
| #include <string> | ||
| #include <exception> | ||
| #include <unordered_map> | ||
|
|
||
| #include "grpc_client.h" | ||
| #include "grpc_service.pb.h" | ||
|
|
||
| enum class TritonBatchMode { Rectangular = 1, Ragged = 2 }; | ||
|
|
||
| class TritonClient : public SonicClient<TritonInputMap, TritonOutputMap> { | ||
| public: | ||
| struct ServerSideStats { | ||
| uint64_t inference_count_; | ||
| uint64_t execution_count_; | ||
| uint64_t success_count_; | ||
| uint64_t cumm_time_ns_; | ||
| uint64_t queue_time_ns_; | ||
| uint64_t compute_input_time_ns_; | ||
| uint64_t compute_infer_time_ns_; | ||
| uint64_t compute_output_time_ns_; | ||
| }; | ||
|
|
||
| //constructor | ||
| TritonClient(const edm::ParameterSet& params, const std::string& debugName); | ||
|
|
||
| //destructor | ||
| ~TritonClient() override; | ||
|
|
||
| //accessors | ||
| unsigned batchSize() const; | ||
| TritonBatchMode batchMode() const { return batchMode_; } | ||
| bool verbose() const { return verbose_; } | ||
| bool useSharedMemory() const { return useSharedMemory_; } | ||
| void setUseSharedMemory(bool useShm) { useSharedMemory_ = useShm; } | ||
| bool setBatchSize(unsigned bsize); | ||
| void setBatchMode(TritonBatchMode batchMode); | ||
| void resetBatchMode(); | ||
| void reset() override; | ||
| TritonServerType serverType() const { return serverType_; } | ||
| bool isLocal() const { return isLocal_; } | ||
|
|
||
| //for fillDescriptions | ||
| static void fillPSetDescription(edm::ParameterSetDescription& iDesc); | ||
|
|
||
| protected: | ||
| //helpers | ||
| bool noOuterDim() const { return noOuterDim_; } | ||
| unsigned outerDim() const { return outerDim_; } | ||
| unsigned nEntries() const; | ||
| void getResults(const std::vector<std::shared_ptr<triton::client::InferResult>>& results); | ||
| void evaluate() override; | ||
| template <typename F> | ||
| bool handle_exception(F&& call); | ||
|
|
||
| void reportServerSideStats(const ServerSideStats& stats) const; | ||
| void updateServer(std::string serverName); | ||
| ServerSideStats summarizeServerStats(const inference::ModelStatistics& start_status, | ||
| const inference::ModelStatistics& end_status) const; | ||
|
|
||
| inference::ModelStatistics getServerSideStatus() const; | ||
|
|
||
| //members | ||
| unsigned maxOuterDim_; | ||
| unsigned outerDim_; | ||
| bool noOuterDim_; | ||
| unsigned nEntries_; | ||
| TritonBatchMode batchMode_; | ||
| bool manualBatchMode_; | ||
| bool verbose_; | ||
| bool useSharedMemory_; | ||
| TritonServerType serverType_; | ||
| bool isLocal_; | ||
| grpc_compression_algorithm compressionAlgo_; | ||
| triton::client::Headers headers_; | ||
|
|
||
| std::unique_ptr<triton::client::InferenceServerGrpcClient> client_; | ||
| //stores timeout, model name and version | ||
| std::vector<triton::client::InferOptions> options_; | ||
| edm::ServiceToken token_; | ||
|
|
||
| private: | ||
| friend TritonInputData; | ||
| friend TritonOutputData; | ||
|
|
||
| //private accessors only used by data | ||
| auto client() { return client_.get(); } | ||
| void addEntry(unsigned entry); | ||
| void resizeEntries(unsigned entry); | ||
| }; | ||
|
|
||
| #endif | ||
| #ifndef HeterogeneousCore_SonicTriton_TritonClient | ||
| #define HeterogeneousCore_SonicTriton_TritonClient | ||
|
|
||
| #include "FWCore/ParameterSet/interface/ParameterSet.h" | ||
| #include "FWCore/ParameterSet/interface/ParameterSetDescription.h" | ||
| #include "FWCore/ServiceRegistry/interface/ServiceToken.h" | ||
| #include "HeterogeneousCore/SonicCore/interface/SonicClient.h" | ||
| #include "HeterogeneousCore/SonicTriton/interface/TritonData.h" | ||
| #include "HeterogeneousCore/SonicTriton/interface/TritonService.h" | ||
|
|
||
| #include <map> | ||
| #include <vector> | ||
| #include <string> | ||
| #include <exception> | ||
| #include <unordered_map> | ||
|
|
||
| #include "grpc_client.h" | ||
| #include "grpc_service.pb.h" | ||
|
|
||
| enum class TritonBatchMode { Rectangular = 1, Ragged = 2 }; | ||
|
|
||
| class TritonClient : public SonicClient<TritonInputMap, TritonOutputMap> { | ||
| public: | ||
| struct ServerSideStats { | ||
| uint64_t inference_count_; | ||
| uint64_t execution_count_; | ||
| uint64_t success_count_; | ||
| uint64_t cumm_time_ns_; | ||
| uint64_t queue_time_ns_; | ||
| uint64_t compute_input_time_ns_; | ||
| uint64_t compute_infer_time_ns_; | ||
| uint64_t compute_output_time_ns_; | ||
| }; | ||
|
|
||
| //constructor | ||
| TritonClient(const edm::ParameterSet& params, const std::string& debugName); | ||
|
|
||
| //destructor | ||
| ~TritonClient() override; | ||
|
|
||
| //accessors | ||
| unsigned batchSize() const; | ||
| TritonBatchMode batchMode() const { return batchMode_; } | ||
| bool verbose() const { return verbose_; } | ||
| bool useSharedMemory() const { return useSharedMemory_; } | ||
| void setUseSharedMemory(bool useShm) { useSharedMemory_ = useShm; } | ||
| bool setBatchSize(unsigned bsize); | ||
| void setBatchMode(TritonBatchMode batchMode); | ||
| void resetBatchMode(); | ||
| void reset() override; | ||
| TritonServerType serverType() const { return serverType_; } | ||
| bool isLocal() const { return isLocal_; } | ||
| virtual void connectToServer(const std::string& url); | ||
|
|
||
| //for fillDescriptions | ||
| static void fillPSetDescription(edm::ParameterSetDescription& iDesc); | ||
|
|
||
| protected: | ||
| /** | ||
| * @brief Constructor for unit testing purposes only. | ||
| * | ||
| * This constructor is provided to allow the creation of a TritonClient | ||
| * instance (or a mock derived from it) without needing the full CMSSW | ||
| * Service framework, which is required by the standard constructor. | ||
| * This is essential for writing isolated unit tests that do not depend | ||
| * on external services. It initializes the base SonicClient with dummy | ||
| * parameters. | ||
| * @param is_testing A boolean flag to select this constructor. | ||
| */ | ||
| TritonClient(bool is_testing); | ||
trevin-lee marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| //helpers | ||
| bool noOuterDim() const { return noOuterDim_; } | ||
| unsigned outerDim() const { return outerDim_; } | ||
| unsigned nEntries() const; | ||
| void getResults(const std::vector<std::shared_ptr<triton::client::InferResult>>& results); | ||
| virtual void evaluate() override; | ||
| template <typename F> | ||
| bool handle_exception(F&& call); | ||
|
|
||
| void reportServerSideStats(const ServerSideStats& stats) const; | ||
| void updateServer(std::string serverName); | ||
| ServerSideStats summarizeServerStats(const inference::ModelStatistics& start_status, | ||
| const inference::ModelStatistics& end_status) const; | ||
|
|
||
| inference::ModelStatistics getServerSideStatus() const; | ||
|
|
||
| //members | ||
| unsigned maxOuterDim_; | ||
| unsigned outerDim_; | ||
| bool noOuterDim_; | ||
| unsigned nEntries_; | ||
| TritonBatchMode batchMode_; | ||
| bool manualBatchMode_; | ||
| bool verbose_; | ||
| bool useSharedMemory_; | ||
| TritonServerType serverType_; | ||
| bool isLocal_; | ||
| grpc_compression_algorithm compressionAlgo_; | ||
| triton::client::Headers headers_; | ||
|
|
||
| std::unique_ptr<triton::client::InferenceServerGrpcClient> client_; | ||
| //stores timeout, model name and version | ||
| std::vector<triton::client::InferOptions> options_; | ||
| edm::ServiceToken token_; | ||
|
|
||
| private: | ||
| friend TritonInputData; | ||
| friend TritonOutputData; | ||
|
|
||
| //private accessors only used by data | ||
| auto client() { return client_.get(); } | ||
| void addEntry(unsigned entry); | ||
| void resizeEntries(unsigned entry); | ||
| }; | ||
|
|
||
| #endif | ||
46 changes: 46 additions & 0 deletions
46
HeterogeneousCore/SonicTriton/src/RetryActionDiffServer.cc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| #include "HeterogeneousCore/SonicTriton/interface/RetryActionDiffServer.h" | ||
| #include "HeterogeneousCore/SonicTriton/interface/TritonClient.h" | ||
| #include "FWCore/MessageLogger/interface/MessageLogger.h" | ||
|
|
||
| RetryActionDiffServer::RetryActionDiffServer( | ||
| const edm::ParameterSet& conf, | ||
| SonicClientBase* client | ||
| ): RetryActionBase(conf, client) { | ||
| alt_server_url_ = conf.getUntrackedParameter<std::string>("altServerUrl", ""); | ||
| alt_server_token_ = conf.getUntrackedParameter<std::string>("altServerToken", ""); | ||
trevin-lee marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| if (this->alt_server_url_.empty()) { | ||
| edm::LogWarning("RetryActionDiffServer") | ||
| << "No alternative server URL provided. " | ||
| << "This retry action will be disabled."; | ||
| this->shouldRetry_ = false; | ||
| } | ||
| } | ||
|
|
||
| void RetryActionDiffServer::start() { | ||
| this->shouldRetry_ = true; | ||
| } | ||
|
|
||
| void RetryActionDiffServer::retry() { | ||
| if (!this->shouldRetry_ || this->alt_server_url_.empty()) { | ||
| this->shouldRetry_ = false; | ||
| edm::LogInfo("RetryActionDiffServer") << "No alternative server available for retry."; | ||
| return; | ||
| } | ||
|
|
||
| try { | ||
| TritonClient* tritonClient = static_cast<TritonClient*>(client_); | ||
| edm::LogInfo("RetryActionDiffServer") | ||
| << "Attempting retry by switching to server: " | ||
| << this->alt_server_url_; | ||
| tritonClient->connectToServer(this->alt_server_url_); | ||
| eval(); | ||
| } catch (const std::exception& e) { | ||
| edm::LogError("RetryActionDiffServer") | ||
| << "Failed to retry with alternative server: " | ||
| << e.what(); | ||
| } | ||
| this->shouldRetry_ = false; | ||
| } | ||
|
|
||
| DEFINE_RETRY_ACTION(RetryActionDiffServer); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.