Skip to content

Commit 94c905b

Browse files
authored
add control to force non-profiling command queues (#162)
This can be used for performance analysis, but may cause errors if an application requires event profiling. If this control is set along with another control that requires event profiling, such as DevicePerformanceTiming, then the other control will override this control, but this behavior may change in the future and should not be relied upon!
1 parent 0f96dfd commit 94c905b

File tree

4 files changed

+15
-4
lines changed

4 files changed

+15
-4
lines changed

docs/controls.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -584,6 +584,10 @@ If set to a nonzero value, the Intercept Layer for OpenCL Applications inserts a
584584

585585
If set to a nonzero value, the Intercept Layer for OpenCL Applications will force all queues to be created in-order. This can be used for performance analysis, but may lead to deadlocks in some cases.
586586

587+
##### `NoProfilingQueue` (bool)
588+
589+
If set to a nonzero value, the Intercept Layer for OpenCL Applications will force all queues to be created without event profiling support. This can be used for performance analysis, but may lead to errors if the application requires event profiling.
590+
587591
##### `NullEnqueue` (bool)
588592

589593
If set to a nonzero value, the Intercept Layer for OpenCL Applications will silently ignore any enqueue. This can be used for performance analysis, but will likely cause errors if the application relies on any sort of information from OpenCL events and should be used carefully.

intercept/src/controls.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ CLI_CONTROL( bool, FinishAfterEnqueue, false, "If s
156156
CLI_CONTROL( bool, FlushAfterEnqueue, false, "If set to a nonzero value, the Intercept Layer for OpenCL Applications inserts a call to clFlush() after every enqueue. The command queue that the command was just enqueued to is passed to clFlush(). This can also be used to debug possible timing or resource management issues and is slightly less obtrusive than FinishAfterEnqueue but still will likely impact performance. If both FinishAfterEnqueue and FlushAfterEnqueue are nonzero then the Intercept Layer for OpenCL Applications will only insert a call to clFinish() after every enqueue, because clFinish() implies clFlush()." )
157157
CLI_CONTROL( bool, FlushAfterEnqueueBarrier, false, "If set to a nonzero value, the Intercept Layer for OpenCL Applications inserts a call to clFlush() after every barrier enqueue. The command queue that the command was just enqueued to is passed to clFlush(). This has been useful to debug out-of-order queue issues." )
158158
CLI_CONTROL( bool, InOrderQueue, false, "If set to a nonzero value, the Intercept Layer for OpenCL Applications will force all queues to be created in-order. This can be used for performance analysis, but may lead to deadlocks in some cases." )
159+
CLI_CONTROL( bool, NoProfilingQueue, false, "If set to a nonzero value, the Intercept Layer for OpenCL Applications will force all queues to be created without event profiling support. This can be used for performance analysis, but may lead to errors if the application requires event profiling." )
159160
CLI_CONTROL( bool, NullEnqueue, false, "If set to a nonzero value, the Intercept Layer for OpenCL Applications will silently ignore any enqueue. This can be used for performance analysis, but will likely cause errors if the application relies on any sort of information from OpenCL events and should be used carefully." )
160161
CLI_CONTROL( bool, NullLocalWorkSize, false, "If set to a nonzero value, the Intercept Layer for OpenCL Applications will force the local work size argument to clEnqueueNDRangeKernel() to be NULL, which causes the OpenCL implementation to pick the local work size. Note that this control takes effect before NullLocalWorkSizeX / NullLocalWorkSizeY / NullLocalWorkSizeZ (see below), so enabling both controls will have the effect of forcing a specific local work size." )
161162
CLI_CONTROL( size_t, NullLocalWorkSizeX, 0, "If set to a nonzero value, the Intercept Layer for OpenCL Applications will set the local work size that will be used if an application passes NULL as the local work size to clEnqueueNDRangeKernel(). 1D dispatches will only look at NullLocalWorkSizeX, 2D dispatches will only look at NullLocalWorkSizeX and NullLocalWorkSizeY, while 3D dispatches will look at NullLocalWorkSizeX, NullLocalWorkSizeY, and NullLocalWorkSizeZ. If the specified values for NullLocalWorkSize do not evenly divide the global work size then the specified values of NullLocalWorkSize will not take effect." )

intercept/src/intercept.cpp

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4495,6 +4495,14 @@ void CLIntercept::updateHostTimingStats(
44954495
void CLIntercept::modifyCommandQueueProperties(
44964496
cl_command_queue_properties& props ) const
44974497
{
4498+
if( config().InOrderQueue )
4499+
{
4500+
props &= ~(cl_command_queue_properties)CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE;
4501+
}
4502+
if( config().NoProfilingQueue )
4503+
{
4504+
props &= ~(cl_command_queue_properties)CL_QUEUE_PROFILING_ENABLE;
4505+
}
44984506
if( config().DevicePerformanceTiming ||
44994507
config().ITTPerformanceTiming ||
45004508
config().ChromePerformanceTiming ||
@@ -4503,10 +4511,6 @@ void CLIntercept::modifyCommandQueueProperties(
45034511
{
45044512
props |= (cl_command_queue_properties)CL_QUEUE_PROFILING_ENABLE;
45054513
}
4506-
if( config().InOrderQueue )
4507-
{
4508-
props &= ~(cl_command_queue_properties)CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE;
4509-
}
45104514
}
45114515

45124516
///////////////////////////////////////////////////////////////////////////////

intercept/src/intercept.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2172,6 +2172,7 @@ inline bool CLIntercept::checkDevicePerformanceTimingEnqueueLimits() const
21722172
pIntercept->config().SIMDSurvey || \
21732173
pIntercept->config().DevicePerfCounterEventBasedSampling || \
21742174
pIntercept->config().InOrderQueue || \
2175+
pIntercept->config().NoProfilingQueue || \
21752176
pIntercept->config().DefaultQueuePriorityHint || \
21762177
pIntercept->config().DefaultQueueThrottleHint ) \
21772178
{ \
@@ -2188,6 +2189,7 @@ inline bool CLIntercept::checkDevicePerformanceTimingEnqueueLimits() const
21882189
pIntercept->config().SIMDSurvey || \
21892190
pIntercept->config().DevicePerfCounterEventBasedSampling || \
21902191
pIntercept->config().InOrderQueue || \
2192+
pIntercept->config().NoProfilingQueue || \
21912193
pIntercept->config().DefaultQueuePriorityHint || \
21922194
pIntercept->config().DefaultQueueThrottleHint ) \
21932195
{ \

0 commit comments

Comments
 (0)