neuralmagic
diff --git a/‎src/content/user-guide/deepsparse-engine.mdx
Lines changed: 6 additions & 6 deletions b/‎src/content/user-guide/deepsparse-engine.mdx
Lines changed: 6 additions & 6 deletions
diff --git a/‎src/content/user-guide/deepsparse-engine/benchmarking.mdx
Lines changed: 8 additions & 8 deletions b/‎src/content/user-guide/deepsparse-engine/benchmarking.mdx
Lines changed: 8 additions & 8 deletions
diff --git a/‎src/content/user-guide/deepsparse-engine/numactl-utility.mdx
Lines changed: 8 additions & 8 deletions b/‎src/content/user-guide/deepsparse-engine/numactl-utility.mdx
Lines changed: 8 additions & 8 deletions
diff --git a/‎src/content/user-guide/deepsparse-engine/scheduler.mdx
Lines changed: 14 additions & 14 deletions b/‎src/content/user-guide/deepsparse-engine/scheduler.mdx
Lines changed: 14 additions & 14 deletions
@@ -7,28 +7,28 @@ index: 4000
 
 # User Guides for the DeepSparse Engine
 
-This user guide offers more information for exploring additional and advanced functionality for the DeepSparse Engine.
+This user guide offers information for exploring additional and advanced functionality for the DeepSparse Engine.
 
 ## Guides
 
 <LinkCards>
   <LinkCard href="./hardware-support" heading="Supported Hardware">
-    Supported hardware for the DeepSparse Engine, including CPU types and instruction sets.
+    Lists supported hardware for the DeepSparse Engine, including CPU types and instruction sets.
   </LinkCard>
 
   <LinkCard href="./scheduler" heading="Inference Types">
-    Inference types and the tradeoffs with the DeepSparse Scheduler, such as single and multi-stream.
+    Describes inference types and tradeoffs with the DeepSparse Scheduler, such as single and multi-stream.
   </LinkCard>
 
   <LinkCard href="./benchmarking" heading="Benchmarking">
-    Benchmarking ONNX models in the DeepSparse Engine.
+    Explains how to benchmark ONNX models in the DeepSparse Engine.
   </LinkCard>
 
   <LinkCard href="./diagnostics-debugging" heading="Diagnostics/Debugging">
-    Logging guidance for diagnosing and debugging any issues.
+    Provides logging guidance for diagnosing and debugging any issues.
   </LinkCard>
 
   <LinkCard href="./numactl-utility" heading="numactl Utility">
-    Controlling resource utilization with the DeepSparse Engine using the numactl utility.
+    Explains how to use the numactl utility for controlling resource utilization with the DeepSparse Engine.
   </LinkCard>
 </LinkCards>
@@ -15,19 +15,19 @@ execute the model depending on the chosen scenario. By default, it will choose a
 
 ## Installation Requirements
 
-This page requires the [DeepSparse General Install](/get-started/install/deepsparse).
+Use of the DeepSparse Benchmarking utilities requires installation of the [DeepSparse Community](/get-started/install/deepsparse).
 
 ## Quickstart
 
-To benchmark a dense BERT ONNX model fine-tuned on the SST2 dataset (which is identified by its SparseZoo stub), run the following:
+To benchmark a dense BERT ONNX model fine-tuned on the SST2 dataset (which is identified by its SparseZoo stub), run:
 
 ```bash
 deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none
 ```
 
 ## Usage
 
-In most cases, good performance will be found in the default options so it can be as simple as running the command with a SparseZoo model stub or your local ONNX model.
+In most cases, good performance will be found in the default options so usage can be as simple as running the command with a SparseZoo model stub or your local ONNX model.
 However, if you prefer to customize benchmarking for your personal use case, you can run `deepsparse.benchmark -h` or with `--help` to view your usage options:
 
 CLI Arguments:
@@ -91,23 +91,23 @@ $ deepsparse.benchmark --help
 >         -x EXPORT_PATH, --export_path EXPORT_PATH
 >                                         Store results into a JSON file.
 ```
-💡**PRO TIP**💡: save your benchmark results in a convenient JSON file!
+**PRO TIP:** Save your benchmark results in a convenient JSON file.
 
-Example CLI command for benchmarking an ONNX model from the SparseZoo and saving the results to a `benchmark.json` file:
+The following is an example CLI command for benchmarking an ONNX model from the SparseZoo and saving the results to a `benchmark.json` file:
 
 ```bash
 deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none -x benchmark.json
 ```
 
 ### Sample CLI Argument Configurations
 
-To run a sparse FP32 MobileNetV1 at batch size 16 for 10 seconds for throughput using 8 streams of requests:
+To run a sparse FP32 MobileNetV1 at batch size 16 for 10 seconds for throughput using 8 streams of requests, use:
 
 ```bash
 deepsparse.benchmark zoo:cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-moderate --batch_size 16 --time 10 --scenario async --num_streams 8
 ```
 
-To run a sparse quantized INT8 6-layer BERT at batch size 1 for latency:
+To run a sparse quantized INT8 6-layer BERT at batch size 1 for latency, use:
 
 ```bash
 deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant_6layers-aggressive_96 --batch_size 1 --scenario sync
@@ -131,7 +131,7 @@ The throughput value reported comes from measuring the number of finished infere
 
 **BERT 3-layer FP32 Sparse Throughput**
 
-No need to add *scenario* argument since `async` is the default option:
+There is no need to add a *scenario* argument since `async` is the default option:
 ```bash
 $ deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_3layers-aggressive_83
 
 
@@ -35,20 +35,20 @@ For more fine-grained control, **numactl** can be used to bind the process runni
 
 Similarly, for a multi-socket system with N sockets and C physical CPUs per socket, the CPUs located on a single socket will range from K*C to ((K+1)*C)-1 where 0&lt;=K&lt;N. For multi-socket, multi-thread systems, the logical threads are separated by N*C. For example, for a two socket, two thread per CPU system with 8 cores per CPU, the logical threads for socket 0 would be numbered 0-7 and 16-23, and the threads for socket 1 would be numbered 8-15 and 24-31.
 
-Given the architecture above, to run the DeepSparse Engine on the first four CPUs on the second socket, you would use the following:
+Given the architecture above, to run the DeepSparse Engine on the first four CPUs on the second socket, you would use:
 
 ```bash
     numactl --physcpubind 8-11 --preferred 1 <deepsparseengine-process>
 ```
 
 Appending `--preferred 1` is needed here since the DeepSparse Engine is being bound to CPUs on the second socket.
 
-Note: When running on multiple sockets using a batch size that is evenly divisible by the number of sockets will yield the best performance.
+**Note:** When running on multiple sockets, using a batch size that is evenly divisible by the number of sockets will yield the best performance.
 
 
 ## DeepSparse Engine and Thread Pinning
 
-When using **numactl** to specify which CPUs/sockets the engine is allowed to run on, there is no restriction as to which CPU a particular computation thread is executed on. A single thread of computation may run on one or more CPUs during the course of execution. This is desirable if the system is being shared between multiple processes so that idle CPU threads are not prevented from doing other work.
+When using **numactl** to specify the CPUs/sockets on which the engine is allowed to run, there is no restriction as to the CPU on which a particular computation thread is executed. A single thread of computation may run on one or more CPUs during the course of execution. This is desirable if the system is being shared between multiple processes so that idle CPU threads are not prevented from doing other work.
 
 However, the engine works best when threads are pinned (i.e., not allowed to migrate from one CPU to another). Thread pinning can be enabled using the `NM_BIND_THREADS_TO_CORES` environment variable. For example:
 
@@ -58,20 +58,20 @@ However, the engine works best when threads are pinned (i.e., not allowed to mig
     export NM_BIND_THREADS_TO_CORES=1 <deepsparseengine-process>
 ```
 
-`NM_BIND_THREADS_TO_CORES` should be used with care since it forces the DeepSparse Engine to run on only the threads it has been allocated at startup. If any other process ends up running on the same threads, it could result in a major degradation of performance.
+Use `NM_BIND_THREADS_TO_CORES` with care since it forces the DeepSparse Engine to run on only the threads it has been allocated at startup. If any other process ends up running on the same threads, it could result in a major degradation of performance.
 
-**Note:** The threads-to-cores mappings described above are specific to Intel only. AMD has a different mapping. For AMD, all the threads for a single core are consecutive, i.e., if each core has two threads and there are N cores, the threads for a particular core K are 2*K and 2*K+1.  The mapping of cores to sockets is also straightforward, for a N socket system with C cores per socket, the cores for a particular socket S are numbered S*C to ((S+1)*C)-1.
+**Note:** The threads-to-cores mappings described above are specific to Intel only. AMD has a different mapping. For AMD, all the threads for a single core are consecutive; that is, if each core has two threads and there are N cores, the threads for a particular core K are 2*K and 2*K+1.  The mapping of cores to sockets is also straightforward. For an N socket system with C cores per socket, the cores for a particular socket S are numbered S*C to ((S+1)*C)-1.
 
 ## Additional Notes
 
+This displays the inventory of available sockets/CPUs on a system:
+
 `numactl --hardware`
 
-Displays the inventory of available sockets/CPUs on a system.
+This displays the resources available to the current process:
 
 `numactl --show`
 
-Displays the resources available to the current process.
-
 For further details about these and other parameters, see the man page on **numactl**:
 
 ```bash
 
@@ -9,38 +9,38 @@ index: 2000
 
 This page explains the various settings for DeepSparse, which enable you to tune the performance to your workload.
 
-Schedulers are special system software which handle the distribution of work across cores in parallel computation.
-The goal of a good scheduler is to ensure that while work is available, cores aren’t sitting idle.
+Schedulers are special system software, which handle the distribution of work across cores in parallel computation.
+The goal of a good scheduler is to ensure that, while work is available, cores are not sitting idle.
 On the contrary, as long as parallel tasks are available, all cores should be kept busy.
 
 ## Single Stream (Default)
 In most use cases, the default scheduler is the preferred choice when running inferences with the DeepSparse Engine.
-It's highly optimized for minimum per-request latency, using all of the system's resources provided to it on every request it gets.
+The default scheduler is highly optimized for minimum per-request latency, using all of the system's resources provided to it on every request it gets.
 Often, particularly when working with large batch sizes, the scheduler is able to distribute the workload of a single request across as many cores as it's provided.
 
 *Single-stream scheduling; requests execute serially by default:*
 <img src="https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/single-stream.png" alt="single stream diagram" />
 
-## Multi Stream
+## Multi-Stream
 
-However, there are circumstances in which more cores does not imply better performance. If the computation can't be divided up to produce enough parallelism (while maximizing use of the CPU cache), then adding more cores simply adds more compute power with little to apply it to.
+There are circumstances in which more cores does not imply better performance. If the computation can't be divided up to produce enough parallelism (while maximizing use of the CPU cache), then adding more cores simply adds more compute power with little to apply it to.
 
-An alternative, "multi-stream" scheduler is provided with the software. In cases where parallelism is low, sending multiple requests simultaneously can more adequately saturate the available cores. In other words, if speedup can't be achieved by adding more cores, then perhaps speedup can be achieved by adding more work.
+An alternative, multi-stream scheduler is provided with the software. In cases where parallelism is low, sending multiple requests simultaneously can more adequately saturate the available cores. In other words, if speedup can't be achieved by adding more cores, then perhaps speedup can be achieved by adding more work.
 
-If increasing core count doesn't decrease latency, that's a strong indicator that parallelism is low in your particular model/batch-size combination. It may be that total throughput can be increased by making more requests simultaneously. Using the [deepsparse.engine.Scheduler API,](https://docs.neuralmagic.com/deepsparse/api/deepsparse.html) the multi-stream scheduler can be selected, and requests made by multiple Python threads will be handled concurrently.
+If increasing core count does not decrease latency, that's a strong indicator that parallelism is low in your particular model/batch-size combination. It may be that total throughput can be increased by making more requests simultaneously. Using the [deepsparse.engine.Scheduler API,](https://docs.neuralmagic.com/deepsparse/api/deepsparse.html) the multi-stream scheduler can be selected, and requests made by multiple Python threads will be handled concurrently.
 
-*Multi-stream scheduling; requests execute in parallel and may utilize HW resources better:*
+*Multi-stream scheduling; requests execute in parallel and may better utilize hardware resources:*
 <img src="https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/multi-stream.png" alt="multi stream diagram" />
 
 
 
-Whereas the default scheduler will queue up requests made simultaneously and handle them serially, the multi-stream scheduler allows multiple requests to be run in parallel. The `num_streams` argument to the Engine/Context classes controls how the multi-streams scheduler partitions up the machine. Each stream maps to a contiguous set of hardware threads. By default, only one hyperthread per core is used. There is no sharing amongst the partitions and it is generally good practice make sure that the `num_streams` value evenly divides into your number of cores. By default `num_streams` is set to multiplex requests across L3 caches.
+Whereas the default scheduler will queue up requests made simultaneously and handle them serially, the multi-stream scheduler allows multiple requests to be run in parallel. The `num_streams` argument to the Engine/Context classes controls how the multi-streams scheduler partitions up the machine. Each stream maps to a contiguous set of hardware threads. By default, only one hyperthread per core is used. There is no sharing amongst the partitions and it is generally good practice to make sure the `num_streams` value evenly divides into your number of cores. By default `num_streams` is set to multiplex requests across L3 caches.
 
-Here's an example: Consider a machine with 2 sockets, each with 8 cores. In this case the multi-stream scheduler will create two streams, one per socket by default. The first stream will contain cores 0-7 and the second stream will contain cores 8-15.
+Here's an example. Consider a machine with 2 sockets, each with 8 cores. In this case, the multi-stream scheduler will create two streams, one per socket by default. The first stream will contain cores 0-7 and the second stream will contain cores 8-15.
 
-Manually increasing `num_streams` to 3 will result in the following stream breakdown: threads 0-5 in the first stream, 6-10 in the second, and 11-15 in the last. This is problematic for our two socket system. The second stream (threads 6-10) is straddling both sockets, meaning that each request being serviced by that stream is going to incur a performance penalty each time one of its threads makes a remote memory access. The impact of this penalty will depend on the workload, but it will likely be significant.
+Manually increasing `num_streams` to 3 will result in the following stream breakdown: threads 0-5 in the first stream, 6-10 in the second, and 11-15 in the last. This is problematic for our 2-socket system. The second stream (threads 6-10) is straddling both sockets, meaning that each request being serviced by that stream is going to incur a performance penalty each time one of its threads makes a remote memory access. The impact of this penalty will depend on the workload, but it will likely be significant.
 
-Manually increasing `num_streams` to 4 is interesting. Here's the stream breakdown: threads 0-3 in the first stream, 4-7 in the second, 8-11 in the third, and 12-15 in the fourth. Each stream is only making memory accesses that are local to its socket which is good. However, the first two and last two streams are sharing the same L3 cache which can result in worse performance due to cache thrashing. Depending on the workload, the performance gain from the increased parallelism may negate this penalty, though.
+Manually increasing `num_streams` to 4 is interesting. Here's the stream breakdown: threads 0-3 in the first stream, 4-7 in the second, 8-11 in the third, and 12-15 in the fourth. Each stream is only making memory accesses that are local to its socket, which is good. However, the first two and last two streams are sharing the same L3 cache, which can result in worse performance due to cache thrashing. Depending on the workload, though, the performance gain from the increased parallelism may negate this penalty.
 
 The most common use cases for the multi-stream scheduler are where parallelism is low with respect to core count, and where requests need to be made asynchronously without time to batch them. Implementing a model server may fit such a scenario and be ideal for using multi-stream scheduling.
 
@@ -52,12 +52,12 @@ Depending on your engine execution strategy, enable one of these options by runn
 engine = compile_model(model_path, scheduler="single_stream")
 ```
 
-or
+or:
 
 ```python
 engine = compile_model(model_path, scheduler="multi_stream", num_streams=None) # None is the default
 ```
 
-or pass in the enum value directly, since` "multi_stream" == Scheduler.multi_stream`
+or pass in the enum value directly, since` "multi_stream" == Scheduler.multi_stream`.
 
 By default, the scheduler will map to a single stream.