Skip to content

Commit 8744f7d

Browse files
committed
Merge 4.2 release branch into amd-main
2 parents c0b8d2a + f542744 commit 8744f7d

File tree

183 files changed

+58796
-15716
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

183 files changed

+58796
-15716
lines changed

CMakeLists.txt

Lines changed: 407 additions & 170 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 112 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ compression and decompression methods which facilitate the applications to
88
easily integrate and use them.
99
AOCL-Compression supports lz4, zlib/deflate, lzma, zstd, bzip2, snappy, and lz4hc
1010
based compression and decompression methods along with their native APIs.
11+
The library offers openMP based multi-threaded implementation of lz4, zlib,
12+
zstd and snappy compression methods.
1113
It supports the dynamic dispatcher feature that executes the most optimal
1214
function variant implemented using Function Multi-versioning thereby offering
1315
a single optimized library portable across different x86 CPU architectures.
@@ -23,7 +25,7 @@ Installation
2325
------------
2426

2527
1. Download the latest stable release from the Github repository:<br>
26-
https://github.amd.com/AOCL/aocl-compression
28+
https://github.com/amd/aocl-compression
2729
2. Install CMake on the machine where the sources are to be compiled.
2830
3. Make any one of the compilers GCC or Clang available on the machine.
2931
4. Then, use the cmake based build system to compile and generate AOCL-Compression <br>
@@ -90,7 +92,11 @@ Building with Visual Studio IDE (GUI)
9092
Microsoft Visual Studio project is generated.
9193
6. Click __Open Project__.
9294
Microsoft Visual Studio project for the source package __is launched__.
93-
7. Build the entire solution or the required projects.
95+
7. For building multi-threaded library based on AOCL_ENABLE_THREADS, set the
96+
LLVM openMP library path in the Linker->General option and openMP library name
97+
in the Linker->Input under the project properties. Set /openmp as the additional
98+
compilation option.
99+
8. Build the entire solution or the required projects.
94100

95101
Building with Visual Studio IDE (command line)
96102
----------------------------------------------
@@ -112,20 +118,29 @@ AOCL_LZ4_OPT_PREFETCH_BACKWARDS | Enable LZ4 optimizations related to backw
112118
SNAPPY_MATCH_SKIP_OPT | Enable Snappy match skipping optimization (Disabled by default)
113119
LZ4_FRAME_FORMAT_SUPPORT | Enable building LZ4 with Frame format and API support (Enabled by default)
114120
AOCL_LZ4HC_DISABLE_PATTERN_ANALYSIS | Disable Pattern Analysis in LZ4HC for level 9 (Enabled by default)
115-
AOCL_ZSTD_4BYTE_LAZY2_MATCH_FINDER | Enable 4-byte comparison for finding a potential better match candidate with Lazy2 compressor (Disabled by default)
121+
AOCL_ZSTD_SEARCH_SKIP_OPT_DFAST_FAST| Enable ZSTD match skipping optimization, and reduce search strength/tolerance for levels 1-4 (Disabled by default)
122+
AOCL_ZSTD_WILDCOPY_LONG | Faster wildcopy when match lengths are long in ZSTD decompression (Disabled by default)
116123
AOCL_TEST_COVERAGE | Enable GTest and AOCL test bench based CTest suite (Disabled by default)
124+
AOCL_ENABLE_LOG_FEATURE | Enables logging through environment variable `AOCL_ENABLE_LOG` (Disabled by default)
125+
CODE_COVERAGE | Enable source code coverage. Only supported on Linux with the GCC compiler (Disabled by default)
126+
ASAN | Enable Address Sanitizer checks. Only supported on Linux/Debug build (Disabled by default)
127+
VALGRIND | Enable Valgrind checks. Only supported on Linux/Debug and incompatible with ASAN=ON (Disabled by default)
117128
BUILD_DOC | Build documentation for this library (Disabled by default)
118-
ZLIB_DEFLATE_FAST_MODE_2 | Enable optimization for deflate fast using Z_FIXED strategy. Do not combine with ZLIB_DEFLATE_FAST_MODE_3 (Disabled by default)
119-
ZLIB_DEFLATE_FAST_MODE_3 | Enable ZLIB deflate quick strategy. Do not combine with ZLIB_DEFLATE_FAST_MODE_2 (Disabled by default)
129+
ZLIB_DEFLATE_FAST_MODE | Enable ZLIB deflate quick strategy (Disabled by default)
120130
AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT1 | Enable LZ4 match skipping optimization strategy-1 based on a larger base step size applied for long distance search (Disabled by default)
121131
AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT2 | Enable LZ4 match skipping optimization strategy-2 by aggressively setting search distance on top of strategy-1. Preferred to be used with Silesia corpus (Disabled by default)
132+
AOCL_LZ4_NEW_PRIME_NUMBER | Enable the usage of a new prime number for LZ4 hashing function. Preferred to be used with Silesia corpus (Disabled by default)
133+
AOCL_LZ4_EXTRA_HASH_TABLE_UPDATES | Enable storing of additional potential matches to improve compression ratio. Recommended for higher compressibility use cases (Disabled by default)
134+
AOCL_LZ4_HASH_BITS_USED | Control the number of bits used for LZ4 hashing, allowed values are LOW (low perf gain and less CR regression) and HIGH (high perf gain and high CR regression) (Disabled by default)
122135
AOCL_EXCLUDE_BZIP2 | Exclude BZIP2 compression method from the library build (Disabled by default)
123136
AOCL_EXCLUDE_LZ4 | Exclude LZ4 compression method from the library build. LZ4HC also gets excluded (Disabled by default)
124137
AOCL_EXCLUDE_LZ4HC | Exclude LZ4HC compression method from the library build (Disabled by default)
125138
AOCL_EXCLUDE_LZMA | Exclude LZMA compression method from the library build (Disabled by default)
126139
AOCL_EXCLUDE_SNAPPY | Exclude SNAPPY compression method from the library build (Disabled by default)
127140
AOCL_EXCLUDE_ZLIB | Exclude ZLIB compression method from the library build (Disabled by default)
128141
AOCL_EXCLUDE_ZSTD | Exclude ZSTD compression method from the library build (Disabled by default)
142+
AOCL_XZ_UTILS_LZMA_API_EXPERIMENTAL | Build with xz utils lzma APIs. Experimental feature with limited API support (Disabled by default)
143+
AOCL_ENABLE_THREADS | Enable multi-threaded compression and decompression using SMP based openMP threads (Disabled by default)
129144

130145
Running AOCL-Compression Test Bench On Linux
131146
--------------------------------------------
@@ -165,18 +180,41 @@ Here, 5 is the level and 0 is the additional parameter passed to ZSTD method.
165180
Here, 5 is the level and 0 is the additional parameter passed to ZSTD method.
166181

167182

168-
* To run the test bench with error/debug/trace/info logs, use the command:<br>
169-
`aocl_compression_bench -a -t -v <input filename>`<br>
170-
Here, `-v` can be passed with a number such as v<n> that can take values:
171-
* 1 for Error (default)
172-
* 2 for Info
173-
* 3 for Debug
174-
* 4 for Trace.
175-
183+
* To run the test bench with error/debug/trace/info logs, build the library by using `-DAOCL_ENABLE_LOG_FEATURE=ON` & set the environment variable `AOCL_ENABLE_LOG` to any of the following:<br>
184+
* `AOCL_ENABLE_LOG=ERR` for Error logs.
185+
* `AOCL_ENABLE_LOG=INFO` for Error, Info logs.
186+
* `AOCL_ENABLE_LOG=DEBUG` for Error, Info, Debug logs.
187+
* `AOCL_ENABLE_LOG=TRACE` for Error, Info, Debug, Trace logs.<br>
188+
Note: When building the library for highest performance, do not enable `DAOCL_ENABLE_LOG_FEATURE`.
189+
190+
191+
* To run the test bench but only compression or decompression <br>
192+
for a given input file, use the command:<br>
193+
`aocl_compression_bench -rcompress <input filename>` or <br>
194+
`aocl_compression_bench -rdecompress -ezstd <compressed input filename>` or <br>
195+
`aocl_compression_bench -rdecompress -ezstd -t -f<uncompressed file for validation> <compressed input filename>` <br>
196+
Note: In -rdecompress mode, compression method must be specified using -e option. <br>
197+
If validation of decompressed data is needed, specify -t and -f options additionally.
198+
199+
* To run the test bench and dump output data generated <br>
200+
for a given input file, use the command:<br>
201+
`aocl_compression_bench -d<dump filename> -ezstd:1 <input filename>` or <br>
202+
`aocl_compression_bench -d<dump filename> -rcompress -ezstd:1 <input filename>` or <br>
203+
`aocl_compression_bench -d<dump filename> -rdecompress -ezstd <compressed input filename>` <br>
204+
Here, when -rcompress operation is selected, compressed file gets dumped <br>
205+
and when -rdecompress operation is selected, decompressed file gets dumped. <br>
206+
Method name and level must be specified using -e for default and -rcompress modes. <br>
207+
Method name must be specified using -e for -rdecompress mode. <br>
208+
209+
* NOTE: <br>
210+
1. Compression and decompression of large files (>1GB) are supported in the test bench. <br>
211+
2. Decompression of compressed files (> 1GB) that are not generated by aocl-compression <br>
212+
is not guaranteed by the test bench. <br>
213+
176214
---
177215

178216
To test and benchmark the performance of IPP's compression methods, use the
179-
test bench option `-c` along with other relevant options (as explained above).
217+
test bench option `-c<path to IPP library method>` along with other relevant options (as explained above).
180218
IPP's lz4, lz4hc, zlib and bzip2 methods are supported by the test bench.
181219
Check the following details for the exact steps:
182220
1. Set the library path environment variable (export LD_LIBRARY_PATH on <br>
@@ -190,15 +228,15 @@ Check the following details for the exact steps:
190228
4. Build the patched IPP lz4, zlib and bzip2 libraries per the steps <br>
191229
in the IPP readme files in the corresponding patch file <br>
192230
locations for these compression methods.
193-
5. Set the library path environment variable (export LD_LIBRARY_PATH on <br>
194-
Linux) to point to the patched IPP lz4, zlib and bzip2 libraries.
231+
5. Append the library path to `-c` option and pass it to executable as command line argument <br>
232+
(Linux is only supported) for running patched IPP lz4, zlib and bzip2 libraries.
195233
6. Run the test bench to benchmark the IPP library methods as follows:
196234
```
197-
aocl_compression_bench -a -p -c <input filename>
198-
aocl_compression_bench -elz4 -p -c <input filename>
199-
aocl_compression_bench -elz4hc -p -c <input filename>
200-
aocl_compression_bench -ezlib -p -c <input filename>
201-
aocl_compression_bench -ebzip2 -p -c <input filename>
235+
aocl_compression_bench -a -p -c/path/to/ipp_patch <input filename>
236+
aocl_compression_bench -elz4 -p -c/path/to/ipp_patch <input filename>
237+
aocl_compression_bench -elz4hc -p -c/path/to/ipp_patch <input filename>
238+
aocl_compression_bench -ezlib -p -c/path/to/ipp_patch <input filename>
239+
aocl_compression_bench -ebzip2 -p -c/path/to/ipp_patch <input filename>
202240
```
203241

204242
Running AOCL-Compression Test Bench On Windows
@@ -229,6 +267,27 @@ Following are a few sample commands that can be executed in the build directory
229267
To run GTest test cases for a specific method<br>
230268
`ctest -R <METHOD_NAME_IN_CAPITALS>`
231269

270+
Running source code coverage using GCOV
271+
---------------------------------------
272+
273+
To measure source code coverage, use CODE_COVERAGE option while configuring the CMake build. Run CMake with the custom target option 'code-coverage' to execute tests and generate code coverage data. The code coverage reports are generated in the build directory under subdirectory called 'coverage/html_report'. Open the HTML files in browser to view the coverage information.
274+
275+
Following is the sample command usage to run code coverage:
276+
`cmake --build <build directory> --target install code-coverage`
277+
278+
Running Valgrind and ASAN memory checks using CTest
279+
---------------------------------------------------
280+
281+
Use VALGRIND option for Valgrind memory check and ASAN option for ASAN memory check while configuring the CMake build. VALGRIND and ASAN options can not be enabled together.
282+
283+
Following are the commands to execute in the 'build' directory to run memory checks.
284+
285+
To run Valgrind memory check<br>
286+
`ctest -T memcheck`
287+
288+
To run ASAN memory check<br>
289+
`ctest`
290+
232291
Running Performance Benchmarking
233292
--------------------------------
234293

@@ -253,6 +312,38 @@ Generating Documentation
253312
- Documents will be generated in HTML format in the folder __docs/html__ . Open the index.html file in any browser to view the documentation.
254313
- CMake will use the existing Doxygen if available. Else, it will prompt the user to install doxygen and try again.
255314

315+
Enabling/disabling optimizations
316+
--------------------------------
317+
- AOCL optimizations can be disabled by setting the environment variable AOCL_DISABLE_OPT to ON.
318+
- Reference code paths are taken in such a scenario.
319+
- This needs to be set before launching the application for it to take effect.
320+
- If optimization is turned off via aocl_compression_desc::optOff (= 1) passed to aocl_llc_setup(), then reference code paths are taken.
321+
- If optimization is turned on via aocl_compression_desc::optOff (= 0) passed to aocl_llc_setup(), then AOCL_DISABLE_OPT is checked
322+
additionally to override aocl_compression_desc::optOff value.
323+
324+
Enabling specific instructions (ISA)
325+
------------------------------------
326+
- AOCL optimizations can be restricted to certain ISAs by setting the environment variable
327+
AOCL_ENABLE_INSTRUCTIONS. Supported values are SSE2, AVX, AVX2 and AVX512.
328+
- This ensures optimized code paths with ISAs above the set value are not taken. E.g. If
329+
it is set to AVX, no AVX2 and AVX512 optimized code paths are taken.
330+
- This needs to be set before launching the application for it to take effect.
331+
- It takes precedence over aocl_compression_desc::optLevel setting passed to aocl_llc_setup().
332+
- Note: When calling aocl_llc_setup() API from multiple threads, changing aocl_compression_desc::optOff
333+
and aocl_compression_desc::optLevel values between threads can lead to undefined behaviour.
334+
335+
Multi-threaded Compression and Decompression
336+
--------------------------------------------
337+
- Parallel compression and decompression of lz4, zlib, zstd and snappy is implemented using
338+
openMP multi-threading. A RAP (random access point) frame is introduced in AOCL-Compression
339+
to support parallel decompression of the compressed streams/files. Use AOCL_ENABLE_THREADS
340+
config option to enable the multi-threading.
341+
- A stream compressed with multi-threaded AOCL-Compression library can be decompressed using any
342+
single-threaded standard decompressor by simply skipping the initial block of bytes containing
343+
the RAP frame present at the start of the stream.
344+
- The multi-threaded compression support is optimally tuned for AMD CPUs on Linux® OS whereas
345+
this support is experimental for Windows® platforms.
346+
256347

257348
CONTACTS
258349
--------

algos/bzip2/blocksort.c

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
in the file LICENSE.
2020
------------------------------------------------------------------ */
2121

22-
22+
#include "utils/utils.h"
2323
#include "bzlib_private.h"
2424

2525
/*---------------------------------------------*/
@@ -762,9 +762,7 @@ void AOCL_mainSimpleSort ( UInt32* ptr,
762762

763763
#endif
764764

765-
#ifdef AOCL_DYNAMIC_DISPATCHER
766-
767-
void (*AOCL_mainSimpleSort_fp) ( UInt32* ptr,
765+
void (*AOCL_mainSimpleSort_fp) ( UInt32* ptr,
768766
UChar* block,
769767
UInt16* quadrant,
770768
Int32 nblock,
@@ -773,7 +771,7 @@ void (*AOCL_mainSimpleSort_fp) ( UInt32* ptr,
773771
Int32 d,
774772
Int32* budget ) = mainSimpleSort;
775773

776-
void aocl_register_mainSimpleSort_fmv(int optOff, int optLevel, size_t insize, size_t level, size_t windowLog)
774+
void aocl_register_mainSimpleSort_fmv(int optOff, int optLevel)
777775
{
778776
if (optOff)
779777
{
@@ -783,19 +781,30 @@ void aocl_register_mainSimpleSort_fmv(int optOff, int optLevel, size_t insize, s
783781
{
784782
switch (optLevel)
785783
{
784+
case -1: // undecided. use defaults based on compiler flags
785+
#ifdef AOCL_BZIP2_OPT
786+
AOCL_mainSimpleSort_fp = AOCL_mainSimpleSort;
787+
#else
788+
AOCL_mainSimpleSort_fp = mainSimpleSort;
789+
#endif
790+
break;
791+
#ifdef AOCL_BZIP2_OPT
786792
case 0://C version
787793
case 1://SSE version
788794
case 2://AVX version
789795
case 3://AVX2 version
790796
default://AVX512 and other versions
791797
AOCL_mainSimpleSort_fp = AOCL_mainSimpleSort;
792798
break;
799+
#else
800+
default:
801+
AOCL_mainSimpleSort_fp = mainSimpleSort;
802+
break;
803+
#endif
793804
}
794805
}
795806
}
796807

797-
#endif
798-
799808
/*---------------------------------------------*/
800809
/*--
801810
The following is an implementation of
@@ -890,11 +899,7 @@ void mainQSort3 ( UInt32* ptr,
890899
if (hi - lo < MAIN_QSORT_SMALL_THRESH ||
891900
d > MAIN_QSORT_DEPTH_THRESH) {
892901
#ifdef AOCL_BZIP2_OPT
893-
#ifdef AOCL_DYNAMIC_DISPATCHER
894902
AOCL_mainSimpleSort_fp ( ptr, block, quadrant, nblock, lo, hi, d, budget );
895-
#else
896-
AOCL_mainSimpleSort ( ptr, block, quadrant, nblock, lo, hi, d, budget );
897-
#endif
898903
#else
899904
mainSimpleSort ( ptr, block, quadrant, nblock, lo, hi, d, budget );
900905
#endif

0 commit comments

Comments
 (0)