-
Notifications
You must be signed in to change notification settings - Fork 22
Add FP32 Armpl Blas optimization [MOD-9011] #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
140 commits
Select commit
Hold shift + click to select a range
1171d17
Add arm support
dor-forer 8102ad1
Changed the arm cpu info
dor-forer 0504e08
Add ip test
dor-forer ba931d0
Add to tests
dor-forer e0642c8
Added tests andbm
dor-forer 4b8c347
fix tests
dor-forer 3039eb8
Add github benchmakrs
dor-forer 9a67ee8
Check 1
dor-forer a9b87d4
only arm
dor-forer da3c880
change ami
dor-forer 1fdb6d5
Try ireland
dor-forer b4302e1
Try different image
dor-forer a83947a
try image
dor-forer a698070
back to old image
dor-forer 730d8ac
larger image
dor-forer 38371c5
Add option to change env
dor-forer 202a89d
back to default region
dor-forer 185703d
Created new image
dor-forer 90e885c
Try to add the x86 to check
dor-forer d61c358
Try different machine
dor-forer 4a88b1f
added include
dor-forer 3ceadaa
Try without opti on arm
dor-forer e89762c
Change to c6g
dor-forer ba1ea86
added matrix region
dor-forer 76b7132
change to west
dor-forer 55bb40f
try the i8
dor-forer 1b84ced
Try oregon
dor-forer 3f98c27
Change subnet id
dor-forer 66d96a1
Now subnet
dor-forer 0c5f16c
Change subnet
dor-forer b2af693
add subnet
dor-forer 20e596c
Try group id
dor-forer 0682472
Change to vpc id
dor-forer 9be3846
change subnet
dor-forer 125e30b
Change ami
dor-forer 6758753
Try without subnet
dor-forer 2a37fb3
add security group again
dor-forer 7d97821
Change the subnets
dor-forer 97e7249
Change to ids
dor-forer 4545554
Change sg
dor-forer 3a443d3
psubnet
dor-forer a472150
Try different
dor-forer bee1c27
different
dor-forer 4a891da
to a file
dor-forer 0341dd7
print
dor-forer f8f424a
p
dor-forer ee0458a
leave empty
dor-forer 26ff2cc
empty
dor-forer d3eaeeb
Try different account
dor-forer 55bc653
Run 2 arm machines
dor-forer 21de162
Move both to us-west-2
dor-forer 6f8e4d4
Try workflow
dor-forer eedc25c
Change name
dor-forer 578b88d
Changes
dor-forer 41e920f
Change the secrets
dor-forer 6218a9c
Add supprted arch
dor-forer 1533ba7
Add defaults
dor-forer a86d7ac
Support all
dor-forer 7652c9e
Change the jq
dor-forer c369125
Change machine to t4g
dor-forer 9d9a047
Change the name
dor-forer 14f8739
Change the machine
dor-forer 2f119ec
fix the stop
dor-forer 96d63af
only benchamrk
dor-forer 305aa0b
add the secrets
dor-forer 4e45109
region secret
dor-forer 1b4649a
benchmark region
dor-forer 797d1d6
Change timeout
dor-forer db9c63e
Added support for arch name in benchamrks
dor-forer 106fc5e
change th json
dor-forer a0d62fb
changed to v9.0
dor-forer b8075b1
Change the check
dor-forer 2007e33
add v9
dor-forer 606cea7
Check alt version of armv9
dor-forer 12bead0
added check
dor-forer 976c366
add arc_arch
dor-forer 8e23a2f
changed to CONCAT_WITH_UNDERSCORE_ARCH
dor-forer e81ce18
change the check
dor-forer f8f3d9e
Add full check
dor-forer f408017
fix the instruct
dor-forer 0af63d8
Added the cmake
dor-forer 38d563a
fix the support
dor-forer 87ac845
put it back to cmake
dor-forer 14bcd59
back
dor-forer b48d9c4
change the condition
dor-forer 47b9724
No armpl for now
dor-forer 1b35e30
cland format
dor-forer cafb30c
remove the opt
dor-forer bde60e4
Changed to one machine
dor-forer 421715c
Added BENCHMARK_ARCH
dor-forer 3c07da6
fix endif
dor-forer eabe27c
Remove secrets call
dor-forer 7beb70b
pr changes
dor-forer 66b37a6
Changes
dor-forer 768636d
change to compile
dor-forer 01a4f60
add sve
dor-forer 287490f
add #endif
dor-forer 570ab69
add armpl
dor-forer 9ad8c1e
add to cmake
dor-forer 0334e43
remove armpl
dor-forer 15e7963
add install
dor-forer 3750241
Add ARCH=$(uname -m)
dor-forer 22596de
change the path to armpl
dor-forer 69a2f24
suuport check for armv7
dor-forer f31c2a3
change the armpl
dor-forer fd6291e
Change or OR
dor-forer 154b2a8
Merge branch 'dorer-add-arm-support' of https://github.com/RedisAI/Ve…
dor-forer 877a70e
add neon supported for spaces
dor-forer c32ef14
add sve
dor-forer 655a474
add support
dor-forer 4cc47c3
Merge branch 'main' of https://github.com/RedisAI/VectorSimilarity in…
dor-forer 6ae4deb
align
dor-forer 9b09210
format
dor-forer d3cb7ae
change error
dor-forer 405931d
change
dor-forer ef9563f
Removed the ifdef
dor-forer 220616b
Add comments
dor-forer 52c5382
clang
dor-forer e31aa8a
Change names
dor-forer 63ce083
format
dor-forer c6317bc
PR changes
dor-forer 45e8fdd
Change to 1
dor-forer f1487b8
fix the l2
dor-forer 8e097c8
fix format
dor-forer 4aeca15
add desciriopn for chunk == 1
dor-forer 4489bf3
remove template armpl
dor-forer 6fa2474
Back to armpl
dor-forer f2305dc
back to armpl_neon
dor-forer d567ab2
include
dor-forer 192f8e6
armnpl
dor-forer 87feb67
Revert implemetion chooser
dor-forer 67aa3fc
Revert remove error
dor-forer 5ec219d
Remove comment
dor-forer 8882492
Remove empty line
dor-forer 0310d16
Add support macos
dor-forer f8fb4d2
add sudo
dor-forer b72986a
Add absolute path
dor-forer eacebb2
find all libs
dor-forer ead05e9
Change folder
dor-forer e9d0d64
Now set for real
dor-forer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| /* | ||
| *Copyright Redis Ltd. 2021 - present | ||
| *Licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or | ||
| *the Server Side Public License v1 (SSPLv1). | ||
| */ | ||
|
|
||
| #include "VecSim/spaces/space_includes.h" | ||
| #include <armpl.h> | ||
|
|
||
| float FP32_InnerProduct_ARMPL_NEON(const void *pVect1v, const void *pVect2v, size_t dimension) { | ||
| auto *vec1 = (float *)pVect1v; | ||
| auto *vec2 = (float *)pVect2v; | ||
|
|
||
| // Notice: Armpl can choose different implementation based on cpu features. | ||
| float res = cblas_sdot(static_cast<int>(dimension), vec1, 1, vec2, 1); | ||
| return 1.0f - res; | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| /* | ||
| *Copyright Redis Ltd. 2021 - present | ||
| *Licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or | ||
| *the Server Side Public License v1 (SSPLv1). | ||
| */ | ||
|
|
||
| #include "VecSim/spaces/space_includes.h" | ||
| #include <armpl.h> | ||
|
|
||
| float FP32_InnerProduct_ARMPL_SVE2(const void *pVect1v, const void *pVect2v, size_t dimension) { | ||
| auto *vec1 = (float *)pVect1v; | ||
| auto *vec2 = (float *)pVect2v; | ||
|
|
||
| // Notice: Armpl can choose different implementation based on cpu features. | ||
| float res = cblas_sdot(static_cast<int>(dimension), vec1, 1, vec2, 1); | ||
| return 1.0f - res; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| /* | ||
| *Copyright Redis Ltd. 2021 - present | ||
| *Licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or | ||
| *the Server Side Public License v1 (SSPLv1). | ||
| */ | ||
|
|
||
| #include "VecSim/spaces/space_includes.h" | ||
| #include <armpl.h> | ||
|
|
||
| float FP32_InnerProduct_ARMPL_SVE(const void *pVect1v, const void *pVect2v, size_t dimension) { | ||
| auto *vec1 = (float *)pVect1v; | ||
| auto *vec2 = (float *)pVect2v; | ||
|
|
||
| // Notice: Armpl can choose different implementation based on cpu features. | ||
| float res = cblas_sdot(static_cast<int>(dimension), vec1, 1, vec2, 1); | ||
| return 1.0f - res; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| /* | ||
| *Copyright Redis Ltd. 2021 - present | ||
| *Licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or | ||
| *the Server Side Public License v1 (SSPLv1). | ||
| */ | ||
|
|
||
| #include "VecSim/spaces/space_includes.h" | ||
| #include <armpl.h> | ||
|
|
||
| float FP32_L2Sqr_ARMPL_NEON(const void *pVect1v, const void *pVect2v, size_t dimension) { | ||
| const float *vec1 = static_cast<const float *>(pVect1v); | ||
| const float *vec2 = static_cast<const float *>(pVect2v); | ||
|
|
||
| float result = 0.0f; | ||
| constexpr const size_t blockSize = 1024; | ||
| float buffer[blockSize]; | ||
|
|
||
| // Pre-calculate number of full blocks and the size of the last partial block | ||
| const size_t fullBlockCount = dimension / blockSize; | ||
| const size_t lastBlockSize = dimension % blockSize; | ||
|
|
||
| // Process full blocks | ||
| for (size_t i = 0; i < fullBlockCount; i++) { | ||
| size_t offset = i * blockSize; | ||
|
|
||
| // Calculate difference vector for full block | ||
| for (size_t j = 0; j < blockSize; j++) { | ||
| buffer[j] = vec1[offset + j] - vec2[offset + j]; | ||
| } | ||
|
|
||
| // Use ARMPL to compute dot product | ||
| result += cblas_sdot(blockSize, buffer, 1, buffer, 1); | ||
| } | ||
|
|
||
| // Handle remaining elements (if any) | ||
| if (lastBlockSize > 0) { | ||
| size_t offset = fullBlockCount * blockSize; | ||
|
|
||
| // Calculate difference vector for remaining elements | ||
| for (size_t j = 0; j < lastBlockSize; j++) { | ||
| buffer[j] = vec1[offset + j] - vec2[offset + j]; | ||
| } | ||
|
|
||
| // Use ARMPL to compute dot product | ||
| result += cblas_sdot(lastBlockSize, buffer, 1, buffer, 1); | ||
| } | ||
|
|
||
| return result; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| /* | ||
| *Copyright Redis Ltd. 2021 - present | ||
| *Licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or | ||
| *the Server Side Public License v1 (SSPLv1). | ||
| */ | ||
|
|
||
| #include "VecSim/spaces/space_includes.h" | ||
| #include "armpl.h" | ||
|
|
||
| float FP32_L2Sqr_ARMPL_SVE2(const void *pVect1v, const void *pVect2v, size_t dimension) { | ||
| const float *vec1 = static_cast<const float *>(pVect1v); | ||
| const float *vec2 = static_cast<const float *>(pVect2v); | ||
|
|
||
| float result = 0.0f; | ||
| constexpr const size_t blockSize = 1024; | ||
| float buffer[blockSize]; | ||
|
|
||
| // Pre-calculate number of full blocks and the size of the last partial block | ||
| const size_t fullBlockCount = dimension / blockSize; | ||
| const size_t lastBlockSize = dimension % blockSize; | ||
|
|
||
| // Process full blocks | ||
| for (size_t i = 0; i < fullBlockCount; i++) { | ||
| size_t offset = i * blockSize; | ||
|
|
||
| // Calculate difference vector for full block | ||
| for (size_t j = 0; j < blockSize; j++) { | ||
| buffer[j] = vec1[offset + j] - vec2[offset + j]; | ||
| } | ||
|
|
||
| // Use ARMPL to compute dot product | ||
| result += cblas_sdot(blockSize, buffer, 1, buffer, 1); | ||
| } | ||
|
|
||
| // Handle remaining elements (if any) | ||
| if (lastBlockSize > 0) { | ||
| size_t offset = fullBlockCount * blockSize; | ||
|
|
||
| // Calculate difference vector for remaining elements | ||
| for (size_t j = 0; j < lastBlockSize; j++) { | ||
| buffer[j] = vec1[offset + j] - vec2[offset + j]; | ||
| } | ||
|
|
||
| // Use ARMPL to compute dot product | ||
| result += cblas_sdot(lastBlockSize, buffer, 1, buffer, 1); | ||
| } | ||
|
|
||
| return result; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| /* | ||
| *Copyright Redis Ltd. 2021 - present | ||
| *Licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or | ||
| *the Server Side Public License v1 (SSPLv1). | ||
| */ | ||
|
|
||
| #include "VecSim/spaces/space_includes.h" | ||
| #include "armpl.h" | ||
|
|
||
| float FP32_L2Sqr_ARMPL_SVE(const void *pVect1v, const void *pVect2v, size_t dimension) { | ||
| const float *vec1 = static_cast<const float *>(pVect1v); | ||
| const float *vec2 = static_cast<const float *>(pVect2v); | ||
|
|
||
| float result = 0.0f; | ||
| constexpr const size_t blockSize = 1024; | ||
| float buffer[blockSize]; | ||
|
|
||
| // Pre-calculate number of full blocks and the size of the last partial block | ||
| const size_t fullBlockCount = dimension / blockSize; | ||
| const size_t lastBlockSize = dimension % blockSize; | ||
|
|
||
| // Process full blocks | ||
| for (size_t i = 0; i < fullBlockCount; i++) { | ||
| size_t offset = i * blockSize; | ||
|
|
||
| // Calculate difference vector for full block | ||
| for (size_t j = 0; j < blockSize; j++) { | ||
| buffer[j] = vec1[offset + j] - vec2[offset + j]; | ||
| } | ||
|
|
||
| // Use ARMPL to compute dot product | ||
| result += cblas_sdot(blockSize, buffer, 1, buffer, 1); | ||
| } | ||
|
|
||
| // Handle remaining elements (if any) | ||
| if (lastBlockSize > 0) { | ||
| size_t offset = fullBlockCount * blockSize; | ||
|
|
||
| // Calculate difference vector for remaining elements | ||
| for (size_t j = 0; j < lastBlockSize; j++) { | ||
| buffer[j] = vec1[offset + j] - vec2[offset + j]; | ||
| } | ||
|
|
||
| // Use ARMPL to compute dot product | ||
| result += cblas_sdot(lastBlockSize, buffer, 1, buffer, 1); | ||
| } | ||
|
|
||
| return result; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.