Skip to content

Memset speed-up for oapv_mset_x64a#115

Closed
subhrajitm20 wants to merge 2 commits intomainfrom
memset_usage_optimizations
Closed

Memset speed-up for oapv_mset_x64a#115
subhrajitm20 wants to merge 2 commits intomainfrom
memset_usage_optimizations

Conversation

@subhrajitm20
Copy link
Collaborator

  • Debug oapv_memset_x128_avx() code. Added missing iterator.
  • Pointed oapv_mset_x64a to oapv_memset_x128_avx() as a temporary speed-up measure.

* incremented ptr while storing value_vector

Signed-off-by: subhrajitm20 <2003subhrajit@gmail.com>
Signed-off-by: subhrajitm20 <2003subhrajit@gmail.com>
Copy link
Collaborator

@kpchoi kpchoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments

#define oapv_mset_x64a(dst, v, size) memset((dst), (v), (size))
#if X86_SSE
#define oapv_mset_x128(dst, v, size) oapv_memset_x128_avx((dst), (v), (size))
#define oapv_mset_x64a(dst, v, size) oapv_memset_x128_avx((dst), (v), (size))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oapv_mset_x64a() doesn't guranttee 128byte-aligned size.
The AVX optimized oapv_mset_x128 doesn't seems to handle the unalinged size case.

@subhrajitm20
Copy link
Collaborator Author

Relevant changes from this PR have been added to main by @kpchoi. Thus, Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants