Skip to content

Fix EC PKCS#11 object leak with OpenSSL < 3 and improve RSA reference counting#593

Merged
mtrojnar merged 2 commits intoOpenSC:masterfrom
hazefully:improve-object-ref-counting
May 7, 2025
Merged

Fix EC PKCS#11 object leak with OpenSSL < 3 and improve RSA reference counting#593
mtrojnar merged 2 commits intoOpenSC:masterfrom
hazefully:improve-object-ref-counting

Conversation

@hazefully
Copy link
Contributor

Problem

When using an ECDSA private key with OpenSSL versions < 3, the PKCS11_OBJECT_private object and attached PKCS11_SLOT_private end up leaking and not being cleaned up, even with all instances of the EVP_PKEY being freed. This can be seen when the ec-check-privkey.softhsm test is run with valgrind (copying relevant leaks only as there are a lot of leaks related to dlopen) :

Compiled with: OpenSSL 1.1.1  11 Sep 2018
...
==231837== 13 bytes in 1 blocks are indirectly lost in loss record 2 of 19
==231837==    at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==231837==    by 0x5162C11: pkcs11_getattr_alloc (p11_attr.c:68)
==231837==    by 0x5163FDC: pkcs11_object_from_handle (p11_key.c:147)
==231837==    by 0x5164417: pkcs11_init_key.constprop.0 (p11_key.c:790)
==231837==    by 0x51655B4: pkcs11_next_key (p11_key.c:759)
==231837==    by 0x51655B4: pkcs11_find_keys (p11_key.c:737)
==231837==    by 0x51655B4: pkcs11_enumerate_keys (p11_key.c:689)
==231837==    by 0x5169E59: PKCS11_enumerate_keys_ext (p11_front.c:202)
==231837==    by 0x5160CA2: match_key_int (util_uri.c:1468)
==231837==    by 0x516189F: util_ctx_load_object_with_login (util_uri.c:1067)
==231837==    by 0x516189F: util_ctx_load_object (util_uri.c:1183)
==231837==    by 0x5162893: UTIL_CTX_get_privkey_from_uri (util_uri.c:1510)
==231837==    by 0x515F049: ENGINE_CTX_load_privkey (eng_back.c:196)
==231837==    by 0x515EB2D: load_privkey (eng_front.c:207)
==231837==    by 0x49B8642: ENGINE_load_private_key (in /usr/local1.1/lib/libcrypto.so.1.1)
...
==231837== 136 bytes in 1 blocks are indirectly lost in loss record 8 of 19
==231837==    at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==231837==    by 0x51697AB: pkcs11_slot_new (p11_slot.c:450)
==231837==    by 0x51697AB: pkcs11_enumerate_slots (p11_slot.c:91)
==231837==    by 0x515FA34: util_ctx_enumerate_slots_unlocked (util_uri.c:131)
==231837==    by 0x515FB41: util_ctx_init_libp11 (util_uri.c:175)
==231837==    by 0x5161431: util_ctx_load_object (util_uri.c:1169)
==231837==    by 0x5162893: UTIL_CTX_get_privkey_from_uri (util_uri.c:1510)
==231837==    by 0x515F049: ENGINE_CTX_load_privkey (eng_back.c:196)
==231837==    by 0x515EB2D: load_privkey (eng_front.c:207)
==231837==    by 0x49B8642: ENGINE_load_private_key (in /usr/local1.1/lib/libcrypto.so.1.1)
==231837==    by 0x109509: main (check-privkey.c:161)
...
==231837== 216 bytes in 1 blocks are indirectly lost in loss record 11 of 19
==231837==    at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==231837==    by 0x516971D: pkcs11_slot_new (p11_slot.c:438)
==231837==    by 0x516971D: pkcs11_enumerate_slots (p11_slot.c:91)
==231837==    by 0x515FA34: util_ctx_enumerate_slots_unlocked (util_uri.c:131)
==231837==    by 0x515FB41: util_ctx_init_libp11 (util_uri.c:175)
==231837==    by 0x5161431: util_ctx_load_object (util_uri.c:1169)
==231837==    by 0x5162893: UTIL_CTX_get_privkey_from_uri (util_uri.c:1510)
==231837==    by 0x515F049: ENGINE_CTX_load_privkey (eng_back.c:196)
==231837==    by 0x515EB2D: load_privkey (eng_front.c:207)
==231837==    by 0x49B8642: ENGINE_load_private_key (in /usr/local1.1/lib/libcrypto.so.1.1)
==231837==    by 0x109509: main (check-privkey.c:161)
....
==231837== LEAK SUMMARY:
==231837==    definitely lost: 1,680 bytes in 4 blocks
==231837==    indirectly lost: 370 bytes in 4 blocks
==231837==      possibly lost: 0 bytes in 0 blocks
==231837==    still reachable: 88,429 bytes in 24 blocks
==231837==         suppressed: 0 bytes in 0 blocks

A side effect of this is that CloseAllSessions is not called during the engine cleanup as the related slot's reference count never reaches zero:

Compiled with: OpenSSL 1.1.1  11 Sep 2018
....
4: C_GetSlotList
P:231837; T:0x81095552 2025-04-25 12:47:21.926
[in] tokenPresent = 0x0
[out] pSlotList: 
Slot 1525197781
Slot 1
[out] *pulCount = 0x2
Returned:  0 CKR_OK
....
9: C_OpenSession
P:231837; T:0x81095552 2025-04-25 12:47:21.991
[in] slotID = 0x5ae8abd5 -> the slot with the EC key
[in] flags = 0x4
[in] pApplication = (nil)
[in] Notify = (nil)
[out] *phSession = 0x1
Returned:  0 CKR_OK
....
13: C_GetSessionInfo
P:231837; T:0x81095552 2025-04-25 12:47:22.051
[in] hSession = 0x1
[out] pInfo: 
      slotID:                  1525197781
      state:                   0 (           CKS_RO_PUBLIC_SESSION)
      flags:                   4
        CKF_SERIAL_SESSION               
      ulDeviceError:           0
Returned:  0 CKR_OK

14: C_Login
P:231837; T:0x81095552 2025-04-25 12:47:22.056
[in] hSession = 0x1
[in] userType = CKU_USER
[in] pPin[ulPinLen] 0000000004f15330 / 4
    00000000  31 32 33 34                                      1234            
Returned:  0 CKR_OK
....
43: C_CloseAllSessions
P:231837; T:0x81095552 2025-04-25 12:47:22.564
[in] slotID = 0x1.   -> not the correct SlotID (SlotID 1 is the empty SoftHSM Slot)
Returned:  0 CKR_OK

44: C_Finalize
P:231837; T:0x81095552 2025-04-25 12:47:22.571
Returned:  0 CKR_OK

The same problem doesn't happen with OpenSSL 3.0.17, as it follows a different code path:

Compiled with: OpenSSL 3.0.17-dev  (Library: OpenSSL 3.0.17-dev )
....
43: C_CloseAllSessions
P:245659; T:0x82766720 2025-04-25 12:55:06.272
[in] slotID = 0x363acd67
Returned:  0 CKR_OK

44: C_CloseAllSessions
P:245659; T:0x82766720 2025-04-25 12:55:06.282
[in] slotID = 0x1
Returned:  0 CKR_OK

45: C_Finalize
P:245659; T:0x82766720 2025-04-25 12:55:06.283
Returned:  0 CKR_OK
.....
==245659== LEAK SUMMARY:
==245659==    definitely lost: 1,312 bytes in 3 blocks
==245659==    indirectly lost: 0 bytes in 0 blocks
==245659==      possibly lost: 0 bytes in 0 blocks
==245659==    still reachable: 86,612 bytes in 19 blocks
==245659==         suppressed: 0 bytes in 0 blocks
==245659== 

This issue is very similar to what used to happen with RSA keys, which was fixed in #555.

Changes

7ed4afc fixes the code path in pkcs11_get_key for OpenSSL versions < 3 to skip incrementing the PKCS11_OBJECT_private object's reference counter when assigning the underlying EC_KEY object which already holds a reference to the same PKCS#11 object. This stops the memory leak and causes CloseAllSessions to get called as expected:

Compiled with: OpenSSL 1.1.1  11 Sep 2018
...
43: C_CloseAllSessions
P:235065; T:0x81095552 2025-04-25 12:48:39.324
[in] slotID = 0x63b0702c
Returned:  0 CKR_OK

44: C_CloseAllSessions
P:235065; T:0x81095552 2025-04-25 12:48:39.333
[in] slotID = 0x1
Returned:  0 CKR_OK

45: C_Finalize
P:235065; T:0x81095552 2025-04-25 12:48:39.335
Returned:  0 CKR_OK
...
==235065== LEAK SUMMARY:
==235065==    definitely lost: 1,312 bytes in 3 blocks
==235065==    indirectly lost: 0 bytes in 0 blocks
==235065==      possibly lost: 0 bytes in 0 blocks
==235065==    still reachable: 88,429 bytes in 24 blocks
==235065==         suppressed: 0 bytes in 0 blocks

In addition, ae5e3d7 improves the reference counting of RSA objects and attached PKCS#11 objects to remove the assumption made in pkcs11_rsa in p11_rsa.c about the OpenSSL's EVP_PKEY reference counter. See the commit message for more details about the changes. This commit doesn't re-introduce the leak fixed in #555 with either OpenSSL 1.1.1 or OpenSSL 3.0.17:

Compiled with: OpenSSL 1.1.1  11 Sep 2018
...
31: C_CloseAllSessions
P:235099; T:0x81095552 2025-04-25 12:48:49.670
[in] slotID = 0x1b5eb1e8
Returned:  0 CKR_OK

32: C_CloseAllSessions
P:235099; T:0x81095552 2025-04-25 12:48:49.678
[in] slotID = 0x1
Returned:  0 CKR_OK
....
==235099== LEAK SUMMARY:
==235099==    definitely lost: 1,312 bytes in 3 blocks
==235099==    indirectly lost: 0 bytes in 0 blocks
==235099==      possibly lost: 0 bytes in 0 blocks
==235099==    still reachable: 88,429 bytes in 24 blocks
==235099==         suppressed: 0 bytes in 0 blocks
Compiled with: OpenSSL 3.0.17-dev  (Library: OpenSSL 3.0.17-dev )
....
31: C_CloseAllSessions
P:242290; T:0x82766720 2025-04-25 12:54:04.423
[in] slotID = 0x1f670a08
Returned:  0 CKR_OK

32: C_CloseAllSessions
P:242290; T:0x82766720 2025-04-25 12:54:04.431
[in] slotID = 0x1
Returned:  0 CKR_OK
...
==242290== LEAK SUMMARY:
==242290==    definitely lost: 1,312 bytes in 3 blocks
==242290==    indirectly lost: 0 bytes in 0 blocks
==242290==      possibly lost: 0 bytes in 0 blocks
==242290==    still reachable: 86,612 bytes in 19 blocks
==242290==         suppressed: 0 bytes in 0 blocks
...

@mtrojnar
Copy link
Member

@olszomal Please make sure this fix for unsupported OpenSSL versions does not negatively impact the supported versions, which are far more important.

@hazefully
Copy link
Contributor Author

hazefully commented Apr 25, 2025

@olszomal btw I re-ran the ec-check-privkey.softhsm test with PKCS11_FORCE_CLEANUP=1 with the changes in #592 and I can see the same memory leak with OpenSSL 1.1.1 (probably because the slot list is lost after the first complete cleanup anyway)

@mtrojnar
Copy link
Member

So far I reviewed ae5e3d7. The commit description is fairly complicated, while the proposed change simply refactors pkcs11_rsa() to return a reference rather than a pointer, and adjusts the functions calling pkcs11_rsa() to free the returned reference. I like this idea, because it makes the code both simpler and safer. Well done. Could you also rename pkcs11_rsa() to pkcs11_get1_rsa() to reflect the functional change?

@olszomal
Copy link
Collaborator

I have verified that the proposed changes do not negatively affect the supported OpenSSL versions.
Currently, there are object leaks related to EC keys, and the change appears to suppress those.

There is still a memory leak in tests/ec-cert-store.softhsm, but it seems to be caused by a bug in the test itself, which I will fix in a separate PR.

@hazefully
Copy link
Contributor Author

hazefully commented Apr 30, 2025

So far I reviewed ae5e3d7. The commit description is fairly complicated, while the proposed change simply refactors pkcs11_rsa() to return a reference rather than a pointer, and adjusts the functions calling pkcs11_rsa() to free the returned reference. I like this idea, because it makes the code both simpler and safer. Well done. Could you also rename pkcs11_rsa() to pkcs11_get1_rsa() to reflect the functional change?

@mtrojnar Thank you! I renamed the function as you suggested, happy to attempt to reword the commit as well to make it more clear what the change is.

The other change in that commit is that a single RSA object will always hold a single reference to a PKCS#11 object, while before this commit each call to pkcs11_get_key for a PKCS#11 RSA private key would increment the references for that PKCS#11 object although no new OpenSSL objects that reference the PKCS#11 RSA private key are created (and that's why pkcs11_rsa needed to decrement the PKCS#11 object reference manually once).

@hazefully hazefully requested a review from mtrojnar April 30, 2025 14:24
Copy link
Member

@mtrojnar mtrojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Now git rebase to get rid of the Merge branch 'master' into improve-object-ref-counting commit, git rebase -i to squash the fixes/improvements into the commit with RSA fixes (you may use opportunity to improve commit description), and git push --force to push a clean pull request with only two commits.

@mtrojnar
Copy link
Member

mtrojnar commented May 7, 2025

@hazefully ping

hazefully added 2 commits May 7, 2025 18:00
When an ECDSA private key is retrieved via the pkcs11_get_key function
with OpenSSL versions < 3, a new reference to the underlying EC_KEY
object is created and assigned to a new EVP_PKEY which is returned from
the function. However, the reference counter of the
PKCS11_OBJECT_private object attached to the EC_KEY is also incremented,
which means that a single EC_KEY object ends up holding two references
to the attached PKCS11_OBJECT_private, which leads to the reference
count of the PKCS11_OBJECT_private object never reaching zero even when
all the EVP_PKEY objects that reference them are freed. This doesn't
happen with OpenSSL versions >= 3.0 as it creates a duplicate EC_KEY
object instead of assigning the same object.

This commit changes the logic in the pkcs11_get_key function for EC keys
to stop incrementing the PKCS11_OBJECT_private reference count when
assigning the EC_KEY object, to ensure that a single EC_KEY object
always holds a single reference to the attached PKCS11_OBJECT_private.
This leads to the correct clean up of the PKCS11_OBJECT_private objects
and PKCS#11 slot objects.
This commit refactors the pkcs11_rsa function, used in operations on RSA
EVP_PKEYs, to return a new reference to the underlying RSA object
instead of a pointer to an existing reference. This avoids having to
make assumptions about the reference count of the underlying RSA object
of an EVP_PKEY in pkcs11_rsa. To reflect the fact that the returned
reference must be freed after use, the function is renamed to
pkcs11_get1_rsa following the conventions of OpenSSL functions.

In addition, this commit ensures that the reference count of a
PKCS11_OBJECT_private object that is attached to an RSA object is
incremented only once for each RSA object. This assures that a
PKCS11_OBJECT_private object can be freed once all RSA objects that
reference it are freed.
@hazefully hazefully force-pushed the improve-object-ref-counting branch from bfe59c7 to cb7009a Compare May 7, 2025 17:06
@hazefully hazefully requested a review from mtrojnar May 7, 2025 17:07
@hazefully
Copy link
Contributor Author

@mtrojnar I did a rebase and attempted to reword the commit messages to make it more clear what they are trying to achieve, let me know if that looks good to you now

Copy link
Member

@mtrojnar mtrojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mtrojnar mtrojnar merged commit 7fb0dc0 into OpenSC:master May 7, 2025
10 checks passed
@mtrojnar
Copy link
Member

mtrojnar commented May 7, 2025

@hazefully Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants