Skip to content

[Bug] Messages seems broken when got SSL exception in the producerΒ #21933

@pqab

Description

@pqab

Search before asking

  • I searched in the issues and found nothing similar.

Version

2.10.5

Minimal reproduce step

  1. Create topic
bin/pulsar-admin tenants create tenant1
bin/pulsar-admin namespaces create tenant1/namespace1
bin/pulsar-admin namespaces set-persistence --bookkeeper-ack-quorum 2 --bookkeeper-ensemble 3 --bookkeeper-write-quorum 3 --ml-mark-delete-max-rate 0 tenant1/namespace1
bin/pulsar-admin namespaces set-retention tenant1/namespace1 --size -1 --time 3d
bin/pulsar-admin namespaces set-message-ttl tenant1/namespace1 --messageTTL 604800
bin/pulsar-admin topics create-partitioned-topic tenant1/namespace1/topic1 -p 3
  1. Produce large payload & batch from the admin tool with tls
bin/pulsar-perf produce persistent://tenant1/namespace1/topic1 -mk autoIncrement -bb 5242880 -r 5000 -s 5242 -bm 1000 -threads 30 --auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationTls --auth-params '{"tlsCertFile":"conf/user.cer","tlsKeyFile":"conf/user.key.pem"}'
  1. Stop until it produced around 1 million messages

  2. Wait until all the messages goes to BookKeeper backlog

  3. Start consumer to consume all the messages with tls

bin/pulsar-perf  consume persistent://tenant1/namespace1/topic1 --auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationTls --auth-params '{"tlsCertFile":"conf/user.cer","tlsKeyFile":"conf/user.key.pem"}' -sp Earliest -ss sub1

What did you expect to see?

Able to consume all produced messages properly from the consumer

What did you see instead?

Consumer stopped receiving msg in the middle, and could see some error from the broker logs like

2024-01-19T14:05:39,899+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.proto.checksum.DigestManager - Mac mismatch for ledger-id: 852, entry-id: 35932
2024-01-19T14:05:39,902+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.proto.checksum.DigestManager - Mac mismatch for ledger-id: 852, entry-id: 35932
2024-01-19T14:05:39,916+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.proto.checksum.DigestManager - Mac mismatch for ledger-id: 852, entry-id: 35932
2024-01-19T14:05:39,916+0000 [BookKeeperClientWorker-OrderedExecutor-4-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L852 E35899-E35998, Sent to [100.87.157.209:3181, 100.111.147.236:3181, 100.96.184.253:3181], Heard from [100.87.157.209:3181, 100.111.147.236:3181, 100.96.184.253:3181] : bitset = {0, 1, 2}, Error = 'Entry digest does not match'. First unread entry is (35973, rc = 0)
2024-01-19T14:05:39,916+0000 [broker-topic-workers-OrderedExecutor-15-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer - [persistent://tenant1/namespace1/topic1-0 / sub1-Consumer{subscription=PersistentSubscription{topic=persistent://tenant1/namespace1/topic1-0, name=sub1}, consumerId=0, consumerName=383fd, address=/100.96.184.253:50090}] Error reading entries at 852:35899 : Entry digest does not match - Retrying to read in 15.0 seconds

Anything else?

Seems only happening when there is SSL exception in the middle of the produce like

2024-01-19T13:39:13,450+0000 [pulsar-client-io-12-1] WARN  org.apache.pulsar.client.impl.ClientCnx - Got exception io.netty.handler.codec.DecoderException: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.newSSLExceptionForError(ReferenceCountedOpenSslEngine.java:1377)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1089)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1399)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1325)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1426)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1469)
	at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:223)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1353)
	at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1257)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1297)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
	... 15 more

or

2024-01-19T14:01:02,532+0000 [pulsar-client-io-6-1] WARN  org.apache.pulsar.client.impl.ClientCnx - Got exception io.netty.handler.codec.DecoderException: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.newSSLExceptionForError(ReferenceCountedOpenSslEngine.java:1377)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1089)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1399)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1325)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1426)
	at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1469)
	at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:223)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1353)
	at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1257)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1297)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
	... 15 more

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions