Description
Description
Greetings.
We're trying to upgrade to package v2.5.0 from v2.4.0 but found a hard time.
- Confluent community v6.1 brokers and consumer with package 2.4.0 works
- Confluent community v6.1 and consumer 2.5.0 works
- Confluent community v7.5.3 and consumer 2.5.0 not OK
- Confluent community v7.5.3 and consumer 2.4.0 works
Tests were made just changing confluent-kafka version at the consumer. All other things remained the same.
Producers work fine with both versions of confluent-kafka and brokers (four sets of testing). The problem arises with the consumer.
After turning on debug logging at the consumer one can find
20240816 12:11:02.356 FETCHERR [canario-uno#consumer-1] [thrd:sasl_ssl://dfcdsrvv7526.srv.cd.metal:9092/bootstrap]: sasl_ssl://dfcdsrvv7526.srv.cd.metal:9092/3: 12706.files.incremental [4]: Fetch failed at offset 11548 (leader epoch -1): UNKNOWN_TOPIC_ID
On the broker side
java.lang.NullPointerException
[KafkaApi-2] Unexpected error handling request RequestHeader(apiKey=FETCH, apiVersion=15, clientId=canario-uno, correlationId=52, headerVersion=2) -- FetchRequestData(clusterId=null, replicaId=-1, replicaState=ReplicaState(replicaId=-1, replicaEpoch=-1), maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=1, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='', topicId=AAAAAAAAAAAAAAAAAAAAAA, partitions=[FetchPartition(partition=1, currentLeaderEpoch=20, fetchOffset=17059, lastFetchedEpoch=-1, logStartOffset=-1, partitionMaxBytes=1048576), FetchPartition(partition=5, currentLeaderEpoch=23, fetchOffset=13667, lastFetchedEpoch=-1, logStartOffset=-1, partitionMaxBytes=1048576), FetchPartition(partition=0, currentLeaderEpoch=21, fetchOffset=27572, lastFetchedEpoch=-1, logStartOffset=-1, partitionMaxBytes=1048576)])], forgottenTopicsData=[], rackId='') with context RequestContext(header=RequestHeader(apiKey=FETCH, apiVersion=15, clientId=canario-uno, correlationId=52, headerVersion=2), connectionId='10.139.66.190:9092-172.24.45.139:50232-24318', clientAddress=/172.24.45.139, principal=User:12706.usr2, listenerName=ListenerName(INTERNAL), securityProtocol=SASL_SSL, clientInformation=ClientInformation(softwareName=confluent-kafka-python, softwareVersion=2.5.0-rdkafka-2.5.0), fromPrivilegedListener=false, principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@1f09d5d6])
The topic is named 12706.files.incremental
and has six partitions. At the consumer logs (debug_240.txt and debug_250.txt) one can see the result for get_watermark_offsets
as the first entry. Consumer time is in UTC-3 and broker time in UTC.
Can you please further investigate?
How to reproduce
Broker based on Confluent Community v7.5.3
Consumer based on confluent-kafka v2.5.0
You need a simple producer and a simple consumer just calling poll()
. The consumer 2.5.0 never fetches anything. The consumer 2.4.0 on the other hand fetches messages as expected.
Checklist
Please provide the following information:
- confluent-kafka-python and librdkafka version (
confluent_kafka.version()
andconfluent_kafka.libversion()
):
confluent_kafka.version()
('2.5.0', 33882112)
confluent_kafka.libversion()
('2.5.0', 33882367) - Apache Kafka broker version: Confluent Community 7.5.3
- Client configuration:
{'security.protocol': 'SASL_SSL', 'sasl.mechanisms': 'SCRAM-SHA-512', 'sasl.username': os.environ['KAFKA_SASL_USR_EXTRA'], 'sasl.password': os.environ['KAFKA_SASL_PWD_EXTRA'], 'ssl.ca.location': PEM, 'metadata.broker.list': os.environ['KAFKA_BROKERS'], 'group.id': os.environ['TOPIC_KAFKA_ID_GROUP_EXTRA'], 'client.id': 'canario-uno', 'enable.auto.commit': 'false', 'enable.auto.offset.store': 'false', 'log_level': 0, 'debug': 'consumer,cgrp,topic,fetch'}
- Operating system: linux x86_64
- Provide client logs (with
'debug': '..'
as necessary)
debug_240.txt
debug_250.txt - Provide broker log excerpts
broker_log_250.txt - Critical issue
Causes me worries.