So I came across a situation and it took me a while to figure it out. So I'm putting this together as it might help others as well.

Let's say you have Bitnami's packaged Kafka cluster running on Kubernetes on StatefulSets, (so not managed by an operator like Strimzi). And the pods start restarting...

Looking at the logs you might see stuff like this:

-9ee616bdcf73, partition=0, highWatermark=0, lastStableOffset=0, logStartOffset=0, logEndOffset=0) with 1 segments in 2ms (32564/32564 loaded in /bitnami/kafka/data) (kafka.log.LogManager)
[2022-07-06 20:46:05,895] INFO Loaded 32564 logs in 161618ms. (kafka.log.LogManager)
[2022-07-06 20:46:05,896] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2022-07-06 20:46:05,896] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2022-07-06 20:46:05,908] INFO Starting the log cleaner (kafka.log.LogCleaner)
[2022-07-06 20:46:05,999] INFO [kafka-log-cleaner-thread-0]: Starting (kafka.log.LogCleaner)
[2022-07-06 20:46:06,296] INFO [BrokerToControllerChannelManager broker=0 name=forwarding]: Starting (kafka.server.BrokerToControllerRequestThread)
[2022-07-06 20:46:06,419] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2022-07-06 20:46:06,428] INFO Awaiting socket connections on 0.0.0.0:9093. (kafka.network.Acceptor)
[2022-07-06 20:46:06,463] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Created data-plane acceptor and processors for endpoint : ListenerName(INTERNAL) (kafka.network.SocketServer)
[2022-07-06 20:46:06,464] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2022-07-06 20:46:06,464] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2022-07-06 20:46:06,473] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Created data-plane acceptor and processors for endpoint : ListenerName(CLIENT) (kafka.network.SocketServer)
[2022-07-06 20:46:06,481] INFO [BrokerToControllerChannelManager broker=0 name=alterIsr]: Starting (kafka.server.BrokerToControllerRequestThread)
[2022-07-06 20:46:06,508] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2022-07-06 20:46:06,509] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2022-07-06 20:46:06,511] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2022-07-06 20:46:06,512] INFO [ExpirationReaper-0-ElectLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2022-07-06 20:46:06,529] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2022-07-06 20:46:06,611] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient)
[2022-07-06 20:46:06,642] INFO Stat of the created znode at /brokers/ids/0 is: 511242113,511242113,1657140366624,1657140366624,1,0,0,72062637399277581,364,0,511242113
 (kafka.zk.KafkaZkClient)
[2022-07-06 20:46:06,643] INFO Registered broker 0 at path /brokers/ids/0 with addresses: INTERNAL://kafka-0.kafka-headless.default.svc.cluster.local:9093,CLIENT://kafka-0.kafka-headless.default.svc.cluster.local:9092, czxid (broker epoch): 511242113 (kafka.zk.KafkaZkClient)
[2022-07-06 20:46:06,758] INFO [ControllerEventThread controllerId=0] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2022-07-06 20:46:06,765] INFO [ExpirationReaper-0-topic]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
cannot allocate memory for thread-local data: ABORT

or

[395.297s][warning][os,thread] Attempt to protect stack guard pages failed (0x00007efd53bf8000-0x00007efd53bfc000).
[395.297s][warning][os,thread] Attempt to deallocate stack guard pages failed.
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 16384 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007efd53dfa000, 16384, 0) failed; error='Not enough space' (errno=12)

Increasing the memory (Xmx, statefulset/pod's resource limits) most probably won't work... In my case, the problem was described in this excellent blog post. Long story short:

"Kafka allocates a memory map for each log file, in each partition, in each topic. When you keep kafka messages indefinitely and also have a large number of partitions it means that the process hits the limit of the number of memory maps it can allocate"

But how do you fix this on Kubernetes?

First... let's check what's the current value. Run this on a pod running on that same node as the kafka pod getting restarted:

$ kubectl exec -it <some_pod_running_on_the_same_node> -- sysctl -n vm.max_map_count
65530

Most probably you'll get that value, which is default. To change it, you'll have to add an init container to the statefulset:

$ kubectl describe statefulset kafka
Name:               kafka
(some output is omitted)
Pod Template:
  Labels:           app.kubernetes.io/component=kafka
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kafka
                    helm.sh/chart=kafka-16.1.3
  Service Account:  kafka
  Init Containers:
   sysctl:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      -c
      sysctl -w vm.max_map_count=262144

Now you should see a difference

$ kubectl exec -it <same_pod_as_before> -- sysctl -n vm.max_map_count
262144

... and... hopefully your pods are stable and serving traffic.


Comments

comments powered by Disqus