Académique Documents
Professionnel Documents
Culture Documents
Hardware
Memory: Kafka heavily relies on filesystem for caching and storing messages. Uses headspace
very carefully and doesnt require setting heap size more than 5 GB. File system cache up to
3GB per machine. 64 GB machines are safe bets, less than 32 is useless.
CPUs: Enabling SSL hurts CPU performance. Overall, Kafka doesnt require fancy CPUs. More
CPUs are better than few but powerful CPUs. 24+ cores is a safe bet.
Disks: Disks better be standalone and not performing tasks for the rest of the OS. For low
latency, one can RAID the external drives together to a single volume or format each drive to its
own directory. Whatever the choice is, data must be well balanced among partitions. There are
several trade-offs to discuss, it would be better if a sysadmin takes a look at pros and cons
making the final decision on how these disks will be mounted.
Network: What we have in place 1-10 GbE is sufficient. Kafka cluster assumes all nodes are
equal and latency introduced by the network is low.
JVM
Kafka recommends latest version of JDK with G1 collector. There are tuning specs available
online depending on traffic we are expecting. Kunal might want to take a good look at it as he is
the most qualified java person around.
Large messages can cause longer garbage collection (GC) pauses as brokers allocate large
chunks. Monitor the GC log and the server log. If long GC pauses cause Kafka to abandon the
ZooKeeper session, you may need to configure longer timeout values for
zookeeper.session.timeout.ms.
Commonly Tweaked Configuration Parameters
Kafka ships with very good defaults, especially when it comes to performance-related settings
and options. When in doubt, just leave the settings alone.
With that said, there are some logistical configurations that should be changed for production.
These changes are necessary either to make your life easier, or because there is no way to set
a good default (because it depends on your cluster layout).
zookeeper.connect
The list of zookeeper hosts that the broker registers at. It is recommended that you configure
this to all the hosts in your zookeeper cluster
Type: string
Importance: high
broker.id
Integer id that identifies a broker. No two brokers in the same Kafka cluster can have the same
id.
Type: int
Importance: high
log.dirs
The directories in which the Kafka log data is located.
Type: string
Default: /tmp/kafka-logs
Importance: high
listeners
Comma-separated list of URIs (including protocol) that the broker will listen on. Specify
hostname as 0.0.0.0 to bind to all interfaces or leave it empty to bind to the default interface. An
example is PLAINTEXT://myhost:9092.
Type: string
Default: PLAINTEXT://host.name:port where the default for host.name is an
empty string and the default for port is 9092
Importance: high
advertised.listeners
Listeners to publish to ZooKeeper for clients to use. In IaaS environments, this may need to be
different from the interface to which the broker binds. If this is not set, the value for listeners will
be used.
Type: string
Default: listeners
Importance: high
num.partitions
The default number of log partitions for auto-created topics. We recommend increasing this as it
is better to over partition a topic. Over partitioning a topic leads to better data balancing as well
as aids consumer parallelism. For keyed data, in particular, you want to avoid changing the
number of partitions in a topic.
Type: int
Default: 1
Importance: medium
Replication configs
default.replication.factor
The default replication factor that applies to auto-created topics. We recommend setting this to
at least 2.
Type: int
Default: 1
Importance: medium
min.insync.replicas
The minimum number of replicas in ISR needed to commit a produce request with
required.acks=-1 (or all).
Type: int
Default: 1
Importance: medium
unclean.leader.election.enable
Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort,
even though doing so may result in data loss.
Type: int
Default: 1
Importance: medium
If shared storage (such as NAS, HDFS, or S3) is available, consider placing large
files on the shared storage and using Kafka to send a message with the file location. In many
cases, this can be much faster than using Kafka to send the large file itself.
Split large messages into 1 KB segments with the producing client, using
partition keys to ensure that all segments are sent to the same Kafka partition in the correct
order. The consuming client can then reconstruct the original large message.
If you still need to send large messages with Kafka, modify the following configuration
parameters to match your requirements:
Broker Configuration
message.max.bytes
Maximum message size the broker will accept. Must be smaller than the consumer
fetch.message.max.bytes, or the consumer cannot consume the message.
Default value: 1000000 (1 MB)
log.segment.bytes
Size of a Kafka data file. Must be larger than any single message.
Default value: 1073741824 (1 GiB)
replica.fetch.max.bytes
Maximum message size a broker can replicate. Must be larger than message.max.bytes, or a
broker can accept messages it cannot replicate, potentially resulting in data loss.
Default value: 1048576 (1 MiB)
Consumer Configuration
fetch.message.max.bytes
Maximum message size a consumer can read. Must be at least as large as
message.max.bytes.
Default value: 1048576 (1 MiB)
Batch Size
batch.size measures batch size in total bytes instead of the number of messages. It controls
how many bytes of data to collect before sending messages to the Kafka broker. Set this as
high as possible, without exceeding available memory. The default value is 16384.
If you increase the size of your buffer, it might never get full. The Producer sends the information
eventually, based on other triggers, such as linger time in milliseconds. Although you can impair
memory usage by setting the buffer batch size too high, this does not impact latency.
If your producer is sending all the time, you are probably getting the best throughput possible. If
the producer is often idle, you might not be writing enough data to warrant the current allocation
of resources.
Linger Time
linger.ms sets the maximum time to buffer data in asynchronous mode. For example, a setting
of 100 batches 100ms of messages to send at once. This improves throughput, but the buffering
adds message delivery latency.
By default, the producer does not wait. It sends the buffer any time data is available.
Instead of sending immediately, you can set linger.ms to 5 and send more messages in one
batch. This would reduce the number of requests sent, but would add up to 5 milliseconds of
latency to records sent, even if the load on the system does not warrant the delay.
The farther away the broker is from the producer, the more overhead required to send
messages. Increase linger.ms for higher latency and higher throughput in your producer.