Data durability in Apache Kafka

Category : Apache Kafka | Sub Category : Apache Kafka | By Prasad Bonam Last updated: 2023-08-05 11:04:58 Viewed : 308


Data durability in Apache Kafka:

Data durability in Apache Kafka refers to the ability of Kafka to ensure that once data is written to a topic, it will be safely stored and available for consumption, even in the face of failures or crashes. Kafka achieves data durability through various mechanisms, such as replication and configurable retention policies.

Data Replication: Kafka uses data replication to achieve fault tolerance and data durability. When you create a topic, you can specify the replication factor, which determines how many copies (replicas) of each partition will be maintained across brokers. Each replica is hosted on a different broker, providing redundancy. If a broker fails, one of the replicas is promoted to the leader, and the other replicas ensure that data remains available.

Example: Suppose you create a topic "my_topic" with a replication factor of 3 and three brokers (B1, B2, and B3). Kafka will distribute the three partitions of "my_topic" across the brokers as follows:

  • Partition 0: Replica in B1 (Leader), Replica in B2, Replica in B3
  • Partition 1: Replica in B2 (Leader), Replica in B1, Replica in B3
  • Partition 2: Replica in B3 (Leader), Replica in B2, Replica in B1

If any broker fails, the leader for the affected partition may move to another replica on a different broker, ensuring data availability.

Retention Policies: Kafka allows you to configure retention policies for topics. The retention policy determines how long Kafka will retain messages in a topic. By default, Kafka retains messages for seven days. This means that even if all replicas of a partition are temporarily unavailable (e.g., due to broker failures), the messages will still be available for consumption within the retention period.

Example: To set a retention policy of 30 days for a topic "my_topic," you can use the following command:

Using the command-line tool on Unix/Linux/Mac:

bash
bin/kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic my_topic --config retention.ms=2592000000

Using the command-line tool on Windows:

batch
binwindowskafka-topics.bat --bootstrap-server localhost:9092 --alter --topic my_topic --config retention.ms=2592000000

In this example, the retention.ms property is set to 30 days (30 * 24 * 60 * 60 * 1000 milliseconds).

With data replication and retention policies, Kafka ensures data durability, making it highly reliable and suitable for critical data pipelines and event streaming applications. Even in scenarios of broker failures or temporary unavailability, Kafka retains and serves the data, ensuring that messages are not lost and can be reliably consumed by consumers.


Search
Sub-Categories
Related Articles

Leave a Comment: