How Kafka automatically handles the distribution of data and management of replicas.

Category : Apache Kafka | Sub Category : Apache Kafka | By Prasad Bonam Last updated: 2023-08-05 09:30:54 Viewed : 328


How Kafka automatically handles the distribution of data and management of replicas.

Kafka replication is a key feature that provides data redundancy and fault tolerance in Apache Kafka. Replication ensures that data is replicated across multiple Kafka brokers, allowing for high availability and data durability. Each topic partition can have multiple replicas, and Kafka automatically handles the distribution of data and management of replicas.

Lets go through an example to understand Kafka replication:

  1. Setting Up Kafka: For this example, assume you have set up a Kafka cluster with three brokers: Broker-1, Broker-2, and Broker-3.

  2. Topic Creation: Create a Kafka topic named "my_topic" with three partitions and a replication factor of 2. This means each partition will have two replicas across different brokers.

    Using the command-line tool on Unix/Linux/Mac:

    bash
    bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic my_topic --partitions 3 --replication-factor 2

    Using the command-line tool on Windows:

    batch
    binwindowskafka-topics.bat --bootstrap-server localhost:9092 --create --topic my_topic --partitions 3 --replication-factor 2
  3. Producing Messages: Start a producer to send messages to the "my_topic" topic.

    Using the command-line tool on Unix/Linux/Mac:

    bash
    bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic

    Using the command-line tool on Windows:

    batch
    binwindowskafka-console-producer.bat --broker-list localhost:9092 --topic my_topic

    Now, you can enter messages in the console, and they will be published to the "my_topic" topic.

  4. Consuming Messages: Start two consumers to read messages from the "my_topic" topic.

    Using the command-line tool on Unix/Linux/Mac (Consumer 1):

    bash
    bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my_topic --group my_group

    Using the command-line tool on Unix/Linux/Mac (Consumer 2):

    bash
    bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my_topic --group my_group

    The --group option is used to specify a consumer group. In this example, both consumers belong to the same group named "my_group."

  5. Replication in Action: When you produce messages to the "my_topic" topic, Kafka will distribute the data across the three partitions according to the partitioning strategy (e.g., round-robin, custom partitioner). Each partition will have two replicas across different brokers, as specified in the topic creation.

    For example, if Broker-1, Broker-2, and Broker-3 are the three brokers in the cluster:

    • Partition 0: Replica in Broker-1, Replica in Broker-2
    • Partition 1: Replica in Broker-2, Replica in Broker-3
    • Partition 2: Replica in Broker-3, Replica in Broker-1

    If one broker fails (e.g., Broker-1), Kafka automatically promotes one of the replicas to be the new leader, ensuring that the topic remains available for read and write operations. This provides fault tolerance and data durability.

Kafka replication allows for data redundancy and high availability. Even if one or more brokers in the cluster go down, the replicas ensure that the data remains accessible, and the Kafka cluster continues to function without data loss.


Search
Sub-Categories
Related Articles

Leave a Comment: