Apache Kafka — IN & OUTS

VISHAL D BHAT
6 min readAug 2, 2021

--

Topics, Partitions and Offsets :

Topics : Topic is a particular stream of data.

Topics are analogous to the tables in a database. We can create as may topics as we want. A topic is identified by its name.

Topics are split into partitions. Each Partition is ordered, each message within the partition gets incremental id, called an offset.

Brokers : A Kafka cluster is composed of the multiple brokers. Each broker is an independent server within itself. Each broker contains a unique Id. Each broker contains certain topic partitions ( some kind of data, but not all the data ). In Kafka, if you are connected to any one broker { called the Bootstrap Broker }, you are connected to the entire cluster. A good number to start the number of brokers in a cluster is 3.

Topic Replication Factor :

Since Kafka works in a distributed way, there is a need for the backup tech — Replication. This way, when a broker is down, another broker can serve the needs. Replication factors are mentioned within a topic of a broker { usually 2–3 }. Golden Rule : 1) At any time, only one broker (server) can be a leader for a given partition. 2) Only that leader can serve & receive data for the partition. 3) The other brokers are involved in only synchronization of data. {passive replicas}

In a JIST of replicas : Each partition -> 1 leader -> Multiple ISR {In-Sync Replicas}

Question : What decides the topics and Replicas ? Ans — Zookeeper.

Producers :

Producers write data to topics. Producers automatically knows to which broker and partition to write to. If there is a broker failure, then the producers automatically recover, and there is no explicit programming required to implement this feature. The data is sent over to topics using a Round Robin fashion {This is how load balancing is done in Kafka }.

Acknowledgement Mechanism :

  1. acks = 0 : Producers wont wait for an acknowledgment {possible data loss}.
  2. acks = 1 : Producers will wait for an acknowledgment { limited data loss}.

Message Keys : Producers send a key with the message (string, number …). Keys can be useful for message ordering. ( Underlying DS : Key Hashing)

  1. If key = null, data is sent over using Round Robin Technique.
  2. If key = sent, all the message for that key will go to that particular partition.

Consumers :

Consumers read data from a topic ( by its name).Consumers know which broker to read from. Again in case of broker failures, Consumers know how to recover automatically. Consumers read data in groups. Each consumer within a group reads from exclusive partitions. If there are several consumers than the partitions, then consumers will be inactive.

Kafka Broker Discovery :

Every Kafka broker is also termed “Bootstrap Server”.

This means that you only need to connect to one broker, and you will be connected to the entire cluster. Each broker knows about all brokers, topics and partitions { metadata }.It does not necessarily need to hold that data, but will know about it.

Zookeeper :

Zookeeper manages brokers (keeps a record/ list of them)

Zookeeper helps in performing the Leader Elections for the partitions.

Zookeeper sends notifications to the Kafka in case of any changes.

Kafka cannot work without Zookeeper.

Zookeeper by design operates with an odd number of servers (3,5,7,9…).

Zookeeper has one leader (handling the writes) , the rest of the servers are followers ( handles the reads).

Kafka maintains its won metadata in Zookeeper.

Summary:

Starting Up & Running the Kafka on Local Machine:

Windows Installation:

  1. Install JDK — 8.0 version.
  2. Download Apache Kafka zipped file from the official Kafka.org website.
  3. Extract the file in the root directory.
  4. Add the Folder path to the environment variable settings to access the commands from any folder.

5. Create a new folder within the Kafka folder named Data

6. Create two subfolders within Data — Kafka and Zookeeper

7. Open the zookeeper.properties folder and edit the dataDir to C:/kafka_2.12–2.8.0/data/zookeeper {note to insert only the forward slashes in the path name structure :) }

7. To check if the Zookeeper is alive and running : Enter the command zookeeper-server-start.bat config\zookeeper.properties

If everything works fine, Zookeeper must be up & running over the port 2181 :)

8. Open the server.properties file and edit the logs.dir to C:/kafka_2.12–2.8.0/data/kafka

9. Run the similar command as above to see of Kafka is up and running : kafka-server-start.bat config\server.properties (NOTE : Run these commands only in the root directory where the Kafka is installed)

If everything works fine, Kafka must be up and running.

If all the above steps are working fine, great job! You can start over with CLI of Kafka!

Kafka Command Line Interface :

Before using CLI, make sure that the Kafka and the Zookeeper are up & Running

  1. kafka-topics

Entire Documentation of the topic creation, edits and deletion.

2. kafka-topics — zookeeper 127.0.0.1:2181 — topic first_topic_Vishal — create — partitions 3 — replication-factor 1

Creates a Kafka Topic

List the existing Topics : kafka-topics — zookeeper 127.0.0.1:2181 — list

Description of the topic created : kafka-topics — zookeeper 127.0.0.1:2181 — topic first_topic_Vishal -describe

kafka-console-producer — broker-list 127.0.0.1:9092 — topic first_topic_Vishal

What happens when a topic is actually not been created and a producer writes the data to the topic ?

Say for example : kafka-console-producer — broker-list 127.0.0.1:9092 — topic new_topic1, where new_topic1 is not existing.

In this case, a warning pops up which states that the Leader is not been recognized. Remember, its not an error, just a warning!

Zookeeper internally creates the new_topic. The new_topic shall be by default having 1 replication factor, 1 partition (which is not what we want!)

Kafka Consumer CLI :

  1. kafka-console-consumer — bootsrap-server 127.0.0.1:9092 — topic first_topic.
  2. Nothing gets displayed! Even though you have added tons of data to the first_topic.
  3. The reason is that, Kafka consumer reads only the messages that are written as of ow — ONLY THE FRESH MESSAGES!

Reading the Data from the begining within a topic :

kafka-console-consumer — bootstrap-server 127.0.0.1:9092 — topic first_topic — from-beginning

FUTURE UPDATES : Working of Consumers with groups with >1 partitions.

--

--