Kafka cheat sheet. Table of Contents | by Amr Khalil | Jul 2022

0

Contents

  1. Important Notices

2. Installing Kafka

3. Start the Zookeeper and Kafka servers

4. Kafka topics

5. Kafka consumer and producer

6. Kafka consumer groups

7. Kafka GUI Tool

  • Apache Kafka is an open source framework for distributed data stream.
  • Kafka was developed under LinkedIn.
  • Kafka was originally intended to be a mail queueand its core is an abstraction of a distributed commit log.
  • Kafka has 4 APIs: Producer, Consumer, streamsand Relate.
  • In the cheat sheet you will notice a common use of Zoo keeper. It is already installed in Kafka.
  • Zoo keeper keeps track of the state of Kafka cluster nodes and it also keeps track of Kafka topics, partitions, etc.
  • If you are working on your local computer, you will only have to a broker and you can only do a replication, but with many partitions.
  • The founders of Kafka released a commercial version called Confluent.
  • Confluent is a data streaming platform based on Apache Kafka: a large-scale streaming platform, capable of not only publish and subscribebut also the storage and Processing of data in the very large scale stream.
  • There are three versions of Kafka. In this cheat sheet we are working with version 2.0.0, as it is much simpler for beginners and easy to configure. However, with the exception of installations, the cheat sheet commands are the same for all versions.
  • Before installing Kafka, make sure that Java is already installed with this command.
java --version

If you have an error, perform these commands on Mac.

# Make sure you have an updated brew
> git -C /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core fetch --unshallow
> brew update# Install Java8 because it's compatible with Kafka 2.0.0> brew install --cask adoptopenjdk8

Download Kafka 2.0.0

https://archive.apache.org/dist/kafka/2.0.0/kafka_2.11-2.0.0.tgz

Where from here:

Install Kafka on Mac

  • You can perform these steps manually or with your terminal.
  • The .tgz file name can differentiate.
# Move the Kafka file to your user directory
> mv Downloads/kafka_2.12-2.0.0.tgz .
> tar -xvf kafka_2.12-2.0.0.tgz
  • For simplicity: open your user directory and change the name of this folder “kafka_2.12–2.0.0” to “kafka”.
# Read the kafka folder
> ls kafka
  • You should see these folders inside the kafka folder

Check if Kafka works

# Start your terminal again
> kafka/bin/kafka-topics.sh
  • If you see the documentation in your terminal, it means KAFKA WORKS.
  • If not, check the installation steps again.

Add Kafka to Mac Terminal

> nano ~/.bash_profile
  • Add this line to your paths
# Kafkaexport PATH=”$PATH:$HOME/kafka/bin”
> kafka/bin/kafka-topics.sh
  • This should work in all places on your terminal
  • Open two new terminals
  • The first to run the Zookeeper server
# Start the Zookeeper server
> zookeeper-server-start.sh config/zookeeper.properties
  • The second to run the Kafka server
# Start the Kafka server
> kafka-server-start.sh config/server.properties
  • Let them run and open a new terminal for the following commands
  • Data in Kafka is organized and stored in the subjects. A subject is like a folder in a file system, and the data is the files in that folder.
  • The subjects are partitionedwhich means that a subject is spread over a number of “buckets” located on different Kafka brokers.
  • Each subject can be replicato make your data fault tolerant and highly available.
This sample topic has four partitions P1 through P4., source: https://kafka.apache.org/intro
  • Create a topic called “first_topic” with 3 partitions and 1 replication factor.
  • NOTE: If you are working on your local computer, you can only create one replication factor, otherwise you will get an error.
# Create a topic called “first_topic”
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic first_topic --create --partitions 3 --replication-factor 1
# Describe the first topic
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic first_topic --describe
  • Create a topic called “second_topic” with 6 partitions and 1 replication factor.
# Create a topic called “second_topic”
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic second_topic --create --partitions 6--replication-factor 1
# Describe the second topic
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic second_topic --describe

List all topics

# List topics
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --list

Delete topic

# Delete a topic
> kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic second_topic --delete
  • Consumer: Read one or more subjects and process the data stream produced for them.
  • Producer: Write an event stream to one or more Kafka topics.
Kafka message queue
  • Create a Kafka consumer to read the data stream.
# Create a Kafka Consumer
> kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic first_topic
  • Nothing will happen, because we don’t have a producer yet.
  • Open a new terminal and create a Kafka Producer, in order to write the data streams.
# Create a Kafka Producer
> kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic first_topic
  • Write the data streams, for example I will write this event in the producer.
  • Open the consumer and you will find these messages as you wrote them
  • The data is stored in a Kafka topic “first_topic”.

Retrieve data stored in fields

  • Read data stored in a Kafka topic from the beginning
# Retrieve the data stored in topics
> kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic first_topic --from-beginning
Share.

Comments are closed.