Apache Kafka Tutorial Interview Questions
25) What are Broker Configs?
A) The essential configurations are the following:
Kafka Interview Questions # 26) What is Replication in Kafka?
A) Kafka replicates the log for each topics partitions across a configurable number of servers . This allows automatic failover to these replicas when a server in the cluster fails so messages remain available in the presence of failures.
Kafka Interview Questions # 27) What is Log Compaction?
A) Log compaction is handled by the log cleaner, a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log. Each compactor thread works as follows:
It chooses the log that has the highest ratio of log head to log tail
It creates a succinct summary of the last offset for each key in the head of the log
It recopies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment .
The summary of the log head is essentially just a space-compact hash table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner iteration can clean around 366GB of log head .
Kafka Interview Questions # 28) How can you configur the Log Cleaner?
This can be done either at topic creation time or using the alter topic command.
Kafka Interview Questions # 29) What are Quotas?
How Do You Setup A Multi
To setup a multi-broker Kafka cluster you have to do the following 3 tasks.
1. Create a server.properties file for each of the servers you want in the cluster. At a minimum, you have to change the server id, port and log location for each of the property files.
2. Start the zookeeper server.
3. Start the servers in the cluster one by one using the property file of each server.
/** create separate property file for each server *//** config/server-1.properties*/
Who Should Go For This Kafka Online Course
This course is designed for professionals who want to learn Kafka techniques and wish to apply it on Big Data. It is highly recommended for:
- Developers, who want to gain acceleration in their career as a “Kafka Big Data Developer”
- Testing Professionals, who are currently involved in Queuing and Messaging Systems
- Big Data Architects, who like to include Kafka in their ecosystem
- Project Managers, who are working on projects related to Messaging Systems
- Admins, who want to gain acceleration in their careers as a “Apache Kafka Administrator
Read Also: How To Record A Phone Interview
Tell Me About Some Of The Use Cases Where Kafka Is Not Suitable
Following are some of the use cases where Kafka is not suitable :
- Kafka is designed to manage large amounts of data. Traditional messaging systems would be more appropriate if only a small number of messages need to be processed every day.
- Although Kafka includes a streaming API, it is insufficient for executing data transformations. For ETL jobs, Kafka should be avoided.
- There are superior options, such as RabbitMQ, for scenarios when a simple task queue is required.
- If long-term storage is necessary, Kafka is not a good choice. It simply allows you to save data for a specific retention period and no longer.
What Is Replication Factor
Replication factor specifies the number of copies that each partition of a topic will have.
A replication factor of 2 will create 2 copies of each partition. A replication factor of 3 will create 3 copies of each partition. One of the replicas will be the leader, the remaining will be the followers.
You define the replication factor when you create the topic.
Also Check: How To Master A Interview
How Can A Cluster Be Expanded In Kafka
In order to add a server to a Kafka cluster, it just has to be assigned a unique broker id, and Kafka has to be started on this new server. However, a new server will not automatically be assigned any of the data partitions until a new topic is created. Hence, when a new machine is added to the cluster, it becomes necessary to migrate some existing data to these machines. The partition reassignment tool can be used to move some partitions to the new broker. Kafka will add the new server as a follower of the partition that it is migrating to and allow it to completely replicate the data on that particular partition. When this data is fully replicated, the new server can join the ISR one of the existing replicas will delete the data that it has with respect to that particular partition.
What Do You Understand By Mapping Data Flows
Such a technical question helps the interviewer to assess your domain knowledge. You can use this question to show your familiarity with the working concepts of Databricks. In your response, briefly explain what mapping data flow does and how it helps with the workflow.
Sample answer:’Microsoft offers mapping data flows which do not require coding for a simpler data integration experience, as opposed to data factory pipelines. It is a graphical method for designing data transformation pipelines. It helps transform the data flow into Azure data factory activities and execute as part of ADF pipelines.’
You May Like: How To Get Better At Coding Interviews
What Is The Role Of The Zookeeper In Kafka
Apache Kafka is a distributed system. Within the Kafka environment, the ZooKeeper stores offset-related information, which is used to consume a specific topic and by a specific consumer group. The main role of Zookeeper is to build coordination between different nodes in a cluster, but it can also be used to recover from previously committed offset if any node fails as it works as periodically commit offset.
Assume Your Brokers Are Hosted On Aws Ec2 If You’re A Producer Or Consumer Outside Of The Kafka Cluster Network You Will Only Be Capable Of Reaching The Brokers Over Their Public Dns Not Their Private Dns Now Assume Your Client Is Outside Your Kafka Cluster’s Network And You Can Only Reach The Brokers Via Their Public Dns The Private Dns Of The Brokers Hosting The Leader Partitions Not The Public Dns Will Be Returned By The Broker Unfortunately Since Your Client Is Not Present On Your Kafka Cluster’s Network They Will Be Unable To Resolve The Private Dns Resulting In The Leader Not Available Error How Will You Resolve This Network Error
When you first start using Kafka brokers, you might have many listeners. Listeners are just a combination of hostname or IP, port, and protocol.
Each Kafka broker’s server.properties file contains the properties listed below. The important property that will enable you to resolve this network error is advertised.listeners.
listeners â a list of comma-separated hostnames and ports that Kafka brokers listen to.
advertised.listeners â a list of comma-separated hostnames and ports that will be returned to clients. Only include hostnames that will be resolved at the client level, such as public DNS.
inter.broker.listener.name â listeners used for internal traffic across brokers. These hostnames do not need to be resolved on the client side, but all of the cluster’s brokers must resolve them.
listener.security.protocol.map â lists the supported protocols for each listener.
You May Like: What To Say In An Interview
Kafka Admin Interview Questions
An individual working as a Kafka Admin is responsible for building and maintaining the entire Kafka messaging infrastructure and has expertise in concepts like Kafka brokers, Kafka server, etc.
Check out these important Apache Kafka interview questions that will help you crack any Kafka Admin job interview-
Q6 What Are The Types Of System Tools
The three types of system tools are:
- Kafka migration tool that helps to migrate a broker from one version to another.
- Mirror maker assists in offering to mirror one Kafka cluster to another.
- Consumer offset checker shows topic, partitions, and owner for the specified set of topics and consumer group.
Also Check: How To Prepare A Presentation For An Interview
What Is The Role Of Replicas In Apache Kafka
Replicas are the backups for partitions in Kafka. They are never actually read or written rather, they are used to prevent data loss in case of failure. The partitions of a topic are published to several servers in an Apache cluster. There is one Kafka server that is considered to be the leader for that partition. The leader handles all reads and writes for a particular partition. There can be none or more followers in the cluster, where the partitions of the topics get replicated. In the event of a failure in the leader, the data is not lost because of the presence of replicas in other servers. In addition, one of the followers will take on the role of the new leader.
What Is Meant By Kafka Connect
Kafka Connect is a tool provided by Apache Kafka to allow scalable and reliable streaming data to move between Kafka and other systems. It makes it easier to define connectors that are responsible for moving large collections of data in and out of Kafka. Kafka Connect is able to process entire databases as input. It can also collect metrics from application servers into Kafka topics so that this data can be available for Kafka stream processing.
Don’t Miss: How To Prepare For An Interview Manager Position
Explain Producer Batch In Apache Kafka
Producers write messages to Kafka, one at a time. Kafka waits for the messages that are being sent to Kafka, creates a batch and puts the messages into the batch, and waits until this batch becomes full. Only then is the batch sent to Kafka. The batch here is known as the producer batch. The default size of a producer batch is 16KB, but it can be modified. The larger the batch size, the more is the compression and throughput of the producer requests.
Q10 Can You Change The Number Of Partitions For A Topic In Kafka
Kafka does not permit you to reduce the number of partitions for a topic. However, you can expand the partitions. The alter command enables you to change the behavior and its associated configurations of a topic. You can use the following alter command and increase the partitions to five:
./bin/kafka-topics.sh –alter –zookeeper localhost:2181 –topic sample-topic –partitions 5
You May Like: What’s An Exit Interview
Q2 What Are The Main Features Of Kafka
You should be well-prepared for Kafka features interview questions. The interviewer can also ask about any one of the features in particular. The top features of Apache Kafka are as follows:
- Kafka is written in the Scala programming language and developed by Apache.
- It is a publish-subscribe messaging system with high throughput and fault tolerance.
- It is deployable in minutes.
- Kafka is fast because a single Kafka broker can handle megabytes of reads and writes per second and serve thousands of clients.
- You can partition data and streamline over a cluster of machines to enable larger data.
- Confluent Cloud is the cloud Kafka service that provides enterprise-grade features, security, and zero ops burden.
- It offers enterprise-grade security.
- You can reduce the ops burden using Kafka.
- Kafka has a built-in partition system âTopic.
- It offers the replication feature.
- Kafka queue can handle large amounts of data and transfer messages from one sender to another.
- It can also save the messages to storage and replicate them across the cluster.
- It collaborates with Zookeeper to synchronize with other services.
Explain How Topic Configurations Can Be Modified In Apache Kafka
To add a config:
bin/kafka-configs.sh –zookeeper localhost:2181 –topics –topic_name –alter –add-config x=y
To remove a config:
bin/kafka-configs.sh –zookeeper localhost:2181 –topics –topic_name –alter –delete-config x
Where x is the particular configuration key that has to be changed.
Don’t Miss: What Are 10 Good Interview Questions
What Are The Different Kinds Of Clusters In Azure Databricks And What Are The Functions Of Each
Asking such questions helps the interviewer test your theoretical knowledge by observing how well you understand the concepts. It is crucial that you mention all four main types and briefly describe each in your response to this question.
Sample answer:’Azure Databricks has four cluster types. Interactive, job, low-priority and high-priority. Interactive clusters help with data exploration and ad hoc queries. These clusters provide low latency and high concurrency. We utilise job clusters for batch job execution. We can automatically scale job clusters to match the requirements. Low-priority clusters are less expensive than other cluster types but offer low performance. These clusters are suitable for tasks, such as development and testing, that may require lesser performance. High-priority clusters are more costly than other clusters, but they provide the best performance. These clusters are suitable for production-level workloads.’
What Are Top Disadvantages Of Kafka
- Do not have complete set of monitoring tools: Apache Kafka does not contain a complete set of monitoring as well as managing tools. Thus, new startups or enterprises fear to work with Kafka.
- Message tweaking issues: The Kafka broker uses system calls to deliver messages to the consumer. In case, the message needs some tweaking, the performance of Kafka gets significantly reduced. So, it works well if the message does not need to change.
- Do not support wildcard topic selection: Apache Kafka does not support wildcard topic selection. Instead, it matches only the exact topic name. It is because selecting wildcard topics make it incapable to address certain use cases.
- Reduces Performance: Brokers and consumers reduce the performance of Kafka by compressing and decompressing the data flow. This not only affects its performance but also affects its throughput.
- Clumsy Behaviour: Apache Kafka most often behaves a bit clumsy when the number of queues increases in the Kafka Cluster.
- Lack some message paradigms: Certain message paradigms such as point-to-point queues, request/reply, etc. are missing in Kafka for some use cases.
Also Check: What’s A One Way Video Interview
What Are The Differences Between Redis And Kafka
Redis is the short form for remote dictionary servers. It is a key-value store and can be used as a repository to read and write requests. Redis is a no-SQL database.
Redis supports push-based message delivery. This means that messages published to Redis will be automatically delivered to the consumers.
Apache Kafka supports the pull-based delivery of messages. The messages published to the Kafka broker are not delivered automatically to the consumers rather, the consumers have to pull the messages when they are ready for them.
Redis does not support parallel processing.
Due to the partitioning system in Apache Kafka, one or more consumers of a specific consumer group can parallelly consume partitions of the topic at the same time.
Redis does not support message replicas. Once the messages are delivered to the consumers, they are deleted.
Apache Kafka supports creating message replicas in its log.
Redis is an in-memory store, which makes it faster than Kafka.
Apache Kafka uses disk space for storage, which makes it slower than Redis.
Since Redis is an in-memory store, it cannot handle large volumes of data.
Since Kafka uses disk space as its primary storage, it is capable of handling large volumes of data.
Explain The Scalability Of Apache Kafka
In software terms, the scalability of an application is its ability to maintain its performance when it is exposed to changes in application and processing demands. In Apache Kafka, the messages corresponding to a particular topic are divided into partitions. This allows the topic size to be scaled beyond the size that will fit on a single server. Allowing a topic to be divided into partitions ensures that Kafka can guarantee load balancing over multiple consumer processes. In addition, the concept of the consumer group in Kafka also contributes to making it more scalable. In a consumer group, a particular partition is consumed by only one consumer in the group. This aids in the parallelism of consuming multiple messages on a topic.
Also Check: Questions For New Employee Interview
What Is The Purpose Of The Retention Period In The Kafka Cluster
Within the Kafka cluster, the retention period is used to retain all the published records without checking whether they have been consumed or not. Using a configuration setting for the retention period, we can easily discard the records. The main purpose of discarding the records from the Kafka cluster is to free up some space.
What Is The Zookeeper’s Function In Kafka
Apache Kafka is a decentralized database that was designed with Zookeeper in mind. However, Zookeeper’s primary function is to provide coordination amongst the many nodes in the network, in this case. However, because Zookeeper acts as a regularly committed offset, we can restore from previously committed offsets if any node fails.
Top 30+ Kafka Interview Questions And Answers
26/Jul/2021 | 5 minutes to read
Here is a List of essential Kafka Interview Questions and Answers for Freshers and mid level of Experienced Professionals. All answers for these Kafka questions are explained in a simple and easiest way. These basic, advanced and latest Kafka questions will help you to clear your next Job interview.
What Is The Use Of Kafka In Azure Databricks
Apache Kafka is a decentralised streaming platform for constructing real-time streaming data pipelines and stream-adaptive applications. Such questions allow you to demonstrate your understanding of other tools and integrations with Databricks. In your response, mention how Kafka improves the workflow when integrating it with Azure Databricks.
Sample answer:’Azure Databricks uses Kafka for streaming data. It can help collect data from many sources, such as sensors, logs and financial transactions. Kafka is also capable of real-time processing and analysis of streaming data.’
Also Check: How To Sell Yourself In An Interview