Data Engineering

How Kafka Moves Data

April 2026

Most data systems answer a question: what is the current state of this record? Kafka answers a different one: what happened, and in what order? Everything that makes Kafka unusual follows from that distinction.


Producers, Topics, Consumers

A producer is any service that writes events. An event might be a user clicking a button, an order being placed, a sensor reading. The producer sends it to a topic — a named stream. A consumer subscribes to that topic and reads the events.

A topic is split into partitions — parallel lanes that allow multiple consumers to read simultaneously. Events with the same key (say, the same user ID) always land in the same partition, which preserves their order relative to each other.


The Log

This is the part that makes Kafka different from a message queue.

In a traditional queue, a message disappears once it is read. In Kafka, nothing disappears. A partition is an append-only log — events are written sequentially, each assigned a permanent position called an offset. The consumer tracks its own offset and advances it as it reads.

Because the log is retained (typically for days or weeks), a consumer can rewind its offset and re-process past events. This makes it straightforward to replay data through a new pipeline, backfill a database, or recover from a bug in downstream processing — without re-generating the original events.


Consumer Groups

Multiple independent applications often need to read the same stream. Consumer groups make this work without coordination.

Each consumer group tracks its own offset independently. Group A might be a real-time fraud detection service running near the head of the log. Group B might be a batch analytics job processing yesterday's data. Neither knows the other exists.

Within a single group, Kafka partitions the work: each partition is assigned to exactly one consumer in the group at a time. Add more consumers to the group and Kafka rebalances — more partitions get processed in parallel. The upper bound on parallelism is the number of partitions.


Why This Matters at Scale

The combination of durable logs, independent consumer groups, and partition-level parallelism is what makes Kafka a backbone for large data systems.

A single event — say, an order placed on an e-commerce platform — can simultaneously drive inventory updates, trigger a fraud check, update a user's recommendation model, and feed into a real-time analytics dashboard. Each of those is a separate consumer group, reading at its own pace, with no coupling between them.

The producer does not know who is consuming. The consumers do not coordinate with each other. Kafka is the contract between them: events are written once, retained durably, and available to any system that needs them.