I recently had to write some Spark jobs and deploy them on AWS/EMR. The jobs also had external dependencies, like Cassandra and Kafka. As the datasets were not that big, I decided to run this full stack locally during development using Docker and Docker-Compose.
When starting with Apache Kafka, you’re overwhelmed with a lot of new concepts: topics, partitions, groups, replicas, etc. Although Kafka documentation does a great job in explaining all of these concepts, sometimes it’s good to see them in practical scenarios. That’s what this post will try to achieve, by pointing a few findings/scenarios and solutions/explanations/links for each of them.