I recently had to write some Spark jobs and deploy them on AWS/EMR. The jobs also had external dependencies, like Cassandra and Kafka. As the datasets were not that big, I decided to run this full stack locally during development using Docker and Docker-Compose.
Cassandra is a fantastic database system that provides a lot of cool and important features for systems that need to handle large amounts of data, like horizontal scalability, elasticity, high availability, distributability, flexible consistency model among others. However, as you can probably guess, everything is a trade off and there’s no free lunch when it comes to distributed systems, so Cassandra does impose some restrictions in order to provide all of those nice features.
One of these restrictions, and the one we’ll be talking about today, is about its query/data model. One common requirement in databases is to be able to query records of a given table by different fields. When it comes to this type of operation, I consider it can be split into 2 different scenarios:
- Queries that you run to explore data, do some batch processing, generate some reports, etc. In this type of scenario, you don’t need the queries to be executed in real time and, thus, you can afford some extra latency. Spark might be a good fit to handle this scenario, and that’ll probably be subject of a different post.
- Queries that are part of your application and that you rely on to be able to process requests. For this type of scenario, the ideal is that you evolve your data model to support these queries, as you need to process them in an efficient way. This is the scenario that will be covered in this post.
Scala provides nice and clean ways of dealing with Future objects, like Functional Composition and For Comprehensions. Future objects are a great way of writing concurrent and parallel programs, as they allow you to execute asynchronous code and to extract the result of it at some point in the future.
However, this type of software requires a different mindset in terms of reasoning about it, be it while writing the main code or while thinking about testing it. This article will focus primarily in testing this type of code and, for this, we’ll be looking into ScalaTest, one of the best and most popular testing frameworks in the Scala ecosystem.
When starting with Apache Kafka, you’re overwhelmed with a lot of new concepts: topics, partitions, groups, replicas, etc. Although Kafka documentation does a great job in explaining all of these concepts, sometimes it’s good to see them in practical scenarios. That’s what this post will try to achieve, by pointing a few findings/scenarios and solutions/explanations/links for each of them.
In this second and last part, we’ll see how to use Akka HTTP to create RESTful Web Services. The idea is to expose the operations defined in the first part of this article, using HTTP and JSON.
Akka HTTP implements a full server/client-side HTTP based on top of Akka Actors. It describes itself as “a more general toolkit for providing and consuming HTTP-based services instead of a web-framework“. One interesting aspect is that it provides different abstraction levels for doing the same thing in most of the scenarios, i.e., provides high and low level APIs. Akka HTTP is pretty much the continuation of the Spray Project and Spray’s creators are part of Akka HTTP team. In our example, we’ll be using the high level Routing DSL to expose services, which allows routes to be defined and composed by directives. We’ll also use the Spray Json project to provide the infrastructure needed to use JSON with Akka HTTP.
This article is split in 2 parts and it aims to show how to create a small application (or a microservice if you prefer) using Akka HTTP (Scala) and Redis database.The application is very focused and provides means for a customer to be added, removed and retrieved.
In terms of technologies/tools/libs, the following ones will be used:
- Akka HTTP/Akka HTTP Json Support/Akka HTTP Test kit
- Redis database
In this first part, we’ll focus on the layer responsible for communicating with Redis, while the second part will focus on Akka HTTP.
In this article we will create a simple but comprehensive Scala application responsible for reading and processing a CSV file in order to extract information out of it.
Although simple, this app will touch in the following points:
- Creating an application from scratch using SBT
- Usage of traits, case class and a few collection methods
- Usage of scalatest framework to write unit tests