I recently had to write some Spark jobs and deploy them on AWS/EMR. The jobs also had external dependencies, like Cassandra and Kafka. As the datasets were not that big, I decided to run this full stack locally during development using Docker and Docker-Compose.
Everything was working as expected, until at some point some of the containers would just be killed. No logs with error messages, no Out Of Memory Error, nothing.
I tried to run
docker stats while running the jobs, but I wasn’t sure what was going on, at some point some of the containers would just disappear from the list. After researching for a bit, I found out that Docker (at least Docker for Mac) has a hard limit on the amount of memory to be used by all containers together. That value can be inspected by
Kernel Version: 4.9.13-moby Operating System: Alpine Linux v3.5 OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 1.952 GiB ...
So that was my problem. The total memory to be allocated by all containers was set to 2 GB, which seems to be the default. By going to
Docker -> Preferences -> Advanced one can change the total memory and also the CPUs (these operations require restarting docker service). In my case, I increased the total memory to 5 GB and could run all of the containers and jobs locally.