I started reading PAAS under the hood (henceforth referred to as ‘the book’) a few days ago. It’s a collection of posts by dotcloud (docker, inc.) engineering staff about what linux kernel processes are involved in their PAAS offering. Applications like dotcloud and heroku have fascinated me for some time, and I was really impressed with the explanations given in this book. To further my understanding of these concepts, I decided to write a series of blog posts about them, and document my explorations. You’ll find my logs below.
Step 0: Setup
I felt that the only way to fully understand what was being talked about in the book was to grab myself an instance and start messing around with it. Thus, I went and got the cheapest instance I could off of digitalocean, selected ubuntu 12.04, created a new user (other than root) using instructions located here, added my ssh keys using instructions located here, and considered that my starting point. After a quick set of code executions, i’m ready to begin!
One thing to address before I jump off into the land of Linux Containers, is why we need virtualization to begin with. In essence, the idea of platform-as-a-service is to run multiple applications together, possibly from the same users/clients, and also possibly from different users/clients.
Step 1: LinuX Containers (LXC)
In Episode One of the book, on almost the first line, the authors mention something called a Linux Container (LXC). I had no idea what that is, so I looked it up.
Linux Containers are, as described here, a lightweight virtualization technology. A natural follow-up question to this kind of explanation is, what is LXC a “lightweight” compared to, and the answer is that LXCs seem to be compared to Virtual Machines. A nice explanation in video form that I found is located here, and I have documented a bit of background research and a summary below:
A bit of background information for the video:
In virtual-machine-land, there is a thing called a hypervisor, which is something (hardware, software, firmware) that creates and runs virtual machines. There are two main kinds of hypervisors, known as Type 1 and Type 2. The difference between Type 1 and Type 2 hypervisors is that Type 1 hypervisors run directly on the box (‘bare-metal’), whereas Type 2 hypervisors run on top of an os on top of the box (‘hosted’). One would expect that Type 2 hypervisors would therefore perform worse, but this article suggests that the trade-off is not that bad.
We are trying to solve the problem of process isolation, since we want to run apps from different people on the same machine. The apps should be unaware of each other, and be in a state such that they don’t know the other apps exist. We want to consider two options for this, Virtual Machines and Containers.
Virtual Machines (VM) are capable of running their own OS, and then within the OS they can run apps. This provides full isolation from other apps, and thus provides a potential way to solve the problem of running multiple isolated apps from different users. The idea here is you spawn a VM instance per app to allow for full separation.
Containers are based on linux kernel features, and can isolate processes on the same OS. Therefore, the idea is you can spawn a container per application, and all the containers share one operating system, which performs isolation on the applications.
A major drawback of using the VM approach is the duplication of OS and software common to all applications, since a copy must be included in every VM.
A main drawback of using containers is that you can’t have apps using multiple types or versions of operating systems on one box, since they all share the same OS. Another drawback is that the application must be linux-based, since it will be running on a linux OS.
To get a full sense of what the authors of the book are talking about, I decided to try to run LXC on the digitalocean box.
In the case of running containers (LXC) on an instance provided by digitalocean, the box will be using both virtual machines and containers. DigitalOcean runs on a hypervisor called KVM, which is a Type 1 hypervisor. Since DigitalOcean provided a VM that had ubuntu 12.04 loaded on it, and the linux containers will run on top of this OS, we have the following stack:
Naturally, since we are only one of the many clients that DigitalOcean has, there may be many VMs (from other people) that are similarly running on the same KVM hypervisor running on the same bare metal box. We cannot access or see the other VMs from our VM, which is a desirable quality that demonstrates the isolation that a VM provides.