Getting cockroachDB running with Kubernetes

Cloud Platform (intermediate level) posted on 1st Feb 2018


In Getting cockroachdb running on google cloud platform, I went through how to get a few docker containers running in an insecure, 3 node cockroachdb on a single VM. In real life, you'll want to run secured multiple nodes across multiple VMs with shared storage. This can be super complex and error prone if you, like me, don't really know what you're doing. Luckily, Kubernetes can help with orchestration, and cockroach have put together a great guide and some preconfigured .yaml files to help with that. I'll walk through that in this post.

Initially we'll start with getting an insecure version running, just like we did in Getting cockroachdb running on google cloud platform, but this time  using Kubernetes. The main effective difference between a secure and insecure version of cockroach is that certificates are required to be able to query the database from a client. A later post will highlight how to set up a secure version.

What is Kubernetes

Kubernetes helps you manage and deploy apps in containers in a process often called orchestration. It abstracts the idea of hardware and VMS away, and is open sourced and cloud supplier independent - so your configurations can be easily transported from Google to Amazon and other providers. I'm just going to focus on Google Cloud Platform though. You can read about Kubernetes here, although I must admit I find the documentation very heavy going. Most of Google Cloud platform stuff can be tough, but Kubernetes docs dives right in to the weeds from the outset, and in my personal view, it's hard to know where to start, what all the terms mean and how everything relates to each other. This article is a highly simplified version to remind myself. If it informs you in some way too, then so much the better.

kubectl and gcloud

I'll be using the CLI for cloud platform and Kubernetes to set all this up, so you first need to install them. I recommend you use the cloud shell (see Your own free linux VM) as it already has everything installed. If you want to use something else, then see these links for how to install gcloud and kubectl. Best to set up your default project and zone before doing anything else. Substitute the stuff in red with your own parameters.

gcloud config set project fid-sql
gcloud config set compute/zone us-central1-c

Cluster and nodes

A cluster is created and managed by Kubernetes Engine, and consists of a cluster master and nodes, where a node is a physical or virtual machine. A node could be a Google Compute engine VM ( as they will be in my example). Gcloud creates these for you, and the Kubernetes engine takes care of allocating your workload across these nodes. This is the idea of abstracting away the machines from the containers.

You create a cluster like this

gcloud container clusters create cockroachdb

Credentials

Once the cluster is created, you'll want its credentials loaded to your cloud shell environment so you can access it easily from the CLI

gcloud container clusters get-credentials cockroachdb

Yaml files

Now we are ready to use kubectl to configure the cluster. Kubernetes configuration can be a long, complex and tedious process and is usually done through the use of .yaml configuration files. Luckily, there is usually a prebaked one around for you to use. Cockroach provides this one for the next step.

Pods

A container is deployed as a member of a pod.  Although a pod generally contains a single container, it can contain multiple containers which share the same storage, resources and lifetime. The master allocates pods across nodes and stops and starts them as required to optimize the workload.

Stateful sets

If an application is stateful (for example if it uses common persistent storage shared between instances), Kubernetes needs to handle its pods in a special way. A database like cockroach, where the load is shared across multiple cockroach nodes (not the same concept as Kubernetes nodes), is clearly a stateful set. Using the yaml file provided by cockroach, kubectl will create multiple pods for cockroachdb containers in a statefulset , and assign persistent storage  like so


We end up with a cluster that looks like this

gcloud container clusters list 

and heading over to the cloud console, we can see that it has created 3 VM nodes  in the cluster which to handle the stateful set.


and we can can check the pods in the statefulset, like so.

kubectl get statefulsets

Scaling

Now there are 3 replicas in this statefulset, and 3 compute engine nodes. It would be easy to make the assumption that there's a one to one relationship. However they are independent. Remember that the Kubernetes engine makes the decision of which pods to run on which nodes. You can scale a statefulset to have more pods, without necessarily affecting the number of VM nodes they are being run on. Below I have now scaled up to 4 pods (which are load sharing cockroach nodes) , still running  across 3 VMs (Cluster nodes).



Storage


A pod has its own storage allocated, so each of our 4 pods should have some working space attached.
        
and the shared database is shared across the 3 nodes. Here it is via the cloud console. Many resources can be viewed either via the CLIs (gcloud and kubectl), or visually through the cloud console under Kubernetes Engine or Compute Engine.

Job

Next we need to initialize cockroach with a Kubernetes job.  This is defined by another yaml file provided by cockroach. 
In Kubernetes Engine/Workload we can find the log for running that under the cluster-init job


endpoints

To test the cockroachdb which is now all up and running. You can use the run command to make a container in a pod and execute it on a given host  accessible via an endpoint 

kubectl run cockroachdb -i --image=cockroachdb/cockroach --rm --restart=Never -- sql --insecure --host=cockroachdb-public < kutest.sql

End points are used to communicate with the cluster. 



kutest.sql, is just a test file that looks like this

DROP DATABASE IF EXISTS bank CASCADE;
CREATE DATABASE bank;
CREATE TABLE bank.accounts (id INT PRIMARY KEY, balance DECIMAL);
INSERT INTO bank.accounts VALUES (1, 1000.50);
SELECT * FROM bank.accounts;

Once this has run, here's what the pods look like



You could use exec rather than run to execute in an existing pod.

kubectl exec -i cockroachdb-3 ./cockroach -- sql --insecure < kutest.sql

Accessing the db

To make interactive queries inside the db, just exec in a pod

    kubectl exec -it cockroachdb-3 ./cockroach -- sql --insecure




Next you'll want to create an app to talk to the database and serve it's contents externally. I'll be using a server app to provide GraphQL access to an SQL database in the next post.




Why not join our forum, follow the blog or follow me on twitter to ensure you get updates when they are available.
Comments