Terraform, Google Cloud And Kubernetes
I've been hacking about with automated infrastructure setup a lot lately. The two tools I've focused on the most are NixOps and Terraform. This post is about the use of terraform on Google Cloud Platform (GCP) to create and manage a Kubernetes Container Cluster.
Setup
Before we begin, if you want to run any of the code, you'll need an account on google cloud. If you do not know what either GCP, Terraform or Kubernetes are you should follow those links. Note that you will also need the google cloud sdk (as we will be using the gcloud
cli) and also the kubernetes cli, kubectl. Once you are ready, you will need to create an auth json file as described here, and that should be everything you need to proceed.
Terraform Google Cloud Provider
Terraform's Google Cloud provider covers a lot of the functionality of GCP. It also has a backend for storing state on Google Cloud Storage (GCS). By default terraform will store the state locally, so this backend is not required, but it is good practice 1.
Let's set up our backend to store terraforms state on GCS. In a file called backend.tf
:
terraform {
backend "gcs" {
bucket = "my-bucket"
path = "my-folder/cluster-infra/terraform.tfstate"
project = "my-project"
}
}
This tells terraform it should store and lookup state in GCS in the project my-project
, a bucket within that project, my-bucket
, and the path my-folder/cluster-infra/terraform.tfstate
within that bucket. You need to make sure this project, bucket and path exist. Just a few more lines, and we can build our cluster!
Cluster Definition
The definition for the Container Cluster itself is quite short. Let's create another file called init.tf
.
provider "google" {
region = "${var.region}"
project = "${var.project}"
}
resource "google_container_cluster" "primary" {
name = "test-cluster"
zone = "${var.zone}"
initial_node_count = 3
master_auth {
username = "${var.kube_username}"
password = "${var.kube_password}"
}
node_config {
oauth_scopes = [
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
That's it! This will build a three node kubernetes container cluster. Breaking it down:
provider "google"
: this says we want to use the Google Cloud Provider.region
: the region to spin up our resources.project
: the project our resources should live within.
resource "google_container_cluster" "primary"
2: create a google_container_cluster that can be referenced from other terraform resources but the nameprimary
.name
: the name of this cluster within google cloud.zone
: the zone in the region we specified in our provider to spin up these resources.initial_node_count
: the number of nodes in this cluster.master_auth
: the credentials we can use to access the kubernetes clusternode_config
: the machine type and image used for all nodes, here we just define oauth_scopes.
Variables
In the above configs you probably noticed ${var.something}
in a few places. Values for these variables can be loaded into the config when launching terraform in multiple ways 3. For this post I'll go the variable files route. First create a variables.tf
with the following definitions:
variable "project" {}
variable "region" {}
variable "zone" {}
variable "kube_username" {}
variable "kube_password" {}
Now, create a file terraform.tfvars
with the following key/value pairs:
project = "my-project"
region = "europe-west1"
zone = "europe-west1-b"
kube_username = "testuser"
kube_password = "testpassword"
Launching Our Cluster
Let's put everything together and launch our cluster! You should have the following four files:
$ ls
backend.tf init.tf terraform.tfvars variables.tf
There is one more step before we can launch. If you try to run terraform apply
or terraform plan
you will get the following error:
$ terraform plan
Backend reinitialization required. Please run "terraform init".
Reason: Initial configuration of the requested backend "gcs"
The "backend" is the interface that Terraform uses to store state,
perform operations, etc. If this message is showing up, it means that the
Terraform configuration you're using is using a custom configuration for
the Terraform backend.
...
Trying to run terraform init
will also give an error:
$ terraform init
Initializing the backend...
Error configuring the backend "gcs": Failed to configure remote backend "gcs": google: could not find default credentials.
See https://developers.google.com/accounts/docs/application-default-credentials for more information.
Please update the configuration in your Terraform files to fix this error
then run this command again.
What we need to do to correctly initialize the backend is pass the json credentials file created in the Setup section above. If you missed this, the instructions can be found here. Download the json file and place it under ~/.gcp_creds.json
. Now we can finally start running things!
The envvar GOOGLE_APPLICATION_CREDENTIALS
tells terraform where to find the creds. To initialize the backend:
$ GOOGLE_APPLICATION_CREDENTIALS=~/.gcp_creds.json terraform init
Initializing the backend...
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Terraform has been successfully initialized!
...
Let's see what terraform will build with terraform plan
:
$ GOOGLE_APPLICATION_CREDENTIALS=~/.gcp_creds.json terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.
Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.
+ google_container_cluster.primary
additional_zones.#: "<computed>"
cluster_ipv4_cidr: "<computed>"
endpoint: "<computed>"
initial_node_count: "3"
instance_group_urls.#: "<computed>"
logging_service: "<computed>"
master_auth.#: "1"
master_auth.0.client_certificate: "<computed>"
master_auth.0.client_key: "<sensitive>"
master_auth.0.cluster_ca_certificate: "<computed>"
master_auth.0.password: "<sensitive>"
master_auth.0.username: "kubeuser"
monitoring_service: "<computed>"
name: "vulgr-cluster"
network: "default"
node_config.#: "1"
node_config.0.disk_size_gb: "<computed>"
node_config.0.image_type: "<computed>"
node_config.0.local_ssd_count: "<computed>"
node_config.0.machine_type: "<computed>"
node_config.0.oauth_scopes.#: "4"
node_config.0.oauth_scopes.0: "https://www.googleapis.com/auth/compute"
node_config.0.oauth_scopes.1: "https://www.googleapis.com/auth/devstorage.read_only"
node_config.0.oauth_scopes.2: "https://www.googleapis.com/auth/logging.write"
node_config.0.oauth_scopes.3: "https://www.googleapis.com/auth/monitoring"
node_config.0.service_account: "<computed>"
node_pool.#: "<computed>"
node_version: "<computed>"
zone: "europe-west1-b"
Plan: 1 to add, 0 to change, 0 to destroy.
Finally, let's build our cluster:
$ GOOGLE_APPLICATION_CREDENTIALS=~/.gcp_creds.json terraform apply
google_container_cluster.primary: Creating...
additional_zones.#: "" => "<computed>"
cluster_ipv4_cidr: "" => "<computed>"
endpoint: "" => "<computed>"
initial_node_count: "" => "3"
instance_group_urls.#: "" => "<computed>"
logging_service: "" => "<computed>"
master_auth.#: "" => "1"
master_auth.0.client_certificate: "" => "<computed>"
master_auth.0.client_key: "<sensitive>" => "<sensitive>"
master_auth.0.cluster_ca_certificate: "" => "<computed>"
master_auth.0.password: "<sensitive>" => "<sensitive>"
master_auth.0.username: "" => "kubeuser"
monitoring_service: "" => "<computed>"
name: "" => "vulgr-cluster"
network: "" => "default"
node_config.#: "" => "1"
node_config.0.disk_size_gb: "" => "<computed>"
node_config.0.image_type: "" => "<computed>"
node_config.0.local_ssd_count: "" => "<computed>"
node_config.0.machine_type: "" => "<computed>"
node_config.0.oauth_scopes.#: "" => "4"
node_config.0.oauth_scopes.0: "" => "https://www.googleapis.com/auth/compute"
node_config.0.oauth_scopes.1: "" => "https://www.googleapis.com/auth/devstorage.read_only"
node_config.0.oauth_scopes.2: "" => "https://www.googleapis.com/auth/logging.write"
node_config.0.oauth_scopes.3: "" => "https://www.googleapis.com/auth/monitoring"
node_config.0.service_account: "" => "<computed>"
node_pool.#: "" => "<computed>"
node_version: "" => "<computed>"
zone: "" => "europe-west1-b"
google_container_cluster.primary: Still creating... (10s elapsed)
...
google_container_cluster.primary: Still creating... (3m0s elapsed)
google_container_cluster.primary: Creation complete (ID: test-cluster)
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
State path:
...
List The Nodes
Did it actually work? Before we can test, we need to get some credentials. To retrieve the credentials for the cluster:
$ gcloud container clusters get-credentials test-cluster --zone europe-west1-b --project my-project
Fetching cluster endpoint and auth data.
kubeconfig entry generated for test-cluster.
Switch to this context in kubectl:
$ kubectl config set-cluster test-cluster
Cluster "test-cluster" set.
Now should be able to list the nodes:
$ kubectl get nodes
NAME STATUS AGE
gke-test-cluster-default-pool-a1844955-h5w0 Ready 3m
gke-test-cluster-default-pool-a1844955-vc3l Ready 3m
gke-test-cluster-default-pool-a1844955-wf4v Ready 3m
Excellent! To destroy the cluster simply run:
$ GOOGLE_APPLICATION_CREDENTIALS=~/.gcp_creds.json terraform destroy
Do you really want to destroy?
Terraform will delete all your managed infrastructure.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
google_container_cluster.primary: Refreshing state... (ID: test-cluster)
Destroy complete! Resources: 0 destroyed.
Conclusion
There's not much to codifying a cluster setup on Google Cloud. Note that there are some limitations, one of the bigger ones being updates, you cannot update the google_container_cluster
resource without terraform destroying the initial cluster and creating a new one. Depending on how you plan to apply updates this may or may not be a problem - for example you could choose to create an entire new cluster with updates and migrate any existing workloads on the old onto the new , finally destroying the old one.
Now that you have a kubernetes cluster, you can also manage this using the Kubernetes Provider, I'll leave that for another post.
The code from this post was adapted from a project I'm toying about with. You can view the code up to this post here, note that some resource values are different.
See the overview at variables.