Deploying Apps on Google Cloud – Kubernetes Engine

Stack used: Terraform, kubectl, YAML, Redis cluster, Python & Go Lang

Initialize terraform:

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google" (hashicorp/google) 3.13.0...


* provider.google: version = "~> 3.13"

Terraform has been successfully initialized!

Before apply dry run terraform with “plan” option

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ terraform plan --out myplan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.


------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_container_cluster.gke-cluster will be created
  + resource "google_container_cluster" "gke-cluster" {
      + additional_zones            = (known after apply)
      + cluster_ipv4_cidr           = (known after apply)
      + default_max_pods_per_node   = (known after apply)
      + enable_binary_authorization = false
      + enable_intranode_visibility = (known after apply)
      + enable_kubernetes_alpha     = false
      + enable_legacy_abac          = false
      + enable_tpu                  = (known after apply)
      + endpoint                    = (known after apply)
      + id                          = (known after apply)
      + initial_node_count          = 3
      + instance_group_urls         = (known after apply)
      + label_fingerprint           = (known after apply)
      + location                    = "us-east1-b"
      + logging_service             = "logging.googleapis.com/kubernetes"
      + master_version              = (known after apply)
      + monitoring_service          = "monitoring.googleapis.com/kubernetes"
      + name                        = "my-first-gke-cluster"

.............
.............
............

terraform apply to create/modify k8s stack

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ terraform apply "myplan"
google_container_cluster.gke-cluster: Creating...
google_container_cluster.gke-cluster: Still creating... [10s elapsed]
google_container_cluster.gke-cluster: Still creating... [20s elapsed]
google_container_cluster.gke-cluster: Still creating... [30s elapsed]
google_container_cluster.gke-cluster: Still creating... [40s elapsed]
google_container_cluster.gke-cluster: Still creating... [50s elapsed]
google_container_cluster.gke-cluster: Still creating... [1m0s elapsed]
google_container_cluster.gke-cluster: Still creating... [1m10s elapsed]
google_container_cluster.gke-cluster: Still creating... [1m20s elapsed]
google_container_cluster.gke-cluster: Still creating... [1m30s elapsed]
google_container_cluster.gke-cluster: Still creating... [1m40s elapsed]
google_container_cluster.gke-cluster: Still creating... [1m50s elapsed]
google_container_cluster.gke-cluster: Still creating... [2m0s elapsed]
google_container_cluster.gke-cluster: Still creating... [2m10s elapsed]
google_container_cluster.gke-cluster: Still creating... [2m20s elapsed]
google_container_cluster.gke-cluster: Still creating... [2m30s elapsed]
google_container_cluster.gke-cluster: Still creating... [2m40s elapsed]
google_container_cluster.gke-cluster: Still creating... [2m50s elapsed]
google_container_cluster.gke-cluster: Still creating... [3m0s elapsed]
google_container_cluster.gke-cluster: Creation complete after 3m1s [id=projects/amplified-name-270419/locations/us-east1-b/clusters/my-first-gke-cluster]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Now Connect to google cloud and browser based authentication from a token

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ gcloud auth login
Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?................................................................................................................

Set the project & get creds

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ gcloud config set project myk8s-project
Updated property [core/project].
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ gcloud container clusters get-credentials my-first-gke-cluster --zone us-east1-b --project myk8s-project
Fetching cluster endpoint and auth data.
kubeconfig entry generated for my-first-gke-cluster.
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ 

Add extra pool

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ cat nodepool.tf
resource "google_container_node_pool" "extra-pool" {
  name               = "extra-node-pool"
  location           = "us-east1-b"
  cluster            = google_container_cluster.gke-cluster.name
  initial_node_count = 3
}

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ terraform apply "myplan"
google_container_node_pool.extra-pool: Creating...
google_container_node_pool.extra-pool: Still creating... [10s elapsed]
google_container_node_pool.extra-pool: Still creating... [20s elapsed]
google_container_node_pool.extra-pool: Still creating... [30s elapsed]
google_container_node_pool.extra-pool: Still creating... [40s elapsed]
google_container_node_pool.extra-pool: Still creating... [50s elapsed]
google_container_node_pool.extra-pool: Creation complete after 58s [id=projects/myk8s-project/locations/us-east1-b/clusters/my-first-gke-cluster/nodePools/extra-node-pool]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Deploy Redis

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ cat redis-master-deployment.yaml

apiVersion: apps/v1 #  for k8s versions before 1.9.0 use apps/v1beta2  and before 1.8.0 use extensions/v1beta1
kind: Deployment
metadata:
  name: redis-master
spec:
  selector:
    matchLabels:
      app: redis
      role: master
      tier: backend
  replicas: 1
  template:
    metadata:
      labels:
        app: redis
        role: master
        tier: backend
    spec:
      containers:
      - name: master
        image: k8s.gcr.io/redis:e2e  # or just image: redis
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6379
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl create -f \
>     redis-master-deployment.yaml
deployment.apps/redis-master created
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl get pods
NAME                           READY   STATUS              RESTARTS   AGE
redis-master-596696dd4-plxh9   0/1     ContainerCreating   0          10s
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ cat redis-master-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis-master
  labels:
    app: redis
    role: master
    tier: backend
spec:
  ports:
  - port: 6379
    targetPort: 6379
  selector:
    app: redis
    role: master
    tier: backend
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl create -f \
>     redis-master-service.yaml
service/redis-master created
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl get service
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes     ClusterIP   10.27.240.1     <none>        443/TCP    17m
redis-master   ClusterIP   10.27.249.158   <none>        6379/TCP   6s
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ cat redis-slave-deployment.yaml
apiVersion: apps/v1 #  for k8s versions before 1.9.0 use apps/v1beta2  and before 1.8.0 use extensions/v1beta1
kind: Deployment
metadata:
  name: redis-slave
spec:
  selector:
    matchLabels:
      app: redis
      role: slave
      tier: backend
  replicas: 2
  template:
    metadata:
      labels:
        app: redis
        role: slave
        tier: backend
    spec:
      containers:
      - name: slave
        image: gcr.io/google_samples/gb-redisslave:v1
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        env:
        - name: GET_HOSTS_FROM
          value: dns
          # If your cluster config does not include a dns service, then to
          # instead access an environment variable to find the master
          # service's host, comment out the 'value: dns' line above, and
          # uncomment the line below:
          # value: env
        ports:
        - containerPort: 6379
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl create -f \
>     redis-slave-deployment.yaml
deployment.apps/redis-slave created
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl get pods
NAME                           READY   STATUS              RESTARTS   AGE
redis-master-596696dd4-plxh9   1/1     Running             0          84s
redis-slave-96685cfdb-bl8bs    0/1     ContainerCreating   0          7s
redis-slave-96685cfdb-nx8v5    0/1     ContainerCreating   0          7s
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ cat redis-slave-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis-slave
  labels:
    app: redis
    role: slave
    tier: backend
spec:
  ports:
  - port: 6379
  selector:
    app: redis
    role: slave
    tier: backend
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl create -f \
>     redis-slave-service.yaml
service/redis-slave created
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/examples/guestbook$ kubectl get service
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes     ClusterIP   10.27.240.1     <none>        443/TCP    19m
redis-master   ClusterIP   10.27.249.158   <none>        6379/TCP   83s
redis-slave    ClusterIP   10.27.251.180   <none>        6379/TCP   11s

Deploy Python docker container:

Build python docker container

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ docker build -t dbwebapi .

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ docker images|grep db
dbwebapi                                                                  latest                                           680b96d2e4f7        10 seconds ago      993MB

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ docker tag dbwebapi gcr.io/${PROJECT_ID}/dbwebapi:v1

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ docker push gcr.io/${PROJECT_ID}/dbwebapi:v1
The push refers to repository [gcr.io/amplified-name-270419/dbwebapi]
69c2e602b3ad: Pushed 
a61e41f4f360: Pushed 
041654b625c6: Pushed 
e04910a132ed: Pushed 
485799b4fb7a: Pushed 
4be1e4b7b0b1: Pushed 
e90afb708b27: Pushed 
b32c3fe9bc49: Pushed 
132d53d2fcf6: Pushed 
74ef248fc7e3: Pushed 
4bb171da3c44: Pushed 
.........
v1: digest: sha256:58cbc88b3afffd15ca5365a890e1a61c9e8aaa5fd9fd60ee4f153f34456b7caf size: 3687

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ kubectl create deployment dbwebapi --image=gcr.io/amplified-name-270419/dbwebapi:v1
deployment.apps/dbwebapi created

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
dbwebapi-7d8bbbb76b-74rdz      1/1     Running   0          20s
dbwebapi-7d8bbbb76b-8xc9l      1/1     Running   0          20s
dbwebapi-7d8bbbb76b-m9dtb      1/1     Running   0          20s
frontend-69859f6796-8qxh2      1/1     Running   0          75m
frontend-69859f6796-jb4w2      1/1     Running   0          75m
frontend-69859f6796-r5z6j      1/1     Running   0          75m
redis-master-596696dd4-plxh9   1/1     Running   0          77m
redis-slave-96685cfdb-bl8bs    1/1     Running   0          76m
redis-slave-96685cfdb-nx8v5    1/1     Running   0          76m
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ kubectl get services
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
frontend       LoadBalancer   10.27.247.228   34.74.109.101   80:30893/TCP   75m
kubernetes     ClusterIP      10.27.240.1     <none>          443/TCP        96m
redis-master   ClusterIP      10.27.249.158   <none>          6379/TCP       78m
redis-slave    ClusterIP      10.27.251.180   <none>          6379/TCP       76m

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ kubectl expose deployment dbwebapi --type=LoadBalancer --port 25000 --target-port 25443
service/dbwebapi exposed

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ kubectl get services
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)           AGE
dbwebapi           LoadBalancer   10.27.241.175   <pending>       25000:31014/TCP   12s
frontend           LoadBalancer   10.27.247.228   34.74.109.101   80:30893/TCP      77m
kubernetes         ClusterIP      10.27.240.1     <none>          443/TCP           97m
redis-master       ClusterIP      10.27.249.158   <none>          6379/TCP          80m
redis-slave        ClusterIP      10.27.251.180   <none>          6379/TCP          78m

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ kubectl get services
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)           AGE
dbwebapi           LoadBalancer   10.27.241.175   34.74.237.60     25000:31014/TCP   62s
frontend           LoadBalancer   10.27.247.228   34.74.109.101    80:30893/TCP      79m
kubernetes         ClusterIP      10.27.240.1     <none>           443/TCP           99m
redis-master       ClusterIP      10.27.249.158   <none>           6379/TCP          81m
redis-slave        ClusterIP      10.27.251.180   <none>           6379/TCP          80m

Build hello-web docker image

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/kubernetes-engine-samples/hello-app$ docker build -t gcr.io/${PROJECT_ID}/hello-app:v1 .

Successfully built 929d000392a8
Successfully tagged gcr.io/amplified-name-270419/hello-app:v1

Deploying hello-web

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ kubectl create deployment hello-web --image=gcr.io/${PROJECT_ID}/hello-app:v1
deployment.apps/hello-web created


(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ kubectl expose deployment hello-web --type=LoadBalancer --port 80 --target-port 8080
service/hello-web exposed

Expose redis with LoadBalancer so you can query on Public IP (not required – use private IP for dB access)

(base) skondla@skondla-mac:~/apps/redis/src$ ./redis-cli -h 34.73.182.12
34.73.182.12:6379> ping
PONG
34.73.182.12:6379> exit
(base) skondla@skondla-mac:~/apps/redis/src$ ./redis-cli -h 34.73.182.12 get
(error) ERR wrong number of arguments for 'get' command
(base) skondla@skondla-mac:~/apps/redis/src$ ./redis-cli -h 34.73.182.12 KEYS '*'
1) "messages"
(base) skondla@skondla-mac:~/apps/redis/src$ ./redis-cli -h 34.73.182.12 get messages
",Hello"
(base) skondla@skondla-mac:~/apps/redis/src$ ./redis-cli -h 34.73.182.12 get messages
",Hello,Hi, Sudheer - How are doing?"
(base) skondla@skondla-mac:~/apps/redis/src$ 

Test hello-web

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ curl -l http://35.237.106.75:80
Hello, world!
Version: 1.0.0
Hostname: hello-web-bf98759f7-92fgc
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine/webApp$ 

Testing dbwebAPI app (Python App)

https://34.74.237.60:25000/xxx

Python DB Web API app is successfully deployed on GCP Kubernetes Cluster

Redis container Logs

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
dbwebapi-676c645974-fpjr8      1/1     Running   0          4h17m
frontend-69859f6796-29hr8      1/1     Running   0          5h24m
frontend-69859f6796-4jlwh      1/1     Running   0          5h24m
frontend-69859f6796-88f4r      1/1     Running   0          5h24m
redis-master-596696dd4-zcrdb   1/1     Running   0          5h27m
redis-slave-96685cfdb-8l4hq    1/1     Running   0          5h25m
redis-slave-96685cfdb-c266k    1/1     Running   0          5h25m
(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ kubectl logs redis-master-596696dd4-zcrdb
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 2.8.19 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

[1] 23 Mar 19:37:36.865 # Server started, Redis version 2.8.19
[1] 23 Mar 19:37:36.865 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 23 Mar 19:37:36.865 * The server is now ready to accept connections on port 6379
[1] 23 Mar 19:39:39.458 * Slave 10.24.3.3:6379 asks for synchronization
[1] 23 Mar 19:39:39.458 * Full resync requested by slave 10.24.3.3:6379
[1] 23 Mar 19:39:39.458 * Starting BGSAVE for SYNC with target: disk
[1] 23 Mar 19:39:39.458 * Background saving started by pid 8
[8] 23 Mar 19:39:39.473 * DB saved on disk
[8] 23 Mar 19:39:39.474 * RDB: 0 MB of memory used by copy-on-write
[1] 23 Mar 19:39:39.538 * Background saving terminated with success
[1] 23 Mar 19:39:39.539 * Synchronization with slave 10.24.3.3:6379 succeeded
[1] 23 Mar 19:39:40.023 * Slave 10.24.4.3:6379 asks for synchronization
[1] 23 Mar 19:39:40.023 * Full resync requested by slave 10.24.4.3:6379
[1] 23 Mar 19:39:40.023 * Starting BGSAVE for SYNC with target: disk
[1] 23 Mar 19:39:40.024 * Background saving started by pid 9
[9] 23 Mar 19:39:40.026 * DB saved on disk
[9] 23 Mar 19:39:40.027 * RDB: 0 MB of memory used by copy-on-write
[1] 23 Mar 19:39:40.038 * Background saving terminated with success
[1] 23 Mar 19:39:40.038 * Synchronization with slave 10.24.4.3:6379 succeeded
[1] 23 Mar 19:54:52.027 * 1 changes in 900 seconds. Saving...
[1] 23 Mar 19:54:52.027 * Background saving started by pid 10
[10] 23 Mar 19:54:52.031 * DB saved on disk
[10] 23 Mar 19:54:52.031 * RDB: 0 MB of memory used by copy-on-write
[1] 23 Mar 19:54:52.128 * Background saving terminated with success
[1] 23 Mar 20:09:53.028 * 1 changes in 900 seconds. Saving...
[1] 23 Mar 20:09:53.028 * Background saving started by pid 11
[11] 23 Mar 20:09:53.031 * DB saved on disk
[11] 23 Mar 20:09:53.032 * RDB: 0 MB of memory used by copy-on-write
[1] 23 Mar 20:09:53.128 * Background saving terminated with success
[1] 23 Mar 21:25:31.962 * DB saved on disk
[1] 23 Mar 21:25:32.173 * DB saved on disk

hell-web app container logs

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ kubectl logs frontend-69859f6796-29hr8
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.24.2.7. Set the 'ServerName' directive globally to suppress this message
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.24.2.7. Set the 'ServerName' directive globally to suppress this message
[Mon Mar 23 19:41:21.539692 2020] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/5.6.20 configured -- resuming normal operations
[Mon Mar 23 19:41:21.539950 2020] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'
10.142.0.14 - - [23/Mar/2020:19:54:44 +0000] "GET / HTTP/1.1" 200 826 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
10.142.0.14 - - [23/Mar/2020:19:54:44 +0000] "GET /controllers.js HTTP/1.1" 200 759 "http://35.229.81.94/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
10.142.0.14 - - [23/Mar/2020:19:54:45 +0000] "GET /guestbook.php?cmd=get&key=messages HTTP/1.1" 200 244 "http://35.229.81.94/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
10.142.0.16 - - [23/Mar/2020:19:54:45 +0000] "GET /favicon.ico HTTP/1.1" 404 437 "http://35.229.81.94/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
10.142.0.11 - - [23/Mar/2020:19:55:41 +0000] "GET / HTTP/1.1" 200 1184 "-" "Mozilla/5.0 (compatible; Nimbostratus-Bot/v1.3.2; http://cloudsystemnetworks.com)"
10.142.0.16 - - [23/Mar/2020:20:08:24 +0000] "GET / HTTP/1.1" 200 770 "-" "Mozilla/5.0 zgrab/0.x"
10.142.0.11 - - [23/Mar/2020:20:16:10 +0000] "GET / HTTP/1.1" 200 1184 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
10.142.0.15 - - [23/Mar/2020:20:54:02 +0000] "GET / HTTP/1.1" 200 826 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
10.142.0.15 - - [23/Mar/2020:20:54:02 +0000] "GET /guestbook.php?cmd=get&key=messages HTTP/1.1" 200 279 "http://35.229.81.94/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
10.142.0.15 - - [23/Mar/2020:22:05:08 +0000] "GET /Telerik.Web.UI.WebResource.axd?type=rau HTTP/1.1" 404 400 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"
10.142.0.13 - - [24/Mar/2020:00:16:53 +0000] "GET / HTTP/1.1" 200 1184 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"

DB web API container logs

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ kubectl logs dbwebapi-676c645974-fpjr8
 * Serving Flask app "dbWebAPI" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on https://0.0.0.0:25443/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 892-740-420
10.24.5.1 - - [23/Mar/2020 20:57:47] "GET /backup HTTP/1.1" 200 -
10.142.0.15 - - [23/Mar/2020 20:57:47] "GET /favicon.ico HTTP/1.1" 404 -
10.142.0.15 - - [23/Mar/2020 20:58:14] "GET /backup/create HTTP/1.1" 200 -
10.142.0.15 - - [23/Mar/2020 20:58:14] "GET /favicon.ico HTTP/1.1" 404 -
10.142.0.15 - - [23/Mar/2020 20:58:34] "GET /backup/status HTTP/1.1" 200 -
10.142.0.12 - - [23/Mar/2020 20:58:34] "GET /favicon.ico HTTP/1.1" 404 -
10.142.0.16 - - [23/Mar/2020 20:58:49] "GET /backup/delete HTTP/1.1" 200 -


Guestbook App (go lang)

Tear down Kube cluster

(base) skondla@skondla-mac:~/myStage/k8s/gcp/k8sEgine$ terraform destroy
google_container_node_pool.extra-pool: Refreshing state... [id=projects/amplified-name-270419/locations/us-east1-b/clusters/my-first-gke-cluster/nodePools/extra-node-pool]
google_container_cluster.gke-cluster: Refreshing state... [id=projects/amplified-name-270419/locations/us-east1-b/clusters/my-first-gke-cluster]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # google_container_cluster.gke-cluster will be destroyed

(base) skondla@skondla-mac:~/apps/redis/src$ kubectl get pods
The connection to the server xx.xxx.xx.xxx was refused - did you specify the right host or port?
(base) skondla@skondla-mac:~/apps/redis/src$ 

Cassandra Architecture & Operational Best Practices

Before we begin, understand the CAP theorem

  1. CAP (Consistency, Availability and Partition Tolerance), the foundation behind the design of a distributed database system. The foundation of a distributed system is based on the CAP theorem (illustrated in the following diagram), which states that it is impossible for a distributed computing system to simultaneously guarantee Consistency, Availability and Partition Tolerance. Therefore, each distributed system must do some trade off and choose any two of these three properties.
  2. Per the CAP theorem, a distributed system can either guarantee Consistency and Availability (CA) while allowing some trade off with Partition Tolerance, or it can guarantee Consistency and Partition Tolerance (CP) while allowing some trade off with Availability or it can guarantee Availability and Partition Tolerance (AP) while allowing some trade off with Consistency.
  3. Cassandra is highly Available and Partition Tolerant distributed database system with tunable (eventual) Consistency. In Cassandra, we can tune the consistency on per query basis.

What is Cassandra database?

  1. The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear Scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.Cassandra’s support for replicating across multiple data centers is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
    1. Proven: Cassandra is used at large corporations across the globe, for user cases from streaming media, retail, eCommerce, IoT that have large active data sets.
    2. Fault Tolerant: Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
    3. Performant: Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.
    4. Decentralized: There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.
    5. Scalable: Some of the largest production deployments include Apple’s, with over 75,000 nodes storing over 10 PB of data, Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), and eBay (over 100 nodes, 250 TB).
    6. Durable: Cassandra is suitable for applications that can’t afford to lose data, even when an entire data center goes down.
    7. Elastic: Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.

Architecture:

  1. Cassandra Cluster Ring:
  1. At a very high level, Cassandra operates by dividing all data evenly around a cluster of nodes, which can be visualized as a ring. Nodes generally run on commodity hardware. Each C* node in the cluster is responsible for and assigned a token range (which is essentially a range of hashes defined by a partitioner, which defaults to Murmur3Partitioner in C* v1.2+). By default this hash range is defined with a maximum number of possible hash values ranging from 0 to 2^127-1.
    1. In Cassandra, there is no master/slave concept. Each node (server) in a Cassandra cluster is treated or functions equally. A client read or write request can be routed to any node ( which acts as a coordinator for that particular client request) in the Cassandra cluster irrespective of the fact whether that node owns the requested/written data. Each node in the cluster exchange information (using a peer-to-peer protocol) across the cluster every second, which makes it possible for the nodes to know which node owns which data and status of other nodes in the cluster. If a node fails, any other available node will serve the client’s request. Cassandra guarantees availability and partition tolerance by replicating data across the nodes in the cluster. We can control the number of replicas (copies of data) to be maintained in the cluster through “Replication Factor”. There is no primary or secondary replica, each replica is equal and a client request can be served from any available replica.
  2. Cassandra Ring Token Distribution.
  1. One of Cassandra’s many strong points is its approach to keeping architecture simple. It is extremely simple to get a multi node cassandra cluster up and running. While this initial simplicity is great, Cassandra cluster will still require some ongoing maintenance throughout the life of your cluster.
    1. There are two main aspects of token management in Cassandra. The first and somewhat more straightforward aspect, is the initial token selection for the nodes in your cluster. The second aspect being the maintenance of nodes and tokens in the production cluster in order to keep the cluster balanced.
    2. Consistent hashing: The core of Cassandra’s peer to peer architecture is built on the idea of consistent hashing. This is where the concept of tokens comes from. The basic concept from consistent hashing for our purposes is that each node in the cluster is assigned a token that determines what data in the cluster it is responsible for. The tokens assigned to your nodes need to be distributed throughout the entire possible range of tokens. As a simplistic example, if the range of possible tokens was 0-300 and you had six nodes, you would want the tokens for your nodes to be: 0, 50, 100, 150, 200, 250.
    3. Initial Token Selection: The easiest time to ensure your cluster is balanced is during the intial setup of that cluster. Changing tokens once a cluster is running is a heavy operation that involves replicating data between nodes in the cluster.
    4. Balancing a Live Cluster
      1. Determining the new tokens
        1. The first step in the process is to determine what the tokens in your cluster should be. At first glance this step may seem straightforward. Using the example above, if I have a token range of 300 and 6 nodes, my tokens should be: 0, 50, 100, 150, 200, 250. Actually though, this is just one set of possible tokens that are valid. It is really the distribution of the tokens that matters in this case. For example, the tokens 10, 60, 110, 160,210 and 260 will also provide a balanced cluster.
      2. Optimizing for the smallest number of moves
        1.   In the case of balancing a live cluster, you want to find the optimal set of tokens taking into consideration the tokens that already exist in your cluster. This allows you to minimize the number of moves you need to do in order to balance the cluster. The process actually involves examining each token in the ring to see if it is ‘balanced’ with any other tokens in the ring.
      3. Optimizing for the smallest amount of data transferred
        1. After narrowing down the possible sets of new tokens for you cluster, it is possible that there may be multiple sets of tokens that can balance a cluster and require the same number of moves. In this case we want to further narrow down our options by picking the set of tokens that requires the last amount of data transfer in our cluster. That is, the set of tokens that require the shortest total distance to move.
      4. Dealing with multiple racks
        1. In order to achieve the best fault tolerance, it can be a good idea to distribute the data across multiple racks in a single datacenter. To achieve this in Cassandra, alternate racks when assigning tokens to your nodes. So token 0 is assigned to rack A, token 1 to rack B, token 2 to rack A, token 3 to rack B, etc. It is necessary to take this into account in the above two steps if you are dealing with a rack aware cluster.
      5. Moving the nodes
        1. At this point, we’ve determined the optimal set of new tokens, and the nodes in our cluster that need to move to these tokens. The rest of the process is fairly simple, but involves some additional Cassandra operations besides just the command to move tokens. Each move is done synchronously in the cluster. The first step is to initiate the move using the JMX method exposed by Cassandra. If you were doing this manually you would use the nodetool utility provided by Cassandra, which has a ‘move’ command. The move operation will involve transferring data between nodes in the cluster, but it does not automatically clean up data that nodes are no longer responsible for. After each move, we also need to tell Cassandra to cleanup the old data nodes are no longer responsible for. We need to cleanup the node that has just moved, as well as any nodes in the cluster that have changed responsibility. Internally OpsCenter determines these nodes by comparing the ranges each node was responsible for before the move, with the ranges each node is responsible for after the move.
    5. A
  2. Data distribution and Replication
    1. In Cassandra, data distribution and replication go together. Data is organized by table and identified by a primary key, which determines which node the data is stored on. Replicas are copies of rows. When datais first written, it is also referred to as a replica.
    2. Factors influencing replication:
      1. Virtual Nodes: assigns data ownership to physical machines
      2. Partitioner: partitions data across the cluster
      3. Replication Strategy: determines replicas for each row of data
      4. Snitch: defines the topology that the replication strategy uses to place replicas
    3. Data replication: Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed.
       admin@ip-10-94-249-80:~$ nodetool getendpoints findb_uat custprofile 15e0b717-3c7f-436a-93eb-07e13b79b1b4
      3.251.xxx.xx
      3.250.xxx.xxx
      3.249.xxx.xx
    4. In a Cassandra cluster, the nodes need to be distributed throughout entire possible token range starting from -2^(63) to 2^(63) – 1. Each node in the cluster must be assigned a token range. This token range determines the position of the node in the token ring and it’s range of data. For instance, if we have a token ring ranging from 0 to 300 and have 6 nodes in the cluster, then the token range for the nodes would be 0, 50, 100, 150, 200, 250 and 300 respectively. The fundamental idea behind the token range is to balance the data distribution across the nodes in the cluster.
  3. Cassandra Write Path
    1. Cassandra is masterless a client can connect with any node in a cluster. Clients can interface with a Cassandra node using either a thrift protocol or using CQL. In the picture above the client has connected to Node 4. The node that a client connects to is designated as the coordinator, also illustrated in the diagram. The coordinators is responsible for satisfying the clients request. The consistency level determines the number of nodes that the coordinator needs to hear from in order to notify the client of a successful mutation.  All inter-node requests are sent through a messaging service and in an asynchronous manner. Based on the partition key and the replication strategy used the coordinator forwards the mutation to all applicable nodes. In our example it is assumed that nodes 1,2 and 3 are the applicable nodes where node 1 is the first replica and nodes two and three are subsequent replicas. The coordinator will wait for a response from the appropriate number of nodes required to satisfy the consistency level.  QUORUM is a commonly used consistency level which refers to a majority of the nodes.QUORUM can be calculated using the formula (n/2 +1) where n is the replication factor. 
  4. Cassandra Read Path
    1. To satisfy a read, Cassandra must combine results from the active memtable and potentially multiple SSTables.
    2. Cassandra processes data at several stages on the read path to discover where the data is stored, starting with the data in the memtable and finishing with SSTables:
      1. Check the memtable
      2. Check row cache, if enabled
      3. Checks Bloom filter
      4. Checks partition key cache, if enabled
      5. Goes directly to the compression offset map if a partition key is found in the partition key cache, or checks the partition summary if notIf the partition summary is checked, then the partition index is accessed
      6. Locates the data on disk using the compression offset map
      7. Fetches the data from the SSTable on disk

Key Terms and Concepts  (Cassandra 101):

  1. Cluster: is the largest unit of deployment in Cassandra. Each cluster consists of nodes from one or more distributed locations (Availability Zones). 
  2. Distributed Location: contains a collection of nodes that are part of a cluster. In general, while designing a Cassandra cluster on AWS, we recommend that you use multiple Availability Zones to store your data in the cluster. You can configure Cassandra to replicate data across multiple Availability Zones, which will allow your database cluster to be highly available even during the event of an Availability Zone failure. To ensure even distribution of data, the number of Availability Zones should be a multiple of the replication factor. The Availability Zones are also connected through low-latency links, which further helps avoid latency for replication.
  3. Node: is a part of a single distributed location in a Cassandra cluster that stores partitions of data according to the partitioning algorithm. 
  4. Commit Log: is a write-ahead log (WAL) on every node in the cluster. Every write operation made to Cassandra is first written sequentially to this append-only structure, which is then flushed from the write-back cache on the operating system (OS) to disk either periodically or in batches. In the event of a node recovery, the commit logs are replayed to perform recovery of data. This is similar to PostgreSQL “wal” files or Oracle redo logs.
  5. Memtable: is basically a write-back cache of data rows that can be looked up by key. It is an in-memory structure. A single memtable only stores data for a single table and is flushed to disk either when node global memory thresholds have been reached, the commit log is full, or after a table level interval is reached. 
  6. SStable: An SStable (sorted string table) is a logical structure made up of multiple physical files on disk. An SStable is created when a memtable is flushed to disk. An SStable is an immutable data structure. Memtables are sorted by key and then written out sequentially to create an SStable. Thus, write operations in Cassandra are extremely fast, costing only a commit log append and an amortized sequential write operation for the flush. Unlike most of RDBMS database typical OLTP workloads are random in nature which cause disk seeks, Cassandra avoids such random disks seek by pre-sorted on key when it store data that further helps avoids seeks during reads.
  7. Bloom Filter: is a probabilistic data structure for testing set membership that never produces a false negative, but can be tuned for false positives. Bloom filters are off-heap structures. Thus, if a bloom filter responds that a key is not present in an SStable, then the key is not present, but if it responds that the key is present in the SStable, it might or might not be present. Bloom filters can help scale read requests in Cassandra. Bloom filters can also save additional disk read operations reading the SStable, by indicating if a key is not present in the SStable. 
  8. Index File: maintains the offset of keys into the main data file (SStable). Cassandra by default holds a sample of the index file in memory, which stores the offset for every 128th key in the main data file (this value is configurable). Index files can also help scale read operations better because they can provide you the random position in the SStable from which you can sequentially scan to get the data. Without the index files, you need to scan the whole SStable to retrieve data. 
  9. Keyspace: is a logical container in a cluster that contains one or more tables. Replication strategy is typically defined at the keyspace level. 
  10. Table: also known as a column family, is a logical entity within a keyspace consisting of a collection of ordered columns fetched by row. Primary key definition is required while defining a table. 
  11. Data Partitioning: Cassandra is a distributed database system using a shared nothing architecture. A single logical database is spread across a cluster of nodes and thus the need to spread data evenly amongst all participating nodes. At a 10000 foot level Cassandra stores data by dividing data evenly around its cluster of nodes. Each node is responsible for part of the data. The act of distributing data across nodes is referred to as data partitioning.
  12. Consistent Hashing: Two main problems crop up when trying to distribute data efficiently. One, determining a node on which a specific piece of data should reside on. Two, minimizing data movement when adding or removing nodes. Consistent hashing enables us to achieve these goals. A consistent hashing algorithm enables us to map Cassandra row keys to physical nodes. The range of values from a consistent hashing algorithm is a fixed circular space which can be visualized as a ring. Consistent hashing also minimises the key movements when nodes join or leave the cluster. On average only k/n keys need to be remapped where k is the number of keys and n is the number of slots (nodes). This is in stark contrast to most hashing algorithms where a change in the number of slots results in the need to remap a large number of keys.
  13. Data Replication: Partitioning of data on a shared nothing system results in a single point of failure i.e. if one of the nodes goes down part of your data is unavailable. This limitation is overcome by creating copies of the data, know as replicas, thus avoiding a single point of failure. Storing copies of data on multiple nodes is referred to as replication.  Replication of data ensures fault tolerance and reliability.
  14. Eventual Consistency: Data is replicated across nodes, we need to ensure that data is synchronized across replicas. This is referred to as data consistency.  Eventual consistency is a consistency model used in distributed computing. It theoretically guarantees that, provided there are no new updates, all nodes/replicas will eventually return the last updated value. Domain Name System (DNS) are a good example of an eventually consistent system.
  15. Tunable Consistency: enables users to configure the number of replicas in a cluster that must acknowledge a read or write operation before considering the operation successful. The consistency level is a required parameter in any read and write operation and determines the exact number of nodes that must successfully complete the operation before considering the operation successful.
  16. Data Center, Racks, Nodes: A Data Centre (DC) is a centralized place to house computer and networking systems to help meet an organization’s information technology needs. A rack is a unit that contains multiple servers all stacked one on top of another. A rack enables data centers to conserve floor space and consolidates networked resources. A node is a single server in a rack. Why do we care? Often Cassandra is deployed in a DC environment and one must replicate data intelligently to ensure no single point of failure. Data must be replicated to servers in different racks to ensure continued availability in the case of rack failure. Cassandra can be easily configured to work in a multi DC environment to facilitate fail over and disaster recovery. At IBM Cloud Video Cassandra clusters are setup as follows
    1. Data Center – AWS region
    2. Rack – AWS Availability Zone (AZ)
    3. Node – EC2 in AZ
  17. Snitches and Replication Strategies: It is important to intelligently distribute data across DC’s and racks. In Cassandra the distribution of data across nodes is configurable. Cassandra uses snitches and replication strategies to determine how data is replicated across DC’s, racks and nodes. Snitches determine proximity of nodes within a ring. Replication strategies use proximity information provided by snitches to determine locality of a particular copy. At IBM Cloud Video Cassandra clusters, we use Ec2MultiRegionSnitch.
    for example:  endpoint_snitch: org.apache.cassandra.locator.Ec2MultiRegionSnitch
    for more information about Cassandra snitches, see reference section.
  18. Gossip Protocol: Cassandra uses a gossip protocol to discover node state for all nodes in a cluster.  Nodes discover information about other nodes by exchanging state information about themselves and other nodes they know about. This is done with a maximum of 3 other nodes. Nodes do not exchange information with every other node in the cluster in order to reduce network load. They just exchange information with a few nodes and over a period of time state information about every node propagates throughout the cluster. The gossip protocol facilitates failure detection.
  19. Merkel Tree: is a hash tree which provides an efficient way to find differences in data blocks. Leaves contain hashes of individual data blocks and parent nodes contain hashes of their respective children. This enables efficient way of finding differences between nodes.
  20. Write Back Cache: A write back cache is where the write operation is only directed to the cache and completion is immediately confirmed. This is different from Write-through cache where the write operation is directed at the cache but is only confirmed once the data is written to both the cache and the underlying storage structure.
  21. Virtual Nodes: Virtual nodes, known as Vnodes, distribute data across nodes at a finer granularity than can be easily achieved if calculated tokens are used. Vnodes simplify many tasks in Cassandra:
    1. Tokens are automatically calculated and assigned to each node.
    2. Rebalancing a cluster is automatically accomplished when adding or removing nodes. When a node joins the cluster, it assumes responsibility for an even portion of data from the other nodes in the cluster. If a node fails, the load is spread evenly across other nodes in the cluster.
    3. Rebuilding a dead node is faster because it involves every other node in the cluster.
    4. The proportion of vnodes assigned to each machine in a cluster can be assigned, so smaller and larger computers can be used in building a cluster.
  22.  Partitioner: A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a function for deriving a token representing a row from its partition key, typically by hashing. Each row of data is then distributed across the cluster by the value of the token.
    1. Murmur3Partitioner
    2. RandomPartitioner
    3. ByteOrderPartitioner

Best Practices and Considerations:

  1. Configuring data consistency:
    1. Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas. Cassandra extends the concept of eventual consistency by offering tunable consistency. For any given read or write operation, the client application decides how consistent the requested data must be.
    2. Consistency levels in Cassandra can be configured to manage availability versus data accuracy. Configure consistency for a session or per individual read or write operation. Within cqlsh, use CONSISTENCY, to set the consistency level for all queries in the current cqlsh session. For programming client applications, set the consistency level using an appropriate driver. For example, using the Java driver, call QueryBuilder.insertInto with setConsistencyLevel to set a per-insert consistency level.
    3. For more information refer: https://lake.data.blog/2020/02/16/how-the-cassandra-consistency-level-configured/
  2. Cassandra Data Model.
    1. Cassandra data model by Queries.
      1. Cassandra data model works queries by design. Know your queries before hand and evolve data model based on how you are going to query the tables.

Cassandra Ring in AWS

Performance Tuning:

  1. Cassandra Garbage Collector tuning
    1. Refer: 
  2. Long GC pauses
    1. Refer: 
  3. Aggressive Compactions
  4. Cassandra Heap Pressure Scenarios
  5. Compaction
    1. The Cassandra write process stores data in files called SSTables. SSTables are immutable. Instead of overwriting existing rows with inserts or updates, Cassandra writes new timestamped versions of the inserted or updated data in new SSTables. Cassandra does not perform deletes by removing the deleted data: instead, Cassandra marks it with tombstones.
    2. Over time, Cassandra may write many versions of a row in different SSTables. Each version may have a unique set of columns stored with a different timestamp. As SSTables accumulate, the distribution of data can require accessing more and more SSTables to retrieve a complete row.
    3. To keep the database healthy, Cassandra periodically merges SSTables and discards old data. This process is called compaction.
    4. Compaction works on a collection of SSTables. From these SSTables, compaction collects all versions of each unique row and assembles one complete row, using the most up-to-date version (by timestamp) of each of the row’s columns. The merge process is performant, because rows are sorted by partition key within each SSTable, and the merge process does not use random I/O. The new versions of each row is written to a new SSTable. The old versions, along with any rows that are ready for deletion, are left in the old SSTables, and are deleted as soon as pending reads are completed.
    5. Compaction causes a temporary spike in disk space usage and disk I/O while old and new SSTables co-exist. As it completes, compaction frees up disk space occupied by old SSTables. It improves read performance by incrementally replacing old SSTables with compacted SSTables. Cassandra can read data directly from the new SSTable even before it finishes writing, instead of waiting for the entire compaction process to finish.
    6. As Cassandra processes writes and reads, it replaces the old SSTables with new SSTables in the page cache. The process of caching the new SSTable, while directing reads away from the old one, is incremental — it does not cause a the dramatic cache miss. Cassandra provides predictable high performance even under heavy load.
    7. Compaction Types:
      1. SizeTieredCompactionStrategy (STCS)
        1. Recommended for write-intensive workloads.
        2. Pros:Compacts write-intensive workload very well
        3. Cons: Can hold onto stale data too long. Amount of memory needed increases over time.
      2. LeveledCompactionStrategy (LCS)
        1. Recommended for read-intensive workloads.
        2. Pros: Disk requirements are easier to predict. Read operation latency is more predictable. Stale data is evicted more frequently.
        3. Cons: Much higher I/O utilization impacting operation latency
      3. TimeWindowCompactionStrategy(TWCS)
        1. Recommended for time series and expiring TTL workloads.
        2. Pros: Used for time series data, stored in tables that use the default TTL for all data. Simpler configuration than that of DTCS.
        3. Cons: Not appropriate if out-of-sequence time data is required, since SSTables will not compact as well. Also, not appropriate for data without a TTL, as storage will grow without bound. Less fine-tuned configuration is possible than with DTCS.
      4. DateTieredCompactionStrategy(DTCS)
        1. Deprecated in Cassandra 3.0.8/3.8.
        2. Pros: Specifically designed for time series data, stored in tables that use the default TTL. DTCS is a better choice when fine-tuning is required to meet space-related SLAs.
        3. Cons: Insertion of records out of time sequence (by repairs or hint replaying) can increase latency or cause errors. In some cases, it may be necessary to turn off read repair and carefully test and control the use of TIMESTAMP options in BATCH, DELETE, INSERT and UPDATE CQL commands.

Tools & Maintenance:

  1. Cassandra Tools & Maintenance https://lake.data.blog/2020/02/16/cassandra-tools-maintenance/

Troubleshooting:

Cassandra Stress Test

Cassandra-stress tool is java base stress testing utility for basic benchmarking and load testing a Cassandra cluster. Data modeling choices will affect application performance. Significant load testing over several trials is the best way to discover issues with specific data model. The cassandra-stress tool is an effective tool for populating a cluster and stress testing CQL tables and queries. Use cassandra-stress to:

  1. Quickly determine how a schema performs.
  2. Understand how your database scales.
  3. Optimize data model and settings
  4. Determine production capacity
  5. Supports YAML based profile to defining specific schemas with various compaction strategies, cache settings and types.
  6. YAML file supports user-defined keyspaces, tables and schema.

Cassandra nodetool

$ nodetool
usage: nodetool [(-pwf <passwordFilePath> | --password-file <passwordFilePath>)]
        [(-pw <password> | --password <password>)] [(-p <port> | --port <port>)]
        [(-u <username> | --username <username>)] [(-h <host> | --host <host>)]
        <command> [<args>]

The most commonly used nodetool commands are:
    assassinate                  Forcefully remove a dead node without re-replicating any data.  Use as a last resort if you cannot removenode
    bootstrap                    Monitor/manage node's bootstrap process
    cleanup                      Triggers the immediate cleanup of keys no longer belonging to a node. By default, clean all keyspaces
    clearsnapshot                Remove the snapshot with the given name from the given keyspaces. If no snapshotName is specified we will remove all snapshots
    compact                      Force a (major) compaction on one or more tables or user-defined compaction on given SSTables
    compactionhistory            Print history of compaction
    compactionstats              Print statistics on compactions
    decommission                 Decommission the *node I am connecting to*
    describecluster              Print the name, snitch, partitioner and schema version of a cluster
    describering                 Shows the token ranges info of a given keyspace
    disableautocompaction        Disable autocompaction for the given keyspace and table
    disablebackup                Disable incremental backup
    disablebinary                Disable native transport (binary protocol)
    disablegossip                Disable gossip (effectively marking the node down)
    disablehandoff               Disable storing hinted handoffs
    disablehintsfordc            Disable hints for a data center
    disablethrift                Disable thrift server
    drain                        Drain the node (stop accepting writes and flush all tables)
    enableautocompaction         Enable autocompaction for the given keyspace and table
    enablebackup                 Enable incremental backup
    enablebinary                 Reenable native transport (binary protocol)
    enablegossip                 Reenable gossip
    enablehandoff                Reenable future hints storing on the current node
    enablehintsfordc             Enable hints for a data center that was previsouly disabled
    enablethrift                 Reenable thrift server
    failuredetector              Shows the failure detector information for the cluster
    flush                        Flush one or more tables
    gcstats                      Print GC Statistics
    getcompactionthreshold       Print min and max compaction thresholds for a given table
    getcompactionthroughput      Print the MB/s throughput cap for compaction in the system
    getendpoints                 Print the end points that owns the key
    getinterdcstreamthroughput   Print the Mb/s throughput cap for inter-datacenter streaming in the system
    getlogginglevels             Get the runtime logging levels
    getsstables                  Print the sstable filenames that own the key
    getstreamthroughput          Print the Mb/s throughput cap for streaming in the system
    gettimeout                   Print the timeout of the given type in ms
    gettraceprobability          Print the current trace probability value
    gossipinfo                   Shows the gossip information for the cluster
    help                         Display help information
    info                         Print node information (uptime, load, ...)
    invalidatecountercache       Invalidate the counter cache
    invalidatekeycache           Invalidate the key cache
    invalidaterowcache           Invalidate the row cache
    join                         Join the ring
    listsnapshots                Lists all the snapshots along with the size on disk and true size.
    move                         Move node on the token ring to a new token
    netstats                     Print network information on provided host (connecting node by default)
    pausehandoff                 Pause hints delivery process
    proxyhistograms              Print statistic histograms for network operations
    rangekeysample               Shows the sampled keys held across all keyspaces
    rebuild                      Rebuild data by streaming from other nodes (similarly to bootstrap)
    rebuild_index                A full rebuild of native secondary indexes for a given table
    refresh                      Load newly placed SSTables to the system without restart
    refreshsizeestimates         Refresh system.size_estimates
    reloadtriggers               Reload trigger classes
    relocatesstables             Relocates sstables to the correct disk
    removenode                   Show status of current node removal, force completion of pending removal or remove provided ID
    repair                       Repair one or more tables
    replaybatchlog               Kick off batchlog replay and wait for finish
    resetlocalschema             Reset node's local schema and resync
    resumehandoff                Resume hints delivery process
    ring                         Print information about the token ring
    scrub                        Scrub (rebuild sstables for) one or more tables
    setcachecapacity             Set global key, row, and counter cache capacities (in MB units)
    setcachekeystosave           Set number of keys saved by each cache for faster post-restart warmup. 0 to disable
    setcompactionthreshold       Set min and max compaction thresholds for a given table
    setcompactionthroughput      Set the MB/s throughput cap for compaction in the system, or 0 to disable throttling
    sethintedhandoffthrottlekb   Set hinted handoff throttle in kb per second, per delivery thread.
    setinterdcstreamthroughput   Set the Mb/s throughput cap for inter-datacenter streaming in the system, or 0 to disable throttling
    setlogginglevel              Set the log level threshold for a given class. If both class and level are empty/null, it will reset to the initial configuration
    setstreamthroughput          Set the Mb/s throughput cap for streaming in the system, or 0 to disable throttling
    settimeout                   Set the specified timeout in ms, or 0 to disable timeout
    settraceprobability          Sets the probability for tracing any given request to value. 0 disables, 1 enables for all requests, 0 is the default
    snapshot                     Take a snapshot of specified keyspaces or a snapshot of the specified table
    status                       Print cluster information (state, load, IDs, ...)
    statusbackup                 Status of incremental backup
    statusbinary                 Status of native transport (binary protocol)
    statusgossip                 Status of gossip
    statushandoff                Status of storing future hints on the current node
    statusthrift                 Status of thrift server
    stop                         Stop compaction
    stopdaemon                   Stop cassandra daemon
    tablehistograms              Print statistic histograms for a given table
    tablestats                   Print statistics on tables
    toppartitions                Sample and print the most active partitions for a given column family
    tpstats                      Print usage statistics of thread pools
    truncatehints                Truncate all hints on the local node, or truncate hints for the endpoint(s) specified.
    upgradesstables              Rewrite sstables (for the requested tables) that are not on the current version (thus upgrading them to said current version)
    verify                       Verify (check data checksum for) one or more tables
    version                      Print cassandra version
    viewbuildstatus              Show progress of a materialized view build

See 'nodetool help <command>' for more information on a specific command.

Cassandra Tools & Maintenance

This document helps you understand tools needed to maintain Cassandra cluster. Following tools you may use to maintain Cassandra ring.

Tools:

  1. nodetool: The nodetool utility is a command line interface for managing a cluster. refer “nodetool help” for more options with node tool. Listed below some of the nodetool command option you would most likely use.

     OptionsShortLongDescription-h--hostHostname or IP address.-p--portPort number.-pwf--password-filePassword file path.-pw--passwordPassword.-u--usernameRemote JMX agent username.
    1. nodeool compact: 
      1. Force a major compaction on one or more tables.
    2. nodetool repair:
      1. When nodetool repair is run against a node it initiates a repair for some range of tokens. The range being repaired depends on what options are specified. The default options, just calling “nodetool repair”, initiate a repair of every token range owned by the node. The node you issued the call to becomes the coordinator for the repair operation, and it coordinates repairing those token ranges between all of the nodes that own them.
      2. When you use “nodetool repair -pr” each node picks a subset of its token range to schedule for repair, such that if “-pr” is run on EVERY node in the cluster, every token range will only be repaired once. What that means is, when ever you use -pr, you need to be repairing the entire ring (every node in every data center). If you use “-pr” on just one node, or just the nodes in one data center, you will only repair a subset of the data on those nodes.
      3. When running repair to fix a problem, like a node being down for longer than the hint windows, you need to repair the entire token range of that node. So you can’t just run “nodetool repair -pr” on it. You need to initiate a full “nodetool repair” on it, or do a full cluster repair with “nodetool repair -pr”.
      4. If you have multiple data centers, by default when running repair all nodes in all data centers will sync with each other on the range being repaired.
      5. Repairs are important for every Cassandra cluster, especially when frequently deleting data. Running the nodetool repair command initiates the repair process on a specific node which in turn computes a Merkle tree for each range of data on that node. The merkle tree is a binary tree of hashes used by Cassandra for calculating the differences in datasets between nodes in a cluster. Every time a repair is carried out, the tree has to be calculated, each node that is involved in the repair has to construct its merkle tree from all the sstables it stores making the calculation very expensive. This allows for repairs to be network efficient as only targeted rows identified by the merkle tree as inconsistencies are sent across the network.
      6. Scanning every sstable to allow for the creation of merkle trees is an expensive operation. To avoid the need for constant tree construction incremental repairs are being introduced in Cassandra 2.1. The idea is to persist already repaired data, and only calculate merkle trees for sstables that haven’t previously undergone repairs allowing the repair process to stay performant and lightweight even as datasets grow so long as repairs are run frequently.
      7. node tool repair – should be running a repair at a frequency that is less than the GC_GRACE_SECONDS
      8. nodetool repair –options
        •  nodetool repairnodetool repair -pr
          nodetool repair -incnode tool repair -snapshot
    3. nodetool gestates:
    4. nodetool flush:
    5. nodetool netstats:
    6. node removenode
    7. nodetool rebuild
    8. nodetool snapshot
    9. nodetool table stats
    10. nodetool tpstats
    11. nodetool status
    12. nodetool upgradesstables
    13. nodetool stop
    14. nodetool failuredetector
    15. nodetool info
      1. nodetool info
    16. nodetool tpstats
      1. nodetool tpstats
    17. nodetool status
      1. sshnp admin@10.33.XXX.XX ‘nodetool status’
    18. nodetool cfstats
      1. nodetool cfstats finance.custprofile
  2. A

Maintenance

  1. Repairing Nodes
  2. Backing up Cassandra database
  3. Restoring from a snapshot
  4. Restoring a snapshot into a new cluster
  5. Recovering from a single disk failure using JBOD.
    1. Steps for recovering from a single disk failure in a disk array using JBOD (just a bunch of disks).
    2. Cassandra might not fail from the loss of one disk in a JBOD array, but some reads and writes may fail when:
      1. The operation’s consistency level is ALL.
      2. The data being requested or written is stored on the defective disk.
      3. The data to be compacted is on the defective disk.
    3. It’s possible that you can simply replace the disk, restart Cassandra, and run nodetool repair. However, if the disk crash corrupted the Cassandra system table, you must remove the incomplete data from the other disks in the array. The procedure for doing this depends on whether the cluster uses vnodes or single-token architecture.
    4. These steps are supported for Cassandra versions 3.2 and later. If a disk fails on a node in a cluster using an earlier version of Cassandra, replace the node.
  6. Replacing a dead node or dead seed node.

How the Cassandra consistency level configured

Consistency levels in Cassandra can be configured to manage availability versus data accuracy. Configure consistency for a session or per individual read or write operation. Within cqlsh, use CONSISTENCY, to set the consistency level for all queries in the current cqlsh session. For programming client applications, set the consistency level using an appropriate driver. For example, using the Java driver, call QueryBuilder.insertInto with setConsistencyLevel to set a per-insert consistency level. 

The consistency level defaults to ONE for all write and read operations.Write Consistency Levels

LevelDescriptionUsage
ALLA write must be written to the commit log and memtable on all replica nodes in the cluster for that partition.Provides the highest consistency and the lowest availability of any other level.
EACH_QUORUMStrong consistency. A write must be written to the commit log and memtableon a quorum of replica nodes in eachdatacenter.Used in multiple datacenter clusters to strictly maintain consistency at the same level in each datacenter. For example, choose this level if you want a read to fail when a datacenter is down and the QUORUM cannot be reached on that datacenter.
QUORUMA write must be written to the commit log and memtable on a quorum of replica nodes across all datacenters.Used in either single or multiple datacenter clusters to maintain strong consistency across the cluster. Use if you can tolerate some level of failure. 
LOCAL_QUORUMStrong consistency. A write must be written to the commit log and memtableon a quorum of replica nodes in the same datacenter as the coordinator. Avoids latency of inter-datacenter communication.Used in multiple datacenter clusters with a rack-aware replica placement strategy, such as NetworkTopologyStrategy, and a properly configured snitch. Use to maintain consistency locally (within the single datacenter). Can be used withSimpleStrategy.
ONEA write must be written to the commit log and memtable of at least one replica node.Satisfies the needs of most users because consistency requirements are not stringent.
TWOA write must be written to the commit log and memtable of at least two replica nodes.Similar to ONE.
THREEA write must be written to the commit log and memtable of at least three replica nodes.Similar to TWO.
LOCAL_ONEA write must be sent to, and successfully acknowledged by, at least one replica node in the local datacenter.In a multiple datacenter clusters, a consistency level of ONE is often desirable, but cross-DC traffic is not. LOCAL_ONE accomplishes this. For security and quality reasons, you can use this consistency level in an offline datacenter to prevent automatic connection to online nodes in other datacenters if an offline node goes down.
ANYA write must be written to at least one node. If all replica nodes for the given partition key are down, the write can still succeed after a hinted handoff has been written. If all replica nodes are down at write time, an ANY write is not readable until the replica nodes for that partition have recovered.Provides low latency and a guarantee that a write never fails. Delivers the lowest consistency and highest availability.

 Read consistency levels

This table describes read consistency levels in strongest-to-weakest order. Read Consistency Levels

LevelDescriptionUsage
ALLReturns the record after all replicas have responded. The read operation will fail if a replica does not respond.Provides the highest consistency of all levels and the lowest availability of all levels.
EACH_QUORUMNot supported for reads.
QUORUMReturns the record after a quorum of replicas from all datacenters has responded.Used in either single or multiple datacenter clusters to maintain strong consistencyacross the cluster. Ensures strong consistency if you can tolerate some level of failure.
LOCAL_QUORUMReturns the record after a quorum of replicas in the current datacenter as the coordinator has reported. Avoids latency of inter-datacenter communication.Used in multiple datacenter clusters with a rack-aware replica placement strategy (NetworkTopologyStrategy) and a properly configured snitch. Fails when using SimpleStrategy.
ONEReturns a response from the closest replica, as determined by the snitch. By default, a read repair runs in the background to make the other replicas consistent.Provides the highest availability of all the levels if you can tolerate a comparatively high probability of stale data being read. The replicas contacted for reads may not always have the most recent write.
TWOReturns the most recent data from two of the closest replicas.Similar to ONE.
THREEReturns the most recent data from three of the closest replicas.Similar to TWO.
LOCAL_ONEReturns a response from the closest replica in the local datacenter.Same usage as described in the table about write consistency levels.
SERIALAllows reading the current (and possibly uncommitted) state of data without proposing a new addition or update. If a SERIAL read finds an uncommitted transaction in progress, it will commit the transaction as part of the read. Similar to QUORUM. To read the latest value of a column after a user has invoked a lightweight transactionto write to the column, use SERIAL. Cassandra then checks the inflight lightweight transaction for updates and, if found, returns the latest data. 
LOCAL_SERIALSame as SERIAL, but confined to the datacenter. Similar to LOCAL_QUORUM.Used to achieve linearizable consistencyfor lightweight transactions. 

How QUORUM is calculated

The QUORUM level writes to the number of nodes that make up a quorum. A quorum is calculated, and then rounded down to a whole number, as follows:

quorum = (sum_of_replication_factors / 2) + 1

The sum of all the replication_factor settings for each datacenter is thesum_of_replication_factors.

sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . + datacentern_RF

Examples:

  • Using a replication factor of 3, a quorum is 2 nodes. The cluster can tolerate 1 replica down.
  • Using a replication factor of 6, a quorum is 4. The cluster can tolerate 2 replicas down.
  • In a two datacenter cluster where each datacenter has a replication factor of 3, a quorum is 4 nodes. The cluster can tolerate 2 replica nodes down.
  • In a five datacenter cluster where two datacenters have a replication factor of 3 and three datacenters have a replication factor of 2, a quorum is 7 nodes.

The more datacenters, the higher number of replica nodes need to respond for a successful operation.

Similar to QUORUM, the LOCAL_QUORUM level is calculated based on the replication factor of the same datacenter as the coordinator node. That is, even if the cluster has more than one datacenter, the quorum is calculated only with local replica nodes.

In EACH_QUORUM, every datacenter in the cluster must reach a quorum based on that datacenter’s replication factor in order for the read or write request to succeed. That is, for every datacenter in the cluster a quorum of replica nodes must respond to the coordinator node in order for the read or write request to succeed.

 Configuring client consistency levels

You can use a cqlsh command, CONSISTENCY, to set the consistency level for queries in the current cqlsh session. For programming client applications, set the consistency level using an appropriate driver. For example, call QueryBuilder.insertInto with a setConsistencyLevel argument using the Java driver.

Issues with SERIAL AND LOCAL_SERIAL consistency levels

  1. SERIAL and LOCAL_SERIAL are not valid consistency levels for writes.
    SERIAL and LOCAL_SERIAL are valid consistency levels only for reads, not for writes. Executing writes with SERIAL or LOCAL_SERIAL result in errorsHowever serial and local_serial consistency levels can be used for conditional writes 

Replicating AWS Aurora DB Clusters Across AWS Regions

As of writing this article around mid-march 2019

  1. You can create an Amazon Aurora DB cluster as a Read Replica in a different AWS Region than the source DB cluster. 
  2. Taking this approach can improve your disaster recovery capabilities, let you scale read operations into a region that is closer to your users, and make it easier to migrate from one region to another.
  3. You can create Read Replicas of both encrypted and unencrypted DB clusters.
  4. The Read Replica must be encrypted if the source DB cluster is encrypted.
  5. When you create an Aurora DB cluster Read Replica in another region, you should be aware of the following:
    1. In a cross-region scenario, there is more lag time between the source DB cluster and the Read Replica due to the longer network channels between regions.
    2. Data transferred for cross-region replication incurs Amazon RDS data transfer charges. The following cross-region replication actions generate charges for the data transferred out of the source region:
      • When you create the Read Replica, Amazon RDS takes a snapshot of the source cluster and transfers the snapshot to the Read Replica region.
      • For each data modification made in the source databases, Amazon RDS transfers data from the source region to the Read Replica region.
    3. For each source DB cluster, you can only have one cross-region Read Replica DB cluster.
    4. Both your source DB cluster and your cross-region Read Replica DB cluster can have up to 15 Aurora Replicas along with the primary instance for the DB cluster. This functionality lets you scale read operations for both your source region and your replication target region.
    5. Before you can create an Aurora DB cluster that is a cross-region Read Replica, you must enable binary logging on your source Aurora DB cluster. Amazon Aurora cross-region replication uses MySQL binary replication to replay changes on the cross-region Read Replica DB cluster.
    6. To enable binary logging on an Aurora DB cluster, update the binlog_format parameter for your source DB cluster.
    7. The binlog_format parameter is a cluster-level parameter that is in the default.aurora5.6 cluster parameter group by default. 

Step-by-step guide

References:

Cost Estimates: Approximate Cost estimates for Multi-Region / Global Database

Related articles

Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.

AWS Aurora DB cluster Scaling

I came from world of Oracle technology stack where I worked a long time with Oracle RAC clusters on Linux/Exadata platforms until I stared to look at open source world on cloud infrastructure few years ago. Like many, I was thinking, what would it take me to build a scalable database storage platform always available and fault tolerant highly resilient to single point of failures on open cloud platform.

Though, AWS Aurora fits into some of the features mentioned above and able to scale up to 15 replicas, I often tend to ask the question myself, what is the point in using replicas when they cannot be load balanced to use maximum available compute. In my situation, for example I have 2 replicas and single master, however replicas rarely being used by application for off-loading reads wasting valuable compute resources, while master instance is over-whelmed with maximum peak load. I was excited when AWS announced multi-master Aurora cluster back in 2017 re-invent , but until now I don’t see that coming.

I asked AWS team about another option something similar to a query router when deployed, it routes read requests appropriately to replicas or slave instances. Today, our development team struggle to get it working through the app code and hard to manage routing traffic efficiently. I looked at another solution from a third party called ScaleArc to do the job. Below are few architectural diagrams I created and how it works

No alt text provided for this image

Current Setup

No alt text provided for this image

DB query router

No alt text provided for this image

DB query router in HA

Automated snapshots using Python Flask Web API/Curl

I started to think about how I can make database service requests more agile and self served. As a Cloud Data platform architect I have access to full resources however, often times I overwhelmed with requests people think I am responsible for , but I am not really, the DBAs are. The point I am making here is, how well I can integrate db services into CI/CD pipelines and/or deployment work flows. During each release, devOps request a database snapshot be taken prior to deployment.

I developed a webApp written using python flask web framework to expose RESTful API calls using AWS python SDK. the flask application use API calls defined as routes

Below is the diagram depicts a simple architectural diagram

No alt text provided for this image

The web interface (HTML) or curl command can be used to request for a

  1. taking database backup
  2. checking the backup of an existing backup/snapshot
  3. deleting a manual on demand backup previously taken

API app is written using Python Flask Web Frame work with API endpoints called routes. The tool can be used in two forms

  1. HTTPS calls via browser. A basic interface for submitting a request through HTML interface.
Main backup tool page
Check backup
Create backup
Delete backup

  1. Examples:

webApp via HTTPS or CURL

#Main Page

#https://10.10.x.x:25443/backup

#create cluster db snapshot

curl -k https://10.10.x.x:25443/backup/create \
 --data "endpoint=mydB1-net1.us-east-1.rds.amazonaws.com"; \
 echo

#https://10.10.x.x:25443/backup/create

#check cluster db snapshot

curl -k https://10.10.x.x:25443/backup/status \
 --data "snapshotname=mydB1-2018-06-24-22-21-57" \
 --data "endpoint=mydB1.cluster-ro-net1.us-east-1.rds.amazonaws.com"; \
 echo

#https://10.10.x.x:25443/backup/status

#delete cluster db snapshot

curl -k https://10.10.x.x:25443/backup/delete \
 --data "snapshotname=mydB1-2018-06-24-22-21-57" \
 --data "endpoint=mydB1.cluster-ro-net1.us-east-1.rds.amazonaws.com"; \
 echo

#https://10.10.x.x:25443/backup/delete

Scheduling during or pre deployment:

#!/bin/bash#Author: Sudheer Kondla, 06/27/201
if [ $# -lt 1 ];then
echo "USAGE: bash $0 [require Fully Qualified db endpoint]"
exit 1
fi
dbEndPoint=$1 
/usr/bin/curl -k https://10.10.x.x:25443/backup/create \
--data "endpoint=${dbEndPoint}";  echo

Automated Restores with Flask web App API

The web interface (HTML) or CURL command can be used to request for a

  1. Restoring a database instance from cluster and non-cluster database instances
  2. Checking the status of the restore from step 1.
  3. Attaching an instances to already existing dB cluster 
The API app is written in Python using Flask Web Frame work and blueprints with API endpoints called routes.

Request website credentials: Provide your email for web site sign up. Sign up is only available to Admins through Admin console. Apps with both Admin and User interfaces are running on dB API server(s) in a docker container mapped to different ports.

  1. The web site uses authentication (currently stored in Postgresql dB) can be used in two forms
  2. The web site stores user information for each action is performance on the web site. (user email, computer IP (real IP when available))
  3. In future the web site will be added with geo location and JWT authentication
HTTPS calls via browser. A basic interface for submitting a request through HTML interface.

Admin Console:

Login Page

Restore Database Page

Restore DB Status Page

Attach dB instance to the dB cluster Page

CURL interface

In order to get to functional page of web site, you need to provide login authentication first to pass that session information via SecureCookieSession 

Flask Python Curl command: (how to use) Restore backup:

#!/bin/bash
#Author: skondla@me.com
#Purpose: Restore DB from a Snapshot
 
if [ $# -lt 2 ];
then
    echo "Provide snapshotname , db endpoint"
    echo "example: bash getDBRestoreStatus.sh myDB
    myDB.cluster-XXXYYYYDDDD.us-east-1.rds.amazonaws.com"
exit 1
fi
 
snapshotname=${1}
endpoint=${2}
 
 
EMAIL=`cat ~/.password/mySecrets2 | grep email | awk '{print $2}'`
PASSWORD=`cat ~/.password/mySecrets2 | grep password | awk '{print $2}'`
 
 
#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/login" \
/usr/bin/curl -k "https://192.168.2.15:50443/login" \
    --data-urlencode "email=${EMAIL}" \
    --data-urlencode "password=${PASSWORD}" \
    --cookie "cookies.txt" \
    --cookie-jar "cookies.txt" \
    --verbose \
    > "login_log.html"
 
#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/restore" \
/usr/bin/curl -k "https://192.168.2.15:50443/restore" \
    --data-urlencode "snapshotname=${snapshotname}" \
    --data-urlencode "endpoint=${endpoint}" \
    --cookie "cookies.txt" \
    --verbose \
    --cookie-jar "cookies.txt"; \
    echo
rm -f cookies.txt

Status of Restored backup:

#!/bin/bash
#Author: skondla@me.com
#Purpose: Status of Restore

if [ $# -lt 2 ];
then
    echo "Provide snapshotname , db endpoint"
    echo "example: bash getDBRestoreStatus.sh myDB
    myDB.cluster-XXXYYYYDDDD.us-east-1.rds.amazonaws.com"
exit 1
fi

snapshotname=${1}
endpoint=${2}


EMAIL=`cat ~/.password/mySecrets2 | grep email | awk '{print $2}'`
PASSWORD=`cat ~/.password/mySecrets2 | grep password | awk '{print $2}'`


#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/login" \
/usr/bin/curl -k "https://192.168.2.15:50443/login" \
    --data-urlencode "email=${EMAIL}" \
    --data-urlencode "password=${PASSWORD}" \
    --cookie "cookies.txt" \
    --cookie-jar "cookies.txt" \
    --verbose \
    > "login_log.html"

#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/status" \
/usr/bin/curl -k "https://192.168.2.15:50443/restore" \
    --data-urlencode "snapshotname=${snapshotname}" \
    --data-urlencode "endpoint=${endpoint}" \
    --cookie "cookies.txt" \
    --verbose \
    --cookie-jar "cookies.txt"; \
    echo
rm -f cookies.txt

Attach DB instance to the cluster:

Flask Python Curl command: (how to use) Restore backup:

#!/bin/bash
#Author: skondla@me.com
#Purpose: Restore DB from a Snapshot
 
if [ $# -lt 2 ];
then
    echo "Provide snapshotname , db endpoint"
    echo "example: bash getDBRestoreStatus.sh myDB
    myDB.cluster-XXXYYYYDDDD.us-east-1.rds.amazonaws.com"
exit 1
fi
 
snapshotname=${1}
endpoint=${2}
 
 
EMAIL=`cat ~/.password/mySecrets2 | grep email | awk '{print $2}'`
PASSWORD=`cat ~/.password/mySecrets2 | grep password | awk '{print $2}'`
 
 
#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/login" \
/usr/bin/curl -k "https://192.168.2.15:50443/login" \
    --data-urlencode "email=${EMAIL}" \
    --data-urlencode "password=${PASSWORD}" \
    --cookie "cookies.txt" \
    --cookie-jar "cookies.txt" \
    --verbose \
    > "login_log.html"
 
#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/restore" \
/usr/bin/curl -k "https://192.168.2.15:50443/restore" \
    --data-urlencode "snapshotname=${snapshotname}" \
    --data-urlencode "endpoint=${endpoint}" \
    --cookie "cookies.txt" \
    --verbose \
    --cookie-jar "cookies.txt"; \
    echo
rm -f cookies.txt

Status of Restored backup:

#!/bin/bash
#Author: skondla@me.com
#Purpose: Status of Restore

if [ $# -lt 2 ];
then
    echo "Provide snapshotname , db endpoint"
    echo "example: bash getDBRestoreStatus.sh myDB
    myDB.cluster-XXXYYYYDDDD.us-east-1.rds.amazonaws.com"
exit 1
fi

snapshotname=${1}
endpoint=${2}


EMAIL=`cat ~/.password/mySecrets2 | grep email | awk '{print $2}'`
PASSWORD=`cat ~/.password/mySecrets2 | grep password | awk '{print $2}'`


#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/login" \
/usr/bin/curl -k "https://192.168.2.15:50443/login" \
    --data-urlencode "email=${EMAIL}" \
    --data-urlencode "password=${PASSWORD}" \
    --cookie "cookies.txt" \
    --cookie-jar "cookies.txt" \
    --verbose \
    > "login_log.html"

#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/status" \
/usr/bin/curl -k "https://192.168.2.15:50443/restore" \
    --data-urlencode "snapshotname=${snapshotname}" \
    --data-urlencode "endpoint=${endpoint}" \
    --cookie "cookies.txt" \
    --verbose \
    --cookie-jar "cookies.txt"; \
    echo
rm -f cookies.txt

Attach DB instance to the cluster:

#!/bin/bash
#Author: skondla@me.com
#Purpose: Attach dB instance to dB cluster
 
if [ $# -lt 2 ];
then
    echo "Provide db endpoint , instanceclass"
    echo "example: bash getDBRestoreStatus.sh
          myDB.cluster-XXXYYYYDDDD.us-east-1.rds.amazonaws.com
          db.t2.small"
exit 1
fi
 
 
endpoint=${1}
instanceclass=${2}
 
 
EMAIL=`cat ~/.password/mySecrets2 | grep email | awk '{print $2}'`
PASSWORD=`cat ~/.password/mySecrets2 | grep password | awk '{print $2}'`
 
#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/login" \
/usr/bin/curl -k "https://192.168.2.15:50443/login" \
    --data-urlencode "email=${EMAIL}" \
    --data-urlencode "password=${PASSWORD}" \
    --cookie "cookies.txt" \
    --cookie-jar "cookies.txt" \
    --verbose \
    > "login_log.html"
 
#/usr/bin/curl -k "https://ec2-54.94.x.x.compute-1.amazonaws.com:50443/attachdb" \
/usr/bin/curl -k "https://192.168.2.15:50443/attachdb" \
    --data-urlencode "endpoint=${endpoint}" \
    --data-urlencode "instanceclass=${instanceclass}" \
    --cookie "cookies.txt" \
    --verbose \
    --cookie-jar "cookies.txt"; \
    echo
 
rm -f cookies.txt

Source code:
email me: skondla@me.com





Design a site like this with WordPress.com
Get started