Troubleshooting Errors: Pod is Unstable - Kubernetes

Kubernetes Appliance Software Installation Guide

prodname
Kubernetes
created_date
June 2018
category
Installation
featnum
B035-6103-068K

Each pod has one or more containers. A container failure may cause the pod to become unstable.

Example 1

One of the elasticserch-logging-846bf5db8b-9zmmm containers has an Error status.
backend8-123:~ # kubectl get po --all-namespaces
NAMESPACE     NAME                                         READY     STATUS    RESTARTS   AGE
kube-system   canal-2fsxc                                  3/3       Running   0          19h
kube-system   canal-8bqhm                                  3/3       Running   8          19h
kube-system   canal-svnsm                                  3/3       Running   2          19h
kube-system   default-http-backend-v1.0-876543fcc7-fedcz   1/1       Running   0          16h
kube-system   elasticserch-logging-846bf5db8b-9zmmm        1/2       Error     328        16h

Example 2

The elasticsearch-logging container is in a Waiting state and seems to be failing.
backend8-123:~ # kubectl describe po -n kube-system elasticserch-logging-846bf5db8b-9zmmm
Name:           elasticserch-logging-846bf5db8b-9zmmm
Namespace:      kube-system
Node:           192.192.123.45/192.192.123.45
Start Time:     Wed, 11 Apr 2018 20:27:37 -0400
Labels:         app=elasticsearch-logging
                kubernetes.io/cluster-service=true
                pod-template-hash=4026918646
                role=frontend
                version=v5.6.2
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Running
IP:             192.34.5.20
Controlled By:  ReplicaSet/elasticserch-logging-846bf5db8b
Containers:
  elasticsearch-logging:
    Container ID:   docker://3e2f822e4bba408cb98032746fe2d475afa8ca0fef74d3915f9e188b2f46ca27
    Image:          192.192.123.45:5000/tdc/gcr.io/google_containers/elasticsearch:v5.6.2
    Image ID:       docker-pullable://192.192.123.45:5000/tdc/gcr.io/google_containers/elasticsearch@sha256:1eef6bf0ea9c41b2255cb69edf0a8bfd421f32b9b48e8d6ac0d6c2a331db4c21
    Ports:          9200/TCP, 9300/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137

Example 3

To further investigate why elasticsearch-logging is failing to start, look at the log files in /var/log/containers/ or /var/log/messages.
backend8-123:~ # kubectl describe po -n kube-system elasticserch-logging-846bf5db8b-9zmmm
Name:           elasticserch-logging-846bf5db8b-9zmmm
Namespace:      kube-system
Node:           192.192.123.45/192.192.123.45
Start Time:     Wed, 11 Apr 2018 20:27:37 -0400
Labels:         app=elasticsearch-logging
                kubernetes.io/cluster-service=true
                pod-template-hash=4026918646
                role=frontend
                version=v5.6.2
...
...
Events:
  Type     Reason   Age                   From                     Message
  ----     ------   ----                  ----                     -------
  Normal   Pulling  20m (x144 over 16h)   kubelet, 192.192.123.45  pulling image "192.192.123.45:5000/tdc/docker.io/syncroswitch/logstash"
  Normal   Pulled   10m (x181 over 16h)   kubelet, 192.192.123.45  Container image "192.192.123.45:5000/tdc/gcr.io/google_containers/elasticsearch:v5.6.2" already present on machine
  Warning  BackOff  5m (x1188 over 16h)   kubelet, 192.192.123.45  Back-off restarting failed container
  Warning  BackOff  40s (x2024 over 16h)  kubelet, 192.192.123.45  Back-off restarting failed container