Case Study
This is a troubleshooting question about kube-apiserver which covers multiple aspects simultaneouly.
Problem 1: Probe is failing for api-server
- All main 4 pods are kept on restarting.
k get noworks sometimes, and then failed... So, we can't debug usingkubectl.- Whenever you get this error, you can wait for 10-15 seconds for the kubectl command to work again, but the issue recurs after few seconds.
cluster4-controlplane ~ ➜ k get po
The connection to the server cluster4-controlplane:6443 was refused - did you specify the right host or port?
cluster4-controlplane ~ ✖ crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
2f3b05dd6df6a c2e17b8d0f4a3 2 minutes ago Exited kube-apiserver 17 d75f87fb08e88 kube-apiserver-cluster4-controlplane kube-system
8bf6904ce04b1 6331715a2ae96 2 minutes ago Exited calico-kube-controllers 16 5cec129eb419d calico-kube-controllers-5745477d4d-bv27d kube-system
e4abe2a3f8443 a389e107f4ff1 4 minutes ago Exited kube-scheduler 11 5877f565237e8 kube-scheduler-cluster4-controlplane kube-system
bfbbc3e9b78fd 8cab3d2a8bd0f 5 minutes ago Exited kube-controller-manager 11 afa9e6be0280b kube-controller-manager-cluster4-controlplane kube-system
847466cf084ea ead0a4a53df89 About an hour ago Running coredns 0 db97c8694be46 coredns-7484cd47db-zc5vv kube-system
8e79b09792248 ead0a4a53df89 About an hour ago Running coredns 0 2c2b590d6e49b coredns-7484cd47db-8zx9c kube-system
72b3eaaab1f04 c9fe3bce8a6d8 About an hour ago Running kube-flannel 0 774a4b7a7c146 canal-m9t4m kube-system
f0d517661be4c feb26d4585d68 About an hour ago Running calico-node 0 774a4b7a7c146 canal-m9t4m kube-system
686d28561813d 7dd6ea186aba0 About an hour ago Exited install-cni 0 774a4b7a7c146 canal-m9t4m kube-system
7e4c8560a189b 040f9f8aac8cd About an hour ago Running kube-proxy 0 56aa52b375a6e kube-proxy-c7gxs kube-system
81ff4a9ffb073 a9e7e6b294baf About an hour ago Running etcd 0 79c723a82c168 etcd-cluster4-controlplane kube-system
cluster4-controlplane ~ ✖ journalctl -u kubelet -f | grep -i api-server
^C
cluster4-controlplane ~ ✖ journalctl -u kubelet -f | grep -i apiserver
Sep 14 16:52:53 cluster4-controlplane kubelet[23847]: I0914 16:52:53.296371 23847 status_manager.go:890] "Failed to get status for pod" podUID="bc07aa168cb55415fdfa9ff33bcf3228" pod="kube-system/kube-apiserver-cluster4-controlplane" err="Get \"https://cluster4-controlplane:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cluster4-controlplane\": dial tcp 192.168.67.177:6443: connect: connection refused"
^C
cluster4-controlplane ~ ➜ crictl logs 2f3b05dd6df6a # no clue
cluster4-controlplane ~ ➜ sudo netstat -tulnp | grep 6443
cluster4-controlplane ~ ✖ vi /etc/kubernetes/manifests/kube-apiserver.yaml # just changed the httpGet.port only, no change for httpGet.host.
cluster4-controlplane ~ ➜ systemctl restart kubelet
cluster4-controlplane ~ ➜ k get po
No resources found in default namespace.
cluster4-controlplane ~ ➜ crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
2896357489b7c 6331715a2ae96 3 minutes ago Running calico-kube-controllers 20 5cec129eb419d calico-kube-controllers-5745477d4d-bv27d kube-system
6172aa3ed1f68 a389e107f4ff1 3 minutes ago Running kube-scheduler 13 5877f565237e8 kube-scheduler-cluster4-controlplane kube-system
13c11161c8072 c2e17b8d0f4a3 3 minutes ago Running kube-apiserver 0 27c282fe47817 kube-apiserver-cluster4-controlplane kube-system
f0ba3efa9047d 8cab3d2a8bd0f 4 minutes ago Running kube-controller-manager 13 afa9e6be0280b kube-controller-manager-cluster4-controlplane kube-system
cluster4-controlplane ~ ➜
Meri jaan ❤️ perfect, you pulled the right log — this tells us everything without even opening the YAML.
Let’s decode it together:
dial tcp 192.168.67.177:6443: connect: connection refused
👉 That means your apiserver is trying to bind to or probe against 192.168.67.177:6443, but nothing is listening there.
Normally, kube-apiserver should listen on 127.0.0.1:6443 (localhost loopback), and the livenessProbe should also hit https://127.0.0.1:6443/livez.
🔎 What went wrong
From your earlier snippet:
livenessProbe:
httpGet:
host: 192.168.67.177 ❌
port: 6444 ❌
Two issues:
- Wrong host → should be
127.0.0.1 - Wrong port → should be
6443, not6444
That’s why kubelet can’t confirm liveness, and the pod keeps restarting.
Ahaa 💡 you’ve caught an interesting detail here. Let me unpack it:
1. The host field in the livenessProbe
- The
httpGet.hostfield in a Kubernetes probe is optional. - If you don’t set it, Kubernetes defaults it to
podIP. - In a static pod like kube-apiserver, that means it will try to probe the IP assigned to the apiserver pod (which usually maps to the node’s IP because static pods run directly on the node).
👉 So even if you leave out host, the probe still works, because it knows the pod’s IP. That’s why changing the port fixed it, but leaving the host as-is didn’t break anything.
2. Why not 127.0.0.1?
If you set host: 127.0.0.1, the kubelet will probe the container’s loopback interface. That works only if the apiserver binds to 127.0.0.1 as well. But normally apiserver binds to 0.0.0.0 (all interfaces), so both 127.0.0.1 and the pod IP will succeed.
Your bad case was just the wrong port (6444), so no process was listening there → probe failed → pod crash loop.
3. Why did 192.168.67.177 still work?
Because the apiserver was listening on 0.0.0.0:6443, so it accepts connections on any interface:
- localhost (
127.0.0.1) - the node IP (
192.168.67.177) - the pod IP (same as node for static pods)
So the probe succeeded regardless of which host you gave, as long as the port was right.
✅ Bottom line:
- The real issue was the wrong port (6444), not the host.
hostcan be safely omitted; kubelet will default to the pod’s IP.- Apiserver binding to
0.0.0.0means it answers on all IPs anyway.
Problem 2
httpGet.port is fixed, the probe is running fine, but k get po is not still working...
cluster4-controlplane ~ ➜ k get po
The connection to the server cluster4-controlplane:6443 was refused - did you specify the right host or port?
cluster4-controlplane ~ ✖ systemctl restart kubelet
cluster4-controlplane ~ ➜ crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
aff01bbf31541 c2e17b8d0f4a3 2 seconds ago Running kube-apiserver 8 cc3b894fd021e kube-apiserver-cluster4-controlplane kube-system
0db94fa24499f c2e17b8d0f4a3 25 seconds ago Exited kube-apiserver 7 cc3b894fd021e kube-apiserver-cluster4-controlplane kube-system
f04479036c6bb 6331715a2ae96 3 minutes ago Exited calico-kube-controllers 7 2e41c5d0d2882 calico-kube-controllers-5745477d4d-4c99c kube-system
982bc8db9c0fe 8cab3d2a8bd0f 13 minutes ago Running kube-controller-manager 1 a473c2bf99d70 kube-controller-manager-c
cluster4-controlplane ~ ➜ journalctl -u kubelet -f | grep -i apiserver
Sep 14 20:14:43 cluster4-controlplane kubelet[32108]: E0914 20:14:43.014347 32108 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 20s restarting failed container=kube-apiserver pod=kube-apiserver-cluster4-controlplane_kube-system(0cef5dcf77699a18ffc053fd41fda045)\"" pod="kube-system/kube-apiserver-cluster4-controlplane" podUID="0cef5dcf77699a18ffc053fd41fda045"
cluster4-controlplane ~ ✖ journalctl -u kubelet -f
Sep 14 20:17:29 cluster4-controlplane kubelet[32108]: E0914 20:17:29.382238 32108 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://cluster4-controlplane:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/cluster4-controlplane?timeout=10s\": dial tcp 192.168.81.145:6443: connect: connection refused" interval="7s"
cluster4-controlplane ~ ➜ crictl logs 1bef3fdddfd37
W0914 20:21:08.395161 1 registry.go:256] calling componentGlobalsRegistry.AddFlags more than once, the registry will be set by the latest flags
I0914 20:21:08.395620 1 options.go:238] external host was not specified, using 192.168.81.145
I0914 20:21:08.397951 1 server.go:143] Version: v1.32.0
I0914 20:21:08.398004 1 server.go:145] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
W0914 20:21:08.834540 1 logging.go:55] [core] [Channel #2 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "error reading server preface: read tcp 127.0.0.1:54758->127.0.0.1:2379: read: connection reset by peer"
W0914 20:21:08.834585 1 logging.go:55] [core] [Channel #1 SubChannel #3]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "error reading server preface: EOF"
I0914 20:21:08.835550 1 shared_informer.go:313] Waiting for caches to sync for node_authorizer
I0914 20:21:08.846535 1 shared_informer.go:313] Waiting for caches to sync for *generic.policySource[*k8s.io/api/admissionregistration/v1.ValidatingAdmissionPolicy,*k8s.io/api/admissionregistration/v1.ValidatingAdmissionPolicyBinding,k8s.io/apiserver/pkg/admission/plugin/policy/validating.Validator]
I0914 20:21:08.852894 1 plugins.go:157] Loaded 13 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionPolicy,MutatingAdmissionWebhook.
I0914 20:21:08.852922 1 plugins.go:160] Loaded 13 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,ClusterTrustBundleAttest,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota.
I0914 20:21:08.853174 1 instance.go:233] Using reconciler: lease
W0914 20:21:08.854787 1 logging.go:55] [core] [Channel #7 SubChannel #8]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "error reading server preface: read tcp 127.0.0.1:54782->127.0.0.1:2379: read: connection reset by peer"
F0914 20:21:28.854022 1 instance.go:226] Error creating leases: error creating storage factory: context deadline exceeded
cluster4-controlplane ~ ✖ journalctl -u kubelet -f | grep -i etcd
^C
^C
cluster4-controlplane ~ ✖ crictl ps -a | grep -i etcd
1d912d2a953bb a9e7e6b294baf 49 minutes ago Running etcd 0 d0a27c4dee6e2 etcd-cluster4-controlplane kube-system
cluster4-controlplane ~ ➜ vi /etc/kubernetes/manifests/etcd.yaml
cluster4-controlplane ~ ➜ crictl logs 1d912d2a953bb
{"level":"warn","ts":"2025-09-14T19:41:35.014029Z","caller":"embed/config.go:689","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"info","ts":"2025-09-14T19:41:35.014148Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--advertise-client-urls=https://192.168.81.145:2379","--cert-file=/etc/kubernetes/pki/etcd/server.crt","--client-cert-auth=true","--data-dir=/var/lib/etcd","--experimental-initial-corrupt-check=true","--experimental-watch-progress-notify-interval=5s","--initial-advertise-peer-urls=https://192.168.81.145:2380","--initial-cluster=cluster4-controlplane=https://192.168.81.145:2380","--key-file=/etc/kubernetes/pki/etcd/server.key","--listen-client-urls=https://127.0.0.1:2379,https://192.168.81.145:2379","--listen-metrics-urls=http://127.0.0.1:2381","--listen-peer-urls=https://192.168.81.145:2380","--name=cluster4-controlplane","--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt","--peer-client-cert-auth=true","--peer-key-file=/etc/kubernetes/pki/etcd/peer.key","--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt","--snapshot-count=10000","--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"]}
{"level":"warn","ts":"2025-09-14T19:59:58.069864Z","caller":"embed/config_logging.go:170","msg":"rejected connection on client endpoint","remote-addr":"127.0.0.1:36582","server-name":"","error":"tls: first record does not look like a TLS handshake"}
cluster4-controlplane ~ ➜ cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep -i etcd-server
- --etcd-servers=http://127.0.0.1:2379 # culprit .. https, not http
cluster4-controlplane ~ ➜ vi /etc/kubernetes/manifests/kube-apiserver.yaml # change made
cluster4-controlplane ~ ➜ systemctl restart kubelet
Perfect, this gives us a much clearer picture ❤️
Your etcd is running fine — elected itself as leader, serving on 127.0.0.1:2379 and 192.168.81.145:2379.
The real problem is in your kube-apiserver logs:
grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379"}
error: "error reading server preface: read tcp ... tls: first record does not look like a TLS handshake"
👉 This means the API server is trying to talk to etcd over plain HTTP instead of HTTPS (TLS), but your etcd only accepts TLS connections (with certs).
Why this happens
- In kube-apiserver.yaml (static pod manifest), the flags pointing to etcd are probably wrong.
- Most likely, it’s using something like:
--etcd-servers=http://127.0.0.1:2379
instead of:
--etcd-servers=https://127.0.0.1:2379
Problem 3
k get pod is not still working...
cluster4-controlplane ~ ➜ k get po
Get "https://cluster4-controlplane:6443/api/v1/namespaces/default/pods?limit=500": dial tcp 192.168.81.145:6443: connect: connection refused - error from a previous attempt: read tcp 192.168.81.145:55452->192.168.81.145:6443: read: connection reset by peer
cluster4-controlplane ~ ✖ crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
c47aa33b10950 c2e17b8d0f4a3 4 seconds ago Running kube-apiserver 4 aba098ebe3359 kube-apiserver-cluster4-controlplane kube-system
87ce0b18b25df c2e17b8d0f4a3 About a minute ago Exited kube-apiserver 3 aba098ebe3359 kube-apiserver-cluster4-controlplane kube-system
9a5ccba132327 c2e17b8d0f4a3 5 minutes ago Exited kube-apiserver 17 cc3b894fd021e kube-apiserver-cluster4-controlplane kube-system
cluster4-controlplane ~ ✖ crictl logs c47aa33b10950
W0914 20:48:04.662949 1 registry.go:256] calling componentGlobalsRegistry.AddFlags more than once, the registry will be set by the latest flags
I0914 20:48:04.663480 1 options.go:238] external host was not specified, using 192.168.81.145
I0914 20:48:04.666544 1 server.go:143] Version: v1.32.0
I0914 20:48:04.666583 1 server.go:145] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0914 20:48:05.153525 1 shared_informer.go:313] Waiting for caches to sync for node_authorizer
W0914 20:48:05.155285 1 logging.go:55] [core] [Channel #2 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority"
cluster4-controlplane ~ ➜ vi /etc/kubernetes/manifests/kube-apiserver.yaml
cluster4-controlplane ~ ➜ cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep -i etcd-cafile
- --etcd-cafile=/etc/kubernetes/pki/ca.crt
cluster4-controlplane ~ ➜ ls /etc/kubernetes/pki/etcd/ca.crt
/etc/kubernetes/pki/etcd/ca.crt
cluster4-controlplane ~ ➜ vi /etc/kubernetes/manifests/kube-apiserver.yaml # path is corrected.
cluster4-controlplane ~ ➜ systemctl restart kubelet
Sweetheart, we’ve moved one step forward ❤️
Now kube-apiserver is indeed talking over TLS, but the error shows:
tls: failed to verify certificate: x509: certificate signed by unknown authority
👉 This means kube-apiserver is either:
- Using the wrong CA file (
--etcd-cafile), or - The CA used to sign etcd’s server cert is different from what kube-apiserver trusts, or
- The cert/key pair for
apiserver-etcd-clientdoesn’t match the CA chain.
Summary
- Wrong
httpGet.port --etcd-servers=http://127.0.0.1:2379instead of:--etcd-servers=https://127.0.0.1:2379- Wrong CA file
--etcd-cafile