Troubleshooting Istio CNI Plugin

TOC

Log

The Istio CNI plugin log provides information about how the plugin configures application pod traffic redirection based on PodSpec.

The plugin runs in the container runtime process space, so you can see CNI log entries in the kubelet log. To make debugging easier, the CNI plugin also sends its log to the istio-cni-node DaemonSet.

The default log level for the CNI plugin is info. To get more detailed log output, you can change the level by editing the cni.values.logLevel in ServiceMesh component config for istioCNI.

The Istio CNI DaemonSet pod log also provides information about CNI plugin installation, and race condition repairing.

DaemonSet readiness

Readiness of the CNI DaemonSet indicates that the Istio CNI plugin is properly installed and configured. If Istio CNI DaemonSet is unready, it suggests something is wrong. Look at the istio-cni-node DaemonSet logs to diagnose.

Race condition repair

By default, the Istio CNI DaemonSet has race condition mitigation enabled, which will evict a pod that was started before the CNI plugin was ready. To understand which pods were evicted, look for log lines like the following:

2021-07-21T08:32:17.362512Z     info   Deleting broken pod: service-graph00/svc00-0v1-95b5885bf-zhbzm

Diagnose pod start-up failure

A common issue with the CNI plugin is that a pod fails to start due to container network set-up failure. Typically the failure reason is written to the pod events, and is visible via pod description:

$ kubectl describe pod POD_NAME -n POD_NAMESPACE

If a pod keeps getting init error, check the init container istio-validation log for “connection refused” errors like the following:

$ kubectl logs POD_NAME -n POD_NAMESPACE -c istio-validation
...
2021-07-20T05:30:17.111930Z     error   Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
2021-07-20T05:30:18.112503Z     error   Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
...
2021-07-20T05:30:22.111676Z     error   validation timeout

The istio-validation init container sets up a local dummy server which listens on traffic redirection target inbound/outbound ports, and checks whether test traffic can be redirected to the dummy server. When pod traffic redirection is not set up correctly by the CNI plugin, the istio-validation init container blocks pod startup, to prevent traffic bypass. To see if there were any errors or unexpected network setup behaviors, search the istio-cni-node for the pod ID.

Another symptom of a malfunctioned CNI plugin is that the application pod is continuously evicted at start-up time. This is typically because the plugin is not properly installed, thus pod traffic redirection cannot be set up. CNI race repair logic considers the pod is broken due to the race condition and evicts the pod continuously. When running into this issue, check the CNI DaemonSet log for information on why the plugin could not be properly installed.

When using Multus to chain multiple CNI providers, for example Kube-OVN, it may cause the Istio CNI plugin to install configuration multiple times (in both Multus and Kube-OVN), result in a circular CNI chain that unable to configure network for the pod. To mitigate this issue, you can specify cni.values.cniConfFileName explicitly to Multus configuration file name. And manually review all CNI configurations to make sure there is no duplicate configuration.