[TOC] ### **现象** 安装了一个单节点的K8S集群,发现上面有Pod起不来 ``` $ kubectl get pod -o wide -n kube-system | grep calico-kube-controller calico-kube-controllers-78b75d47c-m4m2c 0/1 CrashLoopBackOff 4 2m1s 172.26.214.172 10.224.0.17 <none> <none> ``` 查看了Pod的日志,日志显示该Pod连不上kube-apiserver。而且经过排查发现,在主机上Ping这个Pod的IP,也不通。接下来,我们进入到该Pod的网络命名空间中,Ping一下外面试试: ``` $ docker ps | grep calico-kube-con e5cf8a6300e3 10.224.0.17:5000/library/pause:3.1 "/pause" 3 minutes ago Up 3 minutes k8s_POD_calico-kube-controllers-78b75d47c-m4m2c_kube-system_d2a0f315-4137-4d52-ba9f-a97f94293760_0 $ docker inspect e5cf8a6300e3 | grep Pid "Pid": 197434, "PidMode": "", "PidsLimit": null, $ nsenter -t 197434 -n ``` 进入到Pod的网络命名空间后,执行以下命令确认一下: ``` $ ip addr show eth0 4: eth0@if119: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8980 qdisc noqueue state UP group default link/ether 6e:bd:bb:bd:c7:39 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.26.214.172/32 scope global eth0 valid_lft forever preferred_lft forever $ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 169.254.1.1 0.0.0.0 UG 0 0 0 eth0 169.254.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 ``` 然后,从Pod的网络命名空间里面Ping一个外部的IP,在容器的主机上,抓这个容器的包,如下: ``` $ route -n | grep 172.26.214.172 172.26.214.172 0.0.0.0 255.255.255.255 UH 0 0 0 cali3019089d1df $ sudo tcpdump -vvvnn -i cali3019089d1df tcpdump: listening on cali3019089d1df, link-type EN10MB (Ethernet), capture size 262144 bytes 11:46:32.297030 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 172.26.214.172, length 28 11:46:33.299031 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 172.26.214.172, length 28 11:46:47.158155 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 172.26.214.172, length 28 11:46:48.159032 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 172.26.214.172, length 28 11:46:49.161030 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 172.26.214.172, length 28 ``` 上面的抓包说明,主机没有回复ARP包。但是查看calixxxx网卡,是开启了proxy_arp的: ``` $ cat /proc/sys/net/ipv4/conf/cali3019089d1df/proxy_arp 1 ``` ### **解决方案** 根据[此文](https://www.cxyzjd.com/article/weixin_36431018/112662116),arp代理满足: (1)代理网卡的proxy_arp必须为1 (2)主机上必须有路由知道如果到达arp请求中的源IP (3)主机上路由到ARP请求源IP的目的网卡不能与代理网卡相同 在本文中的环境中,由于主机上没有路由指向169.254.1.1,且主机上没有网卡有这个IP,也就是第二个条件不满足,所以没有主机没有回复ARP包。 解决方法有多种: (1)给主机添加默认路由或169.254.1.1/32的路由,路由的dev不使用cali网卡即可 (2)把169.254.1.1/32这个IP绑定到lo网卡上  ### **参考** https://github.com/projectcalico/calico/issues/4186 https://github.com/projectcalico/calico/issues/3270 https://www.cxyzjd.com/article/weixin_36431018/112662116