During the provisioning of RKE2 clusters, the machines are stuck with the status waiting for cluster agent. The rke2-server service is running and pods are being created, but a number of them are in a pending state due to scheduling errors.在配置 RKE2 集群时机器会处于“等待集群代理”状态。rke2 服务器服务正在运行pod 正在创建但由于调度错误其中不少 Pod 处于待处理状态。Example: The vSphere CPI (Cloud Provider Interface) is unable to locate the virtual machine in vSphere, which results in the node being uninitialised. In the downstream cluster, the cloud controller manager pod logs indicate this error locating the virtual machine:示例vSphere CPI云服务提供商接口无法在 vSphere 中定位虚拟机导致节点未初始化。在下游集群中云控制器管理器的 Pod 日志显示了该虚拟机定位的错误search.go:186] Did not find node node1.example.com in vcexample.com and datacenterdatacentre1 nodemanager.go:160] WhichVCandDCByNodeID failed using VM name. Err: No VM found nodemanager.go:205] shakeOutNodeIDLookup failed. ErrNo VM found node_controller.go:233] error syncing node1.example.com: failed to get instance metadata for node node1.example.com: failed to get instance ID from cloud provider: No VM found, requeuing node_controller.go:244] Unhandled Error errerror syncing node1.example.com: failed to get instance metadata for node node1.example.com: failed to get instance ID from cloud provider: No VM found, requeuing node_controller.go:271] Update 1 nodes status took 57.912µs.Resolution 结局In order to resolve this issue, validate and correct the Cloud Provider configuration for the affected cluster, as required.为解决此问题请根据需要验证并纠正受影响集群的云提供商配置。In the example above, with the vSphere Cloud Provider, you would need to check the Add-on: vSphere CPI configuration for the cluster, to ensure the correct vCenter and Data Center was configured, as well as validating that VMware Tools was running successfully in the virtual machine, and its hostname was correctly configured.在上述示例中使用 vSphere Cloud Provider 时你需要检查集群的 Add-on vSphere CPI 配置以确保正确的 vCenter 和 Data Center 配置正确同时验证 VMware Tools 在虚拟机中是否成功运行主机名配置正确。Cause 病因The node.cloudprovider.kubernetes.io/uninitialized taint is added to new nodes in clusters where a Cloud Provider is configured. This taint is removed by the CPI once it successfully queries and sets the spec.providerID on the node. If there is a problem with the CPI configuration and this cannot be successfully queried, then the node will remain in this state and fail to complete provisioning. If this is the first node in the cluster then the cluster itself will be stuck in provisioning.node.cloudprovider.kubernetes.io/uninitialized 污染会被添加到配置云服务提供商的集群中新节点。一旦 CPI 成功查询并设置了节点上的 spec.providerID这个污点就会被清除。如果 CPI 配置存在问题且无法成功查询节点将保持该状态无法完成配置。如果这是集群中的第一个节点那么集群本身将被卡在配置中。Additional Information 附加信息Environment 环境A Rancher-provisioned RKE2 cluster with a Cloud Provider configured一个由 Rancher 配置的 RKE2 集群配置了云提供商访问Rancher-K8S解决方案博主企业合作伙伴 https://blog.csdn.net/lidw2009