💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、豆包、星火、月之暗面及文生图、文生视频 广告
# kylin HA概念性及shell使用 使用pcs shell配置corosync & pacemaker群集 # Pacemaker Pacemaker,即Cluster Resource Manager(CRM),管理整个HA,客户端通过pacemaker管理监控整个集群。 CRM支持ocf和lsb两种资源类型: ocf格式的启动脚本在/usr/lib/ocf/resource.d/下面。 lsb的脚本一般在/etc/rc.d/init.d/下面。 # 集群属性设置 ~~~shell #查看可配置属性列表 [root@node1 ~]# pcs property list Cluster Properties: cluster-infrastructure: corosync cluster-name: hacluster dc-version: 2.0.2-3.ky10.2.02.ky10-744a30d655 have-watchdog: false maintenance-mode: False no-quorum-policy: ignore node-health-green: 0 node-health-red: 0 node-health-strategy: none node-health-yellow: 0 start-failure-is-fatal: True stonith-enabled: False symmetric-cluster: True #如开启隔离功能: pcs property set stonith-enabled=true #查看当前集群属性配置 [root@node1 ~]# pcs property show --all Cluster Properties: batch-limit: 0 cluster-delay: 60s cluster-infrastructure: corosync cluster-ipc-limit: 500 cluster-name: hacluster cluster-recheck-interval: 15min concurrent-fencing: true dc-deadtime: 20s dc-version: 2.0.2-3.ky10.2.02.ky10-744a30d655 election-timeout: 2min enable-acl: false enable-startup-probes: true have-watchdog: false join-finalization-timeout: 30min join-integration-timeout: 3min load-threshold: 80% maintenance-mode: False migration-limit: -1 no-quorum-policy: ignore node-action-limit: 0 node-health-base: 0 node-health-green: 0 node-health-red: 0 node-health-strategy: none node-health-yellow: 0 pe-error-series-max: -1 pe-input-series-max: 4000 pe-warn-series-max: 5000 placement-strategy: default remove-after-stop: false shutdown-escalation: 20min start-failure-is-fatal: True startup-fencing: true stonith-action: reboot stonith-enabled: false stonith-max-attempts: 10 stonith-timeout: 60s stonith-watchdog-timeout: (null) stop-all-resources: false stop-orphan-actions: true stop-orphan-resources: true symmetric-cluster: True transition-delay: 0s ~~~ # shell案例: ## 集群启动停止 ~~~shell #启动完整集群 pcs cluster start --all #停止完整集群 pcs cluster stop --all #启动集群指定节点 pcs cluster start node1 node2 #停止集群指定节点 pcs cluster stop node1 node2 ~~~ ## 创建浮动IP资源 ~~~shell #IPaddr_6资源是一个浮动 IP 地址,它不能是一个已经与物理节点关联的 IP 地址。浮动 IP 必须位于与静态分配的 IP 地址相同的网络中。 #集群主机的网卡需要一致。 #ClusterIP为资源名称 [root@node1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr_6 ip=192.168.142.139 nic=ens33 cidr_netmask=32 [root@node1 ~]# pcs resource config ClusterIP Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr_6) Attributes: cidr_netmask=32 ip=192.168.142.139 nic=ens33 Operations: monitor interval=5s start-delay=1s timeout=20s OCF_CHECK_LEVEL=10 (ClusterIP-monitor-interval-5s) start interval=0s timeout=90 (ClusterIP-start-interval-0s) stop interval=0s timeout=100 (ClusterIP-stop-interval-0s) ~~~ 页面查看: ![](https://img.kancloud.cn/ab/a9/aba941f8eaff60bf9b0ff5971b77021a_1607x196.png) ## 创建nginx资源 ~~~shell #安装nginx服务 yum install -y nginx #修改nginx.conf配置文件 [root@ha1 ~]# mv /usr/share/nginx/html/index.html /usr/share/nginx/html/index.html.bak [root@ha1 ~]# echo "Hello,I'm node1.My address is 192.168.142.130" > /usr/share/nginx/html/index.html [root@ha1 ~]# cat /usr/share/nginx/html/index.html Hello,I'm node1.My address is 192.168.142.130 ~~~ 创建资源 ~~~shell [root@node1 ~]# pcs resource create WebSite ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf ~~~ 页面查看: ![](https://img.kancloud.cn/5b/c7/5bc70f13532f63811d74de4d6830db42_1458x210.png) 清理监控(由于默认nginx的监控资源比较多,我们清理掉不需要的): ![](https://img.kancloud.cn/c6/49/c6494e92c37217dd0adc8a1e44c61f40_880x550.png) ## 资源组配置 ~~~shell #创建资源组可确保在配置正常工作的多节点集群时让资源在同一节点中运行 #创建资源组,并添加当前集群ip和网页资源 [root@node1 ~]# pcs resource group add cluster-nginx ClusterIP WebSite #查看资源组配置 [root@node1 ~]# pcs resource config cluster-nginx Group: cluster-nginx Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr_6) Attributes: cidr_netmask=32 ip=192.168.142.139 nic=ens33 Operations: monitor interval=5s start-delay=1s timeout=20s OCF_CHECK_LEVEL=10 (ClusterIP-monitor-interval-5s) start interval=0s timeout=90 (ClusterIP-start-interval-0s) stop interval=0s timeout=100 (ClusterIP-stop-interval-0s) Resource: WebSite (class=ocf provider=heartbeat type=nginx) Attributes: configfile=/etc/nginx/nginx.conf Operations: reload interval=0s timeout=40s (WebSite-reload-interval-0s) start interval=0s timeout=40s (WebSite-start-interval-0s) stop interval=0s timeout=60s (WebSite-stop-interval-0s) ~~~ 页面查看: ![](https://img.kancloud.cn/b1/f7/b1f7307af46dc19dc3897d5112813cbe_1595x345.png) ## 故障转移测试 ~~~shell #运行该服务的节点设置为待机模式。请注意,由于禁用了隔离功能,因此我们无法有效地模拟节点级别的故障(比如拔掉电源电缆)。需要隔离功能集群才可以在出现这类问题时被恢复。 [root@node1 ~]# pcs node standby node1 #查看集群状态 [root@node1 ~]# pcs status Cluster name: hacluster Stack: corosync Current DC: node1 (version 2.0.2-3.ky10.2.02.ky10-744a30d655) - partition with quorum Last updated: Thu Aug 26 17:22:58 2021 Last change: Thu Aug 26 17:21:44 2021 by root via cibadmin on node1 2 nodes configured 2 resources configured Node node1: standby Online: [ node2 ] Full list of resources: Resource Group: cluster-nginx ClusterIP (ocf::heartbeat:IPaddr_6): Started node2 WebSite (ocf::heartbeat:nginx): Started node2 (Monitoring) #访问网站。服务应该仍然可用,显示信息应该指示服务正在运行的节点。 ~~~ 页面查看 ![](https://img.kancloud.cn/8d/f9/8df9787f45300fc1bb196d6b9824644c_1507x355.png) ![](https://img.kancloud.cn/e5/a0/e5a045ea12a6a87e2737b6e4d12615bf_685x116.png) 恢复节点 ~~~shell #要将集群服务恢复到第一个节点,让节点离开待机模式。这不一定将该服务转换到第一个节点。 [root@node2 ~]# pcs node unstandby node1 [root@node2 ~]# pcs status Cluster name: hacluster Stack: corosync Current DC: node1 (version 2.0.2-3.ky10.2.02.ky10-744a30d655) - partition with quorum Last updated: Thu Aug 26 17:24:47 2021 Last change: Thu Aug 26 17:24:41 2021 by root via cibadmin on node2 2 nodes configured 2 resources configured Online: [ node1 node2 ] Full list of resources: Resource Group: cluster-nginx ClusterIP (ocf::heartbeat:IPaddr_6): Started node2 WebSite (ocf::heartbeat:nginx): Started node2 (Monitoring) ~~~ ## PCS命令整理 ~~~shell #pcs resource 命令的参数。输出中仅显示一部分 pcs resource -h #将原始集群配置保存到指定的文件中 pcs cluster cib testfile #将 CIB 中的原始 xml 保存到名为 testfile 的文件中。 #显示集群和集群资源的状态 pcs status #可以使用 pcs status 命令的 command 参数来显示特定集群组件的状态,指定resources、cluster、nodes或 pcsd。 pcs status resources #显示集群资源的状态 pcs cluster status #显示集群的状态,但不显示集群资源 pcs status nodes #显示集群节点状态 pcs status pcsd #显示集群节点pcsd状态 pcs config #显示完整的集群配置 pcs cluster corosync #以人类可读格式打印corosync.conf 文件的内容 pcs resource cleanup #清理资源告警 ~~~ # 其他: 相关的资源文件: (1)/usr/lib/ocf/resource.d,pacemaker资源库文件位置,可安装资源包:resource-agents 获取更多ocf格式的资源。 (2)/usr/sbin/fence\_\*\*\*,Fencing设备的执行脚本名称,可安装资源包:fence-agents 获取更多Fencing设备资源。 查看使用说明: \[shell\]# man ocf\_heartbeat\_\*\*\* ## 查看OCF资源说明,man ocf\_heartbeat\_apache \[shell\]# man fence\_\*\*\* ## 查看Fencing设备说明,man fence\_vmware