Loading... <p>拓补图:</p> <p></p> <p><img src="//cto.wang/usr/uploads/2016/07/20160703184518-3.jpg" align="" title="1465967173689520.jpg" /></p> <p></p> <p>服务器用了4个网卡</p> <p>两个万兆网卡做了bond连到netgear交换机,交换机端口access 30 对应IP段10.199.16.0/22,网关10.199.16.1做在netgear上</p> <p>两个千兆网卡做了bond连到cisco 3750交换机,交换机端口truck 30 40 1001-1300 对应IP段10.199.16.0/22、10.176.4.0/22、kvm虚拟机内网段,网关10.176.0.4.1做在cisco 3750上</p> <p>netgear和cisco 3750均做了port-channel</p> <p>服务器配置:</p> <p>1.ISCSI多路径配置</p> <p>defaults {</p> <p>udev_dir /dev</p> <p>polling_interval 10</p> <p>path_selector "round-robin 0"</p> <p># path_grouping_policy multibus</p> <p>path_grouping_policy failover</p> <p>getuid_callout "/lib/udev/scsi_id –whitelisted –device=/dev/%n"</p> <p>prio alua</p> <p>path_checker readsector0</p> <p>rr_min_io 100</p> <p>max_fds 8192</p> <p>rr_weight priorities</p> <p>failback immediate</p> <p>no_path_retry fail</p> <p>user_friendly_names yes</p> <p>}</p> <p>multipaths {</p> <p>multipath {</p> <p>wwid 36000d31003157200000000000000000a</p> <p>alias primary1</p> <p>}</p> <p>multipath {</p> <p>wwid 36000d310031572000000000000000003</p> <p>alias primary2</p> <p>}</p> <p>multipath {</p> <p>wwid 36000d31003157200000000000000000b</p> <p>alias primary3</p> <p>}</p> <p>multipath {</p> <p>wwid 36000d31003157200000000000000001b</p> <p>alias qdisk</p> <p>}</p> <p>}</p> <p>2.网卡配置</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0</p> <p>DEVICE=eth0</p> <p>TYPE=Ethernet</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=none</p> <p>MASTER=bond0</p> <p>SLAVE=yes</p> <p>USERCTL=no</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1</p> <p>DEVICE=eth1</p> <p>TYPE=Ethernet</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=none</p> <p>MASTER=bond0</p> <p>SLAVE=yes</p> <p>USERCTL=no</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0</p> <p>DEVICE=bond0</p> <p>TYPE=Bond</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=none</p> <p>BRIDGE=cloudbr0</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth4</p> <p>DEVICE=eth4</p> <p>TYPE=Ethernet</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=none</p> <p>MASTER=bond1</p> <p>SLAVE=yes</p> <p>USERCTL=no</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth5</p> <p>DEVICE=eth5</p> <p>TYPE=Ethernet</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=none</p> <p>MASTER=bond0</p> <p>SLAVE=yes</p> <p>USERCTL=no</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1</p> <p>DEVICE=bond1</p> <p>TYPE=Bond</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=none</p> <p>NAME=bond1</p> <p>BRIDGE=cloudbr1</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-cloudbr0</p> <p>DEVICE=cloudbr0</p> <p>TYPE=Bridge</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=none</p> <p>[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-cloudbr1</p> <p>DEVICE=cloudbr1</p> <p>TYPE=Bridge</p> <p>ONBOOT=yes</p> <p>BOOTPROTO=static</p> <p>IPADDR=10.199.16.101</p> <p>NETMASK=255.255.252.0</p> <p>GATEWAY=10.199.16.1</p> <p>DNS1=114.114.114.114</p> <p>[root@hmkvm01 ~]# tail -f -n 5 /etc/modprobe.d/dist.conf</p> <p>alias char-major-89-* i2c-dev</p> <p>alias bond0 bonding</p> <p>options bond0 mode=0 miimon=100</p> <p>alias bond1 bonding</p> <p>options bond1 mode=0 miimon=100</p> <p>现象:</p> <p>1.有一台服务器出现卡顿现象,从办公网络ping kvm虚拟机会有丢包现象,ping网关无丢包</p> <p>2.RHCS集群新建后node是正常的,再添加别的机器不能Join Cluster,luci面板报红色错误,cman和clvmd不能运行,</p> <p>而且只要手动启动cman服务该节点就会进入无限重启的死循环</p> <p>3.在luci面板修改Expected votes值不生效,手动修改配置文件设成1,当失败节点再Join Cluster时依然失败,Expected votes值又会改变,</p> <p>指定network模式为UDP Multicast时地址为239开头的IP,在hmkvm01节点能ping通,在令外的节点ping不通,手动指定Multicast addresses不生效</p> <p>[root@hmkvm01 ~]# cman_tool status</p> <p>Version: 6.2.0</p> <p>Config Version: 28</p> <p>Cluster Name: hmcloud</p> <p>Cluster Id: 50417</p> <p>Cluster Member: Yes</p> <p>Cluster Generation: 992</p> <p>Membership state: Cluster-Member</p> <p>Nodes: 3</p> <p>Expected votes: 7</p> <p>Quorum device votes: 3</p> <p>Total votes: 6</p> <p>Node votes: 1</p> <p>Quorum: 4 </p> <p>Active subsystems: 11</p> <p>Flags: </p> <p>Ports Bound: 0 11 177 178 </p> <p>Node name: hmkvm01</p> <p>Node ID: 1</p> <p>Multicast addresses: 255.255.255.255 </p> <p>Node addresses: 10.199.16.101 </p> <p></p> <p><img src="//cto.wang/usr/uploads/2016/07/20160703184518-33.png" align="" title="1465967318133441.png" /></p> <p></p> <p></p> <p><img src="//cto.wang/usr/uploads/2016/07/20160703184519-66.png" align="" title="1465967532725848.png" /></p> <p></p> <p>4.启动cman服务 Waiting for quorum… Timed-out waiting for cluster,修改network下的模式为UDP Broadcast或在配置文件加cman broadcast="yes",Post Join Delay 改成600,</p> <p>手动修改配置文件Expected votes值为1,重启全部服务器,三台服务器状态都正常了,再看配置文件</p> <p>[root@hmkvm01 ~]# cat /etc/cluster/cluster.conf </p> <p><?xml version="1.0"?></p> <p><cluster config_version="28" name="hmcloud"></p> <p><span class="Apple-tab-span"> </span><clusternodes></p> <p><span class="Apple-tab-span"> </span><clusternode name="hmkvm01" nodeid="1"></p> <p><span class="Apple-tab-span"> </span><fence></p> <p><span class="Apple-tab-span"> </span><method name="hmkvm01"></p> <p><span class="Apple-tab-span"> </span><device name="hmkvm01"/></p> <p><span class="Apple-tab-span"> </span></method></p> <p><span class="Apple-tab-span"> </span></fence></p> <p><span class="Apple-tab-span"> </span></clusternode></p> <p><span class="Apple-tab-span"> </span><clusternode name="hmkvm02" nodeid="2"></p> <p><span class="Apple-tab-span"> </span><fence></p> <p><span class="Apple-tab-span"> </span><method name="hmkvm02"></p> <p><span class="Apple-tab-span"> </span><device name="hmkvm02"/></p> <p><span class="Apple-tab-span"> </span></method></p> <p><span class="Apple-tab-span"> </span></fence></p> <p><span class="Apple-tab-span"> </span></clusternode></p> <p><span class="Apple-tab-span"> </span><clusternode name="hmkvm04" nodeid="3"></p> <p><span class="Apple-tab-span"> </span><fence></p> <p><span class="Apple-tab-span"> </span><method name="hmkvm04"/></p> <p><span class="Apple-tab-span"> </span></fence></p> <p><span class="Apple-tab-span"> </span></clusternode></p> <p><span class="Apple-tab-span"> </span><clusternode name="pcs1" nodeid="4"/></p> <p><span class="Apple-tab-span"> </span></clusternodes></p> <p><span class="Apple-tab-span"> </span><cman broadcast="yes" expected_votes="7"/></p> <p><span class="Apple-tab-span"> </span><fence_daemon post_join_delay="600"/></p> <p><span class="Apple-tab-span"> </span><fencedevices></p> <p><span class="Apple-tab-span"> </span><fencedevice agent="fence_idrac" ipaddr="10.199.2.224" login="root" name="hmkvm01" passwd="HMIDC#88878978"/></p> <p><span class="Apple-tab-span"> </span><fencedevice agent="fence_idrac" ipaddr="10.199.2.225" login="root" name="hmkvm02" passwd="HMIDC#88878978"/></p> <p><span class="Apple-tab-span"> </span><fencedevice agent="fence_idrac" ipaddr="10.199.2.227" login="root" name="hmkvm04" passwd="HMIDC#88878978"/></p> <p><span class="Apple-tab-span"> </span></fencedevices></p> <p><span class="Apple-tab-span"> </span><quorumd label="qdisk" min_score="1"></p> <p><span class="Apple-tab-span"> </span><heuristic interval="10" program="ping -c3 -t2 10.199.16.1" tko="10"/></p> <p><span class="Apple-tab-span"> </span></quorumd></p> <p><span class="Apple-tab-span"> </span><logging debug="on"/></p> <p></cluster></p> <p>5.当集群正常后,在某一节点echo c>/proc/sysrq-trigger,当节点重启后必须重复4现象才能正常加入集群。</p> <p>6.仲裁磁盘qdisk能在每台机器发现,qdisk配置如下</p> <p>[root@hmkvm01 ~]# mkqdisk -L</p> <p>mkqdisk v3.0.12.1</p> <p>/dev/block/253:5:</p> <p>/dev/disk/by-id/dm-name-qdisk:</p> <p>/dev/disk/by-id/dm-uuid-mpath-36000d31003157200000000000000001b:</p> <p>/dev/dm-5:</p> <p>/dev/mapper/qdisk:</p> <p>Magic: eb7a62c2</p> <p>Label: qdisk</p> <p>Created: Mon Jun 13 16:23:05 2016</p> <p>Host: hmkvm01</p> <p>Kernel Sector Size: 512</p> <p>Recorded Sector Size: 512</p> <p></p> <p><img src="//cto.wang/usr/uploads/2016/07/20160703184519-58.png" align="" title="1465967519805093.png" /></p> <p></p> <p>6.fence设备是正常的</p> <p>[root@hmkvm01 ~]# fence_idrac -a 10.199.2.227 -l root -p ****** -o status</p> <p>Status: ON</p> <p>7.查看日志没有发现特别的地方</p> <p><img src="//cto.wang/usr/uploads/2016/07/20160703184519-19.png" border="0" title="1465967502982541.png" /></p> <p>8.重启网卡,或中断几秒钟,当前节点就会重启</p> <p>问题如下:</p> <p>1.我的网卡绑定是否有有需要修改的地方?</p> <p>2.多路径配置是否有问题?</p> <p>2.我的集群有没有配置错误?</p> <p>3.Multicast addresses是各node都能ping通么?</p> <p>4.network下的红色方框中的IP地址是什么关系?</p> <p>5.tcpdump抓包并没有发现个节点有跟Multicast addresses通信,这是正常的么?</p> <p>6.现象8重启的时间在哪里设置?</p> <p></p> <p><img src="//cto.wang/usr/uploads/2016/07/20160703184519-47.png" align="" title="1465967342608207.png" /></p> <p></p> <p></p> <p><img src="//cto.wang/usr/uploads/2016/07/20160703184519-61.png" align="" title="1465967383475161.png" /></p> <p></p> 最后修改:2021 年 12 月 10 日 10 : 53 AM © 允许规范转载 赞赏 如果觉得我的文章对你有用,请随意赞赏 赞赏作者 支付宝微信