本文由 dbaplus 社群授权转载。

一、背景

由于业务模型的发展及数据周期保留的需要，最近某省运营商计划将对现有GBase 8a集群进行扩容，从现有的3coor+21data节点扩到3coor+61data节点。

当前GBase 8a集群版本是GBase8a_MPP_Cluster-NoLicense-8.6.2_build33-R12-redhat7.3-x86_64。新增的40个节点只作为数据节点。本人有幸参与其中，负责具体的扩容及后续数据重分布操作。

二、环境信息

1、硬件配置信息

CPU：

CPU数：4*8C (4个物理cpu，每个物理cpu有8个逻辑cpu)

内存：

MemTotal: 512GB

2、软件版本

GBase 8a集群版本 GBase8a_MPP_Cluster-NoLicense-8.6.2_build33-R12-redhat7.3-x86_64

3、扩容机器规划

为保证应用接入接口ip地址群不变，扩容后，仍保留3个coordinator（管理节点）节点不变，扩容40节点均为data（数据）节点。规划主机名为gbase25-gbase64。

三、实施前准备

1、扩容实施的网络环境需求

现场的网络环境是原集群的24台服务器和新扩容40台服务器均为内网万兆，外网千兆，双网卡绑定，网络测试结果符合扩容要求。

2、扩容实施的存储空间需求

为保证扩容实施的绝对安全，需要每台服务器具有足够的空间用来存放重分布的临时数据。集群已有节点每节点opt目录有空闲空间13TB，根目录空闲空间439GB；新增节点opt有空闲空间22TB，根目录空闲空间149GB，符合扩容要求。

检查发现有两台服务器（IP地址为190、193）磁盘写速度明显异常，主机人员确定是RAID卡电池故障，修复后磁盘读写速度正常。

3、扩容实施的服务器需求

统一MPP集群节点的操作系统版本。扩容前已将新扩容节点操作系统统一重新集成，与集群已有节点操作系统版本一致，为rhel7.3，符合扩容要求。

四、扩容实施

1、添加新增节点root及gbase用户互信

-- root用户
scp -r ~/.ssh 192.168.200.12:~/
-- gbase用户
scp -r ~/.ssh 192.168.200.12:~/

2、配置C3工具(该工具用于GBASE各节点同时执行执行命令)

-- root用户
vi /etc/c3.conf 添加
cluster new {
     192.168.200.11:redhat1
     192.168.200.12
}

3、使用C3工具配置扩容节点环境

-- 操作系统
cexec new_data: new: 'cat /etc/redhat-release'
-- 内核
cexec new_data: new: 'uname -a'
-- 防火墙
cexec new_data: new: 'service iptables status'
cexec new_data: new: 'service ip6tables status'
cexec new_data: new: 'chkconfig | grep iptables'
cexec new_data: new: 'chkconfig | grep ip6tables'
-- selinux
cexec new_data: new: 'sestatus'
cexec new_data: new: 'grep ^SELINUX= /etc/selinux/config'
-- 内存参数
cexec new_data: new: 'ulimit -H'
cexec new_data: new: 'ulimit -S'
cexec new_data: new: 'ulimit -m'
-- vi /etc/security/limits.config 添加
-- * soft as unlimited
-- * hard as unlimited
-- * rss as unlimited
-- 透明大页
cexec new_data: new: 'cat /sys/kernel/mm/redhat_transparent_hugepage/enabled'
-- echo "never" > /sys/kernel/mm/redhat_transparent_hugepage/enabled
-- 主机名检查
cexec new_data: new: 'hostname'

4、设置集群readonly后备份集群信息

-- 修改/etc/hosts
vi /etc/hosts 添加新增节点信息并同步到新增节点
-- 确认集群正常
gcadmin
gcadmin showddlevent
gcadmin showdmlevent
gcadmin showdmlstorageevent
-- 设置集群readonly
gcadmin switchmode readonly
-- scn tableid 备份
cexec 'python -c "import gcware;print gcware.getscn()"'
cexec 'python -c "import gcware;print gcware. gettableid ()"'
-- 版本信息查看
cexec "/opt/gcluster/server/bin/gclusterd -V"
cexec "gcadmin -V"
cexec data: "/opt/gnode/server/bin/gbased -V"
cexec 'gccli -ugbase -pgbase20110531 -Nse "select @@version"'
-- 数据库信息备份
sh backup_database.sh
ls -l /home/gbase/gbase_expand/201811
-- nodedatamap备份
gccli -ugbase -pgbase20110531 -vvv -e"rmt:select * from gbase.nodedatamap  into outfile '/home/gbase/gbase_expand/201811/nodedatamap.dat' fields terminated by '|'"
wc -l /home/gbase/gbase_expand/201811/nodedatamap.dat
-- 备份集群配置文件
cexec "mkdir -p /home/gbase/gbase_expand/201811/gcluster"
cexec "cp -r /opt/gcluster/config/ /home/gbase/gbase_expand/201811/gcluster/"
cexec "ls /home/gbase/gbase_expand/201811/gcluster/config"
cexec data: "mkdir -p /home/gbase/gbase_expand/201811/gnode"
cexec data: "cp -r /opt/gnode/config/ /home/gbase/gbase_expand/201811/gnode/"
cexec coor: "ls /home/gbase/gbase_expand/201811/gnode/config"
-- 备份corosync配置文件
cexec "cp -r /etc/corosync /home/gbase/gbase_expand/201811/"
cexec "ls /home/gbase/gbase_expand/201811/corosync | wc -l"
-- 备份gcware配置文件
cexec "cp -r /var/lib/gcware /home/gbase/gbase_expand/201811/"
cexec 'ls /home/gbase/gbase_expand/201811/gcware | wc -l'

5、执行扩容

-- 停止集群
cexec "service gcware stop"

--找到原来的升级包所在的目录
-- 修改demo.options
cd gcinstall/
vi demo.options
installPrefix= /opt
coordinateHost = 
dataHost = 134.32.48.8,134.32.48.11,134.32.48.13,134.32.48.14,134.32.48.46,134.32.48.47,134.32.48.48,134.32.48.50
existCoordinateHost =
134.32.48.208,134.32.48.209,134.32.48.210,134.32.48.211,134.32.48.212,134.32.48.213,134.32.48.214,134.32.48.215,134.32.48.216,134.32.48.217,134.32.48.218,134.32.48.219,134.32.48.220,134.32.48.221,134.32.48.222,134.32.48.223,134.32.48.224,134.32.48.225,134.32.48.226,134.32.48.227
existDataHost =134.32.48.208,134.32.48.209,134.32.48.210,134.32.48.211,134.32.48.212,134.32.48.213,134.32.48.214,134.32.48.215,134.32.48.216,134.32.48.217,134.32.48.218,134.32.48.219,134.32.48.220,134.32.48.221,134.32.48.222,134.32.48.223,134.32.48.224,134.32.48.225,134.32.48.226,134.32.48.227
loginUser= root
loginUserPwd = ' Huawei#123'
#loginUserPwdFile = loginUserPwd.json
dbaUser = gbase
dbaGroup = gbase
dbaPwd = gbase
rootPwd = ' Huawei#123'
#rootPwdFile = rootPwd.json
dbRootPwd = 'Huawei@123'
#mcastAddr = 226.94.1.39
mcastPort = 5493

-- 执行扩容
./gcinstall.py --silent=demo.options
-- 配置文件对比
diff /opt/gcluster/config/gbase_8a_gcluster.cnf /home/gbase/gbase_expand/201811/gcluster/config/gbase_8a_gcluster.cnf 
diff /opt/gnode/config/gbase_8a_gbase.cnf /home/gbase/gbase_expand/201811/gnode/config/gbase_8a_gbase.cnf
cexec data: md5sum /opt/gnode/config/gbase_8a_gbase.cnf
-- 生成新的distribution（备份方式）
gcadmin distribution gcChangeInfo.xml p 1 d 1
-- 生成新的hashmap
gccli -ugbase -pgbase20110531 -vvv -e"initnodedatamap"

6、扩容完成集群可用性基本验证

增删改查测试
create database db_test;
create table db_test.t1(c1 int,c2 int) distributed by ('c1');
insert into db_test.t1 values (1,1),(2,2),(3,3);
update db_test.t1 set c2=10 where c1=1;
select * from db_test.t1;
delete from db_test.t1 where c1>=3;
select * from db_test.t1;
truncate table db_test.t1;
数据加载测试
load data infile 'sftp://gbase:gbase@192.168.200.11/tmp/t1.txt' into table db_test.t1 fields terminated by ':';
select count(1) from db_test.t1;
drop table db_test.t1;
drop database db_test;

五、数据重分布

所有的MPP集群由于数据分布在很多数据节点，所以在扩容操作完成后，为了避免数据倾斜，需要将所有的业务表数据重分布到所有数据节点（包括扩容节点）。

-- 重分布
-- 设置重分布并发度为0
gccli -ugbase -pgbase20110531 -vvv -e"set global gcluster_rebalancing_concurrent_count=0"
gccli -ugbase -pgbase20110531 -Ns -e"select @@gcluster_rebalancing_concurrent_count"
-- 重分布整个实例
gccli -ugbase -pgbase20110531 -vvv -e"rebalance instance"
gccli -ugbase -pgbase20110531 -Ns -e"select count(1) from gclusterdb.rebalancing_status"
-- 调整优先级
create table test.reb_tab(db_name varchar(64),table_name varchar(64),priority int) replicated;
-- 插入优先级高的表
insert into test.reb_tab values ('test','t1',1),('test','t2',2);
update gclusterdb.rebalancing_status a, test.reb_tab b set a.priority=b.priority where a.db_name=b.db_name and a.table_name=b.table_name ;
select count(1) from gclusterdb.rebalancing_status where priority<5; 
-- 调整重分布并发度
gccli -ugbase -pgbase20110531 -vvv -e"set global gcluster_rebalancing_concurrent_count=1"
gccli -ugbase -pgbase20110531 -Ns -e"select @@gcluster_rebalancing_concurrent_count"
-- 暂停重分布
gccli -ugbase -pgbase20110531 -vvv -e"pause rebalance instance"
gccli -ugbase -pgbase20110531 -Ns -e"select status,count(1) from gclusterdb.rebalancing_status group by 1"
-- 继续重分布
gccli -ugbase -pgbase20110531 -vvv -e"continue rebalance instance"
gccli -ugbase -pgbase20110531 -Ns -e"select status,count(1) from gclusterdb.rebalancing_status group by 1"
-- 等待重分布完成
-- 恢复业务

六、效率分析

在扩容中各步骤的分别耗时情况：

扩容：24日18:30 ~ 24日20:20，耗时约2小时；
重分布：一共8802张表，231T的数据量，24日20.25 ~ 26日10.36，耗时约38小时，原计划91个小时（以工程经验35MB/s的速度计算）。

注：因为有一张分布极不平均的表，全部数据落在一个节点，70个字段，75亿记录，13压缩，单个分片350GB。仅这一张表重分布就用了12小时。除掉这张一表以外，8801张表实际用时27小时（24日20：25~25日23：25），达到118MB/s，重分布速度远远超出预期。

七、经验总结

1、MPP集群一般在做数据重分布操作的时候，必须要考虑到业务调度的执行时间，因为重分布操作可能会造成业务表锁表从而影响业务调度的正常执行，本次扩容操作前调研数据同步时间为2点到下午15点，调度执行周期较长，采用在调度执行前将所有调度需要的业务表，提高重分布优先级，提前完成重分布，调度执行期间降低重分布的并发度，从而做到能够做到24小时重分布，且不影响生产调度。如果日调度时间较短或者表过多无法筛选那些表为调度执行需要表的情况下，建议错时重分布数据。

2、新加节点除了考虑和本集群的网络打通外，需要考虑到与数据加载机，与hadoop集群（如果配置了hdp加载数据的话）的网络连通。

3、扩容前最好检查下表的倾斜情况，倾斜较大的表建议调整分布键，以防止本次扩容类似“因为有一张分布极不平均的表，全部数据落在一个节点，70个字段，75亿记录，13压缩，单个分片350GB。仅这一张表重分布就用了12小时”的情况。

作者介绍：

汪浩，新炬网络核心业务系统DBA，主要涉及Oracle、Greenplum、Gbase等数据库管理、IT运维管理工作，对数据库多业务场景性能优化有着丰富的实践经验，专注于数据库性能优化、IT运维自动化工作。

原文链接：

https://mp.weixin.qq.com/s?__biz=MzI4NTA1MDEwNg==&mid=2650787161&idx=2&sn=1e5957e9d70401ad26b06d40b26e5fc7&chksm=f3f978ccc48ef1dad1e354e53d35b3edac8b522c34dc3b354cbfa3852ebc435e188c99778a82&scene=27#wechat_redirect

创作场景

GBase 8a MPP 集群扩容实战