弊社の技術資料やコラム等を公開しています。
内容に誤りや、社会通念上不適切な点があることが考えられますが、弊社では本サイトの情報の使用により生じたいかなる損害も責任を負いかねますことをご了承ください。

GlusterFSの使い方研究(主に拡張性を確認)

初出:2019-06-13 00:00:00　更新:2019-06-13 00:00:00

カテゴリ: IT 　タグ: ストレージ

GlusterFSの使い方、特に拡張性について色々やってみました。

参考資料

GlusterFS Documentation https://docs.gluster.org/en/latest/

インストール(Debianの場合)

試験機の準備

試験機としてKVMで仮想マシンを作成。システム用ディスク x1、ストレージ用ディスク x12を準備する。それぞれ100GiB。
Debian 10 (Buster)をシステム用ディスクにインストール。パーティションは自動でよい。ホスト名は”tst”。taskselでは”SSH Server”と”Standard System Utilities”を選択。
3, インストールが完了したらrootでログインして(以降全部rootで実施)、好みのツール類をインストール
1
2
apt update
apt install vim parted

固定IPとホストの名前解決を設定。ここではホスト名をtst、IPアドレスを192.168.0.201とする。インターネットにはつながるようにしておく。

vim /etc/hostname
---
tst
---

vim /etc/network/interfaces
---
allow-hotplug ens3
iface ens3 inet static
address 192.168.0.201
netmask 255.255.255.0
gateway 192.168.0.1
nameservers 192.168.22.1
---

vim /etc/hosts
---
192.168.0.201  tst
---

# 一旦再起動して、上記の設定を反映させる
reboot

GlusterFSのインストール

サーバー(glusterfs-server)を入れる。GlusterFSのクライアント(glusterfs-client)も自動的に入る。
1
2
3
apt install glusterfs-server
systemctl enable glusterd
systemctl start glusterd

ボリュームの作成と削除

GlusterFSのbrick用領域の作成

ストレージ用ディスクをXFSフォーマットする。ディスクが12個あるので、下記のような手順を12回繰り返す

# ストレージ用ディスクのデバイス名を確認
parted -l
# ディスクのフォーマット
parted /dev/vdb
(parted) mklabel gpt
(parted) mkpart xfs 0% 100%
(parted) quit
mkfs.xfs -i size=512 /dev/vdb1
# マウント
mkdir -p /gfs/v01
mount -t xfs /dev/vdb1 /gfs/v01
df -h
# 500Gある
===
Filesystem      Size  Used Avail Use% Mounted on
(略)
/dev/vdb1       500G  543M  500G   1% /gfs/v01
===

# 面倒なので下記のようにしてしまってもよいかも。
# フォーマット
for disk in vdb vdc vdd vde vdf vdg vdh vdi vdj vdk vdl vdm; do parted /dev/$disk "mklabel gpt"; parted /dev/$disk "mkpart xfs 0% 100%"; sleep 1; mkfs.xfs -i size=512 /dev/${disk}1; done
# マウント
num=1; for disk in vdb vdc vdd vde vdf vdg vdh vdi vdj vdk vdl vdm; do vol=`printf "v%02d" $num`; mkdir -p /gfs/$vol; mount -t xfs /dev/${disk}1 /gfs/$vol; num=`expr $num + 1`; done

各ストレージボリューム上にbrick用ディレクトリを作成する。

1
2
3

cd gfs
# /gfs/v01 ～ /gfs/v12の中にbrickというディレクトリを作成する
for vol in `seq -w 12`; do mkdir /gfs/v${vol}/brick; done

GlusterFS用ボリュームの作成

ここでは簡単に、brickを1つだけで作ってみる

gluster volume create test tst:/gfs/v01/brick
===
volume create: test: success: please start the volume to access data
===

gluster volume start test
===
volume start: test: success
===

gluster volume info test
===
Volume Name: test
Type: Distribute
Volume ID: a102d2c2-5e6b-4577-b2e9-a9a337d31a9b
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

ボリュームのマウント

mount -t glusterfs tst:/test /mnt
mount
===
tst:/test on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
===

df -h
# サイズ100GのGlusterFSボリュームができている
===
Filesystem      Size  Used Avail Use% Mounted on
(略)
tst:/test       100G  1.2G   99G   2% /mnt
===

ボリュームの削除

# まずはアンマウント
umount /mnt

# 削除
gluster volume stop test
===
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: test: success
===

gluster volume delete test
===
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: test: success
===

gluster volume info
===
No volumes present
===

# /gfs/v01/brickを空にしておく。そのままだと次にgluster create ... /gfs/v01/brickした際にエラーになるので。
rm -rf /gfs/v01/brick; mkdir /gfs/v01/brick

色々なパターンを試す

今回はbrickが全部同じサーバーなので、replicaボリュームを作ろうとすると”Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Bricks should be on different nodes to have best fault tolerant configuration. Use ‘force’ at the end of the command if you want to override this behavior.”というエラーが出る。今回はこのエラーにあるように、”force”をコマンドの最後につけて無視する。

brick x2の複製(replicated)構成

Split brain現象を招きやすいので、推奨されない構成である。

gluster volume create test replica 2 tst:/gfs/v01/brick tst:/gfs/v02/brick force
===
volume create: test: success: please start the volume to access data
===

# force無しの場合、split brainの警告が出る
gluster volume create test replica 2 tst:/gfs/v01/brick tst:/gfs/v02/brick
===
Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid this. See: http://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/.
Do you still want to continue?
 (y/n)
===

gluster volume start test

gluster volume info
===
Volume Name: test
Type: Replicate
Volume ID: 02bae4c0-b94b-4b18-b68c-a93134583b9e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
===

mount -t glusterfs tst:/test /mnt
df -h
# (100GBディスク * 2個) / 2つでreplica = 100GB
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       100G  1.2G   99G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
# ここでは12個全部のbrickを再作成しているが、実際には使った分だけ再作成すればよい。
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

brick x2のストライプ(striped)構成

こちらはforce無しでもエラーにはならない。

ストライプ構成は廃止予定(deprecated)らしい。

gluster volume create test stripe 2 tst:/gfs/v01/brick tst:/gfs/v02/brick
===
volume create: test: success: please start the volume to access data
===

gluster volume start test
gluster volume info
===
Volume Name: test
Type: Stripe
Volume ID: f8150162-7143-4122-bb5f-31f5e331bb7c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

mount -t glusterfs tst:/test /mnt
df -h
# 100GB * 2 = 200GB
===
Filesystem       Size  Used Avail Use% Mounted on
(中略)
tst:/test       200G  2.3G  198G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

brick x2の分配(distributed)構成

force無しでもエラーにはならない。
replica、stripeと違い、構成を明示しないとdistributed構成になる。

striped構成は1つのファイル内のデータを各brickにバラシて保管させるのに対し、distrubuted構成はファイルデータをバラさずにどこか1つのbrickに保管する。

gluster volume create test 2 tst:/gfs/v01/brick tst:/gfs/v02/brick

gluster volume start test
gluster volume info
===
Volume Name: test
Type: Distribute
Volume ID: a33172b0-0d67-4a63-bd70-efe4ff2fa309
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

mount -t glusterfs tst:/test /mnt
df -h
# 100GB * 2 = 200GB
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       200G  2.3G  198G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

brick x3の複製(replicated)構成

GlusterFSでは、3つ以上のbrickでボリュームを構成するのが普通。Split brainが起きにくい(か、起きない)。

gluster volume create test replica 3 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick force
gluster volume start test
gluster volume info
===
Volume Name: test
Type: Replicate
Volume ID: 5a0463e8-b452-46f6-bab5-308ccaf5010e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
===

mount -t glusterfs tst:/test /mnt
df -h
# 100GB * 3 / 3 = 100GB
===
Filesystem       Size  Used Avail Use% Mounted on
(中略)
tst:/test       100G  1.2G   99G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

brick x3の分配(distributed)構成

gluster volume create test tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick
gluster volume start test
gluster volume info
===
Volume Name: test
Type: Distribute
Volume ID: 578826c0-512e-4214-bab3-51df86f529b6
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

mount -t glusterfs tst:/test /mnt
df -h
# 100GB * 3 = 300GB
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       300G  3.4G  297G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

brick x3の分散(dispersed)構成

これも普通にやるとエラーが出るので、forceをつける

gluster volume create test disperse 3 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick force
gluster volume start test
gluster volume info
===
Volume Name: test
Type: Disperse
Volume ID: d8844dfb-3af8-4905-82c8-887d83a9582b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

mount -t glusterfs tst:/test /mnt
df -h
# 100GB * 3 * (2/3)= 200GB
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       200G  2.3G  198G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

brick x4以上の分散(dispersed)構成

“not optimal”(最適ででない)という警告が作成時に出る。

brick数(bricks) - 故障してもよい数(冗長数/redundancies) が、2の累乗になるようにすると”optimal”と言えるらしい。例：3 bricks - 1冗長 = 2 →OK、4 bricks - 1冗長 = 3 → NG (4 bricks 2冗長は仕様上不可；冗長数は全bricksの1/2未満でないといけないため)、6 bricks - 2冗長 = 4 → OK、6 bricks - 1冗長 = 5 → NG、7 bricks - 1冗長 = 6 → NG

2の累乗にならないような構成はパフォーマンスに影響が出るらしい。

gluster volume create test disperse 4 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick force
===
# 警告が出る
This configuration is not optimal on most workloads. Do you want to use it ? (y/n)
===

gluster volume create test disperse 5 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick force
# 警告が出ない

gluster volume create test disperse 5 redundancy 2 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick force
# 警告が出る

gluster volume create test disperse 6 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick force
# 警告が出る

gluster volume create test disperse 6 redundancy 2 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick force
# 警告が出ない

gluster volume create test disperse 3 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick force
# 警告が出ない。これはdistributed dispersed構成(分配分散構成)であり、2+1のdispersedのbricks群が2つdistributedされているということになる。

gluster volume create test disperse 9 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick tst:/gfs/v07/brick  tst:/gfs/v08/brick  tst:/gfs/v09/brick force
# 9-1=8。警告が出ない。しかし9台のうち1台死んだらアウトというのは一般的に認容できるのだろうか。

gluster volume create test disperse 9 redundancy 3 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick tst:/gfs/v07/brick  tst:/gfs/v08/brick  tst:/gfs/v09/brick force
# 9-3=6。警告が出る。

gluster volume create test disperse 9 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick tst:/gfs/v07/brick  tst:/gfs/v08/brick  tst:/gfs/v09/brick force
# 警告が出ない。"redundancy"を明示しないと、最適なものを自動選択してくれる。この場合はredundancy=1になる(gluster volume infoで見れる)

分配複製(distributed replicated)構成

複数の複製brick群(replicated bricks)を分配構成としたもの。

# 複製brick群 * 2個による構成
# "replica 2"により、指定したbrickが2つずつの複製brick群(replicated bricks)となり、それが結果的に2つできるため、これらを分配構成にするということになる。
# v01/brick + v02/brick - 複製brick群1
# v03/brick + v04/brick - 複製brick群2
gluster volume create test replica 2 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick force

gluster volume info test
===
Volume Name: test
Type: Distributed-Replicate
Volume ID: dac551d3-d9b0-4d76-add8-144437f39e0b
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Brick4: tst:/gfs/v04/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
===

mount -t glusterfs tst:/test /mnt
df -h
# ((100GB * 2) / 2) * 2 = 200GB
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       200G  2.3G  198G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

分配分散(distributed dispersed)構成

複数の分散brick群(dispersed bricks)を分配構成としたもの。

# 複製brick群 * 2個による構成
# "disperse 3"により、指定したbrickが3つずつの分散brick群(dispersed bricks)となり、それが結果的に2つできるため、これらを分配構成にするということになる。
# v01/brick + v02/brick + v03/brick - 分散brick群1
# v04/brick + v05/brick + v06/brick - 分散brick群2
gluster volume create test disperse 3 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick force

gluster volume info test
===
Volume Name: test
Type: Distributed-Disperse
Volume ID: 58189ecb-e9c0-49c8-9c54-fa31b10f6391
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Brick4: tst:/gfs/v04/brick
Brick5: tst:/gfs/v05/brick
Brick6: tst:/gfs/v06/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

mount -t glusterfs tst:/test /mnt
df -h
# ((100GB * 3) * (2/3)) * 2 = 400GB
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       400G  4.6G  396G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

単体の分散構成(dispersed)→分配分散構成(distributed dispersed)への拡張

# まずは3分散brick群1つでtestボリュームを作成してマウント
gluster volume create test disperse 3 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick force
gluster volume start test
gluster volume info test
===
Volume Name: test
Type: Disperse
Volume ID: e9abe51a-cec6-4f38-8992-64564cd12e8f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

mount -t glusterfs tst:/test /mnt
df -h
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       200G  2.3G  198G   2% /mnt
===

# これを拡張する
gluster volume add-brick test tst:/gfs/v04/brick
===
# 1つだけ足そうと思ってもできない
volume add-brick: failed: Incorrect number of bricks supplied 1 with count 3
===

gluster volume add-brick test tst:/gfs/v04/brick tst:/gfs/v05/brick
===
# 2つでもできない
volume add-brick: failed: Incorrect number of bricks supplied 2 with count 3
===

gluster volume add-brick test tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick
===
# これはできた
volume add-brick: success
===

gluster volume info test
# 2 x (2 + 1) = 6 になっている
===
Volume Name: test
Type: Distributed-Disperse
Volume ID: e9abe51a-cec6-4f38-8992-64564cd12e8f
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Brick4: tst:/gfs/v04/brick
Brick5: tst:/gfs/v05/brick
Brick6: tst:/gfs/v06/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
===

df -h
# 200G増えた
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       400G  2.3G  198G   2% /mnt
===

# 最初に試したときは200Gのままだった。この時はボリュームをstop->startしたら400Gになった。下記のような手順。
umount /mnt
gluster volume stop test
gluster volume start test
mount -t glusterfs tst:/test /mnt
df -h

# 縮めてみる。
gluster volume remove-brick test tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick start
===
# 事前にcluster.force-migrationを確認したらoffだった(gluster volume get test cluster.force-migration)
Running remove-brick with cluster.force-migration enabled can result in data corruption. It is safer to disable this option so that files that receive writes during migration are not migrated.
Files that are not migrated can then be manually copied after the remove-brick commit operation.
Do you want to continue with your current cluster.force-migration settings? (y/n) y
volume remove-brick start: success
ID: 8cc38dee-cfd9-4a95-bc54-721e1b0f0d59
===

# 状況確認
gluster volume remove-brick test tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick status
===
# "completed"になっていたら完了。
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0            completed        0:00:00
===

# 上記操作で安心してはいけない。commit操作が必要となる
gluster volume remove-brick test tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick commit
===
# 成功。一応、削除したbrickの中味は確認しておいた方がいいようだ。
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick.
===

gluster volume info test
===
# 1 x (2 + 1) = 3になった
Volume Name: test
Type: Disperse
Volume ID: e9abe51a-cec6-4f38-8992-64564cd12e8f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: tst:/gfs/v01/brick
Brick2: tst:/gfs/v02/brick
Brick3: tst:/gfs/v03/brick
Options Reconfigured:
performance.client-io-threads: on
transport.address-family: inet
nfs.disable: on
===

df -h
# 200G減った。
===
Filesystem      Size  Used Avail Use% Mounted on
(中略)
tst:/test       200G  2.3G  198G   2% /mnt
===

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done

その他の実験

brickの空き容量を減らす

brickはXFS上に存在する。あるbrickの存在するXFS上の、brick外の領域にデータを書き込んだらどうなるかを見る。

# replica 3のGlusterFSボリュームを作成
gluster volume create test replica 3 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick force
gluster volume start test
mount -t glusterfs tst:/test /mnt
df -h
===
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1       100G  135M  100G   1% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       100G  1.2G   99G   2% /mnt
===
# GlusterFS(/mnt)の使用量は1.2G、brickがあるXFS(/gfs/v01)は135M。

# 10Gのファイルを/gfs/v01/に作成する
dd if=/dev/zero of=/gfs/v01/test10g.dat bs=1M count=10240

df -h
===
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       100G   12G   89G  12% /mnt
===
# 使用量がそれぞれ10G分程増えた。

# 10Gのファイルを/gfs/v02/に作成する
dd if=/dev/zero of=/gfs/v02/test10g.dat bs=1M count=10240
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   11G   90G  11% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       100G   12G   89G  12% /mnt
===
# v02の使用量は増えたが、GlusterFSの方は増えない。

# v02にさらに10G足してみる
dd if=/dev/zero of=/gfs/v02/test10g2.dat bs=1M count=10240

df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   21G   80G  21% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       100G   22G   79G  22% /mnt
===
# GlusterFSの方も10G増えた。

# v03に5Gのファイルを作成してみる
dd if=/dev/zero of=/gfs/v03/test5g.dat bs=1M count=5120
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   21G   80G  21% /gfs/v02
/dev/vdd1       100G  5.2G   95G   6% /gfs/v03
tst:/test       100G   22G   79G  22% /mnt
===
# v03が5G増えたが、GlusterFSはやはり増えない

# GlusterFS上に3Gのファイルを作成してみる
dd if=/dev/zero of=/mnt/test3g.dat bs=1M count=3072
df -h
===
/dev/vdb1       100G   14G   87G  14% /gfs/v01
/dev/vdc1       100G   24G   77G  24% /gfs/v02
/dev/vdd1       100G  8.2G   92G   9% /gfs/v03
tst:/test       100G   25G   76G  25% /mnt
===
# 全部3Gずつ増えた。

# v01の10Gのファイルを削除する
rm /gfs/v01/test10g.dat
df -h
===
/dev/vdb1       100G  3.2G   97G   4% /gfs/v01
/dev/vdc1       100G   24G   77G  24% /gfs/v02
/dev/vdd1       100G  8.2G   92G   9% /gfs/v03
tst:/test       100G   25G   76G  25% /mnt
===
# v01からは10G減ったが、GlusterFSの方は変わらず

# v02の10Gのファイルを削除する
rm /gfs/v01/test10g.dat
df -h
===
/dev/vdb1       100G  3.2G   97G   4% /gfs/v01
/dev/vdc1       100G   14G   87G  14% /gfs/v02
/dev/vdd1       100G  8.2G   92G   9% /gfs/v03
tst:/test       100G   15G   86G  15% /mnt
===
# v02から10G、GlusterFSからも10G減った。

# 一旦始末する
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done
rm /gfs/v01/*.dat /gfs/v02/*.dat /gfs/v03/*.dat


# disperse 3 redundancy 1のGlusterFSボリュームを作成
gluster volume create test disperse 3 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick force
gluster volume start test
mount -t glusterfs tst:/test /mnt
df -h
===
/dev/vdb1       100G  135M  100G   1% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       200G  2.3G  198G   2% /mnt
===

# v01に10Gのファイルを作成
dd if=/dev/zero of=/gfs/v01/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       200G   23G  178G  12% /mnt
===
# v01が10G増えたのに対して、GlusterFSの使用量は20G増えた。

# v02に10Gのファイルを作成
dd if=/dev/zero of=/gfs/v02/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   11G   90G  11% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       200G   23G  178G  12% /mnt
===
# v02は増えたが、GlusterFSの方は増えない。

# v02に10Gのファイルを追加
dd if=/dev/zero of=/gfs/v02/test10g2.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   21G   80G  21% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
tst:/test       200G   43G  158G  22% /mnt
===
# v02が10G増えたのに対して、GlusterFSの方は20G増えた。

# つまり、(brick群の中で最小の空き容量 × brickの個数) × 容量効率 ＝ 実効容量 ということになる...？

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done
rm /gfs/v01/*.dat /gfs/v02/*.dat /gfs/v03/*.dat

# distributed replicated構成の場合
# 2 x 3のreplica構成を作成
gluster volume create test replica 3 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick force
gluster volume start test
mount -t glusterfs tst:/test /mnt
df -h
===
/dev/vdb1       100G  135M  100G   1% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G  135M  100G   1% /gfs/v04
/dev/vdf1       100G  135M  100G   1% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       200G  2.3G  198G   2% /mnt
===

# v01に10Gファイルを作成
dd if=/dev/zero of=/gfs/v01/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G  135M  100G   1% /gfs/v04
/dev/vdf1       100G  135M  100G   1% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       200G   13G  188G   7% /mnt
===
# GlusterFSの使用量は10G増えた。

# v04(replicaのbrick群の2群目)に10Gファイルを作成
dd if=/dev/zero of=/gfs/v04/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G   11G   90G  11% /gfs/v04
/dev/vdf1       100G  135M  100G   1% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       200G   23G  178G  12% /mnt
===
# GlusterFSが10G増えた。

# v02とv05にそれぞれ10Gファイルを作成
dd if=/dev/zero of=/gfs/v02/test10g.dat bs=1M count=10240
dd if=/dev/zero of=/gfs/v05/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   11G   90G  11% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G   11G   90G  11% /gfs/v04
/dev/vdf1       100G   11G   90G  11% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       200G   23G  178G  12% /mnt
===
# GlusterFSの使用量は増えない。

# v04に20Gファイルを追加
dd if=/dev/zero of=/gfs/v04/test20g.dat bs=1M count=20480
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   11G   90G  11% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G   31G   70G  31% /gfs/v04
/dev/vdf1       100G   11G   90G  11% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       200G   43G  158G  22% /mnt
===
# GlusterFSの使用量が20G増えた。

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done
for vol in `seq -w 6`; do rm /gfs/v0${vol}/*.dat; done


# distributed dispersed構成の場合
# 2 x (2 + 1)のdispersed構成を作成
gluster volume create test disperse 3 redundancy 1 tst:/gfs/v01/brick tst:/gfs/v02/brick tst:/gfs/v03/brick tst:/gfs/v04/brick tst:/gfs/v05/brick tst:/gfs/v06/brick force
gluster volume start test
mount -t glusterfs tst:/test /mnt
df -h
===
/dev/vdb1       100G  135M  100G   1% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G  135M  100G   1% /gfs/v04
/dev/vdf1       100G  135M  100G   1% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       400G  4.6G  396G   2% /mnt
===

# v01に10Gファイルを作成
dd if=/dev/zero of=/gfs/v01/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G  135M  100G   1% /gfs/v04
/dev/vdf1       100G  135M  100G   1% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       400G   25G  376G   7% /mnt
===
# GlusterFSの使用量が20G増えた。

# v04に10Gファイルを作成
dd if=/dev/zero of=/gfs/v04/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G  135M  100G   1% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G   11G   90G  11% /gfs/v04
/dev/vdf1       100G  135M  100G   1% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       400G   45G  356G  12% /mnt
===
# GlusterFSの使用量が20G増えた。

# v02とv05にそれぞれ10Gファイルを作成
dd if=/dev/zero of=/gfs/v02/test10g.dat bs=1M count=10240
dd if=/dev/zero of=/gfs/v05/test10g.dat bs=1M count=10240
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   11G   90G  11% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G   11G   90G  11% /gfs/v04
/dev/vdf1       100G   11G   90G  11% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       400G   45G  356G  12% /mnt
===
# GlusterFSの使用量は増えない。

# v04に20Gファイルを追加
dd if=/dev/zero of=/gfs/v04/test20g.dat bs=1M count=20480
df -h
===
/dev/vdb1       100G   11G   90G  11% /gfs/v01
/dev/vdc1       100G   11G   90G  11% /gfs/v02
/dev/vdd1       100G  135M  100G   1% /gfs/v03
/dev/vde1       100G   31G   70G  31% /gfs/v04
/dev/vdf1       100G   11G   90G  11% /gfs/v05
/dev/vdg1       100G  135M  100G   1% /gfs/v06
tst:/test       400G   85G  316G  22% /mnt
===
# GlusterFSの使用量が40G増えた。

# 後始末
umount /mnt; gluster volume stop test; gluster volume delete test
for vol in `seq -w 12`; do rm -rf /gfs/v$vol/brick; mkdir /gfs/v$vol/brick; done
for vol in `seq -w 6`; do rm /gfs/v0${vol}/*.dat; done

# あるbrick群の空き容量は、その中のbrickのうち空き容量が一番少ないものを基準にして算出される。
# 複数のbrick群の空き容量は、単純に各brick群の空き容量の合計となる。
# あるbrickの空き容量が減っても、影響を受ける可能性があるのはそのbrickが属するbrick群だけであり、ほかのbrick群の空き容量には影響しない。

感想など

ひとつのボリュームには異なるタイプのbrick群が混在できないことが分かった。例えばbrick群1がreplicatedで、brick群2がdispersedみたいなのはできない。
- 例えばZFSだとRAID-Z2の塊(vdev)の後ろにRAID-Z1やらミラーのvdevをつなげて拡張することができる。個数の縛りもない。
  - ZFSを例に出したのは単に筆者が慣れているからであって、GlusterFSとZFSを本格的に比較したいわけではないです。
分散(dispersed)構成時にbrick数と冗長数(number of bricks and number of redundancies)を考慮しないといけないのことを知った。
- ストライプサイズの問題である。ZFSにも似た話がある。この種のものではそういうのを常に気にすべきなのだろう。

フリー画像サイト集後>