注意
本文档适用于 Ceph 的开发版本。
Zoned Storage 支持
Zoned Storage 是一类存储设备,它使主机和存储设备能够协同工作,以实现更高的存储容量、增加的吞吐量和更低的延迟。Zoned Storage 接口目前通过叠瓦式磁记录 (SMR) 硬盘上的 SCSI Zoned Block Commands (ZBC) 和 Zoned Device ATA Command Set (ZAC) 标准提供,并且也正在被即将推出的 NVMe Zoned Namespaces (ZNS) 标准用于 NVMe 固态硬盘。
该项目旨在使 Ceph 能够在分区存储驱动器上工作,同时探索与采用这种新接口相关的研究问题。首要目标是在主机管理型 SMR (HM-SMR) 驱动器上启用非覆盖工作负载(例如 RGW),并探索清理(垃圾回收)策略。HM-SMR 驱动器是具有 ZBC/ZAC 接口的高容量硬盘。长期目标是支持 ZNS SSD,因为它们将可用,以及支持覆盖工作负载。
该系列补丁中的第一个补丁使数据能够写入 HM-SMR 驱动器。此补丁引入了 ZonedFreelistManager,这是一个 FreelistManager 实现,它通过跟踪写入指针和每个区域的死字节数,将足够的信息传递给 ZonedAllocator,以正确初始化区域的状态。我们必须引入一个新的 FreelistManager 实现,因为对于分区设备,磁盘区域可以处于三种状态(空闲、已使用和死亡),而当前的 BitmapFreelistManager 只跟踪两种状态(空闲和已使用)。只跟踪两种状态无法准确初始化 ZonedAllocator 中区域的状态。计划中的第三个补丁将引入一个基本的清理器,为进一步研究奠定基础。
目前,我们可以在运行在 HM-SMR 驱动器上的 OSD 上执行基本的 RADOS 基准测试,重新启动 OSD,读取写入的数据,并写入新数据,如下所示。
如有疑问,请联系 Abutalib Aghayev <agayev@psu.edu>。
$ sudo zbd report -i -n /dev/sdc
Device /dev/sdc:
Vendor ID: ATA HGST HSH721414AL T240
Zone model: host-managed
Capacity: 14000.520 GB (27344764928 512-bytes sectors)
Logical blocks: 3418095616 blocks of 4096 B
Physical blocks: 3418095616 blocks of 4096 B
Zones: 52156 zones of 256.0 MB
Maximum number of open zones: no limit
Maximum number of active zones: no limit
52156 / 52156 zones
$ MON=1 OSD=1 MDS=0 sudo ../src/vstart.sh --new --localhost --bluestore --bluestore-devs /dev/sdc --bluestore-zoned
<snipped verbose output>
$ sudo ./bin/ceph osd pool create bench 32 32
pool 'bench' created
$ sudo ./bin/rados bench -p bench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_h0.cc.journaling712.narwhal.p_29846
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 45 29 115.943 116 0.384175 0.407806
2 16 86 70 139.949 164 0.259845 0.391488
3 16 125 109 145.286 156 0.31727 0.404727
4 16 162 146 145.953 148 0.826671 0.409003
5 16 203 187 149.553 164 0.44815 0.404303
6 16 242 226 150.621 156 0.227488 0.409872
7 16 281 265 151.384 156 0.411896 0.408686
8 16 320 304 151.956 156 0.435135 0.411473
9 16 359 343 152.401 156 0.463699 0.408658
10 15 396 381 152.356 152 0.409554 0.410851
Total time run: 10.3305
Total writes made: 396
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 153.333
Stddev Bandwidth: 13.6561
Max bandwidth (MB/sec): 164
Min bandwidth (MB/sec): 116
Average IOPS: 38
Stddev IOPS: 3.41402
Max IOPS: 41
Min IOPS: 29
Average Latency(s): 0.411226
Stddev Latency(s): 0.180238
Max latency(s): 1.00844
Min latency(s): 0.108616
$ sudo ../src/stop.sh
$ # Notice the lack of "--new" parameter to vstart.sh
$ MON=1 OSD=1 MDS=0 sudo ../src/vstart.sh --localhost --bluestore --bluestore-devs /dev/sdc --bluestore-zoned
<snipped verbose output>
$ sudo ./bin/rados bench -p bench 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 61 45 179.903 180 0.117329 0.244067
2 16 116 100 199.918 220 0.144162 0.292305
3 16 174 158 210.589 232 0.170941 0.285481
4 16 251 235 234.918 308 0.241175 0.256543
5 16 316 300 239.914 260 0.206044 0.255882
6 15 392 377 251.206 308 0.137972 0.247426
7 15 458 443 252.984 264 0.0800146 0.245138
8 16 529 513 256.346 280 0.103529 0.239888
9 16 587 571 253.634 232 0.145535 0.2453
10 15 646 631 252.254 240 0.837727 0.246019
Total time run: 10.272
Total reads made: 646
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 251.558
Average IOPS: 62
Stddev IOPS: 10.005
Max IOPS: 77
Min IOPS: 45
Average Latency(s): 0.249385
Max latency(s): 0.888654
Min latency(s): 0.0103208
$ sudo ./bin/rados bench -p bench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_h0.aa.journaling712.narwhal.p_64416
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 46 30 119.949 120 0.52627 0.396166
2 16 82 66 131.955 144 0.48087 0.427311
3 16 123 107 142.627 164 0.3287 0.420614
4 16 158 142 141.964 140 0.405177 0.425993
5 16 192 176 140.766 136 0.514565 0.425175
6 16 224 208 138.635 128 0.69184 0.436672
7 16 261 245 139.967 148 0.459929 0.439502
8 16 301 285 142.468 160 0.250846 0.434799
9 16 336 320 142.189 140 0.621686 0.435457
10 16 374 358 143.166 152 0.460593 0.436384