ceph-objectstore-tool 使用说明

参考链接:
https://github.com/ceph/ceph/blob/master/doc/man/8/ceph-objectstore-tool.rst
https://github.com/ceph/ceph/blob/master/src/tools/ceph_objectstore_tool.cc

ceph-objectstore-tool 是 ceph 提供的一个操作 pg 及 pg 里面对象的工具。

ceph-objectstore-tool是修改OSD状态的工具。它有助于操作对象的内容、删除对象、列出omap、操作omap头、操作omap键、列出对象属性和操作对象属性键。

1. 简介

[root@localhost build]# ./bin/ceph-objectstore-tool -h
Must provide --data-pathAllowed options:--help                      produce help message,帮助--type arg                  Arg is one of [bluestore (default), filestore, memstore],存储引擎类型,默认 bluestore--data-path arg             path to object store, mandatory,存储路径,一般 /var/ceph/osd-0--journal-path arg          path to journal, use if tool can't find it,filestore 时使用--pgid arg                  PG id, mandatory for info, log, remove, export, export-remove, mark-complete, trim-pg-log, and mandatory for apply-layout-settings if --pool is not specified,某些情况下并且 pool 未指定时,为必填项--pool arg                  Pool name, mandatory for apply-layout-settings if--pgid is not specified,apply-layout-settings 时并且 pgid 未指定时,必填--op arg                    Arg is one of [info, log, remove, mkfs, fsck, repair, fuse, dup, export, export-remove, import,list, fix-lost, list-pgs, dump-journal, dump-super, meta-list, get-osdmap, set-osdmap, get-inc-osdmap, set-inc-osdmap, mark-complete, reset-last-complete, apply-layout-settings, update-mon-db, dump-export, trim-pg-log],操作--epoch arg                 epoch# for get-osdmap and get-inc-osdmap, the current epoch in use if not specified,几代目--file arg                  path of file to export, export-remove, import, get-osdmap, set-osdmap, get-inc-osdmap or set-inc-osdmap,指定文件输入输出路径--mon-store-path arg        path of monstore to update-mon-db--fsid arg                  fsid for new store created by mkfs--target-data-path arg      path of target object store (for --op dup)--mountpoint arg            fuse mountpoint--format arg (=json-pretty) Output format which may be json, json-pretty, xml, xml-pretty--debug                     Enable diagnostic output to stderr--force                     Ignore some types of errors and proceed with operation - USE WITH CAUTION: CORRUPTION POSSIBLENOW OR IN THE FUTURE--skip-journal-replay       Disable journal replay--skip-mount-omap           Disable mounting of omap--head                      Find head/snapdir when searching for objects by name--dry-run                   Don't modify the objectstore--namespace arg             Specify namespace when searching for objects--rmtype arg                Specify corrupting object removal 'snapmap' or 'nosnapmap' - TESTING USE ONLYPositional syntax:ceph-objectstore-tool ... <object> (get|set)-bytes [file]
ceph-objectstore-tool ... <object> set-(attr|omap) <key> [file]
ceph-objectstore-tool ... <object> (get|rm)-(attr|omap) <key>
ceph-objectstore-tool ... <object> get-omaphdr
ceph-objectstore-tool ... <object> set-omaphdr [file]
ceph-objectstore-tool ... <object> list-attrs
ceph-objectstore-tool ... <object> list-omap
ceph-objectstore-tool ... <object> remove|removeall
ceph-objectstore-tool ... <object> dump
ceph-objectstore-tool ... <object> set-size
ceph-objectstore-tool ... <object> clear-data-digest
ceph-objectstore-tool ... <object> remove-clone-metadata <cloneid><object> can be a JSON object description as displayed
by --op list.
<object> can be an object name which will be looked up in all
the OSD's PGs.
<object> can be the empty string ('') which with a provided pgid
specifies the pgmeta objectThe optional [file] argument will read stdin or write stdout
if not specified or if '-' specified.

通用示例

ceph-objectstore-tool --data-path path-to-osd --op

使用前,关闭相关 OSD 服务

ceph osd set noout
systemctl stop ceph-osd@$OSD_NUMBER
systemctl status ceph-osd@$OSD_NUMBER

使用结束,重启 OSD 服务

systemctl restart ceph-osd@OSD_NUMBER
ceph osd unset nooutcrash 警告解决方法
HEALTH_WARN 5 daemons have recently crashed
ceph crash ls-new
ceph crash archive-all

2. 示例

列出指定 OSD 所有对象

结果以 [pgid, {oid,对象信息}] 的形式展示每个对象。

ceph-objectstore-tool --data-path $PATH_TO_OSD --op list[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list
...
["2.7",{"oid":"rbd_data.20e5ff0224ec0.00000000000000a0","key":"","snapid":-2,"hash":467323015,"max":0,"pool":2,"namespace":"","max":0}]
["2.3a",{"oid":"rbd_info","key":"","snapid":-2,"hash":2886620986,"max":0,"pool":2,"namespace":"","max":0}]
["2.33",{"oid":"rbd_data.20e5ff0224ec0.0000000000000000","key":"","snapid":-2,"hash":2764933619,"max":0,"pool":2,"namespace":"","max":0}]
["2.25",{"oid":"rbd_id.rbd-pool-image-1","key":"","snapid":-2,"hash":2198578149,"max":0,"pool":2,"namespace":"","max":0}]
["2.1c",{"oid":"rbd_directory","key":"","snapid":-2,"hash":816417820,"max":0,"pool":2,"namespace":"","max":0}]
["2.1d",{"oid":"rbd_header.20e5ff0224ec0","key":"","snapid":-2,"hash":1624672572,"max":0,"pool":2,"namespace":"","max":0}]
...

列出指定 PG 所有对象

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID --op list[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.68 --op list
["1.68",{"oid":"benchmark_data_node-1_2694_object67","key":"","snapid":-2,"hash":1705301608,"max":0,"pool":1,"namespace":"","max":0}]
...

查询对象信息

可以指定对象id,也可以通过在结果中 grep 来筛选。

ceph-objectstore-tool --data-path $PATH_TO_OSD --op list $OBJECT_ID[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list benchmark_data_node-1_2694_object61
["1.26",{"oid":"benchmark_data_node-1_2694_object61","key":"","snapid":-2,"hash":1072655526,"max":0,"pool":1,"namespace":"","max":0}]

查询对象详细信息

$OBJECT 可以是 json 格式的对象,也可以直接是 oid。这一点在帮助文档中有提到。

ceph-objectstore-tool --data-path $PATH_TO_OSD $OBJECT dump[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ rbd_header.20e5ff0224ec0 dump
{"id": {"oid": "rbd_header.20e5ff0224ec0","key": "","snapid": -2,"hash": 1624672572,"max": 0,"pool": 2,"namespace": "","max": 0},"info": {"oid": {"oid": "rbd_header.20e5ff0224ec0","key": "","snapid": -2,"hash": 1624672572,"max": 0,"pool": 2,"namespace": ""},"version": "137'29","prior_version": "137'28","last_reqid": "osd.1.0:2","user_version": 27,"size": 0,"mtime": "2021-05-27 09:33:24.367195","local_mtime": "2021-05-27 09:33:24.422271","lost": 0,"flags": ["dirty","omap","data_digest","omap_digest"],"truncate_seq": 0,"truncate_size": 0,"data_digest": "0xffffffff","omap_digest": "0x4bbef111","expected_object_size": 0,"expected_write_size": 0,"alloc_hint_flags": 0,"manifest": {"type": 0},"watchers": {}},"stat": {"size": 0,"blksize": 4096,"blocks": 0,"nlink": 1},"SnapSet": {"snap_context": {"seq": 0,"snaps": []},"clones": []}
}

修复所有遗失对象(fixme)

注意:fix 功能还需要完善,暂时不可用

ceph-objectstore-tool --data-path $PATH_TO_OSD --op fix-lost

修复 PG 遗失对象(fixme)

注意:fix 功能还需要完善,暂时不可用

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID --op fix-lost

修复遗失对象(fixme)

注意:fix 功能还需要完善,暂时不可用

 ceph-objectstore-tool --data-path $PATH_TO_OSD --op fix-lost $OBJECT_ID

获取对象

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-bytes > $OBJECT_FILE_NAME[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-bytes > rbd_header
[root@node-1 ~]# ls -al rbd_header
-rw-r--r-- 1 root root 4194304 6月   9 14:54 rbd_header

修改对象内容

结合 get-bytes 命令,可以用于替换损坏对象

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-bytes < $OBJECT_FILE_NAME[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' set-bytes < zone_info.default.working-copy

删除对象

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT remove[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object67 remove
remove #1:166b25a6:::benchmark_data_node-1_2694_object67:head#

查询 object map key

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT list-omap[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ rbd_header.20e5ff0224ec0 list-omap
access_timestamp
create_timestamp
features
flags
modify_timestamp
object_prefix
order
size
snap_seq

获取 object map

必须要指定key,可以先通过 list-omap 获取所有key

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-omap <key> [> $OBJECT_MAP_FILE_NAME][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ rbd_header.20e5ff0224ec0 get-omap object_prefix
Base64:FgAAAHJiZF9kYXRhLjIwZTVmZjAyMjRlYzA=
[root@node-1 ~]# echo FgAAAHJiZF9kYXRhLjIwZTVmZjAyMjRlYzA= | base64 -d
rbd_data.20e5ff0224ec0

设置 object map

此操作会覆盖之前的 value 值,若要以追加或者修改的方式更改 value,需要先使用 get-omap命令获取原本的 value。

注意:必须提供 key,文件路径。

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-omap <$KEY> <$OBJECT_MAP_FILE_NAME>[root@node-1 ~]# vi my_omap_value
this is my omap value
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 set-omap my_omap_key my_omap_value
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omap my_omap_key
Base64:dGhpcyBpcyBteSBvbWFwIHZhbHVlCg==
[root@node-1 ~]# echo dGhpcyBpcyBteSBvbWFwIHZhbHVlCg== | base64 -d
this is my omap value

删除 object map

必须指定 key。

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT rm-omap $KEY[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 rm-omap my_omap_key
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omap my_omap_key
Key not found

设置 object map header

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-omaphdr [< $OBJECT_MAP_FILE_NAME][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omaphdr
Base64:dGhpcyBpcyBvbWFwIGhlYWRlcgo=
[root@node-1 ~]# echo dGhpcyBpcyBvbWFwIGhlYWRlcgo= | base64 -d
this is omap header

获取 object map header

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-omap $KEY [> $OBJECT_MAP_FILE_NAME][root@node-1 ~]# cat omaphdr
this is omap header
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 set-omaphdr < omaphdr
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omaphdr
Base64:dGhpcyBpcyBvbWFwIGhlYWRlcgo=
[root@node-1 ~]# echo dGhpcyBpcyBvbWFwIGhlYWRlcgo= | base64 -d
this is omap header

列出对象 attrs key

列出对象的 xattr 属性的所有 key 。

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT list-attrs[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 list-attrs
_
snapset

获取对象 attrs

需要指定对象及其 xattr 的 key

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-attr $KEY [> $OBJECT_ATTRS_FILE_NAME][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-attr _
Base64:EQgcAQAABANEAAAAAAAAACMAAABiZW5jaG1hcmtfZGF0YV9ub2RlLTFfMjY5NF9vYmplY3Q2Mf7/pmzvPwAAAAAAAQAAAAAAAAAGAxwAAAABAAAAAAAAAP8AAAAAAAAAAP//AAAAAAIAAAAAAAAAMgAAAAEAAAAAAAAAHQAAAAICFQAAAAQCAAAAAAAAAAcAAAAAAAAAAAAAAAAAQAAAAAAAv9eYYNTq8SUCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAADQAAAC/15hgasR6MwnBjuz/AABAAAAAAAAAAEAAAAAAADUAAAA=

设置对象 attrs

注意:必须指定 oid、key、value文件路径

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT  set-attr $KEY < $OBJECT_ATTRS_FILE_NAME[root@node-1 ~]# vi my_xattr
this is my xattr
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 set-attr my_xattr_key my_xattr
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-attr my_xattr_key
Base64:dGhpcyBpcyBteSB4YXR0cgo=
[root@node-1 ~]# echo dGhpcyBpcyBteSB4YXR0cgo= | base64 -d
this is my xattr

删除对象 attrs

ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT rm-attr $KEY[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 rm-attr my_xattr_key
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-attr my_xattr_key
getattr: (61) No data available

fsck | repair

[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op [fsck|repair]

mkfs + dup:导出整个 osd(block 数据、keyring、fsid、type 等)

此组合操作可以拷贝出一个完整的 osd。

首先mkfs,需要指定路径,type 可选项(默认 bluestore),fsid 可选项(默认 随机生成)

[root@node-1 ~]# ceph-objectstore-tool --data-path /root/osd.dir/ --op mkfs
failed to fetch mon config (--no-mon-config to skip)
[root@node-1 ~]# ceph-objectstore-tool --data-path /root/osd.dir/ --op mkfs --no-mon-config

然后使用 dup 命令复制 osd,需要指出 源路径 和 目标路径

本人 dup 失败了,下面贴出报错信息。

[root@node-1 osd.dir]#  ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --target-data-path ~/osd.dir/ --op dup
dup from bluestore: /var/lib/ceph/osd/ceph-0/to bluestore: /root/osd.dir/
src fsid 9912f587-6c2c-4098-8635-b97fd46f721e != dest 2c65c351-a968-4dee-b97a-82e9107ef749
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/os/bluestore/Allocator.cc: In function 'virtual Allocator::SocketHook::~SocketHook()' thread 7fcfc160f780 time 2021-06-10 10:34:15.927047
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/os/bluestore/Allocator.cc: 43: FAILED ceph_assert(r == 0)ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7fcfb75ba2d5]2: (()+0x25449d) [0x7fcfb75ba49d]3: (()+0x925a75) [0x561b5b38fa75]4: (BitmapAllocator::~BitmapAllocator()+0x12f) [0x561b5b3ded7f]5: (BlueFS::_stop_alloc()+0xb3) [0x561b5b39f853]6: (BlueFS::umount(bool)+0x13e) [0x561b5b3b9e6e]7: (BlueStore::_close_bluefs(bool)+0x11) [0x561b5b29a401]8: (BlueStore::_close_db_and_around(bool)+0x91) [0x561b5b31dac1]9: (BlueStore::umount()+0x299) [0x561b5b31e4c9]10: (dup(std::string, ObjectStore*, std::string, ObjectStore*)+0x39c) [0x561b5ae20e4c]11: (main()+0x3139) [0x561b5ade1789]12: (__libc_start_main()+0xf5) [0x7fcfb444d555]13: (()+0x3a52a0) [0x561b5ae0f2a0]
*** Caught signal (Aborted) **in thread 7fcfc160f780 thread_name:ceph-objectstorceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)1: (()+0xf630) [0x7fcfb5a8f630]2: (gsignal()+0x37) [0x7fcfb4461387]3: (abort()+0x148) [0x7fcfb4462a78]4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7fcfb75ba324]5: (()+0x25449d) [0x7fcfb75ba49d]6: (()+0x925a75) [0x561b5b38fa75]7: (BitmapAllocator::~BitmapAllocator()+0x12f) [0x561b5b3ded7f]8: (BlueFS::_stop_alloc()+0xb3) [0x561b5b39f853]9: (BlueFS::umount(bool)+0x13e) [0x561b5b3b9e6e]10: (BlueStore::_close_bluefs(bool)+0x11) [0x561b5b29a401]11: (BlueStore::_close_db_and_around(bool)+0x91) [0x561b5b31dac1]12: (BlueStore::umount()+0x299) [0x561b5b31e4c9]13: (dup(std::string, ObjectStore*, std::string, ObjectStore*)+0x39c) [0x561b5ae20e4c]14: (main()+0x3139) [0x561b5ade1789]15: (__libc_start_main()+0xf5) [0x7fcfb444d555]16: (()+0x3a52a0) [0x561b5ae0f2a0]
已放弃

fuse:可以查看块层面的对象

# 终端1
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op fuse --mountpoint /mnt/ceph-osd@0/
mounting fuse at /mnt/ceph-osd@0/ ...# 终端2
[root@node-1 mnt]# df
文件系统                   1K-块     已用    可用 已用% 挂载点
foo                     10481664  3820032 6661632   37% /mnt/ceph-osd@0
# 使用完,记得卸载
[root@node-1 mnt]# umount /mnt/ceph-osd\@0/

查看超级块信息

[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op dump-super[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op dump-super
{"cluster_fsid": "60e065f1-d992-4d1a-8f4e-f74419674f7e","osd_fsid": "9912f587-6c2c-4098-8635-b97fd46f721e","whoami": 0,"current_epoch": 156,"oldest_map": 1,"newest_map": 156,"weight": 0,"compat": {"compat": {},"ro_compat": {},"incompat": {"feature_1": "initial feature set(~v.18)","feature_2": "pginfo object","feature_3": "object locator","feature_4": "last_epoch_clean","feature_5": "categories","feature_6": "hobjectpool","feature_7": "biginfo","feature_8": "leveldbinfo","feature_9": "leveldblog","feature_10": "snapmapper","feature_11": "sharded objects","feature_12": "transaction hints","feature_13": "pg meta object","feature_14": "explicit missing set","feature_15": "fastinfo pg attr","feature_16": "deletes in missing set"}},"clean_thru": 156,"last_epoch_mounted": 154
}

列出本块 OSD 上 PG

[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list-pgs[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list-pgs
2.e
2.d
2.c
2.a
2.9
2.8
2.7
2.3f
2.f
2.3e
2.3a
...

info | log:查询 PG 信息

[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.0 --op [info|log][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.2 --op log
{"pg_log_t": {"head": "50'2","tail": "0'0","log": [{"op": "modify","object": "1:416569a2:::benchmark_data_node-1_2694_object81:head","version": "29'1","prior_version": "0'0","reqid": "client.24213.0:82","extra_reqids": [],"mtime": "2021-05-10 14:50:40.083127","return_code": 0,"mod_desc": {"object_mod_desc": {"can_local_rollback": false,"rollback_info_completed": false,"ops": []}}},{"op": "modify","object": "1:416569a2:::benchmark_data_node-1_2694_object81:head","version": "50'2","prior_version": "29'1","reqid": "osd.1.0:9","extra_reqids": [],"mtime": "0.000000","return_code": 0,"mod_desc": {"object_mod_desc": {"can_local_rollback": false,"rollback_info_completed": false,"ops": []}}}],"dups": []},"pg_missing_t": {"missing": [],"may_include_deletes": true}
}[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.2 --op info
{"pgid": "1.2","last_update": "50'2","last_complete": "50'2","log_tail": "0'0","last_user_version": 1,"last_backfill": "MAX","last_backfill_bitwise": 0,"purged_snaps": [],"history": {"epoch_created": 18,"epoch_pool_created": 18,"last_epoch_started": 324,"last_interval_started": 323,"last_epoch_clean": 324,"last_interval_clean": 323,"last_epoch_split": 0,"last_epoch_marked_full": 0,"same_up_since": 323,"same_interval_since": 323,"same_primary_since": 322,"last_scrub": "50'2","last_scrub_stamp": "2021-07-08 16:27:39.579601","last_deep_scrub": "50'2","last_deep_scrub_stamp": "2021-07-08 16:27:39.579601","last_clean_scrub_stamp": "2021-07-08 16:27:39.579601"},"stats": {"version": "50'2","reported_seq": "189","reported_epoch": "321","state": "unknown","last_fresh": "2021-07-13 14:36:21.560457","last_change": "2021-07-13 14:36:21.560457","last_active": "2021-06-23 14:50:51.902398","last_peered": "2021-06-23 14:49:25.437981","last_clean": "2021-06-23 14:49:25.437981","last_became_active": "2021-06-23 14:44:21.614072","last_became_peered": "2021-06-23 14:44:21.614072","last_unstale": "2021-07-13 14:36:21.560457","last_undegraded": "2021-07-13 14:36:21.560457","last_fullsized": "2021-07-13 14:36:21.560457","mapping_epoch": 323,"log_start": "0'0","ondisk_log_start": "0'0","created": 18,"last_epoch_clean": 311,"parent": "0.0","parent_split_bits": 0,"last_scrub": "50'2","last_scrub_stamp": "2021-07-08 16:27:39.579601","last_deep_scrub": "50'2","last_deep_scrub_stamp": "2021-07-08 16:27:39.579601","last_clean_scrub_stamp": "2021-07-08 16:27:39.579601","log_size": 2,"ondisk_log_size": 2,"stats_invalid": false,"dirty_stats_invalid": false,"omap_stats_invalid": false,"hitset_stats_invalid": false,"hitset_bytes_stats_invalid": false,"pin_stats_invalid": false,"manifest_stats_invalid": false,"snaptrimq_len": 0,"stat_sum": {"num_bytes": 4194304,"num_objects": 1,"num_object_clones": 0,"num_object_copies": 3,"num_objects_missing_on_primary": 0,"num_objects_missing": 0,"num_objects_degraded": 0,"num_objects_misplaced": 0,"num_objects_unfound": 0,"num_objects_dirty": 1,"num_whiteouts": 0,"num_read": 0,"num_read_kb": 0,"num_write": 1,"num_write_kb": 4096,"num_scrub_errors": 0,"num_shallow_scrub_errors": 0,"num_deep_scrub_errors": 0,"num_objects_recovered": 0,"num_bytes_recovered": 0,"num_keys_recovered": 0,"num_objects_omap": 0,"num_objects_hit_set_archive": 0,"num_bytes_hit_set_archive": 0,"num_flush": 0,"num_flush_kb": 0,"num_evict": 0,"num_evict_kb": 0,"num_promote": 0,"num_flush_mode_high": 0,"num_flush_mode_low": 0,"num_evict_mode_some": 0,"num_evict_mode_full": 0,"num_objects_pinned": 0,"num_legacy_snapsets": 0,"num_large_omap_objects": 0,"num_objects_manifest": 0,"num_omap_bytes": 0,"num_omap_keys": 0,"num_objects_repaired": 0},"up": [1,0,2],"acting": [1,0,2],"avail_no_missing": [],"object_location_counts": [],"blocked_by": [],"up_primary": 1,"acting_primary": 1,"purged_snaps": []},"empty": 0,"dne": 0,"incomplete": 0,"last_epoch_started": 324,"hit_set_history": {"current_last_update": "0'0","history": []}
}

export|export-remove:导出 PG,dump-export:查看导出的对象信息

# export:导出不删除,export-remove:导出并从 OSD 中移除该 PG
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.0 --op [export|export-remove] --file export.file[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.1 --op export --file pg2.1.file
Exporting 2.1 info 2.1( v 119'6 (0'0,119'6] local-lis/les=154/155 n=0 ec=64/64 lis/c 154/154 les/c/f 155/155/0 154/154/154)
Export successful
[root@node-1 ~]# ls
anaconda-ks.cfg  ceph-deploy  pg2.1.file# 查看,需要先使用 export 导出pg
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --file ./export.file --op dump-export [root@node-1 ~]# ceph-objectstore-tool --file pg2.1.file  --op dump-export
failed to fetch mon config (--no-mon-config to skip)
[root@node-1 ~]# ceph-objectstore-tool --file pg2.1.file  --op dump-export --no-mon-config
{"pgid": "2.1","cluster_fsid": "60e065f1-d992-4d1a-8f4e-f74419674f7e","features": "compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo object,3=object locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction hints,13=pg meta object,14=explicit missing set,15=fastinfo pg attr,16=deletes in missing set}","metadata_section": {"pg_disk_version": 10,"map_epoch": 155,"OSDMap": {"epoch": 155,"fsid": "60e065f1-d992-4d1a-8f4e-f74419674f7e","created": "2020-08-07 13:40:34.125175","modified": "2021-06-10 08:50:55.264485","last_up_change": "2021-06-10 08:50:54.258791","last_in_change": "2021-06-09 13:47:51.144852","flags": "sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit","flags_num": 5799936,"flags_set": ["pglog_hardlimit","purged_snapdirs","recovery_deletes","sortbitwise"],"crush_version": 7,"full_ratio": 0.94999998807907104,"backfillfull_ratio": 0......
}

导入 PG(必须先使用 export-remove,确保该 PG 为空)

[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/  --pgid 1.0 --op import --file import.file# pgid要匹配
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.0 --op import --file pg2.1.file
specified pgid 2.0 does not match actual pgid 2.1
# pg需要为空
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.1 --op import --file pg2.1.file
get_pg_num_history pg_num_history pg_num_history(e156 pg_nums {1={18=128},2={64=64}} deleted_pools )
pgid 2.1 already exists# 导出,再导入
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.0 --op export-remove --file pg2.1_repli.file
Exporting 2.0 info 2.0( v 119'6 (0'0,119'6] local-lis/les=154/155 n=0 ec=64/64 lis/c 154/154 les/c/f 155/155/0 154/154/152)
Export successfulmarking collection for removal
setting '_remove' omap key
finish_remove_pgs 2.0_head removing 2.0
Remove successful
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.0 --op import --file pg2.1_repli.file
get_pg_num_history pg_num_history pg_num_history(e156 pg_nums {1={18=128},2={64=64}} deleted_pools )
Importing pgid 2.0
write_pg epoch 155 info 2.0( v 119'6 (0'0,119'6] local-lis/les=154/155 n=0 ec=64/64 lis/c 154/154 les/c/f 155/155/0 154/154/152)
Import successful

3. 源码分析

源码位于:src/tools/ceph_objectstore_tool.cc

main

int main(int argc, char **argv)
{ // 使用 options_description 保存所有的参数po::options_description desc("Allowed options");// 参数解析desc.add_options()("type", po::value<string>(&type),"Arg is one of [bluestore (default), filestore, memstore]")...;vector<string> ceph_option_strings; po::variables_map vm;try {po::parsed_options parsed =po::command_line_parser(argc, argv).options(all).allow_unregistered().positional(pd).run();po::store( parsed, vm);po::notify(vm);ceph_option_strings = po::collect_unrecognized(parsed.options,po::include_positional);} catch(po::error &e) {std::cerr << e.what() << std::endl;return 1;}// 参数校验...// 向ceph_option_strings加入:-n, osd.whoami, --osd-data, data-pathchar fn[PATH_MAX];snprintf(fn, sizeof(fn), "%s/whoami", dpath.c_str());int fd = ::open(fn, O_RDONLY);if (fd >= 0) {bufferlist bl;bl.read_fd(fd, 64);string s(bl.c_str(), bl.length());int whoami = atoi(s.c_str());vector<string> tmp;// identify ourselves as this osd so we can auth and fetch our configstmp.push_back("-n");tmp.push_back(string("osd.") + stringify(whoami));// populate osd_data so that the default keyring location workstmp.push_back("--osd-data");tmp.push_back(dpath);tmp.insert(tmp.end(), ceph_option_strings.begin(),ceph_option_strings.end());tmp.swap(ceph_option_strings);}// 读取 osd typesnprintf(fn, sizeof(fn), "%s/type", dpath.c_str());...// 对一些特殊 op 做参数完整性校验if (op == "fuse" && mountpoint.length() == 0) {cerr << "Missing fuse mountpoint" << std::endl;usage(desc);return 1;}...// 创建 ObjectStoreToolObjectStoreTool tool = ObjectStoreTool(file_fd, dry_run);...// 初始化 global_context    auto cct = global_init(NULL, ceph_options,CEPH_ENTITY_TYPE_OSD,CODE_ENVIRONMENT_UTILITY_NODOUT,init_flags);common_init_finish(g_ceph_context);...// 创建对象存储句柄,filestore|bluestoreObjectStore *fs = ObjectStore::create(g_ceph_context, type, dpath, jpath, flags);int ret = fs->mount();// 获取 collection 句柄auto ch = fs->open_collection(coll_t::meta());...// 获取超级块  std::unique_ptr <OSDSuperblock> superblock;if (!no_superblock) {superblock.reset(new OSDSuperblock);bufferlist::const_iterator p;ret = fs->read(ch, OSD_SUPERBLOCK_GOBJECT, 0, 0, bl);if (ret < 0) {cerr << "Failure to read OSD superblock: " << cpp_strerror(ret) << std::endl;goto out;}p = bl.cbegin();decode(*superblock, p);}}// 根据 op 执行相应函数...
}

export

// 导出文件模型:
// |super-header|pg-begin|metadata|object|pg-end|
int ObjectStoreTool::do_export(ObjectStore *fs, coll_t coll, spg_t pgid,pg_info_t &info, epoch_t map_epoch, __u8 struct_ver,const OSDSuperblock &superblock,PastIntervals &past_intervals) {PGLog::IndexedLog log;pg_missing_t missing;int ret = get_log(fs, struct_ver, pgid, info, log, missing);if (ret > 0)return ret;// 向导出文件写入 超级块 信息write_super();// pg开始的信息,里面保存了 pgid 和 superblockpg_begin pgb(pgid, superblock);// Special case: If replicated pg don't require the importing OSD to have shard featureif (pgid.is_no_shard()) {pgb.superblock.compat_features.incompat.remove(CEPH_OSD_FEATURE_INCOMPAT_SHARDS);}// 向导出文件写入 pgb 信息ret = write_section(TYPE_PG_BEGIN, pgb, file_fd);if (ret)return ret;// The metadata_section is now before files, so import can detect// errors and abort without wasting time.metadata_section ms(struct_ver,map_epoch,info,log,past_intervals,missing);ret = add_osdmap(fs, ms);if (ret)return ret;// 向导出文件写入 metadata_sectionret = write_section(TYPE_PG_METADATA, ms, file_fd);if (ret)return ret;// 导出 pg 所有对象内容ret = export_files(fs, coll);if (ret) {cerr << "export_files error " << ret << std::endl;return ret;}// 写入 pg_endret = write_simple(TYPE_PG_END, file_fd);if (ret)return ret;return 0;
}

dump-export

int ObjectStoreTool::dump_export(Formatter *formatter)
{bufferlist ebl;pg_info_t info;PGLog::IndexedLog log;//bool skipped_objects = false;int ret = read_super();if (ret)return ret;if (sh.magic != super_header::super_magic) {cerr << "Invalid magic number" << std::endl;return -EFAULT;}if (sh.version > super_header::super_ver) {cerr << "Can't handle export format version=" << sh.version << std::endl;return -EINVAL;}formatter->open_object_section("Export");//First section must be TYPE_PG_BEGINsectiontype_t type;ret = read_section(&type, &ebl);if (ret)return ret;if (type == TYPE_POOL_BEGIN) {cerr << "Dump of pool exports not supported" << std::endl;return -EINVAL;} else if (type != TYPE_PG_BEGIN) {cerr << "Invalid first section type " << std::to_string(type) << std::endl;return -EFAULT;}auto ebliter = ebl.cbegin();pg_begin pgb;pgb.decode(ebliter);spg_t pgid = pgb.pgid;formatter->dump_string("pgid", stringify(pgid));formatter->dump_string("cluster_fsid", stringify(pgb.superblock.cluster_fsid));formatter->dump_string("features", stringify(pgb.superblock.compat_features));bool done = false;bool found_metadata = false;metadata_section ms;bool objects_started = false;while(!done) {ret = read_section(&type, &ebl);if (ret)return ret;if (debug) {cerr << "dump_export: Section type " << std::to_string(type) << std::endl;}if (type >= END_OF_TYPES) {cerr << "Skipping unknown section type" << std::endl;continue;}switch(type) {case TYPE_OBJECT_BEGIN:if (!objects_started) {formatter->open_array_section("objects");objects_started = true;}ret = dump_object(formatter, ebl);if (ret) return ret;break;case TYPE_PG_METADATA:if (objects_started)cerr << "WARNING: metadata_section out of order" << std::endl;ret = dump_pg_metadata(formatter, ebl, ms);if (ret) return ret;found_metadata = true;break;case TYPE_PG_END:if (objects_started) {formatter->close_section();}done = true;break;default:cerr << "Unknown section type " << std::to_string(type) << std::endl;return -EFAULT;}}if (!found_metadata) {cerr << "Missing metadata section" << std::endl;return -EFAULT;}formatter->close_section();formatter->flush(cout);return 0;
}

fsck|fsck-deep|repair|repair-deep

// 调用  BlueStore::fsck() -> BlueStore::_fsck() 完成数据校验、修复int fsck(bool deep) override {return _fsck(deep ? FSCK_DEEP : FSCK_REGULAR, false);}int repair(bool deep) override {return _fsck(deep ? FSCK_DEEP : FSCK_REGULAR, true);}int quick_fix() override {return _fsck(FSCK_SHALLOW, true);}
/**
An overview for currently implemented repair logics
performed in fsck in two stages: detection(+preparation) and commit.
Detection stage (in processing order):(Issue -> Repair action to schedule)- Detect undecodable keys for Shared Blobs -> Remove- Detect undecodable records for Shared Blobs -> Remove (might trigger missed Shared Blob detection below)- Detect stray records for Shared Blobs -> Remove- Detect misreferenced pextents -> FixPrepare Bloom-like filter to track cid/oid -> pextent Prepare list of extents that are improperly referencedEnumerate Onode records that might use 'misreferenced' pextents(Bloom-like filter applied to reduce computation)Per each questinable Onode enumerate all blobs and identify broken ones (i.e. blobs having 'misreferences')Rewrite each broken blob data by allocating another extents and copying data thereIf blob is shared - unshare it and mark corresponding Shared Blob for removalRelease previously allocated spaceUpdate Extent Map- Detect missed Shared Blobs -> Recreate- Detect undecodable deferred transaction -> Remove- Detect Freelist Manager's 'false free' entries -> Mark as used- Detect Freelist Manager's leaked entries -> Mark as free- Detect statfs inconsistency - UpdateCommit stage (separate DB commit per each step):- Apply leaked FM entries fix- Apply 'false free' FM entries fix- Apply 'Remove' actions- Apply fix for misreference pextents- Apply Shared Blob recreate (can be merged with the step above if misreferences were dectected)- Apply StatFS update
*/
int BlueStore::_fsck(BlueStore::FSCKDepth depth, bool repair)
{dout(1) << __func__<< (repair ? " repair" : " check")<< (depth == FSCK_DEEP ? " (deep)" :depth == FSCK_SHALLOW ? " (shallow)" : " (regular)")<< dendl;// in deep mode we need R/W write access to be able to replay deferred opsbool read_only = !(repair || depth == FSCK_DEEP);int r = _open_db_and_around(read_only);if (r < 0)return r;if (!read_only) {r = _upgrade_super();if (r < 0) {goto out_db;}}r = _open_collections();if (r < 0)goto out_db;mempool_thread.init();// we need finisher and kv_{sync,finalize}_thread *just* for replay// enable in repair or deep mode modes onlyif (!read_only) {_kv_start();r = _deferred_replay();_kv_stop();}if (r < 0)goto out_scan;// 校验并修复元数据,具体内容参见:src/os/bluestore/BlueStore.ccr = _fsck_on_open(depth, repair);out_scan:mempool_thread.shutdown();_shutdown_cache();
out_db:_close_db_and_around(false);return r;
}

mkfs

// fs->mkfs();
int BlueStore::mkfs() {...{// 如果之前已经 mkfs 过了,则只做 fsck 检查r = read_meta("mkfs_done", &done);...r = fsck(cct->_conf->bluestore_fsck_on_mkfs_deep);...return r; // idempotent}// 向/osd-data-path/block写入元数据 type ,设为 bluestore{...r = read_meta("type", &type);if (r == 0) {if (type != "bluestore") {derr << __func__ << " expected bluestore, but type is " << type << dendl;return -EIO;}} else {r = write_meta("type", "bluestore");if (r < 0)return r;}}freelist_type = "bitmap";//打开设备目录/osd-data-path/r = _open_path();if (r < 0)return r;//打开/创建设备目录下的/osd-data-path/fsidr = _open_fsid(true);if (r < 0)goto out_path_fd;//锁定fsidr = _lock_fsid();if (r < 0)goto out_close_fsid;//读取fsid,若没有,则生成 fsidr = _read_fsid(&old_fsid);if (r < 0 || old_fsid.is_zero()) {if (fsid.is_zero()) {fsid.generate_random(); //随机生成 fsiddout(1) << __func__ << " generated fsid " << fsid << dendl;} else {dout(1) << __func__ << " using provided fsid " << fsid << dendl;}// we'll write it later.} else {if (!fsid.is_zero() && fsid != old_fsid) {derr << __func__ << " on-disk fsid " << old_fsid<< " != provided " << fsid << dendl;r = -EINVAL;goto out_close_fsid;}fsid = old_fsid;}//在/osd-data-path/目录下创建 block 文件,并把它链接到真正的 bluestore_block_path,尝试预分配 bluestore_block_size 大小的空间。r = _setup_block_symlink_or_file("block", cct->_conf->bluestore_block_path,cct->_conf->bluestore_block_size,cct->_conf->bluestore_block_create);if (r < 0)goto out_close_fsid;//若设有多个磁盘,用作 wal 和 db 设备,则继续创建 block.wal 和 block.db 链接,并预分配空间。if (cct->_conf->bluestore_bluefs) {r = _setup_block_symlink_or_file("block.wal", cct->_conf->bluestore_block_wal_path,cct->_conf->bluestore_block_wal_size,cct->_conf->bluestore_block_wal_create);if (r < 0)goto out_close_fsid;r = _setup_block_symlink_or_file("block.db", cct->_conf->bluestore_block_db_path,cct->_conf->bluestore_block_db_size,cct->_conf->bluestore_block_db_create);if (r < 0)goto out_close_fsid;}//创建并打开 BlockDevice,其类型有pmem,kernel,ust-nvme。ceph有自己的一套块设备操作方式,例如 kernel 设备使用 libaio 直接操作,越过了文件系统。r = _open_bdev(true);if (r < 0)goto out_close_fsid;// choose min_alloc_sizeif (cct->_conf->bluestore_min_alloc_size) {min_alloc_size = cct->_conf->bluestore_min_alloc_size;} else {ceph_assert(bdev);if (bdev->is_rotational()) {min_alloc_size = cct->_conf->bluestore_min_alloc_size_hdd;} else {min_alloc_size = cct->_conf->bluestore_min_alloc_size_ssd;}}//验证块设备大小是否足够启用 bluefs_validate_bdev();// make sure min_alloc_size is power of 2 aligned.if (!isp2(min_alloc_size)) {...goto out_close_bdev;}// 启用 cephfs 及其 db,用来存储元数据,一般是 rocksdbr = _open_db(true);if (r < 0)goto out_close_bdev;...// 记录 kv_backend 数据库类型r = write_meta("kv_backend", cct->_conf->bluestore_kvbackend);if (r < 0)goto out_close_fm;// 记录是否采用 bluefs 代替文件系统,基本都采用r = write_meta("bluefs", stringify(bluefs ? 1 : 0));if (r < 0)goto out_close_fm;// 更新 fsidif (fsid != old_fsid) {r = _write_fsid();if (r < 0) {derr << __func__ << " error writing fsid: " << cpp_strerror(r) << dendl;goto out_close_fm;}}if (out_of_sync_fm.fetch_and(0)) {_sync_bluefs_and_fm();}out_close_fm:_close_fm();out_close_db:_close_db();out_close_bdev:_close_bdev();out_close_fsid:_close_fsid();out_path_fd:_close_path();if (r == 0 &&cct->_conf->bluestore_fsck_on_mkfs) {int rc = fsck(cct->_conf->bluestore_fsck_on_mkfs_deep);if (rc < 0)return rc;if (rc > 0) {derr << __func__ << " fsck found " << rc << " errors" << dendl;r = -EIO;}}if (r == 0) {// indicate success by writing the 'mkfs_done' filer = write_meta("mkfs_done", "yes");}if (r < 0) {derr << __func__ << " failed, " << cpp_strerror(r) << dendl;} else {dout(0) << __func__ << " success" << dendl;}return r;
}

**********dup

  if (op == "dup") {string target_type;char fn[PATH_MAX];snprintf(fn, sizeof(fn), "%s/type", target_data_path.c_str());// 创建 target-path/type 文件int fd = ::open(fn, O_RDONLY);bufferlist bl;bl.read_fd(fd, 64);if (bl.length()) {target_type = string(bl.c_str(), bl.length() - 1);  // drop \n}::close(fd);ObjectStore *targetfs = ObjectStore::create(g_ceph_context, target_type,target_data_path, "", 0);if (targetfs == NULL) {cerr << "Unable to open store of type " << target_type << std::endl;return 1;}int r = dup(dpath, fs, target_data_path, targetfs);if (r < 0) {cerr << "dup failed: " << cpp_strerror(r) << std::endl;return 1;}return 0;}

fuse

if (op == "fuse") {
#ifdef HAVE_LIBFUSE// FuseStore fuse(fs, mountpoint);cout << "mounting fuse at " << mountpoint << " ..." << std::endl;// 通过用户态libfuse挂载 objectstoreint r = fuse.main();if (r < 0) {cerr << "failed to mount fuse: " << cpp_strerror(r) << std::endl;return 1;}
#elsecerr << "fuse support not enabled" << std::endl;
#endifreturn 0;}int FuseStore::main()
{const char *v[] = {"foo",mount_point.c_str(),"-f","-d", // debug};int c = 3;auto fuse_debug = store->cct->_conf.get_val<bool>("fuse_debug");if (fuse_debug)++c;// 调用 libfuse 库的 fuse_main 来挂载自制文件系统return fuse_main(c, (char**)v, &fs_oper, (void*)this);
}

**apply-layout-setting

import

int ObjectStoreTool::do_import(ObjectStore *store, OSDSuperblock &sb,bool force, std::string pgidstr) {bufferlist ebl;pg_info_t info;PGLog::IndexedLog log;bool skipped_objects = false;if (!dry_run)// 递归删除 is_tepm || _has_remove_flag 标志的pg内容// OSD::recursive_remove_collection(g_ceph_context, store, pgid, *it);finish_remove_pgs(store);// 读取要导入的文件中 super_header// 之前 exprot 时,向文件中写入:super_header, pg_begin, object, pg_end 等信息int ret = read_super();//First section must be TYPE_PG_BEGINsectiontype_t type;// 读取 pg_beginret = read_section(&type, &ebl);auto ebliter = ebl.cbegin();pg_begin pgb;pgb.decode(ebliter);spg_t pgid = pgb.pgid;if (pgidstr.length()) {spg_t user_pgid;// 验证 命令行输入的pg_id 和文件中读取到的 pg_id 是否一致bool ok = user_pgid.parse(pgidstr.c_str());// This succeeded in main() alreadyceph_assert(ok);if (pgid != user_pgid) {cerr << "specified pgid " << user_pgid<< " does not match actual pgid " << pgid << std::endl;return -EINVAL;}}// 验证集群 fsid 是否一致,这也要求必须导入的文件必须来自同一个集群if (!pgb.superblock.cluster_fsid.is_zero()&& pgb.superblock.cluster_fsid != sb.cluster_fsid) {cerr << "Export came from different cluster with fsid "<< pgb.superblock.cluster_fsid << std::endl;return -EINVAL;}// Special case: Old export has SHARDS incompat feature on replicated pg, removqqe itif (pgid.is_no_shard())pgb.superblock.compat_features.incompat.remove(CEPH_OSD_FEATURE_INCOMPAT_SHARDS);if (sb.compat_features.compare(pgb.superblock.compat_features) == -1) {CompatSet unsupported = sb.compat_features.unsupported(pgb.superblock.compat_features);cerr << "Export has incompatible features set " << unsupported << std::endl;// Let them import if they specify the --force optionif (!force)return 11;  // Positive return means exit status}// we need the latest OSDMap to check for collisionsOSDMap curmap;bufferlist bl;// 获取 osd_mapret = get_osdmap(store, sb.current_epoch, curmap, bl);pool_pg_num_history_t pg_num_history;get_pg_num_history(store, &pg_num_history);ghobject_t pgmeta_oid = pgid.make_pgmeta_oid();// Check for PG already present.coll_t coll(pgid);if (store->collection_exists(coll)) {cerr << "pgid " << pgid << " already exists" << std::endl;return -EEXIST;}// 创建 pg、osdriver 句柄 ObjectStore::CollectionHandle ch;OSDriver driver(store,coll_t(),OSD::make_snapmapper_oid());//SnapMapper mapper(g_ceph_context, &driver, 0, 0, 0, pgid.shard);bool done = false;bool found_metadata = false;metadata_section ms;while (!done) {// 读取 文件 指针目前指向的 section_header,并返回 type 及其内容ret = read_section(&type, &ebl);// 跳过不能识别的 sectionif (type >= END_OF_TYPES) {cout << "Skipping unknown section type" << std::endl;continue;}// 根据 tpye,填充对应的 bl 信息:object,metadata,pg-endswitch (type) {case TYPE_OBJECT_BEGIN:ceph_assert(found_metadata);// 导入 object 内容ret = get_object(store, driver, mapper, coll, ebl, ms.osdmap,&skipped_objects);if (ret) return ret;break;case TYPE_PG_METADATA:ret = get_pg_metadata(store, ebl, ms, sb, pgid);if (ret) return ret;found_metadata = true;if (pgid != ms.info.pgid) {cerr << "specified pgid " << pgid << " does not match import file pgid "<< ms.info.pgid << std::endl;return -EINVAL;}// make sure there are no conflicting splits or mergesif (ms.osdmap.have_pg_pool(pgid.pgid.pool())) {auto p = pg_num_history.pg_nums.find(pgid.pgid.m_pool);if (p != pg_num_history.pg_nums.end() &&!p->second.empty()) {unsigned start_pg_num = ms.osdmap.get_pg_num(pgid.pgid.pool());unsigned pg_num = start_pg_num;for (auto q = p->second.lower_bound(ms.map_epoch);q != p->second.end();++q) {unsigned new_pg_num = q->second;cout << "pool " << pgid.pgid.pool() << " pg_num " << pg_num<< " -> " << new_pg_num << std::endl;// check for merge targetspg_t target;if (pgid.is_merge_source(pg_num, new_pg_num, &target)) {// FIXME: this checks assumes the OSD's PG is at the OSD's// map epoch; it could be, say, at *our* epoch, pre-merge.coll_t coll(target);if (store->collection_exists(coll)) {cerr << "pgid " << pgid << " merges to target " << target<< " which already exists" << std::endl;return 12;}}// check for split childrenset <spg_t> children;if (pgid.is_split(start_pg_num, new_pg_num, &children)) {cerr << " children are " << children << std::endl;for (auto child : children) {coll_t coll(child);if (store->collection_exists(coll)) {cerr << "pgid " << pgid << " splits to " << children<< " and " << child << " exists" << std::endl;return 12;}}}pg_num = new_pg_num;}}} else {cout << "pool " << pgid.pgid.pool() << " doesn't existing, not checking"<< " for splits or mergers" << std::endl;}if (!dry_run) {ObjectStore::Transaction t;ch = store->create_new_collection(coll);create_pg_collection(t, pgid,pgid.get_split_bits(ms.osdmap.get_pg_pool(pgid.pool())->get_pg_num()));init_pg_ondisk(t, pgid, NULL);// mark this coll for removal until we're donemap <string, bufferlist> values;encode((char) 1, values["_remove"]);t.omap_setkeys(coll, pgid.make_pgmeta_oid(), values);store->queue_transaction(ch, std::move(t));}break;// pg-end 标志着整个 pg 文件已经全部导入case TYPE_PG_END:ceph_assert(found_metadata);done = true;break;default:cerr << "Unknown section type " << std::to_string(type) << std::endl;return -EFAULT;}}ObjectStore::Transaction t;if (!dry_run) {pg_log_t newlog, reject;pg_log_t::filter_log(pgid, ms.osdmap, g_ceph_context->_conf->osd_hit_set_namespace,ms.log, newlog, reject);divergent_priors_t newdp, rejectdp;filter_divergent_priors(pgid, ms.osdmap, g_ceph_context->_conf->osd_hit_set_namespace,ms.divergent_priors, newdp, rejectdp);ms.divergent_priors = newdp;ms.missing.filter_objects([&](const hobject_t &obj) {if (obj.nspace == g_ceph_context->_conf->osd_hit_set_namespace)return false;ceph_assert(!obj.is_temp());object_t oid = obj.oid;object_locator_t loc(obj);pg_t raw_pgid = ms.osdmap.object_locator_to_pg(oid, loc);pg_t _pgid = ms.osdmap.raw_pg_to_pg(raw_pgid);return pgid.pgid != _pgid;});// Just like a split invalidate stats since the object count is changedif (skipped_objects)ms.info.stats.stats_invalid = true;// 导入 mete-data 内容ret = write_pg(t,ms.map_epoch,ms.info,newlog,ms.past_intervals,ms.divergent_priors,ms.missing);if (ret) return ret;}if (!dry_run) {t.omap_rmkey(coll, pgid.make_pgmeta_oid(), "_remove");wait_until_done(&t, [&] {store->queue_transaction(ch, std::move(t));// make sure we flush onreadable items before mapper/driver are destroyed.ch->flush();});}return 0;
}

get-osdmap | get-inc-osdmap

int get_osdmap(ObjectStore *store, epoch_t e, OSDMap &osdmap, bufferlist &bl) {// 获取 collection 句柄,ObjectStore::CollectionHandle ch = store->open_collection(coll_t::meta());// 读取 osd map,读操作不需要经过事务// OSD::get_inc_osdmap_pobject_name(e) 获取 inc osdmapbool found = store->read(ch, OSD::get_osdmap_pobject_name(e), 0, 0, bl) >= 0;osdmap.decode(bl);return 0;
}

set-osdmap | set-inc-osdmap

if (op == "set-osdmap") {bufferlist bl;// 读取要写入的 osdmap 文件ret = get_fd_data(file_fd, bl);if (ret < 0) {cerr << "Failed to read osdmap " << cpp_strerror(ret) << std::endl;} else {// 设置 osdmapret = set_osdmap(fs, epoch, bl, force);}goto out;}int set_osdmap(ObjectStore *store, epoch_t e, bufferlist &bl, bool force) {OSDMap osdmap;osdmap.decode(bl);// 获取 collection 句柄auto ch = store->open_collection(coll_t::meta());// 获取 osdmap id// const ghobject_t inc_oid = OSD::get_inc_osdmap_pobject_name(e);const ghobject_t full_oid = OSD::get_osdmap_pobject_name(e);// 写入 osdmapObjectStore::Transaction t;t.write(coll_t::meta(), full_oid, 0, bl.length(), bl);t.truncate(coll_t::meta(), full_oid, bl.length());store->queue_transaction(ch, std::move(t));return 0;
}

update-mon-db

int update_mon_db(ObjectStore& fs, OSDSuperblock& sb,const string& keyring,const string& store_path)
{MonitorDBStore ms(store_path);// 打开 mondbint r = ms.create_and_open(cerr);if (r < 0) {cerr << "unable to open mon store: " << store_path << std::endl;return r;}// 更新 kerying if ((r = update_auth(keyring, sb, ms)) < 0) {goto out;}// 更新 osdmapif ((r = update_osdmap(fs, sb, ms)) < 0) {goto out;}// 更新 monitorif ((r = update_monitor(sb, ms)) < 0) {goto out;}out:ms.close();return r;
}

remove(建议 export-remove)

// Please use export-remove or you must use --force option
int initiate_new_remove_pg(ObjectStore *store, spg_t r_pgid) {// 测试模式,直接返回 0if (!dry_run)finish_remove_pgs(store);if (!store->collection_exists(coll_t(r_pgid)))return -ENOENT;if (dry_run)return 0;ObjectStore::Transaction rmt;int r = mark_pg_for_removal(store, r_pgid, &rmt);if (r < 0) {return r;}ObjectStore::CollectionHandle ch = store->open_collection(coll_t(r_pgid));store->queue_transaction(ch, std::move(rmt));finish_remove_pgs(store);return r;
}

fix-lost ( a job can be do)

/* fixme: using full features */

list

int do_list(ObjectStore *store, string pgidstr, string object, boost::optional <std::string> nspace,Formatter *formatter, bool debug, bool human_readable, bool head) {int r;lookup_ghobject lookup(object, nspace, head);if (pgidstr.length() > 0) {/**  auto ch = store->open_collection(coll);*  ghobject_t next;*  vecotr<ghobject_t> list;*  int r = store->collection_list(ch, next, ghobject_t::get_max(), LIST_AT_A_TIME, list, next)*  获取到 pg 所有的对象,及对象信息*/r = action_on_all_objects_in_pg(store, pgidstr, lookup, debug);} else {r = action_on_all_objects(store, lookup, debug);}if (r)return r;lookup.dump(formatter, human_readable);formatter->flush(cout);return 0;
}

meta-list

// 与 list 原理相同
// 指定pg 为 coll_t::meta() 元数据pg

list-pgs

  ret = fs->list_collections(ls);// Find pgfor (it = ls.begin(); it != ls.end(); ++it) {spg_t tmppgid;if (pgidstr == "meta") {if (it->to_str() == "meta")break;elsecontinue;}if (!it->is_pg(&tmppgid)) {continue;}if (it->is_temp(&tmppgid)) {continue;}if (op != "list-pgs" && tmppgid != pgid) {continue;}if (op != "list-pgs") {//Found!break;}cout << tmppgid << std::endl;}

ceph-objectstore-tool相关推荐

  1. 【ceph】后端存储ObjectStore|BlueStore

    目录 ObjectStore简介 CEPH OBJECTSTORE API介绍 ObjectStore API 操作 事务 日志 实现 ObjectStore 实现 FileStore KeyValu ...

  2. 一文囊括Ceph所有利器(工具)

    原文链接: 知乎专栏: 一文囊括Ceph所有利器(工具) - 知乎 前言 ceph的工具很多,包括集群管理与运维,还有性能分析等等. 所以本文期望应收尽收所有的工具,也当做自己的一个梳理与总结,当自己 ...

  3. 【ceph】cmake管理Ceph编译+Ceph工程目录+cmake 实战学习

    前言 Ceph cmake 工程 cmake生成的目录 cmake工程添加新模块(CMakeLists.txt) 添加动态库依赖 cmake导入外部链接库 *.cmake文件 cmake生成编译DEB ...

  4. ceph bluestore源码分析:非对齐写逻辑

    文章目录 环境 原理说明 总结 环境 ceph:12.2.1 场景:ec 2+1 部署cephfs,执行如右写模式:dd if=/dev/zero of=/xxx/cephfs bs=6K count ...

  5. 关于OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error问题的解决

    环境: ceph L版本12.2.1升级到12.2.12 这个问题是由于升级后进行12.2.12环境中的使用ceph-disk 进行osd部署时出现如下问题,执行命令 ceph-disk -v pre ...

  6. 分布式存储 Ceph 的演进经验 · SOSP 2019

    『看看论文』是一系列分析计算机和软件工程领域论文的文章,我们在这个系列的每一篇文章中都会阅读一篇来自 OSDI.SOSP 等顶会中的论文,这里不会事无巨细地介绍所有的细节,而是会筛选论文中的关键内容, ...

  7. ceph 代码分析 读_分布式存储 Ceph 的演进经验 SOSP 2019

    『看看论文』是一系列分析计算机和软件工程领域论文的文章,我们在这个系列的每一篇文章中都会阅读一篇来自 OSDI.SOSP 等顶会中的论文,这里不会事无巨细地介绍所有的细节,而是会筛选论文中的关键内容, ...

  8. openstack整合ceph

    2019独角兽企业重金招聘Python工程师标准>>> 环境:ubuntu16.04.ceph:10.2.3.openstack:14.0.1 一.在ceph集群中创建池 ceph ...

  9. ceph存储原理_赠书 | Linux 开源存储全栈详解——从Ceph到容器存储

    // 留言点赞赠书我有书,你有故事么?留言说出你的存储故事留言点赞前两名,免费送此书截止日期12.27号12.30号公布名单 // 内容简介 本书致力于帮助读者形成有关Linux开源存储世界的细致的拓 ...

  10. 020 ceph作openstack的后端存储

    一.使用ceph做glance后端 1.1 创建用于存储镜像的池 [root@serverc ~]#  ceph osd pool create images 128 128 pool 'images ...

最新文章

  1. python3入门书籍-零基础自学python3 好用的入门书籍推荐
  2. Python 开发面试题
  3. ***检测与网络审计产品是孪生兄弟吗?
  4. 字符串处理:布鲁特--福斯算法
  5. 用c语言编写一个2048 游戏,求c语言编写的2048游戏代码,尽量功能完善一些
  6. 电路结构原理_精密半波、全波整流电路结构原理图解
  7. SonarQube搭建和使用教程
  8. SG函数(hdu1847)
  9. 服务器虚拟内存最佳设置范围,虚拟内存有什么用?虚拟内存设置多少合适?
  10. Compose Modifier修饰符详细解说
  11. 共阳极数码管与共阴极数码管联合使用来循环显示数字00-99。
  12. Python作业:公鸡5元/只,母鸡3元/只,小鸡1元3只。问100元怎么买到100只。
  13. Fresco加载图片优化
  14. 夏日炎炎玩转新加坡:盘点室内景点和夜游好去处
  15. APIO10-特别行动队-题解
  16. 什么是Java / JVM中的-Xms和-Xms参数(已更新至Java 13)
  17. 桔皮加蜂蜜的制作方法?桔皮加蜂蜜泡水喝吗?
  18. 吉大C语言程序设计作业,吉大19年9月《C语言程序设计》作业考核试题答案
  19. 英语思维导图大全 前言(一)
  20. ncl批量处理多个nc文件_利用MATLAB读取NC文件并绘图

热门文章

  1. 知乎回答:每日完成任务用于打卡的APP
  2. php算法,冒泡排序
  3. 细思极恐的“立体”用户画像,如何为“新零售”赋能?
  4. Python数据分析+可视化项目案例教学:亚马逊平台用户订单数据分析
  5. Linux-CentOS安装N卡驱动以及解决屏幕亮度不可调问题
  6. Mac OS:PC安装Mac OS X Lion记录
  7. HTML5+CSS大作业——个人简历设计(5页) html期末作业代码网页设计 期末作业成品代码
  8. java.好友发送验证申请_SpringBoot+LayIM+t-io 实现好友申请通知流程
  9. 新浪微博发布2023年8月微博短视频排行榜
  10. 问鼎OSPF(6)-因地巧施张良计,宏图霸业指日统