ceph-objectstore-tool
ceph-objectstore-tool 使用说明
参考链接:
https://github.com/ceph/ceph/blob/master/doc/man/8/ceph-objectstore-tool.rst
https://github.com/ceph/ceph/blob/master/src/tools/ceph_objectstore_tool.cc
ceph-objectstore-tool 是 ceph 提供的一个操作 pg 及 pg 里面对象的工具。
ceph-objectstore-tool是修改OSD状态的工具。它有助于操作对象的内容、删除对象、列出omap、操作omap头、操作omap键、列出对象属性和操作对象属性键。
1. 简介
[root@localhost build]# ./bin/ceph-objectstore-tool -h
Must provide --data-pathAllowed options:--help produce help message,帮助--type arg Arg is one of [bluestore (default), filestore, memstore],存储引擎类型,默认 bluestore--data-path arg path to object store, mandatory,存储路径,一般 /var/ceph/osd-0--journal-path arg path to journal, use if tool can't find it,filestore 时使用--pgid arg PG id, mandatory for info, log, remove, export, export-remove, mark-complete, trim-pg-log, and mandatory for apply-layout-settings if --pool is not specified,某些情况下并且 pool 未指定时,为必填项--pool arg Pool name, mandatory for apply-layout-settings if--pgid is not specified,apply-layout-settings 时并且 pgid 未指定时,必填--op arg Arg is one of [info, log, remove, mkfs, fsck, repair, fuse, dup, export, export-remove, import,list, fix-lost, list-pgs, dump-journal, dump-super, meta-list, get-osdmap, set-osdmap, get-inc-osdmap, set-inc-osdmap, mark-complete, reset-last-complete, apply-layout-settings, update-mon-db, dump-export, trim-pg-log],操作--epoch arg epoch# for get-osdmap and get-inc-osdmap, the current epoch in use if not specified,几代目--file arg path of file to export, export-remove, import, get-osdmap, set-osdmap, get-inc-osdmap or set-inc-osdmap,指定文件输入输出路径--mon-store-path arg path of monstore to update-mon-db--fsid arg fsid for new store created by mkfs--target-data-path arg path of target object store (for --op dup)--mountpoint arg fuse mountpoint--format arg (=json-pretty) Output format which may be json, json-pretty, xml, xml-pretty--debug Enable diagnostic output to stderr--force Ignore some types of errors and proceed with operation - USE WITH CAUTION: CORRUPTION POSSIBLENOW OR IN THE FUTURE--skip-journal-replay Disable journal replay--skip-mount-omap Disable mounting of omap--head Find head/snapdir when searching for objects by name--dry-run Don't modify the objectstore--namespace arg Specify namespace when searching for objects--rmtype arg Specify corrupting object removal 'snapmap' or 'nosnapmap' - TESTING USE ONLYPositional syntax:ceph-objectstore-tool ... <object> (get|set)-bytes [file]
ceph-objectstore-tool ... <object> set-(attr|omap) <key> [file]
ceph-objectstore-tool ... <object> (get|rm)-(attr|omap) <key>
ceph-objectstore-tool ... <object> get-omaphdr
ceph-objectstore-tool ... <object> set-omaphdr [file]
ceph-objectstore-tool ... <object> list-attrs
ceph-objectstore-tool ... <object> list-omap
ceph-objectstore-tool ... <object> remove|removeall
ceph-objectstore-tool ... <object> dump
ceph-objectstore-tool ... <object> set-size
ceph-objectstore-tool ... <object> clear-data-digest
ceph-objectstore-tool ... <object> remove-clone-metadata <cloneid><object> can be a JSON object description as displayed
by --op list.
<object> can be an object name which will be looked up in all
the OSD's PGs.
<object> can be the empty string ('') which with a provided pgid
specifies the pgmeta objectThe optional [file] argument will read stdin or write stdout
if not specified or if '-' specified.
通用示例
ceph-objectstore-tool --data-path path-to-osd --op
使用前,关闭相关 OSD 服务
ceph osd set noout
systemctl stop ceph-osd@$OSD_NUMBER
systemctl status ceph-osd@$OSD_NUMBER
使用结束,重启 OSD 服务
systemctl restart ceph-osd@OSD_NUMBER
ceph osd unset nooutcrash 警告解决方法
HEALTH_WARN 5 daemons have recently crashed
ceph crash ls-new
ceph crash archive-all
2. 示例
列出指定 OSD 所有对象
结果以 [pgid, {oid,对象信息}] 的形式展示每个对象。
ceph-objectstore-tool --data-path $PATH_TO_OSD --op list[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list
...
["2.7",{"oid":"rbd_data.20e5ff0224ec0.00000000000000a0","key":"","snapid":-2,"hash":467323015,"max":0,"pool":2,"namespace":"","max":0}]
["2.3a",{"oid":"rbd_info","key":"","snapid":-2,"hash":2886620986,"max":0,"pool":2,"namespace":"","max":0}]
["2.33",{"oid":"rbd_data.20e5ff0224ec0.0000000000000000","key":"","snapid":-2,"hash":2764933619,"max":0,"pool":2,"namespace":"","max":0}]
["2.25",{"oid":"rbd_id.rbd-pool-image-1","key":"","snapid":-2,"hash":2198578149,"max":0,"pool":2,"namespace":"","max":0}]
["2.1c",{"oid":"rbd_directory","key":"","snapid":-2,"hash":816417820,"max":0,"pool":2,"namespace":"","max":0}]
["2.1d",{"oid":"rbd_header.20e5ff0224ec0","key":"","snapid":-2,"hash":1624672572,"max":0,"pool":2,"namespace":"","max":0}]
...
列出指定 PG 所有对象
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID --op list[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.68 --op list
["1.68",{"oid":"benchmark_data_node-1_2694_object67","key":"","snapid":-2,"hash":1705301608,"max":0,"pool":1,"namespace":"","max":0}]
...
查询对象信息
可以指定对象id,也可以通过在结果中 grep 来筛选。
ceph-objectstore-tool --data-path $PATH_TO_OSD --op list $OBJECT_ID[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list benchmark_data_node-1_2694_object61
["1.26",{"oid":"benchmark_data_node-1_2694_object61","key":"","snapid":-2,"hash":1072655526,"max":0,"pool":1,"namespace":"","max":0}]
查询对象详细信息
$OBJECT 可以是 json 格式的对象,也可以直接是 oid。这一点在帮助文档中有提到。
ceph-objectstore-tool --data-path $PATH_TO_OSD $OBJECT dump[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ rbd_header.20e5ff0224ec0 dump
{"id": {"oid": "rbd_header.20e5ff0224ec0","key": "","snapid": -2,"hash": 1624672572,"max": 0,"pool": 2,"namespace": "","max": 0},"info": {"oid": {"oid": "rbd_header.20e5ff0224ec0","key": "","snapid": -2,"hash": 1624672572,"max": 0,"pool": 2,"namespace": ""},"version": "137'29","prior_version": "137'28","last_reqid": "osd.1.0:2","user_version": 27,"size": 0,"mtime": "2021-05-27 09:33:24.367195","local_mtime": "2021-05-27 09:33:24.422271","lost": 0,"flags": ["dirty","omap","data_digest","omap_digest"],"truncate_seq": 0,"truncate_size": 0,"data_digest": "0xffffffff","omap_digest": "0x4bbef111","expected_object_size": 0,"expected_write_size": 0,"alloc_hint_flags": 0,"manifest": {"type": 0},"watchers": {}},"stat": {"size": 0,"blksize": 4096,"blocks": 0,"nlink": 1},"SnapSet": {"snap_context": {"seq": 0,"snaps": []},"clones": []}
}
修复所有遗失对象(fixme)
注意:fix 功能还需要完善,暂时不可用
ceph-objectstore-tool --data-path $PATH_TO_OSD --op fix-lost
修复 PG 遗失对象(fixme)
注意:fix 功能还需要完善,暂时不可用
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID --op fix-lost
修复遗失对象(fixme)
注意:fix 功能还需要完善,暂时不可用
ceph-objectstore-tool --data-path $PATH_TO_OSD --op fix-lost $OBJECT_ID
获取对象
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-bytes > $OBJECT_FILE_NAME[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-bytes > rbd_header
[root@node-1 ~]# ls -al rbd_header
-rw-r--r-- 1 root root 4194304 6月 9 14:54 rbd_header
修改对象内容
结合 get-bytes 命令,可以用于替换损坏对象
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-bytes < $OBJECT_FILE_NAME[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' set-bytes < zone_info.default.working-copy
删除对象
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT remove[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object67 remove
remove #1:166b25a6:::benchmark_data_node-1_2694_object67:head#
查询 object map key
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT list-omap[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ rbd_header.20e5ff0224ec0 list-omap
access_timestamp
create_timestamp
features
flags
modify_timestamp
object_prefix
order
size
snap_seq
获取 object map
必须要指定key,可以先通过 list-omap 获取所有key
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-omap <key> [> $OBJECT_MAP_FILE_NAME][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ rbd_header.20e5ff0224ec0 get-omap object_prefix
Base64:FgAAAHJiZF9kYXRhLjIwZTVmZjAyMjRlYzA=
[root@node-1 ~]# echo FgAAAHJiZF9kYXRhLjIwZTVmZjAyMjRlYzA= | base64 -d
rbd_data.20e5ff0224ec0
设置 object map
此操作会覆盖之前的 value 值,若要以追加或者修改的方式更改 value,需要先使用 get-omap命令获取原本的 value。
注意:必须提供 key,文件路径。
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-omap <$KEY> <$OBJECT_MAP_FILE_NAME>[root@node-1 ~]# vi my_omap_value
this is my omap value
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 set-omap my_omap_key my_omap_value
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omap my_omap_key
Base64:dGhpcyBpcyBteSBvbWFwIHZhbHVlCg==
[root@node-1 ~]# echo dGhpcyBpcyBteSBvbWFwIHZhbHVlCg== | base64 -d
this is my omap value
删除 object map
必须指定 key。
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT rm-omap $KEY[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 rm-omap my_omap_key
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omap my_omap_key
Key not found
设置 object map header
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-omaphdr [< $OBJECT_MAP_FILE_NAME][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omaphdr
Base64:dGhpcyBpcyBvbWFwIGhlYWRlcgo=
[root@node-1 ~]# echo dGhpcyBpcyBvbWFwIGhlYWRlcgo= | base64 -d
this is omap header
获取 object map header
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-omap $KEY [> $OBJECT_MAP_FILE_NAME][root@node-1 ~]# cat omaphdr
this is omap header
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 set-omaphdr < omaphdr
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-omaphdr
Base64:dGhpcyBpcyBvbWFwIGhlYWRlcgo=
[root@node-1 ~]# echo dGhpcyBpcyBvbWFwIGhlYWRlcgo= | base64 -d
this is omap header
列出对象 attrs key
列出对象的 xattr 属性的所有 key 。
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT list-attrs[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 list-attrs
_
snapset
获取对象 attrs
需要指定对象及其 xattr 的 key
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-attr $KEY [> $OBJECT_ATTRS_FILE_NAME][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-attr _
Base64:EQgcAQAABANEAAAAAAAAACMAAABiZW5jaG1hcmtfZGF0YV9ub2RlLTFfMjY5NF9vYmplY3Q2Mf7/pmzvPwAAAAAAAQAAAAAAAAAGAxwAAAABAAAAAAAAAP8AAAAAAAAAAP//AAAAAAIAAAAAAAAAMgAAAAEAAAAAAAAAHQAAAAICFQAAAAQCAAAAAAAAAAcAAAAAAAAAAAAAAAAAQAAAAAAAv9eYYNTq8SUCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAADQAAAC/15hgasR6MwnBjuz/AABAAAAAAAAAAEAAAAAAADUAAAA=
设置对象 attrs
注意:必须指定 oid、key、value文件路径
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-attr $KEY < $OBJECT_ATTRS_FILE_NAME[root@node-1 ~]# vi my_xattr
this is my xattr
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 set-attr my_xattr_key my_xattr
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-attr my_xattr_key
Base64:dGhpcyBpcyBteSB4YXR0cgo=
[root@node-1 ~]# echo dGhpcyBpcyBteSB4YXR0cgo= | base64 -d
this is my xattr
删除对象 attrs
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT rm-attr $KEY[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 rm-attr my_xattr_key
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ benchmark_data_node-1_2694_object61 get-attr my_xattr_key
getattr: (61) No data available
fsck | repair
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op [fsck|repair]
mkfs + dup:导出整个 osd(block 数据、keyring、fsid、type 等)
此组合操作可以拷贝出一个完整的 osd。
首先mkfs,需要指定路径,type 可选项(默认 bluestore),fsid 可选项(默认 随机生成)
[root@node-1 ~]# ceph-objectstore-tool --data-path /root/osd.dir/ --op mkfs
failed to fetch mon config (--no-mon-config to skip)
[root@node-1 ~]# ceph-objectstore-tool --data-path /root/osd.dir/ --op mkfs --no-mon-config
然后使用 dup 命令复制 osd,需要指出 源路径 和 目标路径
本人 dup 失败了,下面贴出报错信息。
[root@node-1 osd.dir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --target-data-path ~/osd.dir/ --op dup
dup from bluestore: /var/lib/ceph/osd/ceph-0/to bluestore: /root/osd.dir/
src fsid 9912f587-6c2c-4098-8635-b97fd46f721e != dest 2c65c351-a968-4dee-b97a-82e9107ef749
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/os/bluestore/Allocator.cc: In function 'virtual Allocator::SocketHook::~SocketHook()' thread 7fcfc160f780 time 2021-06-10 10:34:15.927047
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/os/bluestore/Allocator.cc: 43: FAILED ceph_assert(r == 0)ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7fcfb75ba2d5]2: (()+0x25449d) [0x7fcfb75ba49d]3: (()+0x925a75) [0x561b5b38fa75]4: (BitmapAllocator::~BitmapAllocator()+0x12f) [0x561b5b3ded7f]5: (BlueFS::_stop_alloc()+0xb3) [0x561b5b39f853]6: (BlueFS::umount(bool)+0x13e) [0x561b5b3b9e6e]7: (BlueStore::_close_bluefs(bool)+0x11) [0x561b5b29a401]8: (BlueStore::_close_db_and_around(bool)+0x91) [0x561b5b31dac1]9: (BlueStore::umount()+0x299) [0x561b5b31e4c9]10: (dup(std::string, ObjectStore*, std::string, ObjectStore*)+0x39c) [0x561b5ae20e4c]11: (main()+0x3139) [0x561b5ade1789]12: (__libc_start_main()+0xf5) [0x7fcfb444d555]13: (()+0x3a52a0) [0x561b5ae0f2a0]
*** Caught signal (Aborted) **in thread 7fcfc160f780 thread_name:ceph-objectstorceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)1: (()+0xf630) [0x7fcfb5a8f630]2: (gsignal()+0x37) [0x7fcfb4461387]3: (abort()+0x148) [0x7fcfb4462a78]4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7fcfb75ba324]5: (()+0x25449d) [0x7fcfb75ba49d]6: (()+0x925a75) [0x561b5b38fa75]7: (BitmapAllocator::~BitmapAllocator()+0x12f) [0x561b5b3ded7f]8: (BlueFS::_stop_alloc()+0xb3) [0x561b5b39f853]9: (BlueFS::umount(bool)+0x13e) [0x561b5b3b9e6e]10: (BlueStore::_close_bluefs(bool)+0x11) [0x561b5b29a401]11: (BlueStore::_close_db_and_around(bool)+0x91) [0x561b5b31dac1]12: (BlueStore::umount()+0x299) [0x561b5b31e4c9]13: (dup(std::string, ObjectStore*, std::string, ObjectStore*)+0x39c) [0x561b5ae20e4c]14: (main()+0x3139) [0x561b5ade1789]15: (__libc_start_main()+0xf5) [0x7fcfb444d555]16: (()+0x3a52a0) [0x561b5ae0f2a0]
已放弃
fuse:可以查看块层面的对象
# 终端1
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op fuse --mountpoint /mnt/ceph-osd@0/
mounting fuse at /mnt/ceph-osd@0/ ...# 终端2
[root@node-1 mnt]# df
文件系统 1K-块 已用 可用 已用% 挂载点
foo 10481664 3820032 6661632 37% /mnt/ceph-osd@0
# 使用完,记得卸载
[root@node-1 mnt]# umount /mnt/ceph-osd\@0/
查看超级块信息
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op dump-super[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op dump-super
{"cluster_fsid": "60e065f1-d992-4d1a-8f4e-f74419674f7e","osd_fsid": "9912f587-6c2c-4098-8635-b97fd46f721e","whoami": 0,"current_epoch": 156,"oldest_map": 1,"newest_map": 156,"weight": 0,"compat": {"compat": {},"ro_compat": {},"incompat": {"feature_1": "initial feature set(~v.18)","feature_2": "pginfo object","feature_3": "object locator","feature_4": "last_epoch_clean","feature_5": "categories","feature_6": "hobjectpool","feature_7": "biginfo","feature_8": "leveldbinfo","feature_9": "leveldblog","feature_10": "snapmapper","feature_11": "sharded objects","feature_12": "transaction hints","feature_13": "pg meta object","feature_14": "explicit missing set","feature_15": "fastinfo pg attr","feature_16": "deletes in missing set"}},"clean_thru": 156,"last_epoch_mounted": 154
}
列出本块 OSD 上 PG
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list-pgs[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op list-pgs
2.e
2.d
2.c
2.a
2.9
2.8
2.7
2.3f
2.f
2.3e
2.3a
...
info | log:查询 PG 信息
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.0 --op [info|log][root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.2 --op log
{"pg_log_t": {"head": "50'2","tail": "0'0","log": [{"op": "modify","object": "1:416569a2:::benchmark_data_node-1_2694_object81:head","version": "29'1","prior_version": "0'0","reqid": "client.24213.0:82","extra_reqids": [],"mtime": "2021-05-10 14:50:40.083127","return_code": 0,"mod_desc": {"object_mod_desc": {"can_local_rollback": false,"rollback_info_completed": false,"ops": []}}},{"op": "modify","object": "1:416569a2:::benchmark_data_node-1_2694_object81:head","version": "50'2","prior_version": "29'1","reqid": "osd.1.0:9","extra_reqids": [],"mtime": "0.000000","return_code": 0,"mod_desc": {"object_mod_desc": {"can_local_rollback": false,"rollback_info_completed": false,"ops": []}}}],"dups": []},"pg_missing_t": {"missing": [],"may_include_deletes": true}
}[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.2 --op info
{"pgid": "1.2","last_update": "50'2","last_complete": "50'2","log_tail": "0'0","last_user_version": 1,"last_backfill": "MAX","last_backfill_bitwise": 0,"purged_snaps": [],"history": {"epoch_created": 18,"epoch_pool_created": 18,"last_epoch_started": 324,"last_interval_started": 323,"last_epoch_clean": 324,"last_interval_clean": 323,"last_epoch_split": 0,"last_epoch_marked_full": 0,"same_up_since": 323,"same_interval_since": 323,"same_primary_since": 322,"last_scrub": "50'2","last_scrub_stamp": "2021-07-08 16:27:39.579601","last_deep_scrub": "50'2","last_deep_scrub_stamp": "2021-07-08 16:27:39.579601","last_clean_scrub_stamp": "2021-07-08 16:27:39.579601"},"stats": {"version": "50'2","reported_seq": "189","reported_epoch": "321","state": "unknown","last_fresh": "2021-07-13 14:36:21.560457","last_change": "2021-07-13 14:36:21.560457","last_active": "2021-06-23 14:50:51.902398","last_peered": "2021-06-23 14:49:25.437981","last_clean": "2021-06-23 14:49:25.437981","last_became_active": "2021-06-23 14:44:21.614072","last_became_peered": "2021-06-23 14:44:21.614072","last_unstale": "2021-07-13 14:36:21.560457","last_undegraded": "2021-07-13 14:36:21.560457","last_fullsized": "2021-07-13 14:36:21.560457","mapping_epoch": 323,"log_start": "0'0","ondisk_log_start": "0'0","created": 18,"last_epoch_clean": 311,"parent": "0.0","parent_split_bits": 0,"last_scrub": "50'2","last_scrub_stamp": "2021-07-08 16:27:39.579601","last_deep_scrub": "50'2","last_deep_scrub_stamp": "2021-07-08 16:27:39.579601","last_clean_scrub_stamp": "2021-07-08 16:27:39.579601","log_size": 2,"ondisk_log_size": 2,"stats_invalid": false,"dirty_stats_invalid": false,"omap_stats_invalid": false,"hitset_stats_invalid": false,"hitset_bytes_stats_invalid": false,"pin_stats_invalid": false,"manifest_stats_invalid": false,"snaptrimq_len": 0,"stat_sum": {"num_bytes": 4194304,"num_objects": 1,"num_object_clones": 0,"num_object_copies": 3,"num_objects_missing_on_primary": 0,"num_objects_missing": 0,"num_objects_degraded": 0,"num_objects_misplaced": 0,"num_objects_unfound": 0,"num_objects_dirty": 1,"num_whiteouts": 0,"num_read": 0,"num_read_kb": 0,"num_write": 1,"num_write_kb": 4096,"num_scrub_errors": 0,"num_shallow_scrub_errors": 0,"num_deep_scrub_errors": 0,"num_objects_recovered": 0,"num_bytes_recovered": 0,"num_keys_recovered": 0,"num_objects_omap": 0,"num_objects_hit_set_archive": 0,"num_bytes_hit_set_archive": 0,"num_flush": 0,"num_flush_kb": 0,"num_evict": 0,"num_evict_kb": 0,"num_promote": 0,"num_flush_mode_high": 0,"num_flush_mode_low": 0,"num_evict_mode_some": 0,"num_evict_mode_full": 0,"num_objects_pinned": 0,"num_legacy_snapsets": 0,"num_large_omap_objects": 0,"num_objects_manifest": 0,"num_omap_bytes": 0,"num_omap_keys": 0,"num_objects_repaired": 0},"up": [1,0,2],"acting": [1,0,2],"avail_no_missing": [],"object_location_counts": [],"blocked_by": [],"up_primary": 1,"acting_primary": 1,"purged_snaps": []},"empty": 0,"dne": 0,"incomplete": 0,"last_epoch_started": 324,"hit_set_history": {"current_last_update": "0'0","history": []}
}
export|export-remove:导出 PG,dump-export:查看导出的对象信息
# export:导出不删除,export-remove:导出并从 OSD 中移除该 PG
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.0 --op [export|export-remove] --file export.file[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.1 --op export --file pg2.1.file
Exporting 2.1 info 2.1( v 119'6 (0'0,119'6] local-lis/les=154/155 n=0 ec=64/64 lis/c 154/154 les/c/f 155/155/0 154/154/154)
Export successful
[root@node-1 ~]# ls
anaconda-ks.cfg ceph-deploy pg2.1.file# 查看,需要先使用 export 导出pg
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --file ./export.file --op dump-export [root@node-1 ~]# ceph-objectstore-tool --file pg2.1.file --op dump-export
failed to fetch mon config (--no-mon-config to skip)
[root@node-1 ~]# ceph-objectstore-tool --file pg2.1.file --op dump-export --no-mon-config
{"pgid": "2.1","cluster_fsid": "60e065f1-d992-4d1a-8f4e-f74419674f7e","features": "compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo object,3=object locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction hints,13=pg meta object,14=explicit missing set,15=fastinfo pg attr,16=deletes in missing set}","metadata_section": {"pg_disk_version": 10,"map_epoch": 155,"OSDMap": {"epoch": 155,"fsid": "60e065f1-d992-4d1a-8f4e-f74419674f7e","created": "2020-08-07 13:40:34.125175","modified": "2021-06-10 08:50:55.264485","last_up_change": "2021-06-10 08:50:54.258791","last_in_change": "2021-06-09 13:47:51.144852","flags": "sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit","flags_num": 5799936,"flags_set": ["pglog_hardlimit","purged_snapdirs","recovery_deletes","sortbitwise"],"crush_version": 7,"full_ratio": 0.94999998807907104,"backfillfull_ratio": 0......
}
导入 PG(必须先使用 export-remove,确保该 PG 为空)
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 1.0 --op import --file import.file# pgid要匹配
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.0 --op import --file pg2.1.file
specified pgid 2.0 does not match actual pgid 2.1
# pg需要为空
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.1 --op import --file pg2.1.file
get_pg_num_history pg_num_history pg_num_history(e156 pg_nums {1={18=128},2={64=64}} deleted_pools )
pgid 2.1 already exists# 导出,再导入
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.0 --op export-remove --file pg2.1_repli.file
Exporting 2.0 info 2.0( v 119'6 (0'0,119'6] local-lis/les=154/155 n=0 ec=64/64 lis/c 154/154 les/c/f 155/155/0 154/154/152)
Export successfulmarking collection for removal
setting '_remove' omap key
finish_remove_pgs 2.0_head removing 2.0
Remove successful
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 2.0 --op import --file pg2.1_repli.file
get_pg_num_history pg_num_history pg_num_history(e156 pg_nums {1={18=128},2={64=64}} deleted_pools )
Importing pgid 2.0
write_pg epoch 155 info 2.0( v 119'6 (0'0,119'6] local-lis/les=154/155 n=0 ec=64/64 lis/c 154/154 les/c/f 155/155/0 154/154/152)
Import successful
3. 源码分析
源码位于:src/tools/ceph_objectstore_tool.cc
main
int main(int argc, char **argv)
{ // 使用 options_description 保存所有的参数po::options_description desc("Allowed options");// 参数解析desc.add_options()("type", po::value<string>(&type),"Arg is one of [bluestore (default), filestore, memstore]")...;vector<string> ceph_option_strings; po::variables_map vm;try {po::parsed_options parsed =po::command_line_parser(argc, argv).options(all).allow_unregistered().positional(pd).run();po::store( parsed, vm);po::notify(vm);ceph_option_strings = po::collect_unrecognized(parsed.options,po::include_positional);} catch(po::error &e) {std::cerr << e.what() << std::endl;return 1;}// 参数校验...// 向ceph_option_strings加入:-n, osd.whoami, --osd-data, data-pathchar fn[PATH_MAX];snprintf(fn, sizeof(fn), "%s/whoami", dpath.c_str());int fd = ::open(fn, O_RDONLY);if (fd >= 0) {bufferlist bl;bl.read_fd(fd, 64);string s(bl.c_str(), bl.length());int whoami = atoi(s.c_str());vector<string> tmp;// identify ourselves as this osd so we can auth and fetch our configstmp.push_back("-n");tmp.push_back(string("osd.") + stringify(whoami));// populate osd_data so that the default keyring location workstmp.push_back("--osd-data");tmp.push_back(dpath);tmp.insert(tmp.end(), ceph_option_strings.begin(),ceph_option_strings.end());tmp.swap(ceph_option_strings);}// 读取 osd typesnprintf(fn, sizeof(fn), "%s/type", dpath.c_str());...// 对一些特殊 op 做参数完整性校验if (op == "fuse" && mountpoint.length() == 0) {cerr << "Missing fuse mountpoint" << std::endl;usage(desc);return 1;}...// 创建 ObjectStoreToolObjectStoreTool tool = ObjectStoreTool(file_fd, dry_run);...// 初始化 global_context auto cct = global_init(NULL, ceph_options,CEPH_ENTITY_TYPE_OSD,CODE_ENVIRONMENT_UTILITY_NODOUT,init_flags);common_init_finish(g_ceph_context);...// 创建对象存储句柄,filestore|bluestoreObjectStore *fs = ObjectStore::create(g_ceph_context, type, dpath, jpath, flags);int ret = fs->mount();// 获取 collection 句柄auto ch = fs->open_collection(coll_t::meta());...// 获取超级块 std::unique_ptr <OSDSuperblock> superblock;if (!no_superblock) {superblock.reset(new OSDSuperblock);bufferlist::const_iterator p;ret = fs->read(ch, OSD_SUPERBLOCK_GOBJECT, 0, 0, bl);if (ret < 0) {cerr << "Failure to read OSD superblock: " << cpp_strerror(ret) << std::endl;goto out;}p = bl.cbegin();decode(*superblock, p);}}// 根据 op 执行相应函数...
}
export
// 导出文件模型:
// |super-header|pg-begin|metadata|object|pg-end|
int ObjectStoreTool::do_export(ObjectStore *fs, coll_t coll, spg_t pgid,pg_info_t &info, epoch_t map_epoch, __u8 struct_ver,const OSDSuperblock &superblock,PastIntervals &past_intervals) {PGLog::IndexedLog log;pg_missing_t missing;int ret = get_log(fs, struct_ver, pgid, info, log, missing);if (ret > 0)return ret;// 向导出文件写入 超级块 信息write_super();// pg开始的信息,里面保存了 pgid 和 superblockpg_begin pgb(pgid, superblock);// Special case: If replicated pg don't require the importing OSD to have shard featureif (pgid.is_no_shard()) {pgb.superblock.compat_features.incompat.remove(CEPH_OSD_FEATURE_INCOMPAT_SHARDS);}// 向导出文件写入 pgb 信息ret = write_section(TYPE_PG_BEGIN, pgb, file_fd);if (ret)return ret;// The metadata_section is now before files, so import can detect// errors and abort without wasting time.metadata_section ms(struct_ver,map_epoch,info,log,past_intervals,missing);ret = add_osdmap(fs, ms);if (ret)return ret;// 向导出文件写入 metadata_sectionret = write_section(TYPE_PG_METADATA, ms, file_fd);if (ret)return ret;// 导出 pg 所有对象内容ret = export_files(fs, coll);if (ret) {cerr << "export_files error " << ret << std::endl;return ret;}// 写入 pg_endret = write_simple(TYPE_PG_END, file_fd);if (ret)return ret;return 0;
}
dump-export
int ObjectStoreTool::dump_export(Formatter *formatter)
{bufferlist ebl;pg_info_t info;PGLog::IndexedLog log;//bool skipped_objects = false;int ret = read_super();if (ret)return ret;if (sh.magic != super_header::super_magic) {cerr << "Invalid magic number" << std::endl;return -EFAULT;}if (sh.version > super_header::super_ver) {cerr << "Can't handle export format version=" << sh.version << std::endl;return -EINVAL;}formatter->open_object_section("Export");//First section must be TYPE_PG_BEGINsectiontype_t type;ret = read_section(&type, &ebl);if (ret)return ret;if (type == TYPE_POOL_BEGIN) {cerr << "Dump of pool exports not supported" << std::endl;return -EINVAL;} else if (type != TYPE_PG_BEGIN) {cerr << "Invalid first section type " << std::to_string(type) << std::endl;return -EFAULT;}auto ebliter = ebl.cbegin();pg_begin pgb;pgb.decode(ebliter);spg_t pgid = pgb.pgid;formatter->dump_string("pgid", stringify(pgid));formatter->dump_string("cluster_fsid", stringify(pgb.superblock.cluster_fsid));formatter->dump_string("features", stringify(pgb.superblock.compat_features));bool done = false;bool found_metadata = false;metadata_section ms;bool objects_started = false;while(!done) {ret = read_section(&type, &ebl);if (ret)return ret;if (debug) {cerr << "dump_export: Section type " << std::to_string(type) << std::endl;}if (type >= END_OF_TYPES) {cerr << "Skipping unknown section type" << std::endl;continue;}switch(type) {case TYPE_OBJECT_BEGIN:if (!objects_started) {formatter->open_array_section("objects");objects_started = true;}ret = dump_object(formatter, ebl);if (ret) return ret;break;case TYPE_PG_METADATA:if (objects_started)cerr << "WARNING: metadata_section out of order" << std::endl;ret = dump_pg_metadata(formatter, ebl, ms);if (ret) return ret;found_metadata = true;break;case TYPE_PG_END:if (objects_started) {formatter->close_section();}done = true;break;default:cerr << "Unknown section type " << std::to_string(type) << std::endl;return -EFAULT;}}if (!found_metadata) {cerr << "Missing metadata section" << std::endl;return -EFAULT;}formatter->close_section();formatter->flush(cout);return 0;
}
fsck|fsck-deep|repair|repair-deep
// 调用 BlueStore::fsck() -> BlueStore::_fsck() 完成数据校验、修复int fsck(bool deep) override {return _fsck(deep ? FSCK_DEEP : FSCK_REGULAR, false);}int repair(bool deep) override {return _fsck(deep ? FSCK_DEEP : FSCK_REGULAR, true);}int quick_fix() override {return _fsck(FSCK_SHALLOW, true);}
/**
An overview for currently implemented repair logics
performed in fsck in two stages: detection(+preparation) and commit.
Detection stage (in processing order):(Issue -> Repair action to schedule)- Detect undecodable keys for Shared Blobs -> Remove- Detect undecodable records for Shared Blobs -> Remove (might trigger missed Shared Blob detection below)- Detect stray records for Shared Blobs -> Remove- Detect misreferenced pextents -> FixPrepare Bloom-like filter to track cid/oid -> pextent Prepare list of extents that are improperly referencedEnumerate Onode records that might use 'misreferenced' pextents(Bloom-like filter applied to reduce computation)Per each questinable Onode enumerate all blobs and identify broken ones (i.e. blobs having 'misreferences')Rewrite each broken blob data by allocating another extents and copying data thereIf blob is shared - unshare it and mark corresponding Shared Blob for removalRelease previously allocated spaceUpdate Extent Map- Detect missed Shared Blobs -> Recreate- Detect undecodable deferred transaction -> Remove- Detect Freelist Manager's 'false free' entries -> Mark as used- Detect Freelist Manager's leaked entries -> Mark as free- Detect statfs inconsistency - UpdateCommit stage (separate DB commit per each step):- Apply leaked FM entries fix- Apply 'false free' FM entries fix- Apply 'Remove' actions- Apply fix for misreference pextents- Apply Shared Blob recreate (can be merged with the step above if misreferences were dectected)- Apply StatFS update
*/
int BlueStore::_fsck(BlueStore::FSCKDepth depth, bool repair)
{dout(1) << __func__<< (repair ? " repair" : " check")<< (depth == FSCK_DEEP ? " (deep)" :depth == FSCK_SHALLOW ? " (shallow)" : " (regular)")<< dendl;// in deep mode we need R/W write access to be able to replay deferred opsbool read_only = !(repair || depth == FSCK_DEEP);int r = _open_db_and_around(read_only);if (r < 0)return r;if (!read_only) {r = _upgrade_super();if (r < 0) {goto out_db;}}r = _open_collections();if (r < 0)goto out_db;mempool_thread.init();// we need finisher and kv_{sync,finalize}_thread *just* for replay// enable in repair or deep mode modes onlyif (!read_only) {_kv_start();r = _deferred_replay();_kv_stop();}if (r < 0)goto out_scan;// 校验并修复元数据,具体内容参见:src/os/bluestore/BlueStore.ccr = _fsck_on_open(depth, repair);out_scan:mempool_thread.shutdown();_shutdown_cache();
out_db:_close_db_and_around(false);return r;
}
mkfs
// fs->mkfs();
int BlueStore::mkfs() {...{// 如果之前已经 mkfs 过了,则只做 fsck 检查r = read_meta("mkfs_done", &done);...r = fsck(cct->_conf->bluestore_fsck_on_mkfs_deep);...return r; // idempotent}// 向/osd-data-path/block写入元数据 type ,设为 bluestore{...r = read_meta("type", &type);if (r == 0) {if (type != "bluestore") {derr << __func__ << " expected bluestore, but type is " << type << dendl;return -EIO;}} else {r = write_meta("type", "bluestore");if (r < 0)return r;}}freelist_type = "bitmap";//打开设备目录/osd-data-path/r = _open_path();if (r < 0)return r;//打开/创建设备目录下的/osd-data-path/fsidr = _open_fsid(true);if (r < 0)goto out_path_fd;//锁定fsidr = _lock_fsid();if (r < 0)goto out_close_fsid;//读取fsid,若没有,则生成 fsidr = _read_fsid(&old_fsid);if (r < 0 || old_fsid.is_zero()) {if (fsid.is_zero()) {fsid.generate_random(); //随机生成 fsiddout(1) << __func__ << " generated fsid " << fsid << dendl;} else {dout(1) << __func__ << " using provided fsid " << fsid << dendl;}// we'll write it later.} else {if (!fsid.is_zero() && fsid != old_fsid) {derr << __func__ << " on-disk fsid " << old_fsid<< " != provided " << fsid << dendl;r = -EINVAL;goto out_close_fsid;}fsid = old_fsid;}//在/osd-data-path/目录下创建 block 文件,并把它链接到真正的 bluestore_block_path,尝试预分配 bluestore_block_size 大小的空间。r = _setup_block_symlink_or_file("block", cct->_conf->bluestore_block_path,cct->_conf->bluestore_block_size,cct->_conf->bluestore_block_create);if (r < 0)goto out_close_fsid;//若设有多个磁盘,用作 wal 和 db 设备,则继续创建 block.wal 和 block.db 链接,并预分配空间。if (cct->_conf->bluestore_bluefs) {r = _setup_block_symlink_or_file("block.wal", cct->_conf->bluestore_block_wal_path,cct->_conf->bluestore_block_wal_size,cct->_conf->bluestore_block_wal_create);if (r < 0)goto out_close_fsid;r = _setup_block_symlink_or_file("block.db", cct->_conf->bluestore_block_db_path,cct->_conf->bluestore_block_db_size,cct->_conf->bluestore_block_db_create);if (r < 0)goto out_close_fsid;}//创建并打开 BlockDevice,其类型有pmem,kernel,ust-nvme。ceph有自己的一套块设备操作方式,例如 kernel 设备使用 libaio 直接操作,越过了文件系统。r = _open_bdev(true);if (r < 0)goto out_close_fsid;// choose min_alloc_sizeif (cct->_conf->bluestore_min_alloc_size) {min_alloc_size = cct->_conf->bluestore_min_alloc_size;} else {ceph_assert(bdev);if (bdev->is_rotational()) {min_alloc_size = cct->_conf->bluestore_min_alloc_size_hdd;} else {min_alloc_size = cct->_conf->bluestore_min_alloc_size_ssd;}}//验证块设备大小是否足够启用 bluefs_validate_bdev();// make sure min_alloc_size is power of 2 aligned.if (!isp2(min_alloc_size)) {...goto out_close_bdev;}// 启用 cephfs 及其 db,用来存储元数据,一般是 rocksdbr = _open_db(true);if (r < 0)goto out_close_bdev;...// 记录 kv_backend 数据库类型r = write_meta("kv_backend", cct->_conf->bluestore_kvbackend);if (r < 0)goto out_close_fm;// 记录是否采用 bluefs 代替文件系统,基本都采用r = write_meta("bluefs", stringify(bluefs ? 1 : 0));if (r < 0)goto out_close_fm;// 更新 fsidif (fsid != old_fsid) {r = _write_fsid();if (r < 0) {derr << __func__ << " error writing fsid: " << cpp_strerror(r) << dendl;goto out_close_fm;}}if (out_of_sync_fm.fetch_and(0)) {_sync_bluefs_and_fm();}out_close_fm:_close_fm();out_close_db:_close_db();out_close_bdev:_close_bdev();out_close_fsid:_close_fsid();out_path_fd:_close_path();if (r == 0 &&cct->_conf->bluestore_fsck_on_mkfs) {int rc = fsck(cct->_conf->bluestore_fsck_on_mkfs_deep);if (rc < 0)return rc;if (rc > 0) {derr << __func__ << " fsck found " << rc << " errors" << dendl;r = -EIO;}}if (r == 0) {// indicate success by writing the 'mkfs_done' filer = write_meta("mkfs_done", "yes");}if (r < 0) {derr << __func__ << " failed, " << cpp_strerror(r) << dendl;} else {dout(0) << __func__ << " success" << dendl;}return r;
}
**********dup
if (op == "dup") {string target_type;char fn[PATH_MAX];snprintf(fn, sizeof(fn), "%s/type", target_data_path.c_str());// 创建 target-path/type 文件int fd = ::open(fn, O_RDONLY);bufferlist bl;bl.read_fd(fd, 64);if (bl.length()) {target_type = string(bl.c_str(), bl.length() - 1); // drop \n}::close(fd);ObjectStore *targetfs = ObjectStore::create(g_ceph_context, target_type,target_data_path, "", 0);if (targetfs == NULL) {cerr << "Unable to open store of type " << target_type << std::endl;return 1;}int r = dup(dpath, fs, target_data_path, targetfs);if (r < 0) {cerr << "dup failed: " << cpp_strerror(r) << std::endl;return 1;}return 0;}
fuse
if (op == "fuse") {
#ifdef HAVE_LIBFUSE// FuseStore fuse(fs, mountpoint);cout << "mounting fuse at " << mountpoint << " ..." << std::endl;// 通过用户态libfuse挂载 objectstoreint r = fuse.main();if (r < 0) {cerr << "failed to mount fuse: " << cpp_strerror(r) << std::endl;return 1;}
#elsecerr << "fuse support not enabled" << std::endl;
#endifreturn 0;}int FuseStore::main()
{const char *v[] = {"foo",mount_point.c_str(),"-f","-d", // debug};int c = 3;auto fuse_debug = store->cct->_conf.get_val<bool>("fuse_debug");if (fuse_debug)++c;// 调用 libfuse 库的 fuse_main 来挂载自制文件系统return fuse_main(c, (char**)v, &fs_oper, (void*)this);
}
**apply-layout-setting
import
int ObjectStoreTool::do_import(ObjectStore *store, OSDSuperblock &sb,bool force, std::string pgidstr) {bufferlist ebl;pg_info_t info;PGLog::IndexedLog log;bool skipped_objects = false;if (!dry_run)// 递归删除 is_tepm || _has_remove_flag 标志的pg内容// OSD::recursive_remove_collection(g_ceph_context, store, pgid, *it);finish_remove_pgs(store);// 读取要导入的文件中 super_header// 之前 exprot 时,向文件中写入:super_header, pg_begin, object, pg_end 等信息int ret = read_super();//First section must be TYPE_PG_BEGINsectiontype_t type;// 读取 pg_beginret = read_section(&type, &ebl);auto ebliter = ebl.cbegin();pg_begin pgb;pgb.decode(ebliter);spg_t pgid = pgb.pgid;if (pgidstr.length()) {spg_t user_pgid;// 验证 命令行输入的pg_id 和文件中读取到的 pg_id 是否一致bool ok = user_pgid.parse(pgidstr.c_str());// This succeeded in main() alreadyceph_assert(ok);if (pgid != user_pgid) {cerr << "specified pgid " << user_pgid<< " does not match actual pgid " << pgid << std::endl;return -EINVAL;}}// 验证集群 fsid 是否一致,这也要求必须导入的文件必须来自同一个集群if (!pgb.superblock.cluster_fsid.is_zero()&& pgb.superblock.cluster_fsid != sb.cluster_fsid) {cerr << "Export came from different cluster with fsid "<< pgb.superblock.cluster_fsid << std::endl;return -EINVAL;}// Special case: Old export has SHARDS incompat feature on replicated pg, removqqe itif (pgid.is_no_shard())pgb.superblock.compat_features.incompat.remove(CEPH_OSD_FEATURE_INCOMPAT_SHARDS);if (sb.compat_features.compare(pgb.superblock.compat_features) == -1) {CompatSet unsupported = sb.compat_features.unsupported(pgb.superblock.compat_features);cerr << "Export has incompatible features set " << unsupported << std::endl;// Let them import if they specify the --force optionif (!force)return 11; // Positive return means exit status}// we need the latest OSDMap to check for collisionsOSDMap curmap;bufferlist bl;// 获取 osd_mapret = get_osdmap(store, sb.current_epoch, curmap, bl);pool_pg_num_history_t pg_num_history;get_pg_num_history(store, &pg_num_history);ghobject_t pgmeta_oid = pgid.make_pgmeta_oid();// Check for PG already present.coll_t coll(pgid);if (store->collection_exists(coll)) {cerr << "pgid " << pgid << " already exists" << std::endl;return -EEXIST;}// 创建 pg、osdriver 句柄 ObjectStore::CollectionHandle ch;OSDriver driver(store,coll_t(),OSD::make_snapmapper_oid());//SnapMapper mapper(g_ceph_context, &driver, 0, 0, 0, pgid.shard);bool done = false;bool found_metadata = false;metadata_section ms;while (!done) {// 读取 文件 指针目前指向的 section_header,并返回 type 及其内容ret = read_section(&type, &ebl);// 跳过不能识别的 sectionif (type >= END_OF_TYPES) {cout << "Skipping unknown section type" << std::endl;continue;}// 根据 tpye,填充对应的 bl 信息:object,metadata,pg-endswitch (type) {case TYPE_OBJECT_BEGIN:ceph_assert(found_metadata);// 导入 object 内容ret = get_object(store, driver, mapper, coll, ebl, ms.osdmap,&skipped_objects);if (ret) return ret;break;case TYPE_PG_METADATA:ret = get_pg_metadata(store, ebl, ms, sb, pgid);if (ret) return ret;found_metadata = true;if (pgid != ms.info.pgid) {cerr << "specified pgid " << pgid << " does not match import file pgid "<< ms.info.pgid << std::endl;return -EINVAL;}// make sure there are no conflicting splits or mergesif (ms.osdmap.have_pg_pool(pgid.pgid.pool())) {auto p = pg_num_history.pg_nums.find(pgid.pgid.m_pool);if (p != pg_num_history.pg_nums.end() &&!p->second.empty()) {unsigned start_pg_num = ms.osdmap.get_pg_num(pgid.pgid.pool());unsigned pg_num = start_pg_num;for (auto q = p->second.lower_bound(ms.map_epoch);q != p->second.end();++q) {unsigned new_pg_num = q->second;cout << "pool " << pgid.pgid.pool() << " pg_num " << pg_num<< " -> " << new_pg_num << std::endl;// check for merge targetspg_t target;if (pgid.is_merge_source(pg_num, new_pg_num, &target)) {// FIXME: this checks assumes the OSD's PG is at the OSD's// map epoch; it could be, say, at *our* epoch, pre-merge.coll_t coll(target);if (store->collection_exists(coll)) {cerr << "pgid " << pgid << " merges to target " << target<< " which already exists" << std::endl;return 12;}}// check for split childrenset <spg_t> children;if (pgid.is_split(start_pg_num, new_pg_num, &children)) {cerr << " children are " << children << std::endl;for (auto child : children) {coll_t coll(child);if (store->collection_exists(coll)) {cerr << "pgid " << pgid << " splits to " << children<< " and " << child << " exists" << std::endl;return 12;}}}pg_num = new_pg_num;}}} else {cout << "pool " << pgid.pgid.pool() << " doesn't existing, not checking"<< " for splits or mergers" << std::endl;}if (!dry_run) {ObjectStore::Transaction t;ch = store->create_new_collection(coll);create_pg_collection(t, pgid,pgid.get_split_bits(ms.osdmap.get_pg_pool(pgid.pool())->get_pg_num()));init_pg_ondisk(t, pgid, NULL);// mark this coll for removal until we're donemap <string, bufferlist> values;encode((char) 1, values["_remove"]);t.omap_setkeys(coll, pgid.make_pgmeta_oid(), values);store->queue_transaction(ch, std::move(t));}break;// pg-end 标志着整个 pg 文件已经全部导入case TYPE_PG_END:ceph_assert(found_metadata);done = true;break;default:cerr << "Unknown section type " << std::to_string(type) << std::endl;return -EFAULT;}}ObjectStore::Transaction t;if (!dry_run) {pg_log_t newlog, reject;pg_log_t::filter_log(pgid, ms.osdmap, g_ceph_context->_conf->osd_hit_set_namespace,ms.log, newlog, reject);divergent_priors_t newdp, rejectdp;filter_divergent_priors(pgid, ms.osdmap, g_ceph_context->_conf->osd_hit_set_namespace,ms.divergent_priors, newdp, rejectdp);ms.divergent_priors = newdp;ms.missing.filter_objects([&](const hobject_t &obj) {if (obj.nspace == g_ceph_context->_conf->osd_hit_set_namespace)return false;ceph_assert(!obj.is_temp());object_t oid = obj.oid;object_locator_t loc(obj);pg_t raw_pgid = ms.osdmap.object_locator_to_pg(oid, loc);pg_t _pgid = ms.osdmap.raw_pg_to_pg(raw_pgid);return pgid.pgid != _pgid;});// Just like a split invalidate stats since the object count is changedif (skipped_objects)ms.info.stats.stats_invalid = true;// 导入 mete-data 内容ret = write_pg(t,ms.map_epoch,ms.info,newlog,ms.past_intervals,ms.divergent_priors,ms.missing);if (ret) return ret;}if (!dry_run) {t.omap_rmkey(coll, pgid.make_pgmeta_oid(), "_remove");wait_until_done(&t, [&] {store->queue_transaction(ch, std::move(t));// make sure we flush onreadable items before mapper/driver are destroyed.ch->flush();});}return 0;
}
get-osdmap | get-inc-osdmap
int get_osdmap(ObjectStore *store, epoch_t e, OSDMap &osdmap, bufferlist &bl) {// 获取 collection 句柄,ObjectStore::CollectionHandle ch = store->open_collection(coll_t::meta());// 读取 osd map,读操作不需要经过事务// OSD::get_inc_osdmap_pobject_name(e) 获取 inc osdmapbool found = store->read(ch, OSD::get_osdmap_pobject_name(e), 0, 0, bl) >= 0;osdmap.decode(bl);return 0;
}
set-osdmap | set-inc-osdmap
if (op == "set-osdmap") {bufferlist bl;// 读取要写入的 osdmap 文件ret = get_fd_data(file_fd, bl);if (ret < 0) {cerr << "Failed to read osdmap " << cpp_strerror(ret) << std::endl;} else {// 设置 osdmapret = set_osdmap(fs, epoch, bl, force);}goto out;}int set_osdmap(ObjectStore *store, epoch_t e, bufferlist &bl, bool force) {OSDMap osdmap;osdmap.decode(bl);// 获取 collection 句柄auto ch = store->open_collection(coll_t::meta());// 获取 osdmap id// const ghobject_t inc_oid = OSD::get_inc_osdmap_pobject_name(e);const ghobject_t full_oid = OSD::get_osdmap_pobject_name(e);// 写入 osdmapObjectStore::Transaction t;t.write(coll_t::meta(), full_oid, 0, bl.length(), bl);t.truncate(coll_t::meta(), full_oid, bl.length());store->queue_transaction(ch, std::move(t));return 0;
}
update-mon-db
int update_mon_db(ObjectStore& fs, OSDSuperblock& sb,const string& keyring,const string& store_path)
{MonitorDBStore ms(store_path);// 打开 mondbint r = ms.create_and_open(cerr);if (r < 0) {cerr << "unable to open mon store: " << store_path << std::endl;return r;}// 更新 kerying if ((r = update_auth(keyring, sb, ms)) < 0) {goto out;}// 更新 osdmapif ((r = update_osdmap(fs, sb, ms)) < 0) {goto out;}// 更新 monitorif ((r = update_monitor(sb, ms)) < 0) {goto out;}out:ms.close();return r;
}
remove(建议 export-remove)
// Please use export-remove or you must use --force option
int initiate_new_remove_pg(ObjectStore *store, spg_t r_pgid) {// 测试模式,直接返回 0if (!dry_run)finish_remove_pgs(store);if (!store->collection_exists(coll_t(r_pgid)))return -ENOENT;if (dry_run)return 0;ObjectStore::Transaction rmt;int r = mark_pg_for_removal(store, r_pgid, &rmt);if (r < 0) {return r;}ObjectStore::CollectionHandle ch = store->open_collection(coll_t(r_pgid));store->queue_transaction(ch, std::move(rmt));finish_remove_pgs(store);return r;
}
fix-lost ( a job can be do)
/* fixme: using full features */
list
int do_list(ObjectStore *store, string pgidstr, string object, boost::optional <std::string> nspace,Formatter *formatter, bool debug, bool human_readable, bool head) {int r;lookup_ghobject lookup(object, nspace, head);if (pgidstr.length() > 0) {/** auto ch = store->open_collection(coll);* ghobject_t next;* vecotr<ghobject_t> list;* int r = store->collection_list(ch, next, ghobject_t::get_max(), LIST_AT_A_TIME, list, next)* 获取到 pg 所有的对象,及对象信息*/r = action_on_all_objects_in_pg(store, pgidstr, lookup, debug);} else {r = action_on_all_objects(store, lookup, debug);}if (r)return r;lookup.dump(formatter, human_readable);formatter->flush(cout);return 0;
}
meta-list
// 与 list 原理相同
// 指定pg 为 coll_t::meta() 元数据pg
list-pgs
ret = fs->list_collections(ls);// Find pgfor (it = ls.begin(); it != ls.end(); ++it) {spg_t tmppgid;if (pgidstr == "meta") {if (it->to_str() == "meta")break;elsecontinue;}if (!it->is_pg(&tmppgid)) {continue;}if (it->is_temp(&tmppgid)) {continue;}if (op != "list-pgs" && tmppgid != pgid) {continue;}if (op != "list-pgs") {//Found!break;}cout << tmppgid << std::endl;}
ceph-objectstore-tool相关推荐
- 【ceph】后端存储ObjectStore|BlueStore
目录 ObjectStore简介 CEPH OBJECTSTORE API介绍 ObjectStore API 操作 事务 日志 实现 ObjectStore 实现 FileStore KeyValu ...
- 一文囊括Ceph所有利器(工具)
原文链接: 知乎专栏: 一文囊括Ceph所有利器(工具) - 知乎 前言 ceph的工具很多,包括集群管理与运维,还有性能分析等等. 所以本文期望应收尽收所有的工具,也当做自己的一个梳理与总结,当自己 ...
- 【ceph】cmake管理Ceph编译+Ceph工程目录+cmake 实战学习
前言 Ceph cmake 工程 cmake生成的目录 cmake工程添加新模块(CMakeLists.txt) 添加动态库依赖 cmake导入外部链接库 *.cmake文件 cmake生成编译DEB ...
- ceph bluestore源码分析:非对齐写逻辑
文章目录 环境 原理说明 总结 环境 ceph:12.2.1 场景:ec 2+1 部署cephfs,执行如右写模式:dd if=/dev/zero of=/xxx/cephfs bs=6K count ...
- 关于OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error问题的解决
环境: ceph L版本12.2.1升级到12.2.12 这个问题是由于升级后进行12.2.12环境中的使用ceph-disk 进行osd部署时出现如下问题,执行命令 ceph-disk -v pre ...
- 分布式存储 Ceph 的演进经验 · SOSP 2019
『看看论文』是一系列分析计算机和软件工程领域论文的文章,我们在这个系列的每一篇文章中都会阅读一篇来自 OSDI.SOSP 等顶会中的论文,这里不会事无巨细地介绍所有的细节,而是会筛选论文中的关键内容, ...
- ceph 代码分析 读_分布式存储 Ceph 的演进经验 SOSP 2019
『看看论文』是一系列分析计算机和软件工程领域论文的文章,我们在这个系列的每一篇文章中都会阅读一篇来自 OSDI.SOSP 等顶会中的论文,这里不会事无巨细地介绍所有的细节,而是会筛选论文中的关键内容, ...
- openstack整合ceph
2019独角兽企业重金招聘Python工程师标准>>> 环境:ubuntu16.04.ceph:10.2.3.openstack:14.0.1 一.在ceph集群中创建池 ceph ...
- ceph存储原理_赠书 | Linux 开源存储全栈详解——从Ceph到容器存储
// 留言点赞赠书我有书,你有故事么?留言说出你的存储故事留言点赞前两名,免费送此书截止日期12.27号12.30号公布名单 // 内容简介 本书致力于帮助读者形成有关Linux开源存储世界的细致的拓 ...
- 020 ceph作openstack的后端存储
一.使用ceph做glance后端 1.1 创建用于存储镜像的池 [root@serverc ~]# ceph osd pool create images 128 128 pool 'images ...
最新文章
- python3入门书籍-零基础自学python3 好用的入门书籍推荐
- Python 开发面试题
- ***检测与网络审计产品是孪生兄弟吗?
- 字符串处理:布鲁特--福斯算法
- 用c语言编写一个2048 游戏,求c语言编写的2048游戏代码,尽量功能完善一些
- 电路结构原理_精密半波、全波整流电路结构原理图解
- SonarQube搭建和使用教程
- SG函数(hdu1847)
- 服务器虚拟内存最佳设置范围,虚拟内存有什么用?虚拟内存设置多少合适?
- Compose Modifier修饰符详细解说
- 共阳极数码管与共阴极数码管联合使用来循环显示数字00-99。
- Python作业:公鸡5元/只,母鸡3元/只,小鸡1元3只。问100元怎么买到100只。
- Fresco加载图片优化
- 夏日炎炎玩转新加坡:盘点室内景点和夜游好去处
- APIO10-特别行动队-题解
- 什么是Java / JVM中的-Xms和-Xms参数(已更新至Java 13)
- 桔皮加蜂蜜的制作方法?桔皮加蜂蜜泡水喝吗?
- 吉大C语言程序设计作业,吉大19年9月《C语言程序设计》作业考核试题答案
- 英语思维导图大全 前言(一)
- ncl批量处理多个nc文件_利用MATLAB读取NC文件并绘图
热门文章
- 知乎回答:每日完成任务用于打卡的APP
- php算法,冒泡排序
- 细思极恐的“立体”用户画像,如何为“新零售”赋能?
- Python数据分析+可视化项目案例教学:亚马逊平台用户订单数据分析
- Linux-CentOS安装N卡驱动以及解决屏幕亮度不可调问题
- Mac OS:PC安装Mac OS X Lion记录
- HTML5+CSS大作业——个人简历设计(5页) html期末作业代码网页设计 期末作业成品代码
- java.好友发送验证申请_SpringBoot+LayIM+t-io 实现好友申请通知流程
- 新浪微博发布2023年8月微博短视频排行榜
- 问鼎OSPF(6)-因地巧施张良计,宏图霸业指日统