块存储：AIO的直接写流程注释

提交io流程中aio_write之前函数的注释，可参考“块存储：AIO的直接读流程注释”。

设置iter迭代器的函数注释可参考“块存储：AIO的直接读流程注释”。

blkdev_write_iter函数调用__generic_file_write_iter开始直写，其注释：

如果是Direct写，__generic_file_write_iter将首先调用generic_file_direct_write函数，其注释如下：

块设备直写函数blkdev_direct_IO及其以下函数调用链的注释见“块存储：AIO的直接读流程注释”。

最后blkdev_write_iter函数调用generic_write_sync用于最后数据的安全落盘，实际是调用块设备的fsync函数blkdev_fsync, 向设备发送一个FLUSH指令，将设备本身带的cache落盘：

另外，对于具体文件系统，fsync()的实现取决于具体文件系统的实现，大部分情况下也会用到REQ_PREFLUSH接口将数据刷到硬盘存储介质。

上述blkdev_fsync先调用file_write_and_wait_range将page cache中的缓存直接落盘，但是OS并不知道磁盘上有没有写缓存，如果磁盘上面有写缓存，file_write_and_wait_range触发的落盘可能只落在了磁盘缓存上，并没有落在非易失介质上，所以需要触发下面的FLUSH指令。

FLUSH指令作用示意：

关于REQ_PREFLUSH
REQ_PREFLUSH 是bio的request flag，表示在本次io开始时先确保在它之前完成的io都已经写到非易失性存储里。
可在一个空的bio里设置REQ_PREFLUSH，表示回刷disk page cache里数据。

Explicit cache flushes (Documentation/block/writeback_cache_control.txt)
The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from the filesystem and will make sure the volatile cache of the storage device has been flushed before the actual I/O operation is started. This explicitly
guarantees that previously completed write requests are on non-volatile storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be set on an otherwise empty bio structure, which causes only an explicit cache
flush without any dependent I/O. It is recommend to use the blkdev_issue_flush() helper for a pure cache flush.

REQ_FLUSH：表示把磁盘cache中的data刷新到磁盘介质中，防止掉电丢失; REQ_FUA （force unit access）：绕过磁盘cache，直接把数据写到磁盘介质中。

Documentation/block/writeback_cache_control.txt：
==========================================
Explicit volatile write back cache control
==========================================Introduction
------------Many storage devices, especially in the consumer market, come with volatile
write back caches.  That means the devices signal I/O completion to the
operating system before data actually has hit the non-volatile storage.  This
behavior obviously speeds up various workloads, but it means the operating
system needs to force data out to the non-volatile storage when it performs
a data integrity operation like fsync, sync or an unmount.The Linux block layer provides two simple mechanisms that let filesystems
control the caching behavior of the storage device.  These mechanisms are
a forced cache flush, and the Force Unit Access (FUA) flag for requests.Explicit cache flushes
----------------------The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
the filesystem and will make sure the volatile cache of the storage device
has been flushed before the actual I/O operation is started.  This explicitly
guarantees that previously completed write requests are on non-volatile
storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
set on an otherwise empty bio structure, which causes only an explicit cache
flush without any dependent I/O.  It is recommend to use
the blkdev_issue_flush() helper for a pure cache flush.Forced Unit Access
------------------The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
filesystem and will make sure that I/O completion for this request is only
signaled after the data has been committed to non-volatile storage.Implementation details for filesystems
--------------------------------------Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.Implementation details for make_request_fn based block drivers
--------------------------------------------------------------These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
directly below the submit_bio interface.  For remapping drivers the REQ_FUA
bits need to be propagated to underlying devices, and a global flush needs
to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
data can be completed successfully without doing any work.  Drivers for
devices with volatile caches need to implement the support for these
flags themselves without any help from the block layer.Implementation details for request_fn based block drivers
---------------------------------------------------------For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.  For devices with volatile write caches the
driver needs to tell the block layer that it supports flushing caches by
doing::blk_queue_write_cache(sdkp->disk->queue, true, false);and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
REQ_PREFLUSH requests with a payload are automatically turned into a sequence
of an empty REQ_OP_FLUSH request followed by the actual write by the block
layer.  For devices that also support the FUA bit the block layer needs
to be told to pass through the REQ_FUA bit using::blk_queue_write_cache(sdkp->disk->queue, true, true);and the driver must handle write requests that have the REQ_FUA bit set
in prep_fn/request_fn.  If the FUA bit is not natively supported the block
layer turns it into an empty REQ_OP_FLUSH request after the actual write.

块存储：AIO的直接写流程注释相关推荐

块存储，文件存储和对象存储
首先,我们介绍这两种传统的存储类型.通常来讲,所有磁盘阵列都是基于Block块的模式(DAS),而所有的NAS产品都是文件级存储 1.块存储以下列出的两种存储方式都是块存储 ...
hdfs写流程和MR缓冲区
一.hdfs的写流程 1. 客户端发起RPC请求到NameNode 2. NameNode收到请求之后,进行校验: a. 校验用户是否有操作权限 b. 校验这个文件是否存在 3. 记录元数据,计算这个 ...
存储－对象存储、文件存储和块存储
块存储和文件存储是我们比较熟悉的两种主流的存储类型,而对象存储(Object-based Storage)是一种新的网络存储架构,基于对象存储技术的设备就是对象存储设备(Object-based St ...
阿里云服务（三）—对象存储OSS和块存储
五.对象存储OSS 块存储适合存放本地使用的一些文件,而且成本比较高,容量也有一些限制,不是适合数据量庞大的大数据. 1.对象存储OSS的概念 1.1 什么是对象存储OSS 存储分类 ...
【Linux集群教程】07 块存储之 iSCSI 服务
6 块存储之 iSCSI 服务 6.1 iSCSI 概述 6.1.1 iSCSI 与 SCSI 原理差别小型计算机系统接口(英语:Small Computer System Interface; 简 ...
F2FS源码分析-2.2 [F2FS 读写部分] F2FS的一般文件写流程分析
F2FS源码分析系列文章主目录一.文件系统布局以及元数据结构二.文件数据的存储以及读写 F2FS文件数据组织方式一般文件写流程一般文件读流程目录文件读流程(未完成) 目录文件写流程(未完成 ...
Hadoop理论——hdfs读、写流程
在Hadoop中我们一定会使用hdfs的传输,那么,hdfs的读写流程究竟是什么,我利用了一点时间整理了一下首先就是官网的图,介绍了HDFS hdfs写流程 1,客户端client调用Distrib ...
华为分布式块存储Fusion Storage知识总结（二）
目录一.华为分布式存储Fusion Storage介绍二.Fusion Storage优势(特点) 1.高弹性和扩展性 2.高性能 3.高可靠性 4.高安全性 5.数据保护 6.高易用性 Fusi ...
Rocksdb 写流程,读流程,WAL文件,MANIFEST文件,ColumnFamily,Memtable,SST文件原理详解
文章目录前言 Rocksdb写流程图 WAL 原理分析概述文件格式查看WAL的工具创建WAL 清理WAL MANIFEST原理分析概述查看MANIFEST的工具创建及清除 MANI ...

块存储：AIO的直接写流程注释

块存储：AIO的直接写流程注释相关推荐

最新文章

热门文章