
  • 前言
  • 集群目录结构
  • 表文件存储形式
  • 结尾


本文是基于postgresql 14的代码进行分析解读,演示是在centos8系统上进行。


  • 集群目录结构

postgres数据库,通过 initdb初始化一个数据库集群目录,目录下存放着当前集群的所有数据,在磁盘上以目录和文件的方式来组织。我们下面看一下,集群目录的结构。

./zptest/├── base│   ├── 1│   ├── 4│   └── 5├── global│   ├── 1213│   ├── 1213_fsm│   ├── 1213_vm│   ├── 1214│   ├── 1232│   ├── 1233│   ├── 1260│   ├── 1260_fsm│   ├── 1260_vm│   ├── 1261│   ├── 1261_fsm│   ├── 1261_vm│   ├── 1262│   ├── 1262_fsm│   ├── 1262_vm│   ├── 2396│   ├── 2396_fsm│   ├── 2396_vm│   ├── 2397│   ├── 2671│   ├── 2672│   ├── 2676│   ├── 2677│   ├── 2694│   ├── 2695│   ├── 2697│   ├── 2698│   ├── 2846│   ├── 2847│   ├── 2964│   ├── 2965│   ├── 2966│   ├── 2967│   ├── 3592│   ├── 3593│   ├── 4060│   ├── 4061│   ├── 4175│   ├── 4176│   ├── 4177│   ├── 4178│   ├── 4181│   ├── 4182│   ├── 4183│   ├── 4184│   ├── 4185│   ├── 4186│   ├── 6000│   ├── 6001│   ├── 6002│   ├── 6100│   ├── 6114│   ├── 6115│   ├── pg_control│   └── pg_filenode.map├── pg_commit_ts├── pg_dynshmem├── pg_hba.conf├── pg_ident.conf├── pg_logical│   ├── mappings│   ├── replorigin_checkpoint│   └── snapshots├── pg_multixact│   ├── members│   └── offsets├── pg_notify├── pg_replslot├── pg_serial├── pg_snapshots├── pg_stat├── pg_stat_tmp├── pg_subtrans│   └── 0000├── pg_tblspc├── pg_twophase├── PG_VERSION├── pg_wal│   ├── 000000010000000000000001│   └── archive_status├── pg_xact│   └── 0000├── postgresql.auto.conf└── postgresql.conf



/** Object ID is a fundamental type in Postgres.*/typedef unsigned int Oid;
postgres=# select oid, datname  from pg_database order by oid;oid |  datname -----+-----------1 | template14 | template05 | postgres(3 rows)


postgres=# select pg_relation_filepath('pg_class');pg_relation_filepath----------------------base/5/1259(1 row)


  • 表文件存储形式

(1) 表文件与表OID的关系:


postgres=# create table test(id integer);CREATE TABLEpostgres=# select oid from pg_class where relname='test';oid -------16384(1 row)postgres=# select pg_relation_filepath('test');pg_relation_filepath----------------------base/5/16384(1 row)

但是这种对应关系也会发生变化,如vaccum full时;所以要找到正确的对应,需要用pg_relation_filepath来查询。


/** The map file is critical data: we have no automatic method for recovering* from loss or corruption of it.  We use a CRC so that we can detect* corruption.  To minimize the risk of failed updates, the map file should* be kept to no more than one standard-size disk sector (ie 512 bytes),* and we use overwrite-in-place rather than playing renaming games.* The struct layout below is designed to occupy exactly 512 bytes, which* might make filesystem updates a bit more efficient.** Entries in the mappings[] array are in no particular order.  We could* speed searching by insisting on OID order, but it really shouldn't be* worth the trouble given the intended size of the mapping sets.*/#define RELMAPPER_FILENAME                "pg_filenode.map"

(2) 表文件大小:


当表数据超过1GB时,会创建新的表文件,表文件名由oid.1 oid.2 … 编号,来拆分成多个文件。

/* RELSEG_SIZE is the maximum number of blocks allowed in one disk file. Thus,the maximum size of a single file is RELSEG_SIZE * BLCKSZ; relations biggerthan that are divided into multiple files. RELSEG_SIZE * BLCKSZ must beless than your OS' limit on file size. This is often 2 GB or 4GB in a32-bit operating system, unless you have large file support enabled. Bydefault, we make the limit 1 GB to avoid any possible integer-overflowproblems within the OS. A limit smaller than necessary only means we dividea large relation into more chunks than necessary, so it seems best to errin the direction of a small limit. A power-of-2 value is recommended tosave a few cycles in md.c, but is not absolutely required. ChangingRELSEG_SIZE requires an initdb. */#define RELSEG_SIZE 131072

BLCKSZ * RELSEG_SIZE 来限制每个表文件里的block数量,BLCKSZ 默认为8KB;




