Postgresql BRIN索引

与我们熟悉的索引不同，BRIN的思路是避免扫描不合适的行，而不是快速找到匹配的行。它不是一个精确的索引：不包含表行的TID。

简而言之，对于值与表中物理位置相关的列，BRIN效果会比较好。换句话说，如果没有ORDER BY子句的查询实际上以升序或降序返回列值（并且该列上没有索引）。

BRIN原理如下，该表按范围被分为几个大的块，因此得名：Block Range Index，BRIN。索引存储有关每个范围中数据的摘要信息。规则上是最小值和最大值，但如进一步所示，也有不同的情况。假设执行一个包含有筛选条件的查询；如果所搜索的值未进入该区间，则跳过这个区间；但如果可以在该区间匹配到，则会仔细检查所有块中的所有行。

BRIN结构

第一页包含元数据。
从元数据中特定的偏移量可以定位到带有概要信息的页面。这些页面上的每个索引行都包含一个范围的摘要信息。

在元数据库页面和摘要数据之间，找到具有反向范围图（简称为“ revmap”）的页面。实际上，这是指向相应索引行的指针（TID）的数组。

对于某些范围，revmap中的指针可能不会指向索引行（图中的灰色标记）。在这种情况下，该范围被认为还没有摘要信息。

扫描索引

如果索引不包含涉及的行，该如何使用？这种访问方法当然不能通过TID返回TID行，但可以构建bitmap。bitmap页面可以有两种：精确到位的页面和精确到页的页面。使用精确到页的位图。

该算法很简单。顺序扫描按范围切分的图。指针用于确定具有范围摘要信息的索引行。如果范围内不包含所寻求的值，则跳过该范围，如果范围包含该值，则该范围内所有页面都添加到bitmap中。然后，照常使用生成的位图。

更新索引

当表变更的时候，索引更新更加有趣。
当添加新版本的行到表页面的时候,会确定该行包含在哪个范围中，并使用范围图查找包含摘要信息的索引行，这些都是简单的算术运算。例如，假设范围的尺寸为4，在第13页上，行版本的行值为42。范围的编号（从零开始）是13/4 = 3，因此，在 revmap中，我们采用偏移量为3的指针（其顺序号为4）。
此范围的最小值为31，最大值为40。由于新值42不在间隔内，因此我们更新最大值（见图）。但是，如果新值仍在存储的范围之内，则无需更新索引。

当页面的新版本出现在可用摘要信息的范围内的时候，创建索引时，会为所有可用范围计算摘要信息，但是在进一步扩展表时，可能会有出现超出范围的新页面。这里有两个选项：

索引不会立即更新。如前所述，在扫描索引时，将扫描整个范围。实际上更新是在“vacuum”期间完成的，或者可以手工调用“ brin_summarize_new_values”函数来完成。
如果我们使用autosummarize参数创建索引，更新将立即完成。但是，当使用新值填充范围页面时，更新可能会经常发生，因此，默认情况下此参数处于关闭状态。

当出现新范围时，«revmap»的大小可能会增加。位于元页面和摘要数据之间的map需要由另一页面扩展时，现有行版本就会移至其他页面。因此，范围图始终位于元页面和摘要数据之间。

当删除一行时，什么也不会发生。有时最小值或最大值将被删除，这种情况下可以减小间隔。但是要检测到这一点，我们将必须读取该范围内的所有值，读取所有值的代价比较高。

为了索引的正确性不受影响，搜索可能需要查看比实际需要更多的范围。通常，可以手动重新计算此类区域的摘要信息（通过调用«brin_desummarize_range»和«brin_summarize_new_values»函数。更新一行只是删除老的版本，增加新版本。

例子：

--建表
create table flights_bi(airport_code char(3),airport_coord point,         -- geo coordinates of airportairport_utc_offset interval, -- time zoneflight_no char(6),           -- flight numberflight_type text,           -- flight type: departure / arrival scheduled_time timestamptz,  -- scheduled departure/arrival time of flightactual_time timestamptz,     -- actual time of flightaircraft_code char(3),seat_no varchar(4),          -- seat numberfare_conditions varchar(10), -- travel classpassenger_id varchar(20),passenger_name text
);--填充数据
DO $$
<<local>>
DECLAREcurdate date := (SELECT min(scheduled_departure) FROM flights);utc_offset interval;
BEGINWHILE (curdate <= bookings.now()::date) LOOPutc_offset := interval '12 hours';WHILE (utc_offset >= interval '2 hours') LOOPINSERT INTO flights_biWITH flight (airport_code,airport_coord,flight_id,flight_no,scheduled_time,actual_time,aircraft_code,flight_type) AS (-- прибытияSELECT a.airport_code,a.coordinates,f.flight_id,f.flight_no,f.scheduled_departure,f.actual_departure,f.aircraft_code,'departure'FROM   airports a,flights f,pg_timezone_names tznWHERE  a.airport_code = f.departure_airportAND    f.actual_departure IS NOT NULLAND    tzn.name = a.timezoneAND    tzn.utc_offset = local.utc_offsetAND    timezone(a.timezone, f.actual_departure)::date = curdateUNION ALL-- вылетыSELECT a.airport_code,a.coordinates,f.flight_id,f.flight_no,f.scheduled_arrival,f.actual_arrival,f.aircraft_code,'arrival'FROM   airports a,flights f,pg_timezone_names tznWHERE  a.airport_code = f.arrival_airportAND    f.actual_arrival IS NOT NULLAND    tzn.name = a.timezoneAND    tzn.utc_offset = local.utc_offsetAND    timezone(a.timezone, f.actual_arrival)::date = curdate)SELECT f.airport_code,f.airport_coord,local.utc_offset,f.flight_no,f.flight_type,f.scheduled_time,f.actual_time,f.aircraft_code,s.seat_no,s.fare_conditions,t.passenger_id,t.passenger_nameFROM   flight fJOIN seats sON s.aircraft_code = f.aircraft_codeLEFT JOIN boarding_passes bpON bp.flight_id = f.flight_idAND bp.seat_no = s.seat_noLEFT JOIN ticket_flights tfON tf.ticket_no = bp.ticket_noAND tf.flight_id = bp.flight_idLEFT JOIN tickets tON t.ticket_no = tf.ticket_no;RAISE NOTICE '%, %', curdate, utc_offset;utc_offset := utc_offset - interval '1 hour';END LOOP;curdate := curdate + 1;END LOOP;
END;
$$;
demo=# select count(*) from flights_bi;count
----------30517076
(1 row)
demo=# select pg_size_pretty(pg_total_relation_size('flights_bi'));pg_size_pretty
----------------4127 MB
(1 row)

在什么列上建立BRIN索引

之前提过数据必须和物理的位置想关联，而且该表更新不能太频繁，于是我们查询pg_stats可以看到，越是接近1，说明物理顺序和逻辑顺序越一致，所以该列适合建立BRIN索引。

demo=# analyze flights_bi;demo=# select attname, correlation from pg_stats where tablename='flights_bi'
order by correlation desc nulls last;attname       | correlation
--------------------+-------------scheduled_time     |    0.999994actual_time        |    0.999994fare_conditions    |    0.796719flight_type        |    0.495937airport_utc_offset |    0.438443aircraft_code      |    0.172262airport_code       |   0.0543143flight_no          |   0.0121366seat_no            |  0.00568042passenger_name     |   0.0046387passenger_id       | -0.00281272airport_coord      |
(12 rows)

通过以上查询，我们可以看到scheduled_time比较适合建立BRIN索引。

demo=# create index on flights_bi using brin(scheduled_time);
demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_scheduled_time_idx'));pg_size_pretty
----------------184 kB
(1 row)

查看使用索引的执行计划

demo=# \set d 'bookings.now()::date - interval \'7 days\''demo=# explain (costs off,analyze)select *from flights_biwhere scheduled_time >= :d and scheduled_time < :d + interval '1 day';QUERY PLAN
--------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=10.282..94.328 rows=83954 loops=1)Recheck Cond: ...Rows Removed by Index Recheck: 12045Heap Blocks: lossy=1664->  Bitmap Index Scan on flights_bi_scheduled_time_idx(actual time=3.013..3.013 rows=16640 loops=1)Index Cond: ...Planning time: 0.375 msExecution time: 97.805 ms

如上，使用了 flights_bi_scheduled_time_idx索引，满足查询条件的行数（Bitmap Heap Scan 的行）与使用索引返回的总行数（bitmap Heap Scan的行加上索引过滤删除的行）之比告诉我们。83954 / (83954 + 12045),大约为87%。

demo=# select airport_utc_offset, count(distinct (ctid::text::point)[0])/365 numblk
from flights_bi
group by airport_utc_offset
order by 2;airport_utc_offset | numblk
--------------------+--------12:00:00           |      606:00:00           |      802:00:00           |     1011:00:00           |     1308:00:00           |     2809:00:00           |     2910:00:00           |     4004:00:00           |     4707:00:00           |    11005:00:00           |    23103:00:00           |    932
(11 rows)

可以看到根据时区的分布的数据不均匀，以下创建一个pages_per_range为4的BRIN索引，这里表示每个range包含4个page，这个值越低，说明精度越高，而且索引也会越大，那么扫描索引的成本也会更高。边界越小，说明数据被切分的越细，精度自然高。

demo=# create index on flights_bi using brin(airport_utc_offset) with (pages_per_range=4);demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_airport_utc_offset_idx'));pg_size_pretty
----------------6528 kB
(1 row)demo=# explain (costs off,analyze)select *from flights_biwhere airport_utc_offset = interval '8 hours';QUERY PLAN
----------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=75.151..192.210 rows=587353 loops=1)Recheck Cond: (airport_utc_offset = '08:00:00'::interval)Rows Removed by Index Recheck: 191318Heap Blocks: lossy=13380->  Bitmap Index Scan on flights_bi_airport_utc_offset_idx(actual time=74.999..74.999 rows=133800 loops=1)Index Cond: (airport_utc_offset = '08:00:00'::interval)Planning time: 0.168 msExecution time: 212.278 ms

以上执行计划可以看到又一次使用了BRIN索引，但是精确度比较低，大概在75%，因为这一列的线性相关性比价低。

当然也可以使用多个BRIN索引，如下：

demo=# \set d 'bookings.now()::date - interval \'60 days\''demo=# explain (costs off,analyze)select *from flights_biwhere scheduled_time >= :d and scheduled_time < :d + interval '30 days'and airport_utc_offset = interval '8 hours';QUERY PLAN
---------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=62.046..113.849 rows=48154 loops=1)Recheck Cond: ...Rows Removed by Index Recheck: 18856Heap Blocks: lossy=1152->  BitmapAnd (actual time=61.777..61.777 rows=0 loops=1)->  Bitmap Index Scan on flights_bi_scheduled_time_idx(actual time=5.490..5.490 rows=435200 loops=1)Index Cond: ...->  Bitmap Index Scan on flights_bi_airport_utc_offset_idx(actual time=55.068..55.068 rows=133800 loops=1)Index Cond: ...Planning time: 0.408 msExecution time: 115.475 ms

创建B-Tree索引比较以下

demo=# create index flights_bi_scheduled_time_btree on flights_bi(scheduled_time);demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_scheduled_time_btree'));pg_size_pretty
----------------654 MB
(1 row)

B-Tree比BRIN大几百倍，但是，查询的速度更快：执行计划使用统计信息来确定数据是按物理顺序排列的，不需要构建位图，不需要重新检查索引条件。

demo=# explain (costs off,analyze)select *from flights_biwhere scheduled_time >= :d and scheduled_time < :d + interval '1 day';QUERY PLAN
----------------------------------------------------------------Index Scan using flights_bi_scheduled_time_btree on flights_bi(actual time=0.099..79.416 rows=83954 loops=1)Index Cond: ...Planning time: 0.500 msExecution time: 85.044 ms

所以这里牺牲了效率，获得了更多的空间。

操作类型

比较运算符并不是会为所有的数据类型定义。例如，这里没有为机场地理坐标的点（“点”类型）定义。顺便说一下，正是由于这个原因，统计信息没有显示此列的相关性。

demo=# select attname, correlation
from pg_stats
where tablename='flights_bi' and attname = 'airport_coord';attname    | correlation
---------------+-------------airport_coord |
(1 row)

但是许多类似的类型使我们能够引入“边界区域”的概念，例如，几何形状的边界矩形。另外文章也详细讨论了GiST索引如何使用此功能。同样，BRIN还可以收集以下数据类型的列的摘要信息：范围内所有值的边界区域仅仅是摘要值。

与GiST不同，BRIN的摘要值必须与所索引的值具有相同的类型。因此，尽管很明显坐标可以在BRIN中工作，但我们无法建立点的索引：经度与时区紧密相关。幸运的是，在将点转换为退化的矩形后，没有任何事情会妨碍在表达式上创建索引。同时，我们将范围的大小设置为一页，以显示极限情况：

demo=# create index on flights_bi using brin (box(airport_coord)) with (pages_per_range=1);
--即使在这种情况下，索引也只有30M
demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_box_idx'));pg_size_pretty
----------------30 MB
(1 row)

然后我们执行一个查询，用坐标当做条件来查询机场

demo=# select airport_code, airport_name
from airports
where box(coordinates) <@ box '120,40,140,50';airport_code |  airport_name
--------------+-----------------KHV          | Khabarovsk-NovyiVVO          | Vladivostok
(2 rows)

但是执行计划没有使用我们的索引

demo=# analyze flights_bi;demo=# explain select * from flights_bi
where box(airport_coord) <@ box '120,40,140,50';QUERY PLAN
---------------------------------------------------------------------Seq Scan on flights_bi  (cost=0.00..985928.14 rows=30517 width=111)Filter: (box(airport_coord) <@ '(140,50),(120,40)'::box)

为什么呢？让我们禁用全表扫描看看

demo=# set enable_seqscan = off;demo=# explain select * from flights_bi
where box(airport_coord) <@ box '120,40,140,50';QUERY PLAN
--------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi  (cost=14079.67..1000007.81 rows=30517 width=111)Recheck Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)->  Bitmap Index Scan on flights_bi_box_idx(cost=0.00..14072.04 rows=30517076 width=0)Index Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)

可以看到能使用索引，但是执行计划认为位图必须建立在整个表上（请看“位图索引扫描”节点的“行”），也就不足为奇了，计划者选择了顺序扫描。这个案例。这里的问题是，对于几何类型，PostgreSQL不收集任何统计信息：

demo=# select * from pg_stats where tablename = 'flights_bi_box_idx' \gx
-[ RECORD 1 ]----------+-------------------
schemaname             | bookings
tablename              | flights_bi_box_idx
attname                | box
inherited              | f
null_frac              | 0
avg_width              | 32
n_distinct             | 0
most_common_vals       |
most_common_freqs      |
histogram_bounds       |
correlation            |
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |

但是没有人对该索引有任何抱怨，因为它确实可以正常工作：

demo=# explain (costs off,analyze)
select * from flights_bi where box(airport_coord) <@ box '120,40,140,50';QUERY PLAN
----------------------------------------------------------------------------------Bitmap Heap Scan on flights_bi (actual time=158.142..315.445 rows=781790 loops=1)Recheck Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)Rows Removed by Index Recheck: 70726Heap Blocks: lossy=14772->  Bitmap Index Scan on flights_bi_box_idx(actual time=158.083..158.083 rows=147720 loops=1)Index Cond: (box(airport_coord) <@ '(140,50),(120,40)'::box)Planning time: 0.137 msExecution time: 340.593 ms

如果几何图形有任何非一般的要求，则可以使用PostGIS。因为它可以收集统计信息。

内部构造

可以使用pageinspect插件查看内部信息

通过查看原信息可以知道分配了多少range,每个range有多少page.

demo=# select *
from brin_metapage_info(get_raw_page(‘flights_bi_scheduled_time_idx’,0));
magic | version | pagesperrange | lastrevmappage
------------±--------±--------------±---------------
0xA8109CFA | 1 | 128 | 3
(1 row)

这里第1至3页分配给revmap，其余的包含摘要数据。从“ revmap”中，我们可以获取每个范围的摘要数据的引用。如下，第一个range的信息（包含前128页）位于以下位置：

demo=# select *
from brin_revmap_data(get_raw_page('flights_bi_scheduled_time_idx',1))
limit 1;pages
---------(6,197)
(1 row)

摘要数据如下：

demo=# select allnulls, hasnulls, value
from brin_page_items(get_raw_page('flights_bi_scheduled_time_idx',6),'flights_bi_scheduled_time_idx'
)
where itemoffset = 197;allnulls | hasnulls |                       value
----------+----------+----------------------------------------------------f        | f        | {2016-08-15 02:45:00+03 .. 2016-08-15 17:15:00+03}
(1 row)

下一个range:

demo=# select *
from brin_revmap_data(get_raw_page('flights_bi_scheduled_time_idx',1))
offset 1 limit 1;pages
---------(6,198)
(1 row)
demo=# select allnulls, hasnulls, value
from brin_page_items(get_raw_page('flights_bi_scheduled_time_idx',6),'flights_bi_scheduled_time_idx'
)
where itemoffset = 198;allnulls | hasnulls |                       value
----------+----------+----------------------------------------------------f        | f        | {2016-08-15 06:00:00+03 .. 2016-08-15 18:55:00+03}
(1 row)

对于“包含”类，“value”字段将显示类似的值：

 {(94.4005966186523,69.3110961914062),(77.6600036621,51.6693992614746) .. f .. f}

BRIN的一些属性如下：

--可以在多列创建索引，amname |     name      | pg_indexam_has_property
--------+---------------+-------------------------brin   | can_order     | fbrin   | can_unique    | fbrin   | can_multi_col | tbrin   | can_exclude   | f
--只支持bitmap scanname      | pg_index_has_property
---------------+-----------------------clusterable   | findex_scan    | fbitmap_scan   | tbackward_scan | f可以操作NULLname        | pg_index_column_has_property
--------------------+------------------------------asc                | fdesc               | fnulls_first        | fnulls_last         | forderable          | fdistance_orderable | freturnable         | fsearch_array       | fsearch_nulls       | t

BRIN索引的特性，可以支持大数据量的查询，但是精度没有B-tree高，索引占用空间小，扫描速度快，写入数据的性能不怎么受索引影响。