PostgreSQL数据库统计信息—

在执行do_analyze_rel函数之前需要确定采样函数，如下：

如果是普通表或者物化视图，则采样函数采用acquire_sample_rows；relpages由RelationGetNumberOfBlocks(onerel)函数提供
如果是外部表，需要使用外部表提供的hook，如postgres_fdw的postgresAnalyzeForeignTable；relpages由同样由fdwroutine->AnalyzeForeignTable外部表hook提供
如果是继承表，采样函数采用acquire_inherited_sample_rows

do_analyze_rel函数主要分为准备工作、采样工作、更新统计信息流程三大流程。

do_analyze_rel函数准备工作流程如下：

创建anl_context内存上下文，并切换到该内存上下文中
切换到表所有者的userid，以便任何索引函数都作为该用户运行。
需要进一步确定要分析一个表中的哪些列：用户可能指定只分析表中的某几个列，被频繁访问的列才更有被分析的价值，然后还要打开待分析表的所有索引，看看是否有可以被分析的列。

static void do_analyze_rel(Relation onerel, VacuumParams *params, List *va_cols, AcquireSampleRowsFunc acquirefunc, BlockNumber relpages, bool inh, bool in_outer_xact, int elevel) {int         attr_cnt, tcnt, i, ind, nindexes, targrows, numrows, save_sec_context, save_nestlevel;double        totalrows, totaldeadrows;Relation   *Irel;bool      hasindex;VacAttrStats **vacattrstats;AnlIndexData *indexdata;   HeapTuple  *rows;PGRUsage   ru0;TimestampTz starttime = 0;MemoryContext caller_context;Oid         save_userid;anl_context = AllocSetContextCreate(CurrentMemoryContext,"Analyze",ALLOCSET_DEFAULT_SIZES); /* Set up a working context so that we can easily free whatever junk gets created. */caller_context = MemoryContextSwitchTo(anl_context);   GetUserIdAndSecContext(&save_userid, &save_sec_context); /* Switch to the table owner's userid, so that any index functions are run as that user.  Also lock down security-restricted operations and arrange to make GUC variable changes local to this command. */SetUserIdAndSecContext(onerel->rd_rel->relowner, save_sec_context | SECURITY_RESTRICTED_OPERATION);save_nestlevel = NewGUCNestLevel();   if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0) { /* measure elapsed time iff autovacuum logging requires it */pg_rusage_init(&ru0);if (params->log_min_duration > 0) starttime = GetCurrentTimestamp();}/* Determine which columns to analyze Note that system attributes are never analyzed, so we just reject them at the lookup stage.  We also reject duplicate column mentions.  (We could alternatively ignore duplicates, but analyzing a column twice won't work; we'd end up making a conflicting update in pg_statistic.) */ // 确定要分析的列。请注意，系统属性从未被分析，因此我们只是在查找阶段拒绝它们。我们还拒绝重复提到的列。（我们也可以忽略重复项，但两次分析一列是行不通的；我们最终会在pg_statistic中进行冲突更新。）if (va_cols != NIL) {Bitmapset  *unique_cols = NULL;ListCell   *le; tcnt = 0;vacattrstats = (VacAttrStats **) palloc(list_length(va_cols) * sizeof(VacAttrStats *)); // 创建指针数组用于存放VacAttrStats指针       foreach(le, va_cols) {char     *col = strVal(lfirst(le));i = attnameAttNum(onerel, col, false);if (i == InvalidAttrNumber) ereport(ERROR,(errcode(ERRCODE_UNDEFINED_COLUMN), errmsg("column \"%s\" of relation \"%s\" does not exist", col, RelationGetRelationName(onerel)))); // 表中不存在这样的列if (bms_is_member(i, unique_cols)) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_COLUMN), errmsg("column \"%s\" of relation \"%s\" appears more than once", col, RelationGetRelationName(onerel)))); // bitmap set中已经存在这样的列unique_cols = bms_add_member(unique_cols, i); // 将该列号存放到bitmap set中vacattrstats[tcnt] = examine_attribute(onerel, i, NULL); // examine_attribute函数的作用// 检测该列是否能够支持分析（dropped的列【已删除的字段还存在于系统表中，只是作了标记】、用户指定不需要分析的列等不支持分析）// 获取数据类型，并决定针对该类型的采样数据量和统计函数不同的类型，其分析函数也不同。如果从pg_type系统表中取出该类型的typanalyze字段不为空，则使用该类型指定的分析函数typanalyze（比如对于数组使用array_typanalyze）；否则使用标准分析函数std_typanalyzeif (vacattrstats[tcnt] != NULL) tcnt++;}attr_cnt = tcnt;} else { // 全部所有的列attr_cnt = onerel->rd_att->natts; tcnt = 0;vacattrstats = (VacAttrStats **)palloc(attr_cnt * sizeof(VacAttrStats *));  for (i = 1; i <= attr_cnt; i++) {vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);if (vacattrstats[tcnt] != NULL) tcnt++;}attr_cnt = tcnt;}/* Open all indexes of the relation, and see if there are any analyzable columns in the indexes.  We do not analyze index columns if there was an explicit column list in the ANALYZE command, however.  If we are doing a recursive scan, we don't want to touch the parent's indexes at all. */if (!inh) vac_open_indexes(onerel, AccessShareLock, &nindexes, &Irel);else{ // 如果我们做递归，不要去打开父表的索引Irel = NULL;nindexes = 0;}hasindex = (nindexes > 0); indexdata = NULL;if (hasindex) { // 有索引indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData));for (ind = 0; ind < nindexes; ind++){ // 遍历每个索引AnlIndexData *thisdata = &indexdata[ind];IndexInfo  *indexInfo;thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);thisdata->tupleFract = 1.0; /* fix later if partial */if (indexInfo->ii_Expressions != NIL && va_cols == NIL){ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);thisdata->vacattrstats = (VacAttrStats **)palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));tcnt = 0;for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++){int         keycol = indexInfo->ii_IndexAttrNumbers[i];if (keycol == 0){                  Node       *indexkey; /* Found an index expression */if (indexpr_item == NULL)    /* shouldn't happen */ elog(ERROR, "too few entries in indexprs list");indexkey = (Node *) lfirst(indexpr_item);indexpr_item = lnext(indexpr_item);thisdata->vacattrstats[tcnt] = examine_attribute(Irel[ind], i + 1, indexkey);if (thisdata->vacattrstats[tcnt] != NULL) tcnt++;}}thisdata->attr_cnt = tcnt;}}}

do_analyze_rel函数采样工作流程如下：使用所有可分析列中的最坏情况，确定需要采样的行数；使用采样函数获取采样行；计算统计数据。

 /* Determine how many rows we need to sample, using the worst case from all analyzable columns.  We use a lower bound of 100 rows to avoid possible overflow in Vitter's algorithm.  (Note: that will also be the target in the corner case where there are no analyzable columns.) */ // 使用所有可分析列中的最坏情况，确定需要采样的行数。我们使用100行的下限来避免Vitter算法中可能的溢出。targrows = 100;for (i = 0; i < attr_cnt; i++) { // 表列if (targrows < vacattrstats[i]->minrows) targrows = vacattrstats[i]->minrows; // 100行的下限}for (ind = 0; ind < nindexes; ind++) { // 索引列AnlIndexData *thisdata = &indexdata[ind];for (i = 0; i < thisdata->attr_cnt; i++){if (targrows < thisdata->vacattrstats[i]->minrows) targrows = thisdata->vacattrstats[i]->minrows; // 100行的下限}}/* Acquire the sample rows */rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple)); // 分配目标行*sizeof(HeapTuple)内存空间if (inh) numrows = acquire_inherited_sample_rows(onerel, elevel, rows, targrows, &totalrows, &totaldeadrows); // 继承表，进入acquire_inherited_sample_rows函数会递归整个分析流程else numrows = (*acquirefunc) (onerel, elevel, rows, targrows,  &totalrows, &totaldeadrows);/* Compute the statistics.  Temporary results during the calculations for each column are stored in a child context.  The calc routines are responsible to make sure that whatever they store into the VacAttrStats structure is allocated in anl_context. */ // 计算统计数据。每列计算期间的临时结果存储在子上下文中。calc例程负责确保它们存储到VacationrStats结构中的任何内容都在anl_context中分配。if (numrows > 0){MemoryContext col_context,old_context;col_context = AllocSetContextCreate(anl_context,"Analyze Column",ALLOCSET_DEFAULT_SIZES);old_context = MemoryContextSwitchTo(col_context);for (i = 0; i < attr_cnt; i++){ // 对表的列VacAttrStats *stats = vacattrstats[i];AttributeOpts *aopt;stats->rows = rows;stats->tupDesc = onerel->rd_att;stats->compute_stats(stats, std_fetch_func, numrows, totalrows);/* If the appropriate flavor of the n_distinct option is specified, override with the corresponding value. */aopt = get_attribute_options(onerel->rd_id, stats->attr->attnum);if (aopt != NULL){float8     n_distinct;n_distinct = inh ? aopt->n_distinct_inherited : aopt->n_distinct;if (n_distinct != 0.0) stats->stadistinct = n_distinct;}MemoryContextResetAndDeleteChildren(col_context);}if (hasindex) compute_index_stats(onerel, totalrows,indexdata, nindexes,rows, numrows,col_context);MemoryContextSwitchTo(old_context);MemoryContextDelete(col_context);

do_analyze_rel函数更新统计信息流程如下：将完成的统计数据行发送到pg_statistic，替换目标列的任何以前的统计数据；构建扩展统计数据；更新pg_class中的页面/元组统计；向统计收集器报告分析

     /* Emit the completed stats rows into pg_statistic, replacing any previous statistics for the target columns.  (If there are stats in pg_statistic for columns we didn't process, we leave them alone.) */ // 将完成的统计数据行发送到pg_statistic，替换目标列的任何以前的统计数据。（如果pg_statistic中有我们没有处理的列的统计信息，我们就不处理它们。）update_attstats(RelationGetRelid(onerel), inh, attr_cnt, vacattrstats);for (ind = 0; ind < nindexes; ind++){AnlIndexData *thisdata = &indexdata[ind];update_attstats(RelationGetRelid(Irel[ind]), false, thisdata->attr_cnt, thisdata->vacattrstats);}/* Build extended statistics (if there are any). For now we only build extended statistics on individual relations, not for relations representing inheritance trees. */ // 构建扩展统计数据（如果有）。目前，我们只对单个关系构建扩展统计数据，而不是表示继承树的关系if (!inh) BuildRelationExtStatistics(onerel, totalrows, numrows, rows,  attr_cnt, vacattrstats);}/* Update pages/tuples stats in pg_class ... but not if we're doing inherited stats. */ // 更新pg_class中的页面/元组统计if (!inh) {BlockNumber relallvisible;visibilitymap_count(onerel, &relallvisible, NULL);vac_update_relstats(onerel,relpages,totalrows,relallvisible,hasindex,InvalidTransactionId,InvalidMultiXactId,in_outer_xact);}/* Same for indexes. Vacuum always scans all indexes, so if we're part of VACUUM ANALYZE, don't overwrite the accurate count already inserted by VACUUM. */if (!inh && !(params->options & VACOPT_VACUUM)){for (ind = 0; ind < nindexes; ind++){AnlIndexData *thisdata = &indexdata[ind];double        totalindexrows;totalindexrows = ceil(thisdata->tupleFract * totalrows);vac_update_relstats(Irel[ind],RelationGetNumberOfBlocks(Irel[ind]),totalindexrows,0,false,InvalidTransactionId,InvalidMultiXactId,in_outer_xact);}}/* Report ANALYZE to the stats collector, too.  However, if doing inherited stats we shouldn't report, because the stats collector only tracks per-table stats.  Reset the changes_since_analyze counter only if we analyzed all columns; otherwise, there is still work for auto-analyze to do. */ // 也向统计收集器报告分析。但是，如果执行继承的统计，我们不应该报告，因为统计收集器只跟踪每个表的统计。仅当我们分析了所有列时，重置changes_since_analyze计数器；否则，自动分析仍有工作要做if (!inh) pgstat_report_analyze(onerel, totalrows, totaldeadrows, (va_cols == NIL));

分析一下std_typanalyze和array_typanalyze
参考：https://mp.weixin.qq.com/s/AZ98JIq75f0ei1YKiX6i8w

PostgreSQL数据库统计信息——analyze执行函数相关推荐

PostgreSQL数据库统计信息——compute_scalar_stats计算统计数据
如果列类型指定的是std_typanalyze函数决定compute_stats的取值: 如果列数据类型支持默认的等于(eqopr equals operator)和小于(ltopr less tha ...
PostgreSQL数据库统计信息——examine_attribute单列预分析
如下代码所示,如果指定分析的列,这里会对每个列进行预分析.遍历va_cols中的每一列,通过attnameAttNum函数返回该列在该表中的列号,并通过unique_cols集合排除重复的列名,最后通 ...
oracle更新统计信息执行计划,为准确生成执行计划更新统计信息-analyze与dbms_stats...
如果我们想让CBO利用合理利用数据的统计信息,正确判断执行任何SQL查询时的最快途径,需要及时的使用analyze命令或者dbms_stats重新统计数据的统计信息. 例如索引跳跃式扫描(INDEX ...
oracle 数据统计收集,Oracle 10g收集数据库统计信息
1.需求概述某数据库由于整体统计信息不准确,多次出现部分业务SQL选错执行计划,从而导致性能下降影响到最终用户体验,目前通过SQL_PROFILE绑定执行计划临时解决,但此方法不够灵活,后续维护工作 ...
达梦数据库统计信息的导出导入
一.统计信息对象统计信息描述了对象数据的分布特征.统计信息是优化器的代价计算的依据,可以帮助优化器较精确地估算成本,对执行计划的选择起着至关重要的作用.统计信息的收集频率是一把双刃剑,频率太低导致统 ...
数据库-统计信息相关资料
前言:最近数据库突然变得卡顿,查看了相关日志后,推测可能是索引碎片以及统计信息的问题,索引碎片很久之前已经写了个job定时处理,应该是跟统计信息的有关,所以写个随笔记录下相关的资料. 什么是统计信息 ...
.net访问PostgreSQL数据库发生“找不到函数名”的问题追踪
PostgreSQL是一个使用广泛的免费开源的数据库,与MySQL比较,它更适合复杂的企业计算任务,而MySQL在互联网领域应用更为广泛,究其原因,可能是PostgreSQL拥有支持最多的数据类型,甚 ...
达梦数据库-统计信息
千里之行始于足下,梦想不付之行动,终究是纸上谈兵经过一段时间的达梦DCP培训让学习了好多支持点.让我对达梦的理解更加深刻. 今天为大家说一说达梦数据的统计信息在介绍索引之前介绍一下统计信息 :因为 ...
SQL Server 数据库统计信息更新对性能优化的切身体会
年底前在开发一个SQL Server BI项目的过程中,我们测试SSIS包的时候,发现有好几天晚上取数的时间都超出了以前的取数时间,刚开始我们以为是换了服务器,服务器硬件的性能有差异导致,后来我们仔 ...

PostgreSQL数据库统计信息——analyze执行函数

PostgreSQL数据库统计信息——analyze执行函数相关推荐

最新文章

热门文章