sql 备份 文件大小

This article will cover the process of analyzing and predicting/forecasting the size of a SQL backup as a means to better handle/manage retention of backups.

本文将介绍分析和预测/预测SQL备份大小的过程,以更好地处理/管理备份保留。

One of the main database management tenets is “Do not lose your data”. According to this; a database administrator incurs huge responsibilities to protect data. Under these circumstances, taking database backups and archiving SQL backup files is a key task for database administrators. In data protection strategy, taking database backups and archiving backup file processes play the leading role. Especially, backup planning is very significant for disaster recovery scenarios because backup file will be used for restore operation after any failure or data corruption. For this reason, every dba must generate recovery strategies for possible disaster scenarios and ensure that these scenarios can be solvable. At the same time, these backup files must be tested for data integrity; thus process provides to evaluate the recovery time and integrity of backup files. In this Backup and Restore (or Recovery) strategies for SQL Server database article you can find all details about backup and restore strategies.

数据库管理的主要原则之一是“不要丢失数据”。 根据这个; 数据库管理员要承担保护数据的巨大责任。 在这种情况下,进行数据库备份和归档SQL备份文件是数据库管理员的一项关键任务。 在数据保护策略中,进行数据库备份和归档备份文件过程起着主导作用。 特别是,备份计划对于灾难恢复方案非常重要,因为备份文件将在发生任何故障或数据损坏后用于还原操作。 因此,每个dba必须为可能的灾难场景生成恢复策略,并确保这些场景可以解决。 同时,必须测试这些备份文件的数据完整性。 因此,该过程可以评估备份文件的恢复时间和完整性。 在此SQL Server数据库的备份和还原(或恢复)策略文章中,您可以找到有关备份和还原策略的所有详细信息。

After this brief introduction about backup and restore operation significance, we need to highlight one more term about backup files life cycle which is backup retention.

在简要介绍了备份和还原操作的重要性之后,我们需要再强调一个有关备份文件生命周期的术语,即备份保留。

什么是SQL备份保留? (What is SQL backup retention?)

According to the organization requirement, backup files must be retained and ready for use at the end of the expired date. This data protection task is named backup/data retention. The amount of time which elapsed to protect to data is backup retention time. This time period and number of backup files can be changed by organization retention policies. There are several factors which affect the retention periods but the main factor which affects this period is organization legal requirements. If your SQL Server databases are host on-premise you have to make disk and media capacity planning for backup retention times. But if your SQL Server runs in Azure, you can take advantage of long-term backup retention feature. This feature allows us to retain Azure SQL database backups for more than 10 years. Of course, this feature will be added as a cost to your Azure bills.

根据组织的要求,必须保留备份文件,并在到期日期结束时准备好使用。 此数据保护任务称为备份/数据保留。 保护数据所花费的时间就是备份保留时间。 可以通过组织保留策略来更改此时间段和备份文件的数量。 有几个因素会影响保留期限,但影响此期限的主要因素是组织法律要求。 如果您SQL Server数据库是本地主机,则必须为备份保留时间制定磁盘和介质容量规划。 但是,如果您SQL Server在Azure中运行,则可以利用长期备份保留功能 。 此功能使我们可以将Azure SQL数据库备份保留10年以上。 当然,此功能将作为费用添加到您的Azure账单中。

如何估算下一个备份大小? (How to estimate the next backup size?)

When we want to estimate the next backup size we can use the following stored procedure which gives some information about the backup size. The reserved column value approximately gives information about size of uncompressed backup size.

当我们要估计下一个备份大小时,可以使用以下存储过程,其中提供了有关备份大小的一些信息。 保留的列值大约提供有关未压缩备份大小的信息。

EXEC sp_spaceused @updateusage = 'true'

Reserved column defines the database size which consists of data and log files. Unused column defines a specific part of reserved data but it is not yet used by database. According to these details, we can calculate the backup size with the following formula;

保留列定义了由数据和日志文件组成的数据库大小。 未使用的列定义了保留数据的特定部分,但是数据库尚未使用它。 根据这些详细信息,我们可以使用以下公式计算备份大小;

Backup Size (MB) = ((Reserved (KB) – Unused (KB))/1024)/1024

备份大小(MB)=((保留(KB)-未使用(KB))/ 1024)/ 1024

For our scenario, the estimation of backup size is approximately equal to ((543328 (KB)) – (21512 (KB))/1024)/1024 = 509 (MB).This calculation method is very useful for calculating next backup size but the disadvantage of this calculation method is that it does not give any idea about the backup file growing trend or acceleration for a particular term.

对于我们的方案,备份大小的估计大约等于((543328(KB))–(21512(KB))/ 1024)/ 1024 = 509(MB)。此计算方法对于计算下一个备份大小非常有用,但是这种计算方法的缺点是,它对于特定期限的备份文件增长趋势或加速没有任何了解。

预测数据库备份大小 (Forecast database backup size)

If your concern is about storage or disk requirement for long terms (week, month, quarter, etc.) we need more sophisticated calculation methods. We can handle this issue with some statistics calculation methodologies. Simple linear regression method can be first potential candidate because when we look at the definition of simple linear regression; Simple linear regression is a statistical method that specifies the relationship between two quantitative data’s. The first variable that is shown on the x-axis; is the argument of the descriptive, and the second variable shown on the y-axis is the predicted output of the dependent variable. y = a*x +b; this equation that specifies a simple linear regression in which y is dependent variable, x is an independent variable and b is a constant. This statistical method will be very useful to find out the correlation between time period and backup size also with the linear regression formula we can forecast the next period’s backup sizes. The backup information is stored in a special table which is backup_history_table and this table is placed in msdb database. When we execute the following query, it gives average backup size of the month.

如果您担心长期(每周,每月,每个季度等)的存储或磁盘需求,我们需要更复杂的计算方法。 我们可以使用一些统计计算方法来解决此问题。 简单线性回归方法可能是第一个潜在的候选方法,因为当我们看简单线性回归的定义时,它会成为可能。 简单线性回归是一种统计方法,用于指定两个定量数据之间的关系。 x轴上显示的第一个变量; 是描述性参数,y轴上显示的第二个变量是因变量的预测输出。 y = a * x + b; 该方程式指定了简单的线性回归,其中y是因变量,x是自变量,b是常数。 这种统计方法对于找出时间段与备用数据大小之间的相关性也非常有用,并且可以使用线性回归公式来预测下一时间段的备用数据大小。 备份信息存储在一个特殊的表中,该表为backup_history_table,并且该表位于msdb数据库中。 当我们执行以下查询时,它将给出该月的平均备份大小。

SELECT DATEPART(MONTH,backup_finish_date) AS [BackupMonth] ,(AVG(msdb.dbo.backupset.backup_size)/1048576) as [BackupSize (MB)] ,
DATEPART(YEAR,backup_finish_date)  AS [BackupYear],
msdb.dbo.backupset.database_nameFROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE  msdb..backupset.type='D'
AND database_name='Adventureworks'
and DATEPART(YEAR,backup_finish_date)=DATEPART(YEAR,GETDATE())
GROUP BY msdb.dbo.backupset.database_name
, DATEPART(MONTH,backup_finish_date) ,
DATEPART(YEAR,backup_finish_date)
order by  DATEPART(MONTH,backup_finish_date)Asc

With the help of this data set, we will apply simple linear regression formula and then we can forecast the following (11 and 12 ) month’s backup sizes. First of all, we need to find out if there is any association between backup month and backup size. The easiest solution to this problem is to calculate the correlation coefficient. A correlation coefficient is a value used to represent the relation between two variables. It gets a value range from -1 through 1. Negative values indicate a negative relationship and positive values are positive relationships. When the correlation coefficient value is close to 1 or-1, there is a strong relationship between two variable. The purpose of following calculation is to find correlation coefficient value between month period and backup size. In this calculation, we will take advantage of R (wiki: R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing) support of SQL Server.

借助此数据集,我们将应用简单的线性回归公式,然后可以预测接下来的(11和12)个月的备份大小。 首先,我们需要确定备份月份和备份大小之间是否存在任何关联。 解决此问题的最简单方法是计算相关系数。 相关系数是用于表示两个变量之间的关系的值。 它的取值范围是-1到1。负值表示负关系,正值表示正关系。 当相关系数值接近1或-1时,两个变量之间存在很强的关系。 以下计算的目的是找到月周期与备份大小之间的相关系数值。 在此计算中,我们将利用对SQL Server的R (Wiki:R是用于统计计算和R Foundation for Statistics Computing 支持的图形的编程语言和免费软件环境)的支持 。

execute sp_execute_external_script@language = N'R',@script = N' mybackupdata  <- SQLIn;SQLOut <- data.frame(cor(mybackupdata))',@input_data_1 = N'SELECT DATEPART(MONTH,backup_finish_date) AS [X] ,
ROUND((AVG(msdb.dbo.backupset.backup_size)/1048576),0) as [Y]
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE  msdb..backupset.type=''D''
AND database_name=''Adventureworks''
AND DATEPART(YEAR,backup_finish_date)=DATEPART(YEAR,GETDATE())
GROUP BY msdb.dbo.backupset.database_name
, DATEPART(MONTH,backup_finish_date) ,
DATEPART(YEAR,backup_finish_date) ',@input_data_1_name = N'SQLIn',@output_data_1_name = N'SQLOut'
with result sets ((XCof Int, Ycof Int));

Correlation coefficient value specifies that there is a strong and positive correlation between month period and backup size because 0.98 is very close to 1. After this exploration, we can go through to calculate simple linear regression. The below chart illustrates the main idea of simple linear regression namely the red dotted line shows the calculated values of linear regression and blue line shows the history values of backup sizes. Red dotted line values are calculated based on this formula “y=96,594*x + 5355,5 y = Backup Size , x= month period “

相关系数值指定0.99非常接近1,因此月周期与备份大小之间存在强正相关。在此探索之后,我们可以进行计算简单的线性回归。 下图说明了简单线性回归的主要思想,即红色虚线显示了线性回归的计算值,蓝色线显示了备份大小的历史值。 红色虚线值基于以下公式计算:“ y = 96,594 * x + 5355,5 y =备份大小,x =月周期”

When we apply the formula for 11 and 12 months the result will be like the bellow table.

当我们将公式应用11个月和12个月时,结果将类似于下面的表格。

Month Estimated Backup Size Formula Estimated Backup Size
11 =96,594*11 + 5355,5 6418,034
12 =96,594*12 + 5355,5 6514,628
估计备份大小公式 估计备份大小
11 = 96,594 * 11 + 5355,5 6418,034
12 = 96,594 * 12 + 5355,5 6514,628

After all these descriptions about linear regression, we will create a scalar-valued function which calculates the simple linear regression formula in SQL Server and returns the result of linear regression equation value.

在完成所有关于线性回归的描述之后,我们将创建一个标量值函数,该函数计算SQL Server中的简单线性回归公式并返回线性回归方程值的结果。

Create FUNCTION CalculateEstimatedBackupSize
(@Month  INT,@DbName VARCHAR(100))
RETURNS FLOAT
ASbegindeclare @RowCount as float declare  @xvariable  as float declare  @bconstant  as floatdeclare  @sumx as float declare  @sumy  as float declare  @sumxx as float declare @sumyy as floatdeclare @sumxy as floatdeclare @result as float
; WITH BackupLinearReg AS (
SELECT
DATEPART(MONTH,backup_finish_date) AS [X] ,
ROUND((AVG(msdb.dbo.backupset.backup_size)/1048576),0) as [Y]
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE  msdb..backupset.type='D'
AND database_name=@DbName
AND DATEPART(YEAR,backup_finish_date)=DATEPART(YEAR,GETDATE())
GROUP BY
msdb.dbo.backupset.database_name ,
DATEPART(MONTH,backup_finish_date) ,
DATEPART(YEAR,backup_finish_date) )
SELECT @RowCount=COUNT(*),@sumx = SUM(x),@sumy = SUM(y), @sumxx = SUM(x*x),@sumyy = SUM(y*y),   @sumxy = SUM(x*y)  FROM BackupLinearReg
SET @xvariable = IIF(@RowCount=1,0,@RowCount * @sumXY - @sumX * @sumY) / (@RowCount * @sumXX - POWER(@sumX,2))SET @bconstant = (@sumY - (@xvariable*@sumX)) / @RowCountset @result=   (@Xvariable*@Month + @bconstant)RETURN @resultEND

After this backup function creation, we will execute the function for 11 and 12 months. Additionally, if you are using compressed backup you can change backup_size column to compressed_backup_size column in this function.

创建备份功能后,我们将执行此功能11到12个月。 此外,如果使用压缩备份,则可以在此函数中将backup_size列更改为compressed_backup_size列。

select dbo.CalculateEstimatedBackupSize(11,'AdventureWorks') as [EstimatedBackupSize_11]select dbo.CalculateEstimatedBackupSize(12,'AdventureWorks') as [EstimatedBackupSize_12]

As you can see the above image, the result is the same with formula calculation and it proves that the calculation methodology is true for scalar-valued function.

如上图所示,结果与公式计算相同,并且证明了标量值函数的计算方法是正确的。

结论 (Conclusions)

In this article, we reviewed some tips and advice about SQL backup and restore strategies and then we discussed how to estimate the backup size of database. In particular, we discussed linear regression methodologies and how this approach is used to forecasts the SQL backup sizes. The advantage of this method is common usage and simplicity. With linear regression method, we created a scalar-valued function which helps to forecast following month’s backup sizes and also we ensured that there was a strong correlation between months and backup size. In addition, you can develop and transform the sampled scalar valued function according to your needs.

在本文中,我们回顾了有关SQL备份和还原策略的一些技巧和建议,然后讨论了如何估计数据库的备份大小。 特别是,我们讨论了线性回归方法,以及如何使用此方法预测SQL备份大小。 这种方法的优点是通用用法和简单性。 使用线性回归方法,我们创建了一个标量值函数,该函数可以帮助预测接下来月份的备份大小,并且还可以确保月份和备份大小之间存在很强的相关性。 此外,您可以根据需要开发和转换采样的标量值函数。

翻译自: https://www.sqlshack.com/forecast-sql-backup-size/

sql 备份 文件大小

sql 备份 文件大小_预测SQL备份大小相关推荐

  1. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第IV部分

    sql还原数据库备份数据库 In this article, we'll see the how the backup-and-restore meta-data tables store the i ...

  2. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第二部分

    sql还原数据库备份数据库 In this article, we'll walk through, some of the refined list of SQL Server backup-and ...

  3. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第一部分

    sql还原数据库备份数据库 So far, we've discussed a lot about database backup-and-restore process. The backup da ...

  4. sql docker容器_了解SQL Server Docker容器中的备份和还原操作

    sql docker容器 In this 17th article of the series (see the full article index at bottom), we will disc ...

  5. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第三部分

    sql还原数据库备份数据库 So far, we've discussed a lot about database backup commands. In this article, we'll d ...

  6. sql azure 语法_方便SQL笔记本,用于在Azure Data Studio中进行故障排除

    sql azure 语法 This article prepares a handy SQL Notebook for DBAs. You can use this notebook to troub ...

  7. sql azure 语法_将SQL工作负载迁移到Microsoft Azure:规划迁移

    sql azure 语法 In this article, we will discuss several points that should be considered when planning ...

  8. sql azure 语法_将SQL工作负载迁移到Microsoft Azure:服务选择

    sql azure 语法 In the previous article, Migrating SQL workloads to Microsoft Azure: Planning the jump, ...

  9. sql server序列_在SQL Server中实现序列聚类

    sql server序列 In this article, we will be discussing Microsoft Sequence Clustering in SQL Server. Thi ...

最新文章

  1. python如何跨模块调用变量_Python跨模块用户定义的全局变量:在其他模块运行时调用它们的问题...
  2. nginx 转发慢_学习Nginx的正确姿势,多图详解助你更上一层楼!(干货收藏篇)...
  3. 学Java好不好 要避开哪些雷区
  4. python列表导出_python list格式数据excel导出方法
  5. 硬件基础知识-- MOS管
  6. mysqld或mysqld_safe启动时必须放在第一位的参数(first argument)
  7. 大数据之-Hadoop源码编译_源码编译的意义---大数据之hadoop工作笔记0044
  8. 关于搭建wiki镜像和数据库的一些东西
  9. Bootstrap基础3(表单)
  10. 图像数据增强扩充数据库_分析数据扩充以进行图像分类
  11. Fishc_密码验证程序
  12. HTML美化页面(下)
  13. AutoCAD打印图纸如何使参照底图灰度显示
  14. comsol奶酪模型 comsol多孔材料
  15. pandas绘图线条颜色大全
  16. 一位月薪1.2w的北漂程序员真实生活!
  17. 国家一级建造师——工程经济——第一章——第二节
  18. 虚拟机Linux - HTTP request sent, awaiting response... 404 Not Found
  19. SQL员工信息表题目及答案
  20. jsp运动会管理系统

热门文章

  1. 2.4_double-ended_queue_双向队列
  2. python使用redis做缓存_python实现类redis缓存
  3. python随机数调用
  4. python-使用字典使Fibonacci更有效率
  5. 老李分享:持续集成学好jenkins之Git和Maven配置
  6. 学者当自树其帜——为一本书专建的“第二次宣言网”上线有感
  7. JavaScript数据结构——字典(Dictionary)
  8. react native 组件汇总整理,点击链接至GitHub
  9. 有多少人乘坐公交车时用NFC付钱?
  10. 累计收益是我的收益吗?