VBMS

SLA-aware SQL Server Maintenance Framework (Index rebuild, stats update, partial integrity checks)

System purpose

System has been created to solve the problem of SLA violation during SQL Server Database Maintenance, that often been a reason why some of our customers gave up database maintenance at all. System has necessary capabilities to solve this problem:

Executes only needed maintenance tasks, evaluating database state
Prioritizes tasks to ensure the most important tasks are executed
Predicts execution time
Keeps the track of time during execution and does not start next task if it probably won’t fit remaining timeframe.

Supported Maintenance Tasks

Current version of system can perform various types of maintenance tasks

Index maintenance:

Offline index rebuild,
Online index rebuild (if it is possible due to columns data type, SQL Server version, edition and system settings),
Index reorganization,

Statistics update

Update statistics with FULLSCAN,
With limited SAMPLE,
With previously used SAMPLE (RESAMPLE).

Integrity checks

Metadata checks (CHECKCATALOG),
Allocation checks (CHECKALLOC)
Single table integrity check (CHECKTABLE)

Key features

Fully modular execution model can help adapt to any constraints
Scalability – most resource-intensive tasks can be configured to run in several threads – Frag evaluation tasks, execution of maintenance tasks (index maintenance, stats update, integrity checks)
Prediction – system uses ML algorithm to predict how much time each task can take based on previous executions
SLA-awareness – system will skip tasks that probably won’t fit in the remaining maintenance window
Time budget – Each phase (Tasks generation, Index maintenance, Stats Update and Integrity Checks) has its separate time budget. If some time remains then system will redistribute it, use for retries of failed or skipped tasks.
Locking control – utilizes WAIT_AT_LOW_PRIORITY feature or LOCK_TIMEOUT to avoid long blocking.
Soft stopping feature – at any time execution can be stopped either soft (finish current task and stop) or hard (just stop).
Granular Integrity Checks – instead of running full DBCC CHECKDB more granular checks can be run (CHECK CATALOG, CHECKALLOC, CHECKTABLE). All with time control.
Monitoring of AG synchronization. If redo queue exceeds the given threshold or AG is unhealthy then index maintenance task is skipped for this database.
Monitoring of running backups – can configure to skip index maintenance tasks for a database if running backup is detected to avoid log overflowing.
Log space control – system can control the amount of space left for transaction logging (including in ldf file and the volume it resides on)
Blacklisting of some operations or objects (exclusion from maintenance)

System components Overview

System is SQL Server database and a set of SQL Server Agent jobs. Database name is VBMS (Value Based Maintenance System – because system evaluates every task and executes the most valuable ones first). Database name should not be changed as it is used in code.

SQL Server Agent Jobs can be of three types:

Tasks generators (FillQueueAll)
Workers (StartWorker)
Killers (KillWorkers)

Tasks generation can be executed before maintenance time as it probably may not interrupt normal work. Needs to be tested on each solution if implied performance impact is acceptable.

Tables (some)

dbo.Parameters

Stores global parameters. Different data types stored in three different columns (string_value, int_value, float_value). Each parameter has it's brief description.

dbo.OperationTypes

Stores the types of operations that system can perform.

dbo.Tasks

The main table in the system. Here the tasks are added and then updated during their execution. Each task is a T-SQL command and a set of metadata, such as operation type, target object, dates when it was added, began and completed, etc.

Fields:

Field	Data type	Description
[entry_id]	bigint	Primary key, increment is 1.
[batch_id]	uniqueidentifier	Unique indetifier of a group of tasks, generated during one execution of FillQueueAll
[subsystem_id]	int	Id of type of work (references dbo.OperationTypes)
[action_type_id]	int	Id of type of action (references dbo.OperationsTypes)
[command]	nvarchar(4000)	T-SQL code to be executed
[date_added]	datetime	Datetime when task was generated by FillQueueAll
[date_started]	datetime	Datetime when task was taken by worker
[date_completed]	datetime	Datetime when task was completed by worker
[duration_s]	int	Time in seconds spent for execution
[database_id]	int	Id of database affected by task
[table_id]	int	Id of a table affected by task
[index_id]	int	Id of an index or statistics affected by task
[partition_n]	int	Number of partition affected by task
[maxdop]	int	Max degree of parallelism used for index maintenance task execution
[user_scans]	int	Number of scans, performed on the index. Taken from sys.dm_db_index_usage_stats
[user_seeks]	int	No longer used.
[user_updates]	int	No longer used.
[rowcnt]	bigint	Number of records, that are stored in a partition or table.
[size_mb]	float	Size of a table, partition, or index.
[ix_frag]	float	Percent of fragmentation
[time_factor]	float	Ratio of size_mb to duration_s. used to predict next execution duration of particular type of task on particular object.
[rowmod_factor]	float	Factor, used to measure how outdated the statistics is.
[checked_daysago]	int	Time since the table has been checked
[result]	nvarchar(max)	Outcome of operation execution. Ok or detailed error message.
[time_prognosis_s]	int	Approximate time that this task can take. Calculated basing on previous executions.
[exit_code]	int	1 if execution was successful, -1 if not enough time left, -2 if not enough transaction log space left, or exact error number if error occurred.
[worker_name]	nvarchar(50)	Text identifier of a worker, that executed this task.
[execution_id]	uniqueidentifier	Uniqueidentitifier of a group of tasks, executed by same worker during same at one launch.
[priority]	int	Number, sets automatically to lower priority of tasks, faild last time. 1 - normal priority, 2 - low priority
volume_mount_point	nvarchar(255)	volume or mountpoint where the object physically resides.

dbo.Workers

Here the task execution profiles are stored. Profile is a set of parameters that are used to control what and how the worker will execute.

Fields:

Field	Data type	Description	Default value
worker_name	nvarchar(255)	Unique profile name
date_added	datetime	Date, when profile has been added. Filled automatically.	getdate()
owner	nvarchar(150)	Who to ask about this profile	John Doe
comment	nvarchar(500)	Description of profile. What is this profile for? ex: AW_Nightly_full_maint	Owner did not add any comment
use_user_db	bit	Perform tasks for user DBs.	1
use_system_db	bit	Perform tasks for system DBs (and VBMS too).	0
dbname_list	varchar(500)	List of databases. No spaces allowed. Ex: 'DB1,DB2,DB3'	''
except_list	bit	Invert dbname_list.	0
indexes	bit	Perform index maintenance tasks	1
frag_eval_time_limit_s	Bigint	Time limit for index frag analysis
stats	bit	Perform stats maintenance tasks	1
stats_sample	nvarchar(50)	Sample to use when generate statistics maintenance tasks after index maintenance. Supported values are: SAMPLE # PERCENT, SAMPLE # ROWS, FULLSCAN	FULLSCAN
checkall	bit	Perform any kind of checks	1
checktable	bit	Perform CHECKTABLE	1
checkalloc	bit	Perform CHECKALLOC	1
checkcatalog	bit	Perform CHECKCATALOG	1
online_only	bit	Do not pick the Offline index rebuild tasks	1
check_backup_state	Bit	Check if a backup is currently running on this database
check_ag_state	Bit	Check if AG is unhealthy
ag_max_queue	Bit	Maximum redo queue size
afterparty	bit	In the end, if we have more than 5 minutes left then we can try to execute skipped tasks of any kind. Or if index maintenance time limit was faced, some index tasks remains and during next phases (statistics maintenance and integrity checks) some time have been saved, then it can be used to perform those index tasks. By default, this feature is on. Do not set it off unless you are sure.	1
add_stats_runtime	bit	Add statistics maintenance tasks AFTER index operations. This might save up some time, because some statistics will be updated during index rebuild. Without this option a lot of time will be spent for nothing.	1
totaltimemin	int	Total limit of worker execution time. Overrides TotalMaintenanceWindowMn global parameter.
indextime	int	Percent of totaltimemin that can be spent for index maintenance. Overrides IndexMaintenanceWindowPercent global parameter
stattime	int	Percent of totaltimemin that can be spent for stats maintenance. Overrides StatMaintenanceWindowPercent global parameter
checktime	int	Percent of totaltimemin that can be spent for integrity checks. Overrides CheckWindowPercent global parameter.
latched_spid	int	Flag used to prevent starting FillQueueAll for this worker several times. When FillQueueAll started for a particular worker it places current session_id here.	0
stoplight	bit	Flag used to prevent worker from taking next task. It's a signal to stop work after current task is completed.	0

dbo.WorkerSession

Table is used to track current worker activities. Each FillQueueAll or Worker add a record here when started and after each task executed. It is used mainly for killing workers if it is needed.

Fields:

Field	Data type	Description	Default value
record_date	datetime	Timestamp of last update to this record	getdate()
worker_name	nvarchar(255)	Name of a worker, executing tasks
session_id	int	Session_id of a worker
program_name	nvarchar(255)	Program name, taked from sys.dm_exec_sessions
subsystem_id	int	subsystem_id of a current task. If 1 then killing may cause rollback.
entry_id	bigint	Entry_id from dbo.Tasks, that is currently executing
is_afterparty	bit	Is worker in the afterparty phase.	0

dbo.DBCCChecksLog

Table is used to store output of DBCC checks. When DBCC command found any errors it outputs them here. Most of the fields are inherited from DBCC CHECK[TABLE, ALLOC, CATALOG] WITH TABLERESULTS.

Fields:

Field	Data type	Description
rec_id	int	Primary key
execution_guid	uniqueidentifier	Unique identifier. Same for records generated during one DBCC launch. Not related to any fields in dbo.Tasks
log_date	datetime	Date and time when check has been executed
All other fields are from standard DBCC output.

dbo.Blacklist

This table stores records of all actions that should not be taken. You can blacklist any kind of maintenance task for any kind of objects. Each record represents a rule, described as subsystem_id, actiontype_id, database_id, table_id and index_id. NULL is used as *. To add any blacklist items there is a stored procedure dbo.AddBlacklistItem. Rules can be enabled or disabled.

Fie__lds:

Field	Data type	Description	Default value
item_id	bigint	Primary key
date_added	datetime	Date when rule was added	getdate()
database_id	int	Id of a database
table_id	int	Object_id of a table
index_id	int	Id of an index or a statistic
partition_n	int	Number of a partition
subsystem_id	int	Id of subsystem to blacklist (refers to dbo.OperationsTypes)
actiontype_id	int	Id of action type to blacklist (refers to dbo.OperationsTypes)
enabled	bit	Flag to enable/disable rule	1
worker_name	nvarchar(255)	Used to map entries to specific workers. Null = for every worker

Stored Procedures

dbo.FillQueueAll

SP for maintenance tasks generation. It evaluates chosen DB's and adds suitable maintenance tasks to dbo.Tasks table. In fact this SP is wrapper for a set of procedures dbo.FillQueueXXX.

Parameters

Parameter	Data type	Description	Default value
worker_name	nvarchar(50)	Generate tasks only for a specific worker profile	NULL
indexes	bit	Generate tasks for index maintenance	1
async_frag_collection	bit	Used when need to handover frag collection to separate thread(s) for scaling. Advanced scenario.	1
collect_frag_now	Int	0-dont collect frag data, only prepare indexes list, 1- prepare indexes list, collect frag data and generate tasks in this session. 2 - Indexes list with frag data has been filled already, need just to prepare the tasks. Advanced scenario	1
stats	bit	Generate tasks for statistics maintenance	0
stats_sample	nvarchar(50)	Sample size used, when generating statistics maintenance tasks	FULLSCAN
checkall	bit	Generate all sorts of DBCC check tasks.	1
checktable	bit	Generate DBCC CHECKTABLE tasks	1
table_max_size_mb	int	Maximum size (Mb) of a table that can be checked by DBCC CHECKTABLE	1000000000000
checkalloc	bit	Generate DBCC CHECKALLOC tasks	1
checkcatalog	bit	Generate DBCC CHECKCATALOG tasks	1
use_user_db	bit	Generate tasks for user databases	1
use_system_db	bit	Generate tasks for system databases	0
dbname_list	nvarchar(500)	Generate tasks for list databases	''
except_list	bit	Turns dbname_list into exception list	0
sortintempdb	bit	Use tempdb for sorting during index maintenance. Overrides values, stored in dbo.Parameters	NULL
ix_maxdop	bit	Maximum number cores, used for index maintenance. Overrides values, stored in dbo.Parameters	NULL

CHECKALL

This flag combines the three flags – checktable, checkalloc and checkcatalog. Each of this three flags can override CHECKALL flag.

Usage

FillQueueAll can be used in several ways. The most common scenario is to configure a worker in dbo.Workers table and then generate tasks for that particular worker:

Exec dbo.FillQueueAll @worker_name = 'AdventureWorks_FullMaint'

This will generate tasks, that can be executed by previously configured AdventureWorks_FullMaint worker.

The other option is to manually select which type of tasks should be generated regardless of are there any workers configured that can handle these tasks.

exec dbo.FillQueueAll @indexes = 1, @stats = 1, @dbname_list = 'AdventureWorks,DB1', @checkall = 0, @checktable =1

This will generate index, stats maintenance and table integrity check tasks for AdventureWorks and DB1 databases.

Procedure removes all incomplete tasks for corresponding objects before generating new one to avoid duplication and execution of outdated tasks.

Procedure returns the batch_id which is same for all tasks generated.

dbo.FillQueueCheckAlloc

SP for generation of CHECKALLOC tasks. Used by FillQueueAll. No direct usage is advised.

dbo.FillQueueCheckCatalog

SP for generation of CHECKCATALOG tasks. Used by FillQueueAll. No direct usage is advised.

dbo.FillQueueCheckTable

SP for generation of CHECKTABLE tasks. Used by FillQueueAll. No direct usage is advised.

dbo.FillQueueIndex

SP for generation of index maintenance tasks. Used by FillQueueAll. No direct usage is advised.

Considers many factors to propose best types of operations possible.

dbo.FillQueueStat

SP for generation of statistics maintenance tasks. Used by FillQueueAll. No direct usage is advised.

Uses dynamic threshold as TF2371.

dbo.ProcessQueueAll

Procedure is used inside of SP StartWorker for execution of previously generated tasks.

Most of parameters are taken from dbo.Workers.

SP loops through dbo.Tasks and picks them one after another. Tasks are filtered by parameters provided.

Operation cycle:

There are four phases:

Index maintenance
Stats maintenance
Integrity checks
AfterParty if enough time left.

During each phase tasks are sorted specific way. For example – during index maintenance phase fist chosen task will be for the index that is most used for scan, heavily fragmented and largest.

For stats the rowmodfactor is used. It shows how hard stats is outdated based on dynamical threshold. For example, 200 means that the statistic is twice more outdated than threshold for it.

Integrity check tasks are sorted by days since last known check. Last known means last found in dbo.Tasks table.

Before executing each task, time left is evaluated. The previous executions of this kind of tasks against this object are used to predict how much time will be spent for this object this time. If prognosis shows that this task may need more time than we have then it is skipped.

Also the free space for transaction log is checked. See dbo.Parameters for details.

If task has been deadlocked it will be retried during afterparty as well as any skipped tasks if we have enough time during this phase. That may occur if stats and integrity check tasks haven't use all their time.

dbo.StartWorker

SP starts the execution of tasks in a queue. Parameter @worker_name is a name of saved profile from table dbo.Workers.

dbo.KillWorkers

SP stops the execution of selected workers one by one. It is mainly an emergency mechanism.

Uses info from table dbo.WorkerSession. If worker executes index maintenance task (subsystem = 1) then SP switches to the next worker, as killing index maintenance task may cause long rollback. This behavior is changed by @force parameter.

Parameters:

Parameter	Data type	Description	Default value
worker_names	nvarchar(500)	Worker names to be killed without spaces. Ex: 'Worker1,Worker2,Worker3'	''
except_list	bit	Reverse worker_names selection.	0
force	bit	Kill workers even if there are index operations in progress	0

dbo.AddBlacklistItem

Procedure is used to add new rules to dbo.Blacklist table. Rules can be added through specifying all the parameters to identify blacklisted tasks or by specifying just entry_id from dbo.Tasks, which means to blacklist that kind of tasks. Blacklisted tasks won't be generated anymore. Rules can be enabled or disabled using [enabled] column.

Parameters:

Parameter	Data type	Default value
entryid	bigint	NULL
database_id	int	NULL
table_id	int	NULL
index_id	int	NULL
partition_n	int	NULL
subsystem_id	int	NULL
action_type_id	int	NULL
worker_name	nvarchar(255)	NULL

Usage:

Let's say, for some reasons we don't want to perform any kind of maintenance operations on a table Sales.SalesOrderDetail in database AdventureWorks2014. Then we need to perform such a query:

USE [AdventureWorks2014]

DECLARE @db int=(SELECTDB_ID()),

@tbl int=(SELECTobject_id('Sales.SalesOrderDetail'))

exec VBMS.dbo.AddBlacklistItem

@database_id = @db,

@table_id = @tbl,

@index_id =NULL,--all indexes

@partition_n =NULL,-- all partitions

@subsystem_id =NULL,-- all subsystems

@action_type_id =NULL-- all types of actions

Or, if we have to block only update statistics on this table and we have this kind of task in dbo.Tasks previously generated under entry_id 123456, then we can just use this kind of query:

exec VBMS.dbo.AddBlacklistItem@entry_id = 123456

And you will not see this kind of tasks anymore. dbo AddBlacklistItem will just take all the parameters from the specified record of dbo.Tasks.

Function

dbo.GetTimeFactor

Function is used to make prognosis of execution time for tasks. Not used directly.

How-to

Installation

SQL Server 2008:

Run script VBMS_create.sql in SQLCMD mode

SQL Server 2008R2 and higher

Deploy data-tier application

Setup and execution

Open dbo.Parameters table and set all the parameters. Defaults may not work for you.

Maintenance itself consist of two phases – task generation and tasks execution which can be combined or separated. Each of them can be run in multiple threads. For example you can generate tasks foe all DB's at once and execute them in 5 different jobs (workers).

Generation of tasks is done by dbo.FillQueueAll procedure. It evaluates the state of the DB's that are passed to it and generate tasks for further execution based on the parameters. There are two main ways to use it:

Manually fill all the parameters, except @worker_name.
Fill only @worker_name

The first scenario is used when we use one FillQueueAll to generate tasks for several different workers.

The second one is much easier and used to create tasks for a specific worker.

Generated tasks are saved in dbo.Tasks table. After the execution those records will be just updated with the execution stats, duration etc.

Workers need to be configured to handle the tasks generated. You need to insert a record into dbo.Workers table, filling all the parameters. Some of them have default values, so doublecheck if the result suits you.

Some of the parameters are overriding the global dbo.Parameter when overlap.

Then configure tasks generation. Set up a SQL Server job that will execute dbo.FillQueueAll with proper parameters.

Consider we have one DB for maintenance – AdventureWorks. Maintenance can be performed only from 01:00 till 3:00. All sorts of operations allowed at this time.

We configure FillQueueAll to run every day at 00:00 not to use maintenance time for analysis. Filling queue and processing it is two different tasks and they are not really connected. We might collect all sorts of tasks and then execute only those we want. We can have one FIllQueueAll job and 5-6 StartWorker jobs, that will perform different tasks from queue.

Step of job should look like:

EXEC [dbo].[FillQueueAll]@workername ='AWFullMaint'

OR

EXEC dbo.FillQueueAll@indexes = 1,@stats =1, @stats_sample ='FULLSCAN',@checkall = 1,@use_user_db =1, @dbname_list ='AdventureWorks2016'

FillQueueAll returns batch_id which identifies all the tasks generated during this run. You can find them in dbo.Tasks with a query:

select*from dbo.Tasks where batch_id = 'YOUR BATCH_ID

To configure a worker you need to add a record to dbo.Workers table. Here we chose what types of tasks and how will be executed by the worker. All the bit parameters have defaults.

In our case, we set:

[worker_name] = 'AWFullMaint'

[owner] = 'John Doe'

[comment] = 'AW full maintenance at night only'

[use_user_db] = 1

[dbname_list] = 'AdventureWorks2016'

[online_only] = 0

Now, when profile is created, we can use it by worker_name.

Set up a job, that executes

EXEC dbo.StartWorker 'AWFullMaint'

every night at 01:00.

As we have only 2 hours for maintenance tasks (01:00 – 03:00) we must limit the execution time to 2 hours. We have two options:

Set the global parameters at dbo.Parameters (TotalMaintenanceWindowMm = 120, IndexMaintenanceWindowPercent = 30, StatMaintenanceWindowPercent = 30, CheckWindowPercent =30),
Set these parameters in worker profile in dbo.Workers (similar columns with similar effect).

Just to be sure that we won't violate SLA we can configure KillWorkers job.

Create a job, that executes EXEC dbo.KillWorkers @worker_names = 'AWFullMaint'

@force flag can used to allow killing with possible index operations rollback. However, it is not recommended.

It's needed only for safety, all the test show that after the first execution (when the system got its first experience) system is pretty accurate in its predictions (unless someone starts some huge workload that makes server twice slower than usually). So…normally it won't be needed to kill workers.

That's it. We're good to go.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
VBMS		VBMS
.gitignore		.gitignore
.gitignore.txt		.gitignore.txt
LICENSE		LICENSE
README.md		README.md
SQL Server Database Maintenance System VBMS.DOCX		SQL Server Database Maintenance System VBMS.DOCX

License

NWiddup/VBMS

Folders and files

Latest commit

History

Repository files navigation

VBMS

System purpose

Supported Maintenance Tasks

Index maintenance:

Statistics update

Integrity checks

Key features

System components Overview

SQL Server Agent Jobs can be of three types:

Tables (some)

dbo.Parameters

dbo.OperationTypes

dbo.Tasks

Fields:

dbo.Workers

Fields:

dbo.WorkerSession

Fields:

dbo.DBCCChecksLog

Fields:

dbo.Blacklist

Fie__lds:

Stored Procedures

dbo.FillQueueAll

Parameters

CHECKALL

Usage

dbo.FillQueueCheckAlloc

dbo.FillQueueCheckCatalog

dbo.FillQueueCheckTable

dbo.FillQueueIndex

dbo.FillQueueStat

dbo.ProcessQueueAll

Operation cycle:

dbo.StartWorker

dbo.KillWorkers

dbo.AddBlacklistItem

Parameters:

Usage:

Function

dbo.GetTimeFactor

How-to

Installation

SQL Server 2008:

SQL Server 2008R2 and higher

Setup and execution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages