NULL Values in Histogram
May 26, 2017
During my cardinality estimator presentation in DBA virtual group, there was a question asked about NULL values and histograms. I am quoting the question as is “SQL Server include null values in the index. How are statistics handle null values?”
I will answer the question in this blog post.
For demo purposes, I will be using WideWorldImporters database which you can download and restore from here. In table [Sales].[Orders] column ‘PickingCompletedWhen’ does allow NULL values.
How many rows are NULL and how many are NOT NULL?
USE [WideWorldImporters]; GO SELECT 'NULL' AS [Value], COUNT(*) AS [NumOfRecord] FROM [Sales].[Orders] WHERE PickingCompletedWhen IS NULL UNION ALL SELECT 'NOT NULL' AS [Value], COUNT(*) AS [NumOfRecord] FROM [Sales].[Orders] WHERE PickingCompletedWhen IS NOT NULL; GO
There is an auto-created statistics on column ‘PickingCompletedWhen’ named ‘_WA_Sys_0000000E_44CA3770’.
If we look at the header section of DBCC SHOW_STATISTICS for this statistic we can see all rows were used to create the statistics. Meaning rows with NULL is also taken into consideration.
|_WA_Sys_0000000E_44CA3770||Jun 2 2016 10:40AM||69809||69809|
In the density_vector section ‘All density’ value for column ‘PickingCompletedWhen’ is 0.0004705882 which was calculated from 1/(Number of distinct values of column ‘PickingCompletedWhen’).
In this case which is 1/2125. All NULL values were considered as one. If you do a count of distinct values you will get a result of 2124. The reason is explained here. If you do a select of all distinct values NULL will show along with other 2124 values.
Let’s look at the histogram now.
First RANGE_HI_KEY is NULL and EQ_ROWS value is 3085 which matches with a count above for ‘PickingCompletedWhen IS NULL’. As this is the first RANGE_HI_KEY for the histogram steps, RANGE_ROWS and DISTINCT_RANGE_ROWS will be zero.
The first NON NULL value in RANGE_HI_KEY is ‘2013-01-01 11:00:00.0000000’ which is 2nd step in the histogram. For which ‘AVG_RANGE_ROWS’ value is 1. If there is a query for ‘PickingCompletedWhen’ < ‘2013-01-01 11:00:00.0000000’, for the estimated number of rows that value will be used which is 1 in this case.