NULL Values in Histogram

May 26, 2017

During my cardinality estimator presentation in DBA virtual group, there was a question asked about NULL values and histograms. I am quoting the question as is “SQL Server include null values in the index. How are statistics handle null values?”

I will answer the question in this blog post.

For demo purposes, I will be using WideWorldImporters database which you can download and restore from here. In table [Sales].[Orders] column ‘PickingCompletedWhen’ does allow NULL values.

How many rows are NULL and how many are NOT NULL?

USE [WideWorldImporters]; 
 'NULL' AS [Value], 
 COUNT(*) AS [NumOfRecord] 
 FROM [Sales].[Orders] 
 WHERE PickingCompletedWhen IS NULL 
 'NOT NULL' AS [Value], 
 COUNT(*) AS [NumOfRecord] 
 FROM [Sales].[Orders] 
 WHERE PickingCompletedWhen IS NOT NULL;
Value NumOfRecord

There is an auto-created statistics on column ‘PickingCompletedWhen’ named ‘_WA_Sys_0000000E_44CA3770’.
If we look at the header section of DBCC SHOW_STATISTICS for this statistic we can see all rows were used to create the statistics. Meaning rows with NULL is also taken into consideration.

NameUpdatedRowsRows Sampled
_WA_Sys_0000000E_44CA3770Jun 2 2016 10:40AM6980969809

In the density_vector section ‘All density’ value for column ‘PickingCompletedWhen’ is 0.0004705882 which was calculated from 1/(Number of distinct values of column ‘PickingCompletedWhen’).
In this case which is 1/2125. All NULL values were considered as one. If you do a count of distinct values you will get a result of 2124. The reason is explained here. If you do a select of all distinct values NULL will show along with other 2124 values.

Let’s look at the histogram now.

2013-01-01 11:00:00.00000000601
2013-01-10 11:00:00.0000000504761533.6

First RANGE_HI_KEY is NULL and EQ_ROWS value is 3085 which matches with a count above for ‘PickingCompletedWhen IS NULL’. As this is the first RANGE_HI_KEY for the histogram steps, RANGE_ROWS and DISTINCT_RANGE_ROWS will be zero.

The first NON NULL value in RANGE_HI_KEY is ‘2013-01-01 11:00:00.0000000’ which is 2nd step in the histogram. For which ‘AVG_RANGE_ROWS’ value is 1. If there is a query for ‘PickingCompletedWhen’ < ‘2013-01-01 11:00:00.0000000’, for the estimated number of rows that value will be used which is 1 in this case.

1 reply on “NULL Values in Histogram”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.