NULL Values in Histogram

During my cardinality estimator presentation in DBA virtual group there was a question asked about NULL values and histogram. I am quoting the question as is “SQL Server include null values in the index. How are statistics handle for null values?”

I will answer the question in this blog post.

For demo purpose I will be using WideWorldImporters database which you can download and restore from here. In table [Sales].[Orders] column ‘PickingCompletedWhen’ does allow NULL values.

How many rows are NULL and how many are NOT NULL?

USE [WideWorldImporters];
GO
SELECT 'NULL' AS [Value], COUNT(*) AS [NumOfRecord]
FROM [Sales].[Orders]
WHERE PickingCompletedWhen IS NULL
UNION ALL
SELECT 'NOT NULL' AS [Value], COUNT(*) AS [NumOfRecord]
FROM [Sales].[Orders]
WHERE PickingCompletedWhen IS NOT NULL;

ValueNumOfRecord
NULL3085
NOT NULL70510

There is an auto created statistics on column ‘PickingCompletedWhen’ named ‘_WA_Sys_0000000E_44CA3770′.
If we look at the header section of DBCC SHOW_STATISTICS for this statistics we can see all rows were used to create the statistics. Meaning rows with NULL is also taken into consideration.

Name UpdatedRowsRows Sampled
_WA_Sys_0000000E_44CA3770May 5 2017 6:56PM7359573595

In the density_vector section ‘All density’ value for column ‘PickingCompletedWhen’ is 0.0004705882 which was calculated from: 1/(Number of distinct values of column ‘PickingCompletedWhen’).
In this case which is 1/2125. All NULL values were considered as one. If you do a count of distinct values you will get a result of 2124. Reason is explained here. If you do a select of all distinct values NULL will show along with other 2124 values.

All density Average LengthColumns
0.00047058827.664651PickingCompletedWhen

Let’s look at the histogram now.

RANGE _HI_KEYRANGE _ROWSEQ_ROWSDISTINCT _RANGE _ROWS AVG _RANGE _ROWS
NULL 0308501
2013-01-01 11:00:00.00000000601
2013-01-10 11:00:00.0000000504761533.6
2013-01-18 11:00:00.0000000360721230
2013-01-25 11:00:00.0000000224771022.4

First RANGE_HI_KEY is NULL and EQ_ROWS value is 3085 which matches with count above for ‘PickingCompletedWhen IS NULL’. As this is the first RANGE_HI_KEY for the histogram steps, RANGE_ROWS and DISTINCT_RANGE_ROWS will be zero.

First NON NULL value in RANGE_HI_KEY is ‘2013-01-01 11:00:00.0000000′ which is 2nd step in the histogram. For which ‘AVG_RANGE_ROWS’ value is 1. If there is a query for ‘PickingCompletedWhen’ < ‘2013-01-01 11:00:00.0000000′, for estimated number of rows that value will be used which is 1 in this case.

FacebookTwitterGoogle+Share