NULL Values in Histogram
May 26, 2017
During my cardinality estimator presentation in DBA virtual group, there was a question asked about NULL values and histograms. I am quoting the question as is “SQL Server include null values in the index. How are statistics handle null values?”
I will answer the question in this blog post.
For demo purposes, I will be using WideWorldImporters database which you can download and restore from here. In table [Sales].[Orders] column ‘PickingCompletedWhen’ does allow NULL values.
How many rows are NULL and how many are NOT NULL?
USE [WideWorldImporters];
GO
SELECT
'NULL' AS [Value],
COUNT(*) AS [NumOfRecord]
FROM [Sales].[Orders]
WHERE PickingCompletedWhen IS NULL
UNION ALL
SELECT
'NOT NULL' AS [Value],
COUNT(*) AS [NumOfRecord]
FROM [Sales].[Orders]
WHERE PickingCompletedWhen IS NOT NULL;
GO
Value | NumOfRecord |
---|---|
NULL | 3085 |
NOT NULL | 70510 |
There is an auto-created statistics on column ‘PickingCompletedWhen’ named ‘_WA_Sys_0000000E_44CA3770’.
If we look at the header section of DBCC SHOW_STATISTICS for this statistic we can see all rows were used to create the statistics. Meaning rows with NULL is also taken into consideration.
Name | Updated | Rows | Rows Sampled |
---|---|---|---|
_WA_Sys_0000000E_44CA3770 | Jun 2 2016 10:40AM | 69809 | 69809 |
In the density_vector section ‘All density’ value for column ‘PickingCompletedWhen’ is 0.0004705882 which was calculated from 1/(Number of distinct values of column ‘PickingCompletedWhen’).
In this case which is 1/2125. All NULL values were considered as one. If you do a count of distinct values you will get a result of 2124. The reason is explained here. If you do a select of all distinct values NULL will show along with other 2124 values.
Let’s look at the histogram now.
RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
NULL | 0 | 2893 | 0 | 1 |
2013-01-01 11:00:00.0000000 | 0 | 6 | 0 | 1 |
2013-01-10 11:00:00.0000000 | 504 | 76 | 15 | 33.6 |
First RANGE_HI_KEY is NULL and EQ_ROWS value is 3085 which matches with a count above for ‘PickingCompletedWhen IS NULL’. As this is the first RANGE_HI_KEY for the histogram steps, RANGE_ROWS and DISTINCT_RANGE_ROWS will be zero.
The first NON NULL value in RANGE_HI_KEY is ‘2013-01-01 11:00:00.0000000’ which is 2nd step in the histogram. For which ‘AVG_RANGE_ROWS’ value is 1. If there is a query for ‘PickingCompletedWhen’ < ‘2013-01-01 11:00:00.0000000’, for the estimated number of rows that value will be used which is 1 in this case.
1 reply on “NULL Values in Histogram”
[…] Taiob Ali explains how NULL values show up in the SQL Server histogram when you create statistics: […]