Friday, August 14, 2015

Covering Index

What happens when a Query execute against a table with a covering index and another query with the same index structure but without the included columns?
“The covering index gives you the knack to avoid the trip back to the table for satisfying the request, as all of the columns requested already included in non-clustered index. It means you are able to escape the logical or physical reads to trip back to read the table for that information.”
But we can’t ensure this is not enough avoidance of logical or physical reads. There is also the consideration of the necessary task between the columns in the non-clustered index and the columns that need to be looked-up in the table, you can say putting data together.
To explain this let's create two identical tables; same schema, same data:
Use TSQL2012
CREATE TABLE TSQL2012.dbo.EmplyeeDetails1
(
  pkid INT IDENTITY(1,1) NOT NULL,
  EmployeeId VARCHAR(20) NOT NULL,
  firstname VARCHAR(50) NOT NULL,
  lastname VARCHAR(50) NOT NULL,
  address1 VARCHAR(100) NULL,
  address2 VARCHAR(100) NULL,
  city VARCHAR(100) NULL,
  stateregion VARCHAR(100) NULL,
  Areapostal_code VARCHAR(10) NULL,
  Empcompanyid VARCHAR(20) NULL,
  notes VARCHAR(200) NULL,
  registration_notes VARCHAR(200) NULL
  );
    GO
ALTER TABLE dbo.EmplyeeDetails1 ADD CONSTRAINT
 PK_EmplyeeDetails1 PRIMARY KEY CLUSTERED
  (
   pkid
  )
 WITH
  (
   PAD_INDEX = OFF
   , FILLFACTOR = 100
   , STATISTICS_NORECOMPUTE = OFF
   , IGNORE_DUP_KEY = OFF
   , ALLOW_ROW_LOCKS = ON
   , ALLOW_PAGE_LOCKS = ON
  ) ON [PRIMARY];
GO
CREATE TABLE TSQL2012.dbo.EmplyeeDetails2
 (
  pkid INT IDENTITY(1,1) NOT NULL,
  EmployeeId VARCHAR(20) NOT NULL,
  firstname VARCHAR(50) NOT NULL,
  lastname VARCHAR(50) NOT NULL,
  address1 VARCHAR(100) NULL,
  address2 VARCHAR(100) NULL,
  city VARCHAR(100) NULL,
  stateregion VARCHAR(100) NULL,
  Areapostal_code VARCHAR(10) NULL,
  Empcompanyid VARCHAR(20) NULL,
  notes VARCHAR(200) NULL,
  registration_notes VARCHAR(200) NULL
  );
 
  ALTER TABLE dbo.EmplyeeDetails2 ADD CONSTRAINT
 PK_EmplyeeDetails2 PRIMARY KEY CLUSTERED
  (
   pkid
  )
 WITH
  (
   PAD_INDEX = OFF
   , FILLFACTOR = 100
   , STATISTICS_NORECOMPUTE = OFF
   , IGNORE_DUP_KEY = OFF
   , ALLOW_ROW_LOCKS = ON
   , ALLOW_PAGE_LOCKS = ON
  ) ON [PRIMARY];
 
 
  GO
Data in both the table is identical, also added identical clustered index. Due to auto incremented identity column added fill factor =100%.
Now making some differences between tables.
EmplyeeDetails1
Adding non clustered index on firstname and lastname with an included column (EmployeeId)
CREATE NONCLUSTERED INDEX [IX_EmplyeeDetails1_lastname] ON [dbo].EmplyeeDetails1
(
 [lastname] ASC,
 [firstname] ASC
)
INCLUDE
 (
  EmployeeId
 )
WITH
 (
  PAD_INDEX = OFF
   , STATISTICS_NORECOMPUTE = OFF
   , SORT_IN_TEMPDB = OFF
   , DROP_EXISTING = OFF
   , ONLINE = OFF
   , ALLOW_ROW_LOCKS = ON
   , ALLOW_PAGE_LOCKS = ON
   , FILLFACTOR = 80
  )
GO

EmplyeeDetails2
Adding non clustered index on firstname and lastname but without the inclusion of column.

CREATE NONCLUSTERED INDEX [IX_EmplyeeDetails2_lastname] ON [dbo].EmplyeeDetails2
(
 [lastname] ASC,
 [firstname] ASC
)
WITH
 (
  PAD_INDEX = OFF
   , STATISTICS_NORECOMPUTE = OFF
   , SORT_IN_TEMPDB = OFF
   , DROP_EXISTING = OFF
   , ONLINE = OFF
   , ALLOW_ROW_LOCKS = ON
   , ALLOW_PAGE_LOCKS = ON
   , FILLFACTOR = 80
  )
GO
 We have two identical tables, each having 5000 records.  The only difference is in their single non-clustered indexes; one has an included column, the other does not.  They have identical primary keys and clustered indexes. 
If we execute the following query against the table with the included column in the clustered index we see the following execution plan:

SELECT employeeid, lastname, firstname
FROM EmplyeeDetails1
WHERE lastname = 'samsamada43032' AND firstname = 'samada43032';
go
SELECT employeeid, lastname, firstname
FROM EmplyeeDetails2
WHERE lastname = 'samsamada43032' AND firstname = 'samada43032';
The whole action occurs within the non-clustered index itself. We never have to travel back to the table for more data. If we execute the identical query against the table with the same index structure, minus the included column in the non-clustered index, we get different behavior.

Since the non-clustered index doesn't contain all the information we need to satisfy the query, it's necessary to hit the index for the base information, and then using the row pointer in the non-clustered index, pull in the employeeid from the table and run a Nested Loop Join in order to marry the two working sets of data together before returning them to the end user. We can see the difference in expense for reads when looking at the I/O statistics from the two queries:
Note that these are all logical reads since the pages were already in the buffer. If this was a situation where it was a much larger data set you were contending with on an active server with the pages still on disk, this could be a very expensive operation. 


No comments:

Post a Comment