Problem
How to use CTEs (Common Table Expressions) to break up complex
queries into several distinct steps.
Solution
Let's start with a sample library database.
It consists of just two
tables:
Each author can have one or more books in the tblBook table,
with the AuthorId column being used as a foreign key. If
you want to reproduce this yourself, run this script:
-- create a database to
hold OUR books
CREATE DATABASE Table
Library
GO
USE Library
GO
-- create a table of
authors
CREATE TABLE tblAuthor(
AuthorId int PRIMARY KEY,
FirstName varchar(50),
LastName varchar(50)
)
GO
-- add some authors
INSERT tblAuthor (AuthorId,
FirstName, LastName) VALUES (1, 'John', 'Wyndham')
INSERT tblAuthor
(AuthorId, FirstName, LastName) VALUES (2, 'Barbara', 'Kingsolver')
INSERT tblAuthor
(AuthorId, FirstName, LastName) VALUES (3, 'Jane', 'Austen')
-- create a table of
books
CREATE TABLE tblBook(
BookId int PRIMARY KEY,
BookName varchar(100),
AuthorId int,
Rating int
)
-- add some books
INSERT tblBook (BookId,
BookName, AuthorId, Rating) VALUES (1, 'The Day of the Triffids', 1, 10)
INSERT tblBook (BookId,
BookName, AuthorId, Rating) VALUES (2, 'The Chrysalids', 1, 8)
INSERT tblBook (BookId,
BookName, AuthorId, Rating) VALUES (3, 'The Lacuna', 2, 10)
INSERT tblBook (BookId,
BookName, AuthorId, Rating) VALUES (4, 'The Poisonwood Bible', 2, 8)
INSERT tblBook (BookId,
BookName, AuthorId, Rating) VALUES (5, 'Pride and Prejudice', 3, 9)
You should now have 3 authors and 5 books.
A single-query solution
Suppose that you want to list out in alphabetical order the
authors who have written more than 1 book. You could do this with a single
query:
-- list authors who have
written more than one book
SELECT
a.FirstName + ' ' + a.LastName AS Author,
COUNT(*) AS 'Number of books'
FROM
tblAuthor AS a
INNER JOIN tblBook AS b ON
a.AuthorId=b.AuthorId
GROUP BY
a.FirstName + ' ' + a.LastName
HAVING
COUNT(*) > 1
ORDER BY
Author
There's nothing wrong with this approach. Indeed for relatively
simple queries like this one it's probably the best one, but as your queries
get more complicated, it'll become ever harder to keep the whole picture in
your head. The reason I like CTEs so much is that they allow you to break
down a problem into different parts (always the holy grail in
computing). To see how this works, we'll solve the above problem again -
but more slowly.
A CTE solution
The following solution uses a common table expression, giving the
following advantages:
- It avoids the
need to repeat expressions in HAVING or GROUP BY clauses;
- It reads more
logically and more intuitively.
To get things started, we need to get a list showing each author's
id, together with the number of books they've written:
USE Library;
-- list authors with the
number of books they've written
WITH cteBooksByAuthor AS
(
SELECT AuthorId,
COUNT(*) AS CountBooks
FROM tblBook
GROUP BY AuthorId
)
-- use this CTE
SELECT * FROM
BooksByAuthor
Notice that the statement immediately before the CTE has to end
with a semi-colon where I have "USE Library;", but this would be for
any statement that is prior to the CTE.
What this query does is to create a temporary table-like object
called cteBooksByAuthor, then immediately refers to it in the
following statement. Note that CTEs are like cheese souffles: they can
only be used as soon as they've been created. If we issued another query
after this the CTE can no longer be referenced.
Now that you've created a CTE, we can join it - as for any other
table - to another table to get at the authors' names. So our final query
is:
USE Library;
-- list authors with the
number of books they've written
WITH cteBooksByAuthor AS
(
SELECT AuthorId, COUNT(*) AS CountBooks
FROM tblBook
GROUP BY AuthorId
)
-- use this CTE to show
authors who have written
-- more than 1 book
SELECT
a.FirstName + ' ' + a.LastName AS Author,
cte.CountBooks AS 'Number of books'
FROM
cteBooksByAuthor AS cte
INNER JOIN tblAuthor AS a ON
cte.AuthorId=a.AuthorId
WHERE cte.CountBooks
> 1
Different example:-
Lets see another example, consider this slightly more complicated
query. Suppose that you want to show for each author their book with the
highest rating. You could easily accomplish this with a single correlated
subquery, but many SQL programmers will find it easy to break the task into two
parts. First get a list for each author of their highest rating:
You can now link this to the books table, to get the books shown
highlighted below:
Here's our CTE to return the right results:
USE Library
GO
-- get a "temporary
table" (a CTE) of highest score for each author
-- (don't need a
semi-colon as this is now first statement in batch)
WITH HighestRatings AS (
SELECT
author.AuthorId,
MAX(book.Rating) AS HighestRating
FROM
tblBook AS book
INNER JOIN tblAuthor AS author ON
book.AuthorId=author.AuthorId
GROUP BY
author.AuthorId
)
-- get the name of book
and name of author
SELECT
author.FirstName + ' ' + author.LastName AS
Author,
book.BookName AS Title,
hr.HighestRating AS Rating
FROM
HighestRatings AS hr
INNER JOIN tblBook AS book ON
hr.HighestRating=book.Rating and hr.AuthorId=book.AuthorId
INNER JOIN tblAuthor AS author ON
book.AuthorId=author.AuthorId
This would return the following set of rows:
No comments:
Post a Comment