SQL Order of Execution

When thinking about SQL performance tips and where to start, I always think back to the very basics.  I’ve actually see some simple things trip up performance.  Recently.  My own.

I’m sure that SQL gurus know that SQL performs it’s operations in a standard order, and the optimizer, while incredibly impressive, is not magical at figuring out the intent of someone’s SQL, as opposed to what they actually wrote.  I think developers, and newbies, think that the optimizer is more powerful and will overlook some simpler problems.

I’m going to include a contrived example to indicate how your SQL can be effected by basic problems, and hopefully if you stumble across this it will help.

The order of execution for SQL are a lot like the BODMAS order of operations for math.

For a particular query you may have the following components and they act in this order:

1 FROM / JOIN
2 WHERE 
3 GROUP BY 
4 HAVING 
5 SELECT
6 ORDER BY 
7 TOP / OFFSET-FETCH
  1. FROM / JOIN – This tells the engine which tables you’re getting your data from.  It is very quickly worked with the WHERE clause to narrow down specific data, but one of the important parts of making good performing SQL is getting the JOIN right, and filtering correctly.
  2. WHERE – Filters which information is brought back.  If you are only filtering on one side of a JOIN then it will rely on that JOIN to bring back rows from the other side that match the JOIN.
  3. GROUP BY – If you are going to aggregate your data into a SUM, or MIN/MAX or other aggregation, it will happen at this point.
  4. HAVING – And then you can filter again by the results of the aggregation.
  5. SELECT – You are choosing specific fields to return from the sets of data you have queried.  Optimizing your SELECT to the specific data you are after can be important to memory usage and bandwidth when returning your results to the client.
  6. ORDER BY – It will then sort the data if you have asked it to.  This can be small or large depending on the size of the result-set.  If you are going to further process the data on the client, then you can get a quick performance saving by not ordering it on the server.
  7. TOP / OFFSET-FETCH – And then, if you only want a subset of the data, you are going to reduce the result-set here.  After you have done all the previous processing.

So, it becomes clear when reading this order that when you do some things can be important.  For example, if you’re going to filter your result-set, if you can do it in the WHERE clause then you are saving a lot of CPU and time doing so there, rather than aggregating the entire result-set and filtering in the HAVING, if possible.

To the contrived example!

So, here is a query that I’ve made against the StackOverflow database that Brent Ozar helpfully keeps a copy of on bittorrent.

SELECT u.Id, u.DisplayName, COUNT(*) AS PostsCount
FROM dbo.Posts p
 JOIN dbo.Users u ON p.OwnerUserId = u.Id
WHERE p.CreationDate >= '2017-01-01' AND p.CreationDate < '2017-02-01'
GROUP BY u.Id, u.DisplayName
HAVING COUNT(*) >= 293
ORDER BY PostsCount DESC;

StackOverflowTop10

It all looks pretty straightforward.  I’m going into the database and getting the users that have more than 293 posts in the month of January, 2017.  This equates to the TOP 10.

So, we’re getting the data, filtering it by the date, getting the top 10 users – so we’d expect to be reading 10 rows from the Users table, right?

Table 'Users'. Scan count 1, logical reads 39913, physical reads 0, read-ahead reads 1423, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 1257, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Except we’re not.  We’ve got 39913 page reads.

If we take a look at the query plan, we can see that it’s actually performed some optimization for us and brought the aggregation early – however, it hasn’t decided to select the top rows from the HAVING clause before performing the JOIN.

Perf1QP1

This means that if we hover over the fat arrow depicting the INNER JOIN back to the Users table we see we’re actually reading in 7.6 million users before filtering for the HAVING clause.  Well, at least it optimized the aggregation, otherwise it could be many more!

Perf1QP1b

There are many ways to optimize this JOIN to perform better.

The query actually performs better if it can optimize knowing you want the TOP 10.

SELECT u.Id, u.DisplayName, COUNT(*) AS PostsCountSELECT u.Id, u.DisplayName, COUNT(*) AS PostsCount
FROM dbo.Posts p
 JOIN dbo.Users u ON p.OwnerUserId = u.Id
WHERE p.CreationDate >= '2017-01-01' AND p.CreationDate < '2017-02-01'
GROUP BY u.Id, u.DisplayName
ORDER BY PostsCount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
Table 'Users'. Scan count 0, logical reads 155, physical reads 1, read-ahead reads 57, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 1257, physical reads 4, read-ahead reads 1253, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Perf1vTop

You can separate the aggregation out to a subquery to get the results prior to the join.

SELECT u.Id, u.DisplayName, p.PostsCount
FROM (SELECT p.OwnerUserId, COUNT(*) PostsCount
   FROM dbo.Posts p
   WHERE p.CreationDate >= '2017-01-01' AND p.CreationDate < '2017-02-01'
   GROUP BY p.OwnerUserId
   HAVING COUNT(*) >= 293
 ) p JOIN dbo.Users u ON p.OwnerUserId = u.Id
ORDER BY p.PostsCount DESC;

You can separate the subquery out into a CTE, which acts like a VIEW.

WITH PostsCte AS (WITH PostsCte AS (
 SELECT p.OwnerUserId, COUNT(*) PostsCount
 FROM dbo.Posts p
 WHERE p.CreationDate >= '2017-01-01' AND p.CreationDate < '2017-02-01'
 GROUP BY p.OwnerUserId
 HAVING COUNT(*) >= 293
)

SELECT u.Id, u.DisplayName, p.PostsCount
FROM PostsCte p JOIN dbo.Users u ON p.OwnerUserId = u.Id
ORDER BY p.PostsCount DESC;

You can make the initial subquery as a separate query into a temporary table.

CREATE TABLE #posts (Id INT, PostsCount INT)

INSERT INTO #posts (Id, PostsCount)
SELECT p.OwnerUserId, COUNT(*) PostsCount
FROM dbo.Posts p
WHERE p.CreationDate >= '2017-01-01' AND p.CreationDate < '2017-02-01'
GROUP BY p.OwnerUserId
HAVING COUNT(*) >= 293

SELECT u.Id, u.DisplayName, p.PostsCount
FROM #posts p JOIN dbo.Users u ON p.Id = u.Id
ORDER BY p.PostsCount DESC

DROP TABLE #posts;

Similarly, you can use a temporary table variable.

DECLARE @posts TABLE (Id INT, PostsCount INT)

INSERT INTO @posts (Id, PostsCount)
SELECT p.OwnerUserId, COUNT(*) PostsCount
FROM dbo.Posts p
WHERE p.CreationDate >= '2017-01-01' AND p.CreationDate < '2017-02-01'
GROUP BY p.OwnerUserId
HAVING COUNT(*) >= 293

SELECT u.Id, u.DisplayName, p.PostsCount
FROM @posts p JOIN dbo.Users u ON p.Id = u.Id
ORDER BY p.PostsCount DESC;

The point here is there are several different ways to approach improving your queries, and you will find a pattern that suits over time, that is readable, and its worth paying attention to your tools to improve your performance.

The optimizer can do some work at improving performance, but having an idea of the order that things are generally done helps you know where the bottleneck of your query may be.

The difference in basic execution times for the six different queries, in CPU use is:

 SQL Server Execution Times:
 CPU time = 2359 ms, elapsed time = 2569 ms.
 CPU time = 250 ms, elapsed time = 362 ms.
 CPU time = 188 ms, elapsed time = 233 ms.
 CPU time = 203 ms, elapsed time = 260 ms.
 CPU time = 172 ms, elapsed time = 213 ms.
 CPU time = 203 ms, elapsed time = 204 ms.

Improving your performance to 1/5th the time, resources, reduces the WAITS, the locking contention and allows you to do a lot more with the hardware you’ve got.

Obviously, as mentioned earlier, this is a contrived example, and you’d do other things to improve performance such as caching of results, but it’s a reminder to keep an eye on your server, and pay attention to the small things, as they can have a big impact on your server performance.

Some further reading with better examples:

SQLBolt – SQL Lesson 12: Order of execution of a Query

Itzik Ben-Gan – Logical Query Processing: What It Is And What It Means to You

1 thought on “SQL Order of Execution”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s