T-SQL Tuesday #18-CTEs: What a CTE is Not

A common beginner’s misconception about Common Table Expressions (CTEs) is that they are a real result set, like those produced by a temporary table or table variable. In fact, the opposite is true: they’re really just a way to simplify and encapsulate your code. For this month’s TSQL-Tuesday, focused on CTEs, I want to illustrate this difference with an example of how a (non-recursive) CTE can be both more and less efficient than a temporary table at accessing data.

Let’s start with a simple query comparing the sales volume and average price between a given year and the previous year expressed as both a CTE and a derived table:

 
/* CTE Version */
WITH SalesData as
(
SELECT sd.ProductId
     , SalesYr  = YEAR(sh.OrderDate)
     , AvgPrice = Avg(UnitPrice)
  FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
  JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    ON sh.SalesOrderID = sd.SalesOrderID
GROUP BY sd.ProductId,YEAR(sh.OrderDate)
)
 
SELECT s1.ProductId
    , s1.SalesYr  as year1
    , s2.SalesYr  as year2
    , s1.AvgPrice as Year1AvgPrice
    , s2.AvgPrice as Year2AvgPrice
    , s2.AvgPrice/s1.AvgPrice as Year2Change
  FROM SalesData s1
INNER JOIN SalesData s2
    ON s1.Productid = s2.ProductId
   AND s1.SalesYr = s2.SalesYr -1
  
/* Derived Table Version */
SELECT s1.ProductId
    , s1.SalesYr  as year1
    , s2.SalesYr  as year2
    , s1.AvgPrice as Year1AvgPrice
    , s2.AvgPrice as Year2AvgPrice
    , s2.AvgPrice/s1.AvgPrice as Year2Change
  FROM (
          SELECT sd.ProductId
             , SalesYr  = YEAR(sh.OrderDate)
             , AvgPrice = Avg(UnitPrice)
          FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
          JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
            ON sh.SalesOrderID = sd.SalesOrderID
        GROUP BY sd.ProductId,YEAR(sh.OrderDate)
       ) s1
  INNER JOIN
    (
        SELECT sd.ProductId
             , SalesYr  = YEAR(sh.OrderDate)
             , AvgPrice = Avg(UnitPrice)
          FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
          JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
            ON sh.SalesOrderID = sd.SalesOrderID
        GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    )             s2
    ON s1.Productid = s2.ProductId
  AND s1.SalesYr = s2.SalesYr -1

Both the CTE and the derived table generate the same execution plan:

Statistics IO for both queries is also identical:

 
(347 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderDetail’. Scan count 2, logical reads 2480, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderHeader’. Scan count 2, logical reads 1372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So, from an execution standpoint, the derived table and CTE are essentially the same. Now, consider the same query using a temporary table:

 
  /* Aggregate Data in Temp Table */
SELECT sd.ProductId
     , SalesYr  = YEAR(sh.OrderDate)
     , AvgPrice = Avg(UnitPrice)
  INTO #TmpSales
  FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
  JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    ON sh.SalesOrderID = sd.SalesOrderID
GROUP BY sd.ProductId,YEAR(sh.OrderDate)
 
/*Return Sales Info */
SELECT s1.ProductId
    , s1.SalesYr  as year1
    , s2.SalesYr  as year2
    , s1.AvgPrice as Year1AvgPrice
    , s2.AvgPrice as Year2AvgPrice
    , s2.AvgPrice/s1.AvgPrice as Year2Change
  FROM #TmpSales s1
INNER JOIN #TmpSales s2
    ON s1.Productid = s2.ProductId
   AND s1.SalesYr = s2.SalesYr -1

Along with its query plan:

And Statistics IO:

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderDetail’. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
(613 row(s) affected)
 
(347 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#TmpSales’. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Notice that in both the CTE and derived table versions, SalesOrderHeader and SalesOrderDetail are each accessed twice and the aggregations are each calculated twice. The temporary table version accesses each table once, aggregates the results, and then uses the smaller temporary table to produce the final results. As a result, the CTE version is nearly twice as expensive as the temporary table. If a CTE were truly a “results set” (as some authors and speakers have presented it), then we should only see SalesOrderHeader and SalesOrderDetail accessed once, just as with the temporary table, and it should have a similar IO cost. They’re not. Conclusion: a CTE is not a temp table or stored results set.

In this particular case, I’ve structured my queries so that the CTE was a less efficient way to access the data. There are times, however, that the compiler can take advantage of the CTE structure and create a more efficient way to access the data.

Here are two (oversimplified) queries to illustrate this point:

 
 
  /* CTE Version */
WITH SalesData as
(
    SELECT sd.ProductId
         , SalesYr  = YEAR(sh.OrderDate)
         , AvgPrice = Avg(UnitPrice)
         , AvgOrderQty =AVG(OrderQty)
      FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
      JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
        ON sh.SalesOrderID = sd.SalesOrderID
    GROUP BY sd.ProductId,YEAR(sh.OrderDate)
)
 
SELECT ProductId
    , AvgPrice
    , AvgOrderQty
  FROM SalesData
WHERE SalesYr = ’2006′;
  /* Aggregate Data in Temp Table */
 
SELECT sd.ProductId
     , SalesYr  = YEAR(sh.OrderDate)
     , AvgPrice = Avg(UnitPrice)
     , AvgOrderQty =AVG(OrderQty)
  INTO #TmpSales
  FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
  JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    ON sh.SalesOrderID = sd.SalesOrderID
GROUP BY sd.ProductId,YEAR(sh.OrderDate)
 
 
/*Return Sales Info */
 
SELECT ProductId
    , AvgPrice
    , AvgOrderQty
  FROM #TmpSales
WHERE SalesYr = ’2006′

Along with their Statistics IO:

***CTE***
(132 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderDetail’. Scan count 1, logical reads 285, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
***Temp Table***
  
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderDetail’. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
(613 row(s) affected)
 
(132 row(s) affected)
Table ‘#TmpSales’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The temporary table approach still pays the cost of aggregating all the data first and it has the same IO cost as the first sample query. With the CTE, however, the compiler is able to take advantage of the WHERE clause when it’s expanding the query. As a result, it aggregates less data and uses about half the IO that’s involved in creating and reading from the unfiltered temporary table. The text showplan reveals this application of the WHERE clause:

StmtText
—————————————————————————————————————————————————————————————————————————————————————-
  |–Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019]/CONVERT_IMPLICIT(money,[Expr1018],0) END, [Expr1006]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1020]/CONVERT_IMPLICIT(int,[Expr1018],0) END))
       |–Hash Match(Aggregate, HASH:([sd].[ProductID]) DEFINE:([Expr1018]=COUNT(*), [Expr1019]=SUM([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[UnitPrice] as [sd].[UnitPrice]), [Expr1020]=SUM([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[OrderQ
            |--Merge Join(Inner Join, MERGE:([sh].[SalesOrderID])=([sd].[SalesOrderID]), RESIDUAL:([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[SalesOrderID] as [sd].[SalesOrderID]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderID] as [
                 |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [sh]),  WHERE:(datepart(year,[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[OrderDate] as [sh].[OrderDate])=(2006))
                 |–Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] AS [sd]), ORDERED FORWARD)

So which approach is better, the CTE or temporary table? As always, it depends on your data and your use. For a little help deciding, why not check out the rest of the posts in this month’s TSQL-Tuesday?

T-SQL Tuesday #18-CTEs: What a CTE is Not

Trending Articles

Principal’s past includes domestic violence case

Download: Dismanto Ft Rich Bizzy – Bwete (Prod by: Dismanto)

Daru and Sharab Status for Sharabi Friends in Hindi, Punjabi

Pasulong o Paurong? (Col. 2:1-7)

FIFA 15 PPSSPP Android Download

Tigers to Lions: San Beda names Kungfu Reyes as Lady Red Spikers head coach...

Nottingham businessman jailed for three years for crimes...

NOTES ZA GENERAL CHEMISTRY ZA NGAIZA

Huzurabad Municipality into 30 wards

Lady Gaga – MAYHEM (2025) [FLAC 24bit/44,1kHz]

PURPLE RANGE LIVE AT GAL AMUNA 2013

Moondru Mudichu 27-05-2016 – Polimer tv Serial

XAMJYSS VPN APP | Powered by XAMJYSSVPN | Sun TU CTC FLIP | GTM FB IG |

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Practice Sheet of Right form of verbs for HSC Students

Lirik Lagu Rohani Glory Haleluya - Yochen Amos

Krishna Kanta Handique State Open University Latest Exam Result 2016...

East Godavari District Police Officers Mobile Numbers

Download EFF Song –“Azania”, led by Mbuyiseni Dlozi

Arrow Flash 2 Sinhala Teledrama – Last Episode 33 – 24th April 2016