Menace of Trim function usage in Entity Framework


Problem Statement 

During the last few weeks, we have been working towards improving the performance of one of our applications. We had multiple refinement meetings to break down the cumulative effort into multiple user stories. One of the prioritized user story was to minimize the service call timings from the application. After doing some further analysis, we narrowed down the performance issue to one particular service call which was taking about 1 sec to bring back the results. Using MiniProfiler and Timers, we further narrowed down the exact piece of code which was the major culprit.

Surprisingly enough, it was a simple Entity Framework query trying to pull in a filtered list of titles based on the matching ListCodes. Looks simple right!

_context.Titles.Where (t => t.ListCode.Trim() == buildTitle.ListCode)

The Trim function was used to remove the leading and trailing white spaces – since there was a high chance of spaces in the strings since the source data was in Mainframe.

I used SQL Profiler to capture the T-SQL query being fired and as expected the usage of Trim function popped out. The LINQ syntax .Trim() was converted to  LTRIM(RTRIM()) in T-SQL side.

LTRIM is used to remove leading spaces from a string.

RTRIM is used to remove trailing spaces from a string.

Starting with SQL Server 2017, we now have the much awaited TRIM function which will remove both the leading and trailing spaces from the string. However that is out of scope of the current article.


// Entity Framework Query
_context.Titles.Where (t => t.ListCode.Trim() == buildTitle.ListCode)
// Corresponding Query fired against SQL Server
DECLARE @p0 VarChar(3) = 'BO'
SELECT [t0].[Titlecode], [t0].[ZLineNumber]
FROM [Titles] AS [t0]
WHERE LTRIM(RTRIM([t0].[ListCode])) = @p0

https://gist.github.com/samirbehara/acf513bee53d448f896195b6601577c8#file-trim-in-entity-framework

LTRIM and RTRIM

Let us now execute the same query in SQL Server without the TRIM operation –


SELECT [t0].[Titlecode], [t0].[ZLineNumber]
FROM [Titles] AS [t0]
WHERE [t0].[ListCode] = @p0

If you want to recreate this issue locally, use the below table/index creation script –


CREATE TABLE [dbo].[Titles](
[Titlecode] [decimal](13, 0) NOT NULL,
[ListCode] [char](3) NULL,
[ZLineNumber] [int] NOT NULL,
[ZLineRateCode] [char](3) NULL,
[ZLineComment] [varchar](30) NULL,
CONSTRAINT [pk_Titles] PRIMARY KEY CLUSTERED
(
[Titlecode] ASC,
[ZLineNumber] ASC
))
CREATE NONCLUSTERED INDEX [nc_ListCode] ON [dbo].[Titles]
(
[ListCode] ASC
)

Generally when I start doing a performance analysis, apart from looking at the Execution Plan – I also look at statistics, which shows me the amount of disk activity and time taken by the SQL Server Optimizer to execute a query. We can turn on statistical information in SQL Server, by executing the below statements –

SET STATISTICS IO ON

SET STATISTICS TIME ON

After running the above 2 queries and looking at the stats, it shows that the logical reads are 27299 vs 17. No surprises there as to which is more optimal from a performance standpoint.

(4949 rows affected)

Table ‘Titles’. Scan count 1, logical reads 27299, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(4949 rows affected)

Table ‘Titles’. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Menace of TRIMs

Conclusion

When a column is wrapped around in a function call in the WHERE clause, the query processing speed can be extremely low because of the potential resultant scan operation. We should avoid using functions on columns in predicate. Especially on large tables, it can create a noticeable performance overhead by preventing the Query Optimizer to use available indexes.

It is the responsibility of the Query Optimizer to figure out how to implement the request in the most optimal way and choosing the lowest cost plan. In this scenario, the Query Optimizer will view the LTRIM(RTIM(coulmn_name)) as a new column which is not indexed. Even if you have proper Indexes on selected columns, using functions on the columns in predicate will prompt the optimizer to not benefit from the available index. Hence you are not able to leverage the real power of indexes on your tables.

Best Practice – 

While saving the data in your SQL Server tables, it is better to store the trimmed values at that point – so that you do not need to use the expensive TRIM operation throughout your application. Also you should not allow the user to enter strings with spaces from the user interface. We went ahead and ran an ad-hoc one time query to trim spaces from the corresponding column, so that there is no impact of removing the Trim function usage in the Entity Framework code. 

By just removing the Trim function usage in the EF query, we were able to reduce the overall service call timing from 1sec to 200ms.

Points to remember

This article focused on the TRIM function but generally any function usage in the WHERE clause can slow down your query performance. Use it with caution. SUBSTRING/ CONVERT/ COLLATE can cause similar performance bottlenecks.

Also the WHERE clause is not the only culprit here – you will be able to replicate the performance issue with JOIN/ ORDER BY/GROUP BY/HAVING operators. You are good to use functions in the SELECT list, and is irrelevant to the index selection stats.

Reference

Applying Functions on WHERE Clause Columns might Cause Serious Performance Problems



Categories: C#, Entity Framework, SQL Server

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: