Do varchar(max), nvarchar(max) and varbinary(max) columns affect select queries?Speed up INSERTsCreate a table dynamically in SQL Serverdeteriorating stored procedure running timesDuplicate records in primary key during big selectUnderstanding varchar(max) 8000 column and why I can store more than 8000 characters in itWhy does my database structure and SELECT operations generate NULLs?Partially-Unique Check ConstraintsWill Re-Seeding an Identity column back to 0 cause page splits?Performance difference between Text and Varchar in MysqlError 666 on clustered primary key

Is there a good way to store credentials outside of a password manager?

Meta programming: Declare a new struct on the fly

What to do when my ideas aren't chosen, when I strongly disagree with the chosen solution?

Latex for-and in equation

Giant Toughroad SLR 2 for 200 miles in two days, will it make it?

Lifted its hind leg on or lifted its hind leg towards?

Installing PowerShell on 32-bit Kali OS fails

What does the "3am" section means in manpages?

In Star Trek IV, why did the Bounty go back to a time when whales were already rare?

Simple recursive Sudoku solver

Perfect riffle shuffles

Can I Retrieve Email Addresses from BCC?

How to prevent YouTube from showing already watched videos?

Reply ‘no position’ while the job posting is still there (‘HiWi’ position in Germany)

Is it okay / does it make sense for another player to join a running game of Munchkin?

How do I repair my stair bannister?

Can a Bard use an arcane focus?

For airliners, what prevents wing strikes on landing in bad weather?

Is a naturally all "male" species possible?

Is there an Impartial Brexit Deal comparison site?

Can I use my Chinese passport to enter China after I acquired another citizenship?

word describing multiple paths to the same abstract outcome

Is there a problem with hiding "forgot password" until it's needed?

Is it legal to discriminate due to the medicine used to treat a medical condition?



Do varchar(max), nvarchar(max) and varbinary(max) columns affect select queries?


Speed up INSERTsCreate a table dynamically in SQL Serverdeteriorating stored procedure running timesDuplicate records in primary key during big selectUnderstanding varchar(max) 8000 column and why I can store more than 8000 characters in itWhy does my database structure and SELECT operations generate NULLs?Partially-Unique Check ConstraintsWill Re-Seeding an Identity column back to 0 cause page splits?Performance difference between Text and Varchar in MysqlError 666 on clustered primary key













5















Consider this table:



create table Books
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100) not null,
[Text] nvarchar(max) not null
)


Let's imagine that we have over 100,000 books in this table.



Now we're given a 10,000 books data to insert into this table, some of which are duplicate. So we need to filter duplicates first, and then insert new books.



One way to check for the duplicates is this way:



select UniqueToken
from Books
where UniqueToken in
(
'first unique token',
'second unique token'
-- 10,000 items here
)


Does the existence of Text column affect this query's performance? If so, how can we optimized it?



P.S.
I have the same structure, for some other data. And it's not performing well. A friend told me that I should break my table into two tables as follow:



create table BookUniqueTokens 
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100)
)

create table Books
(
Id bigint not null primary key,
[Text] nvarchar(max)
)


And I have to do my duplicate finding algorithm on the first table only, and then insert data into both of them. This way he claimed performance gets way better, because tables are physically separate. He claimed that [Text] column affects any select query on the UniqueToken column.










share|improve this question

















  • 2





    Is there a nonclustered index on UniqueToken ? Also, I would not advise an IN with 10k items, I would store them in a temp table and filter the UniqueTokens with this temporary table. More on that here

    – Randi Vertongen
    yesterday






  • 1





    1) If you are checking for duplicates, why would you include the Text column in the query? 2) can you please update the question to inlcude a few examples of values stored in the UniqueToken column? If you don't want to share actual company data, modify it, but keep the format the same.

    – Solomon Rutzky
    yesterday












  • @RandiVertongen, yes there is a nonclustered index on UniqueToken

    – Saeed Neamati
    yesterday











  • @SolomonRutzky, I'm retrieving existing values from database, to be excluded inside the application code.

    – Saeed Neamati
    yesterday











  • @SaeedNeamati I added an edit based on the NC index existing. If the query in the question is the one that needs to be optimized, and the NC index does not have the Text column included, then I would look at the IN for query optimization. There are better ways to find duplicate data.

    – Randi Vertongen
    yesterday















5















Consider this table:



create table Books
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100) not null,
[Text] nvarchar(max) not null
)


Let's imagine that we have over 100,000 books in this table.



Now we're given a 10,000 books data to insert into this table, some of which are duplicate. So we need to filter duplicates first, and then insert new books.



One way to check for the duplicates is this way:



select UniqueToken
from Books
where UniqueToken in
(
'first unique token',
'second unique token'
-- 10,000 items here
)


Does the existence of Text column affect this query's performance? If so, how can we optimized it?



P.S.
I have the same structure, for some other data. And it's not performing well. A friend told me that I should break my table into two tables as follow:



create table BookUniqueTokens 
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100)
)

create table Books
(
Id bigint not null primary key,
[Text] nvarchar(max)
)


And I have to do my duplicate finding algorithm on the first table only, and then insert data into both of them. This way he claimed performance gets way better, because tables are physically separate. He claimed that [Text] column affects any select query on the UniqueToken column.










share|improve this question

















  • 2





    Is there a nonclustered index on UniqueToken ? Also, I would not advise an IN with 10k items, I would store them in a temp table and filter the UniqueTokens with this temporary table. More on that here

    – Randi Vertongen
    yesterday






  • 1





    1) If you are checking for duplicates, why would you include the Text column in the query? 2) can you please update the question to inlcude a few examples of values stored in the UniqueToken column? If you don't want to share actual company data, modify it, but keep the format the same.

    – Solomon Rutzky
    yesterday












  • @RandiVertongen, yes there is a nonclustered index on UniqueToken

    – Saeed Neamati
    yesterday











  • @SolomonRutzky, I'm retrieving existing values from database, to be excluded inside the application code.

    – Saeed Neamati
    yesterday











  • @SaeedNeamati I added an edit based on the NC index existing. If the query in the question is the one that needs to be optimized, and the NC index does not have the Text column included, then I would look at the IN for query optimization. There are better ways to find duplicate data.

    – Randi Vertongen
    yesterday













5












5








5


2






Consider this table:



create table Books
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100) not null,
[Text] nvarchar(max) not null
)


Let's imagine that we have over 100,000 books in this table.



Now we're given a 10,000 books data to insert into this table, some of which are duplicate. So we need to filter duplicates first, and then insert new books.



One way to check for the duplicates is this way:



select UniqueToken
from Books
where UniqueToken in
(
'first unique token',
'second unique token'
-- 10,000 items here
)


Does the existence of Text column affect this query's performance? If so, how can we optimized it?



P.S.
I have the same structure, for some other data. And it's not performing well. A friend told me that I should break my table into two tables as follow:



create table BookUniqueTokens 
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100)
)

create table Books
(
Id bigint not null primary key,
[Text] nvarchar(max)
)


And I have to do my duplicate finding algorithm on the first table only, and then insert data into both of them. This way he claimed performance gets way better, because tables are physically separate. He claimed that [Text] column affects any select query on the UniqueToken column.










share|improve this question














Consider this table:



create table Books
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100) not null,
[Text] nvarchar(max) not null
)


Let's imagine that we have over 100,000 books in this table.



Now we're given a 10,000 books data to insert into this table, some of which are duplicate. So we need to filter duplicates first, and then insert new books.



One way to check for the duplicates is this way:



select UniqueToken
from Books
where UniqueToken in
(
'first unique token',
'second unique token'
-- 10,000 items here
)


Does the existence of Text column affect this query's performance? If so, how can we optimized it?



P.S.
I have the same structure, for some other data. And it's not performing well. A friend told me that I should break my table into two tables as follow:



create table BookUniqueTokens 
(
Id bigint not null primary key identity(1, 1),
UniqueToken varchar(100)
)

create table Books
(
Id bigint not null primary key,
[Text] nvarchar(max)
)


And I have to do my duplicate finding algorithm on the first table only, and then insert data into both of them. This way he claimed performance gets way better, because tables are physically separate. He claimed that [Text] column affects any select query on the UniqueToken column.







sql-server performance






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked yesterday









Saeed NeamatiSaeed Neamati

5501518




5501518







  • 2





    Is there a nonclustered index on UniqueToken ? Also, I would not advise an IN with 10k items, I would store them in a temp table and filter the UniqueTokens with this temporary table. More on that here

    – Randi Vertongen
    yesterday






  • 1





    1) If you are checking for duplicates, why would you include the Text column in the query? 2) can you please update the question to inlcude a few examples of values stored in the UniqueToken column? If you don't want to share actual company data, modify it, but keep the format the same.

    – Solomon Rutzky
    yesterday












  • @RandiVertongen, yes there is a nonclustered index on UniqueToken

    – Saeed Neamati
    yesterday











  • @SolomonRutzky, I'm retrieving existing values from database, to be excluded inside the application code.

    – Saeed Neamati
    yesterday











  • @SaeedNeamati I added an edit based on the NC index existing. If the query in the question is the one that needs to be optimized, and the NC index does not have the Text column included, then I would look at the IN for query optimization. There are better ways to find duplicate data.

    – Randi Vertongen
    yesterday












  • 2





    Is there a nonclustered index on UniqueToken ? Also, I would not advise an IN with 10k items, I would store them in a temp table and filter the UniqueTokens with this temporary table. More on that here

    – Randi Vertongen
    yesterday






  • 1





    1) If you are checking for duplicates, why would you include the Text column in the query? 2) can you please update the question to inlcude a few examples of values stored in the UniqueToken column? If you don't want to share actual company data, modify it, but keep the format the same.

    – Solomon Rutzky
    yesterday












  • @RandiVertongen, yes there is a nonclustered index on UniqueToken

    – Saeed Neamati
    yesterday











  • @SolomonRutzky, I'm retrieving existing values from database, to be excluded inside the application code.

    – Saeed Neamati
    yesterday











  • @SaeedNeamati I added an edit based on the NC index existing. If the query in the question is the one that needs to be optimized, and the NC index does not have the Text column included, then I would look at the IN for query optimization. There are better ways to find duplicate data.

    – Randi Vertongen
    yesterday







2




2





Is there a nonclustered index on UniqueToken ? Also, I would not advise an IN with 10k items, I would store them in a temp table and filter the UniqueTokens with this temporary table. More on that here

– Randi Vertongen
yesterday





Is there a nonclustered index on UniqueToken ? Also, I would not advise an IN with 10k items, I would store them in a temp table and filter the UniqueTokens with this temporary table. More on that here

– Randi Vertongen
yesterday




1




1





1) If you are checking for duplicates, why would you include the Text column in the query? 2) can you please update the question to inlcude a few examples of values stored in the UniqueToken column? If you don't want to share actual company data, modify it, but keep the format the same.

– Solomon Rutzky
yesterday






1) If you are checking for duplicates, why would you include the Text column in the query? 2) can you please update the question to inlcude a few examples of values stored in the UniqueToken column? If you don't want to share actual company data, modify it, but keep the format the same.

– Solomon Rutzky
yesterday














@RandiVertongen, yes there is a nonclustered index on UniqueToken

– Saeed Neamati
yesterday





@RandiVertongen, yes there is a nonclustered index on UniqueToken

– Saeed Neamati
yesterday













@SolomonRutzky, I'm retrieving existing values from database, to be excluded inside the application code.

– Saeed Neamati
yesterday





@SolomonRutzky, I'm retrieving existing values from database, to be excluded inside the application code.

– Saeed Neamati
yesterday













@SaeedNeamati I added an edit based on the NC index existing. If the query in the question is the one that needs to be optimized, and the NC index does not have the Text column included, then I would look at the IN for query optimization. There are better ways to find duplicate data.

– Randi Vertongen
yesterday





@SaeedNeamati I added an edit based on the NC index existing. If the query in the question is the one that needs to be optimized, and the NC index does not have the Text column included, then I would look at the IN for query optimization. There are better ways to find duplicate data.

– Randi Vertongen
yesterday










1 Answer
1






active

oldest

votes


















6














Examples



Consider your query with 8 filter predicates in your IN clause on a dataset of 10K records.



select UniqueToken
from Books
where UniqueToken in
(
'Unique token 1',
'Unique token 2',
'Unique token 3',
'Unique token 4',
'Unique token 5',
'Unique token 6',
'Unique token 9999',
'Unique token 5000'
-- 10,000 items here
);


A clustered index scan is used, there are no other indexes present on this test table



enter image description here



With a data size of 216 Bytes.



You should also note how even with 8 records, the OR filters are stacking up.



The reads that happened on this table:



enter image description here



Credits to statisticsparser.



When you include the Text column in the select part of your query, the actual data size changes drastically:



select UniqueToken,Text
from Books
where UniqueToken in
(
'Unique token 1',
'Unique token 2',
'Unique token 3',
'Unique token 4',
'Unique token 5',
'Unique token 6',
'Unique token 9999',
'Unique token 5000'
-- 10,000 items here
);


Again, the Clustered index Scan with a residual predicate:



enter image description here



But with a dataset of 32KB.



As there are almost 1000 lob logical reads:



enter image description here



Now, when we create the two tables in question, and fill them up with the same 10k records



Executing the same select without Text. Remember that we had 99 Logical reads when using the Books Table.



select UniqueToken
from BookUniqueTokens
where UniqueToken in
(
'Unique token 1',
'Unique token 2',
'Unique token 3',
'Unique token 4',
'Unique token 5',
'Unique token 6',
'Unique token 9999',
'Unique token 5000'
-- 10,000 items here
)


The reads on BookUniqueTokens are lower, 67 instead of 99.



enter image description here



We can track that back to the pages in the original Books table and the pages in the new table without the Text.



Original Books table:



enter image description here



New BookUniqueTokens table



enter image description here



So, all the pages + (2 overhead pages?) are read from the clustered index.



Why is there a difference, and why is the difference not bigger? After all the datasize difference is huge (Lob data <> No Lob data)



Books Data space



enter image description here



BooksWithText Data space



enter image description here



The reason for this is ROW_OVERFLOW_DATA.



When data gets bigger than 8kb the data is stored as ROW_OVERFLOW_DATA on different pages.



Ok, if lob data is stored on different pages, why are the page sizes of these two clustered indexes not the same?



Due to the 24 byte pointer added to the Clustered index to track each of these pages.
After all, sql server needs to know where it can find the lob data.



Source




To answer your questions




He claimed that [Text] column affects any select query on the
UniqueToken column.




And




Does the existence of Text column affect this query's performance? If
so, how can we optimized it?




If the data stored is actually Lob Data, and the Query provided in the answer is used



It does bring some overhead due to the 24 byte pointers.



Depending on the executions / min not being crazy high, I would say that this is negligible, even with 100K records.



Remember that this overhead only happens if an index that includes Text is used, such as the clustered index.



But, what if the clustered index scan is used, and the lob data does not exceed 8kb?



If the data does not exceed 8kb, and you have no index on UniqueToken,the overhead could be bigger . even when not selecting the Text column.



Logical reads on 10k records when Text is only 137 characters long (all records):




Table 'Books2'. Scan count 1, logical reads 419




Due to all this extra data being on the clustered index pages.



Again, an index on UniqueToken (Without including the Text column) will resolve this issue.



As pointed out by @David Browne - Microsoft, you could also store the data off row, as to not add this overhead on the Clustered index when not selecting this Text Column.




Also, if you do want the text stored off-row, you can force that
without using a separate table. Just set the 'large value types out of
row' option with sp_tableoption.
docs.microsoft.com/en-us/sql/relational-databases




TL;DR



Based on the query given, indexing UniqueToken without including TEXT should resolve your troubles.
Additionally, I would use a temporary table or table type to do the filtering instead of the IN statement.



EDIT:




yes there is a nonclustered index on UniqueToken




Your example query is not touching the Text column, and based on the query this should be a covering index.



If we test this on the three tables we previously used (UniqueToken + Lob data, Solely UniqueToken, UniqueToken + 137 Char data in nvarchar(max) column)



CREATE INDEX [IX_Books_UniqueToken] ON Books(UniqueToken);
CREATE INDEX [IX_BookUniqueTokens_UniqueToken] ON BookUniqueTokens(UniqueToken);
CREATE INDEX [IX_Books2_UniqueToken] ON Books2(UniqueToken);


The reads remain the same for these three tables, because the nonclustered index is used.



Table 'Books'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'BookUniqueTokens'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Books2'. Scan count 8, logical reads 16, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Additional details



by @David Browne - Microsoft




Also, if you do want the text stored off-row, you can force that
without using a separate table. Just set the 'large value types out of
row' option with sp_tableoption.
docs.microsoft.com/en-us/sql/relational-databases/




Remember that you have to rebuild your indexes for this to take effect on already populated data.



By @Erik Darling



On



  • MAX Data Types Do WHAT?

Filtering on Lob data sucks.



  • Memory Grants and Data Size

Your memory grants might go through the roof when using bigger datatypes, impacting performance.






share|improve this answer
























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "182"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232941%2fdo-varcharmax-nvarcharmax-and-varbinarymax-columns-affect-select-queries%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    Examples



    Consider your query with 8 filter predicates in your IN clause on a dataset of 10K records.



    select UniqueToken
    from Books
    where UniqueToken in
    (
    'Unique token 1',
    'Unique token 2',
    'Unique token 3',
    'Unique token 4',
    'Unique token 5',
    'Unique token 6',
    'Unique token 9999',
    'Unique token 5000'
    -- 10,000 items here
    );


    A clustered index scan is used, there are no other indexes present on this test table



    enter image description here



    With a data size of 216 Bytes.



    You should also note how even with 8 records, the OR filters are stacking up.



    The reads that happened on this table:



    enter image description here



    Credits to statisticsparser.



    When you include the Text column in the select part of your query, the actual data size changes drastically:



    select UniqueToken,Text
    from Books
    where UniqueToken in
    (
    'Unique token 1',
    'Unique token 2',
    'Unique token 3',
    'Unique token 4',
    'Unique token 5',
    'Unique token 6',
    'Unique token 9999',
    'Unique token 5000'
    -- 10,000 items here
    );


    Again, the Clustered index Scan with a residual predicate:



    enter image description here



    But with a dataset of 32KB.



    As there are almost 1000 lob logical reads:



    enter image description here



    Now, when we create the two tables in question, and fill them up with the same 10k records



    Executing the same select without Text. Remember that we had 99 Logical reads when using the Books Table.



    select UniqueToken
    from BookUniqueTokens
    where UniqueToken in
    (
    'Unique token 1',
    'Unique token 2',
    'Unique token 3',
    'Unique token 4',
    'Unique token 5',
    'Unique token 6',
    'Unique token 9999',
    'Unique token 5000'
    -- 10,000 items here
    )


    The reads on BookUniqueTokens are lower, 67 instead of 99.



    enter image description here



    We can track that back to the pages in the original Books table and the pages in the new table without the Text.



    Original Books table:



    enter image description here



    New BookUniqueTokens table



    enter image description here



    So, all the pages + (2 overhead pages?) are read from the clustered index.



    Why is there a difference, and why is the difference not bigger? After all the datasize difference is huge (Lob data <> No Lob data)



    Books Data space



    enter image description here



    BooksWithText Data space



    enter image description here



    The reason for this is ROW_OVERFLOW_DATA.



    When data gets bigger than 8kb the data is stored as ROW_OVERFLOW_DATA on different pages.



    Ok, if lob data is stored on different pages, why are the page sizes of these two clustered indexes not the same?



    Due to the 24 byte pointer added to the Clustered index to track each of these pages.
    After all, sql server needs to know where it can find the lob data.



    Source




    To answer your questions




    He claimed that [Text] column affects any select query on the
    UniqueToken column.




    And




    Does the existence of Text column affect this query's performance? If
    so, how can we optimized it?




    If the data stored is actually Lob Data, and the Query provided in the answer is used



    It does bring some overhead due to the 24 byte pointers.



    Depending on the executions / min not being crazy high, I would say that this is negligible, even with 100K records.



    Remember that this overhead only happens if an index that includes Text is used, such as the clustered index.



    But, what if the clustered index scan is used, and the lob data does not exceed 8kb?



    If the data does not exceed 8kb, and you have no index on UniqueToken,the overhead could be bigger . even when not selecting the Text column.



    Logical reads on 10k records when Text is only 137 characters long (all records):




    Table 'Books2'. Scan count 1, logical reads 419




    Due to all this extra data being on the clustered index pages.



    Again, an index on UniqueToken (Without including the Text column) will resolve this issue.



    As pointed out by @David Browne - Microsoft, you could also store the data off row, as to not add this overhead on the Clustered index when not selecting this Text Column.




    Also, if you do want the text stored off-row, you can force that
    without using a separate table. Just set the 'large value types out of
    row' option with sp_tableoption.
    docs.microsoft.com/en-us/sql/relational-databases




    TL;DR



    Based on the query given, indexing UniqueToken without including TEXT should resolve your troubles.
    Additionally, I would use a temporary table or table type to do the filtering instead of the IN statement.



    EDIT:




    yes there is a nonclustered index on UniqueToken




    Your example query is not touching the Text column, and based on the query this should be a covering index.



    If we test this on the three tables we previously used (UniqueToken + Lob data, Solely UniqueToken, UniqueToken + 137 Char data in nvarchar(max) column)



    CREATE INDEX [IX_Books_UniqueToken] ON Books(UniqueToken);
    CREATE INDEX [IX_BookUniqueTokens_UniqueToken] ON BookUniqueTokens(UniqueToken);
    CREATE INDEX [IX_Books2_UniqueToken] ON Books2(UniqueToken);


    The reads remain the same for these three tables, because the nonclustered index is used.



    Table 'Books'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'BookUniqueTokens'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Books2'. Scan count 8, logical reads 16, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


    Additional details



    by @David Browne - Microsoft




    Also, if you do want the text stored off-row, you can force that
    without using a separate table. Just set the 'large value types out of
    row' option with sp_tableoption.
    docs.microsoft.com/en-us/sql/relational-databases/




    Remember that you have to rebuild your indexes for this to take effect on already populated data.



    By @Erik Darling



    On



    • MAX Data Types Do WHAT?

    Filtering on Lob data sucks.



    • Memory Grants and Data Size

    Your memory grants might go through the roof when using bigger datatypes, impacting performance.






    share|improve this answer





























      6














      Examples



      Consider your query with 8 filter predicates in your IN clause on a dataset of 10K records.



      select UniqueToken
      from Books
      where UniqueToken in
      (
      'Unique token 1',
      'Unique token 2',
      'Unique token 3',
      'Unique token 4',
      'Unique token 5',
      'Unique token 6',
      'Unique token 9999',
      'Unique token 5000'
      -- 10,000 items here
      );


      A clustered index scan is used, there are no other indexes present on this test table



      enter image description here



      With a data size of 216 Bytes.



      You should also note how even with 8 records, the OR filters are stacking up.



      The reads that happened on this table:



      enter image description here



      Credits to statisticsparser.



      When you include the Text column in the select part of your query, the actual data size changes drastically:



      select UniqueToken,Text
      from Books
      where UniqueToken in
      (
      'Unique token 1',
      'Unique token 2',
      'Unique token 3',
      'Unique token 4',
      'Unique token 5',
      'Unique token 6',
      'Unique token 9999',
      'Unique token 5000'
      -- 10,000 items here
      );


      Again, the Clustered index Scan with a residual predicate:



      enter image description here



      But with a dataset of 32KB.



      As there are almost 1000 lob logical reads:



      enter image description here



      Now, when we create the two tables in question, and fill them up with the same 10k records



      Executing the same select without Text. Remember that we had 99 Logical reads when using the Books Table.



      select UniqueToken
      from BookUniqueTokens
      where UniqueToken in
      (
      'Unique token 1',
      'Unique token 2',
      'Unique token 3',
      'Unique token 4',
      'Unique token 5',
      'Unique token 6',
      'Unique token 9999',
      'Unique token 5000'
      -- 10,000 items here
      )


      The reads on BookUniqueTokens are lower, 67 instead of 99.



      enter image description here



      We can track that back to the pages in the original Books table and the pages in the new table without the Text.



      Original Books table:



      enter image description here



      New BookUniqueTokens table



      enter image description here



      So, all the pages + (2 overhead pages?) are read from the clustered index.



      Why is there a difference, and why is the difference not bigger? After all the datasize difference is huge (Lob data <> No Lob data)



      Books Data space



      enter image description here



      BooksWithText Data space



      enter image description here



      The reason for this is ROW_OVERFLOW_DATA.



      When data gets bigger than 8kb the data is stored as ROW_OVERFLOW_DATA on different pages.



      Ok, if lob data is stored on different pages, why are the page sizes of these two clustered indexes not the same?



      Due to the 24 byte pointer added to the Clustered index to track each of these pages.
      After all, sql server needs to know where it can find the lob data.



      Source




      To answer your questions




      He claimed that [Text] column affects any select query on the
      UniqueToken column.




      And




      Does the existence of Text column affect this query's performance? If
      so, how can we optimized it?




      If the data stored is actually Lob Data, and the Query provided in the answer is used



      It does bring some overhead due to the 24 byte pointers.



      Depending on the executions / min not being crazy high, I would say that this is negligible, even with 100K records.



      Remember that this overhead only happens if an index that includes Text is used, such as the clustered index.



      But, what if the clustered index scan is used, and the lob data does not exceed 8kb?



      If the data does not exceed 8kb, and you have no index on UniqueToken,the overhead could be bigger . even when not selecting the Text column.



      Logical reads on 10k records when Text is only 137 characters long (all records):




      Table 'Books2'. Scan count 1, logical reads 419




      Due to all this extra data being on the clustered index pages.



      Again, an index on UniqueToken (Without including the Text column) will resolve this issue.



      As pointed out by @David Browne - Microsoft, you could also store the data off row, as to not add this overhead on the Clustered index when not selecting this Text Column.




      Also, if you do want the text stored off-row, you can force that
      without using a separate table. Just set the 'large value types out of
      row' option with sp_tableoption.
      docs.microsoft.com/en-us/sql/relational-databases




      TL;DR



      Based on the query given, indexing UniqueToken without including TEXT should resolve your troubles.
      Additionally, I would use a temporary table or table type to do the filtering instead of the IN statement.



      EDIT:




      yes there is a nonclustered index on UniqueToken




      Your example query is not touching the Text column, and based on the query this should be a covering index.



      If we test this on the three tables we previously used (UniqueToken + Lob data, Solely UniqueToken, UniqueToken + 137 Char data in nvarchar(max) column)



      CREATE INDEX [IX_Books_UniqueToken] ON Books(UniqueToken);
      CREATE INDEX [IX_BookUniqueTokens_UniqueToken] ON BookUniqueTokens(UniqueToken);
      CREATE INDEX [IX_Books2_UniqueToken] ON Books2(UniqueToken);


      The reads remain the same for these three tables, because the nonclustered index is used.



      Table 'Books'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

      Table 'BookUniqueTokens'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

      Table 'Books2'. Scan count 8, logical reads 16, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


      Additional details



      by @David Browne - Microsoft




      Also, if you do want the text stored off-row, you can force that
      without using a separate table. Just set the 'large value types out of
      row' option with sp_tableoption.
      docs.microsoft.com/en-us/sql/relational-databases/




      Remember that you have to rebuild your indexes for this to take effect on already populated data.



      By @Erik Darling



      On



      • MAX Data Types Do WHAT?

      Filtering on Lob data sucks.



      • Memory Grants and Data Size

      Your memory grants might go through the roof when using bigger datatypes, impacting performance.






      share|improve this answer



























        6












        6








        6







        Examples



        Consider your query with 8 filter predicates in your IN clause on a dataset of 10K records.



        select UniqueToken
        from Books
        where UniqueToken in
        (
        'Unique token 1',
        'Unique token 2',
        'Unique token 3',
        'Unique token 4',
        'Unique token 5',
        'Unique token 6',
        'Unique token 9999',
        'Unique token 5000'
        -- 10,000 items here
        );


        A clustered index scan is used, there are no other indexes present on this test table



        enter image description here



        With a data size of 216 Bytes.



        You should also note how even with 8 records, the OR filters are stacking up.



        The reads that happened on this table:



        enter image description here



        Credits to statisticsparser.



        When you include the Text column in the select part of your query, the actual data size changes drastically:



        select UniqueToken,Text
        from Books
        where UniqueToken in
        (
        'Unique token 1',
        'Unique token 2',
        'Unique token 3',
        'Unique token 4',
        'Unique token 5',
        'Unique token 6',
        'Unique token 9999',
        'Unique token 5000'
        -- 10,000 items here
        );


        Again, the Clustered index Scan with a residual predicate:



        enter image description here



        But with a dataset of 32KB.



        As there are almost 1000 lob logical reads:



        enter image description here



        Now, when we create the two tables in question, and fill them up with the same 10k records



        Executing the same select without Text. Remember that we had 99 Logical reads when using the Books Table.



        select UniqueToken
        from BookUniqueTokens
        where UniqueToken in
        (
        'Unique token 1',
        'Unique token 2',
        'Unique token 3',
        'Unique token 4',
        'Unique token 5',
        'Unique token 6',
        'Unique token 9999',
        'Unique token 5000'
        -- 10,000 items here
        )


        The reads on BookUniqueTokens are lower, 67 instead of 99.



        enter image description here



        We can track that back to the pages in the original Books table and the pages in the new table without the Text.



        Original Books table:



        enter image description here



        New BookUniqueTokens table



        enter image description here



        So, all the pages + (2 overhead pages?) are read from the clustered index.



        Why is there a difference, and why is the difference not bigger? After all the datasize difference is huge (Lob data <> No Lob data)



        Books Data space



        enter image description here



        BooksWithText Data space



        enter image description here



        The reason for this is ROW_OVERFLOW_DATA.



        When data gets bigger than 8kb the data is stored as ROW_OVERFLOW_DATA on different pages.



        Ok, if lob data is stored on different pages, why are the page sizes of these two clustered indexes not the same?



        Due to the 24 byte pointer added to the Clustered index to track each of these pages.
        After all, sql server needs to know where it can find the lob data.



        Source




        To answer your questions




        He claimed that [Text] column affects any select query on the
        UniqueToken column.




        And




        Does the existence of Text column affect this query's performance? If
        so, how can we optimized it?




        If the data stored is actually Lob Data, and the Query provided in the answer is used



        It does bring some overhead due to the 24 byte pointers.



        Depending on the executions / min not being crazy high, I would say that this is negligible, even with 100K records.



        Remember that this overhead only happens if an index that includes Text is used, such as the clustered index.



        But, what if the clustered index scan is used, and the lob data does not exceed 8kb?



        If the data does not exceed 8kb, and you have no index on UniqueToken,the overhead could be bigger . even when not selecting the Text column.



        Logical reads on 10k records when Text is only 137 characters long (all records):




        Table 'Books2'. Scan count 1, logical reads 419




        Due to all this extra data being on the clustered index pages.



        Again, an index on UniqueToken (Without including the Text column) will resolve this issue.



        As pointed out by @David Browne - Microsoft, you could also store the data off row, as to not add this overhead on the Clustered index when not selecting this Text Column.




        Also, if you do want the text stored off-row, you can force that
        without using a separate table. Just set the 'large value types out of
        row' option with sp_tableoption.
        docs.microsoft.com/en-us/sql/relational-databases




        TL;DR



        Based on the query given, indexing UniqueToken without including TEXT should resolve your troubles.
        Additionally, I would use a temporary table or table type to do the filtering instead of the IN statement.



        EDIT:




        yes there is a nonclustered index on UniqueToken




        Your example query is not touching the Text column, and based on the query this should be a covering index.



        If we test this on the three tables we previously used (UniqueToken + Lob data, Solely UniqueToken, UniqueToken + 137 Char data in nvarchar(max) column)



        CREATE INDEX [IX_Books_UniqueToken] ON Books(UniqueToken);
        CREATE INDEX [IX_BookUniqueTokens_UniqueToken] ON BookUniqueTokens(UniqueToken);
        CREATE INDEX [IX_Books2_UniqueToken] ON Books2(UniqueToken);


        The reads remain the same for these three tables, because the nonclustered index is used.



        Table 'Books'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

        Table 'BookUniqueTokens'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

        Table 'Books2'. Scan count 8, logical reads 16, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


        Additional details



        by @David Browne - Microsoft




        Also, if you do want the text stored off-row, you can force that
        without using a separate table. Just set the 'large value types out of
        row' option with sp_tableoption.
        docs.microsoft.com/en-us/sql/relational-databases/




        Remember that you have to rebuild your indexes for this to take effect on already populated data.



        By @Erik Darling



        On



        • MAX Data Types Do WHAT?

        Filtering on Lob data sucks.



        • Memory Grants and Data Size

        Your memory grants might go through the roof when using bigger datatypes, impacting performance.






        share|improve this answer















        Examples



        Consider your query with 8 filter predicates in your IN clause on a dataset of 10K records.



        select UniqueToken
        from Books
        where UniqueToken in
        (
        'Unique token 1',
        'Unique token 2',
        'Unique token 3',
        'Unique token 4',
        'Unique token 5',
        'Unique token 6',
        'Unique token 9999',
        'Unique token 5000'
        -- 10,000 items here
        );


        A clustered index scan is used, there are no other indexes present on this test table



        enter image description here



        With a data size of 216 Bytes.



        You should also note how even with 8 records, the OR filters are stacking up.



        The reads that happened on this table:



        enter image description here



        Credits to statisticsparser.



        When you include the Text column in the select part of your query, the actual data size changes drastically:



        select UniqueToken,Text
        from Books
        where UniqueToken in
        (
        'Unique token 1',
        'Unique token 2',
        'Unique token 3',
        'Unique token 4',
        'Unique token 5',
        'Unique token 6',
        'Unique token 9999',
        'Unique token 5000'
        -- 10,000 items here
        );


        Again, the Clustered index Scan with a residual predicate:



        enter image description here



        But with a dataset of 32KB.



        As there are almost 1000 lob logical reads:



        enter image description here



        Now, when we create the two tables in question, and fill them up with the same 10k records



        Executing the same select without Text. Remember that we had 99 Logical reads when using the Books Table.



        select UniqueToken
        from BookUniqueTokens
        where UniqueToken in
        (
        'Unique token 1',
        'Unique token 2',
        'Unique token 3',
        'Unique token 4',
        'Unique token 5',
        'Unique token 6',
        'Unique token 9999',
        'Unique token 5000'
        -- 10,000 items here
        )


        The reads on BookUniqueTokens are lower, 67 instead of 99.



        enter image description here



        We can track that back to the pages in the original Books table and the pages in the new table without the Text.



        Original Books table:



        enter image description here



        New BookUniqueTokens table



        enter image description here



        So, all the pages + (2 overhead pages?) are read from the clustered index.



        Why is there a difference, and why is the difference not bigger? After all the datasize difference is huge (Lob data <> No Lob data)



        Books Data space



        enter image description here



        BooksWithText Data space



        enter image description here



        The reason for this is ROW_OVERFLOW_DATA.



        When data gets bigger than 8kb the data is stored as ROW_OVERFLOW_DATA on different pages.



        Ok, if lob data is stored on different pages, why are the page sizes of these two clustered indexes not the same?



        Due to the 24 byte pointer added to the Clustered index to track each of these pages.
        After all, sql server needs to know where it can find the lob data.



        Source




        To answer your questions




        He claimed that [Text] column affects any select query on the
        UniqueToken column.




        And




        Does the existence of Text column affect this query's performance? If
        so, how can we optimized it?




        If the data stored is actually Lob Data, and the Query provided in the answer is used



        It does bring some overhead due to the 24 byte pointers.



        Depending on the executions / min not being crazy high, I would say that this is negligible, even with 100K records.



        Remember that this overhead only happens if an index that includes Text is used, such as the clustered index.



        But, what if the clustered index scan is used, and the lob data does not exceed 8kb?



        If the data does not exceed 8kb, and you have no index on UniqueToken,the overhead could be bigger . even when not selecting the Text column.



        Logical reads on 10k records when Text is only 137 characters long (all records):




        Table 'Books2'. Scan count 1, logical reads 419




        Due to all this extra data being on the clustered index pages.



        Again, an index on UniqueToken (Without including the Text column) will resolve this issue.



        As pointed out by @David Browne - Microsoft, you could also store the data off row, as to not add this overhead on the Clustered index when not selecting this Text Column.




        Also, if you do want the text stored off-row, you can force that
        without using a separate table. Just set the 'large value types out of
        row' option with sp_tableoption.
        docs.microsoft.com/en-us/sql/relational-databases




        TL;DR



        Based on the query given, indexing UniqueToken without including TEXT should resolve your troubles.
        Additionally, I would use a temporary table or table type to do the filtering instead of the IN statement.



        EDIT:




        yes there is a nonclustered index on UniqueToken




        Your example query is not touching the Text column, and based on the query this should be a covering index.



        If we test this on the three tables we previously used (UniqueToken + Lob data, Solely UniqueToken, UniqueToken + 137 Char data in nvarchar(max) column)



        CREATE INDEX [IX_Books_UniqueToken] ON Books(UniqueToken);
        CREATE INDEX [IX_BookUniqueTokens_UniqueToken] ON BookUniqueTokens(UniqueToken);
        CREATE INDEX [IX_Books2_UniqueToken] ON Books2(UniqueToken);


        The reads remain the same for these three tables, because the nonclustered index is used.



        Table 'Books'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

        Table 'BookUniqueTokens'. Scan count 8, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

        Table 'Books2'. Scan count 8, logical reads 16, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


        Additional details



        by @David Browne - Microsoft




        Also, if you do want the text stored off-row, you can force that
        without using a separate table. Just set the 'large value types out of
        row' option with sp_tableoption.
        docs.microsoft.com/en-us/sql/relational-databases/




        Remember that you have to rebuild your indexes for this to take effect on already populated data.



        By @Erik Darling



        On



        • MAX Data Types Do WHAT?

        Filtering on Lob data sucks.



        • Memory Grants and Data Size

        Your memory grants might go through the roof when using bigger datatypes, impacting performance.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited yesterday

























        answered yesterday









        Randi VertongenRandi Vertongen

        3,926824




        3,926824



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232941%2fdo-varcharmax-nvarcharmax-and-varbinarymax-columns-affect-select-queries%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            getting Checkpoint VPN SSL Network Extender working in the command lineHow to connect to CheckPoint VPN on Ubuntu 18.04LTS?Will the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayVPN SSL Network Extender in FirefoxLinux Checkpoint SNX tool configuration issuesCheck Point - Connect under Linux - snx + OTPSNX VPN Ububuntu 18.XXUsing Checkpoint VPN SSL Network Extender CLI with certificateVPN with network manager (nm-applet) is not workingWill the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayImport VPN config files to NetworkManager from command lineTrouble connecting to VPN using network-manager, while command line worksStart a VPN connection with PPTP protocol on command linestarting a docker service daemon breaks the vpn networkCan't connect to vpn with Network-managerVPN SSL Network Extender in FirefoxUsing Checkpoint VPN SSL Network Extender CLI with certificate

            Cannot Extend partition with GParted The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsCan't increase partition size with GParted?GParted doesn't recognize the unallocated space after my current partitionWhat is the best way to add unallocated space located before to Ubuntu 12.04 partition with GParted live?I can't figure out how to extend my Arch home partition into free spaceGparted Linux Mint 18.1 issueTrying to extend but swap partition is showing as Unknown in Gparted, shows proper from fdiskRearrange partitions in gparted to extend a partitionUnable to extend partition even though unallocated space is next to it using GPartedAllocate free space to root partitiongparted: how to merge unallocated space with a partition

            Marilyn Monroe Ny fiainany manokana | Jereo koa | Meny fitetezanafanitarana azy.