Row size limits in SQL Server 2005

In SQL Server 2005 there is no longer a 8060 byte row limit. However pages within the database are still 8060 bytes so if a row goes over this limit the data goes into an overflow page. This has a performance impact when querying against and updating rows that use the overflow area since it does certain bits of synchronously as opposed to asynchronously.

The main issue is that you get no warning when the table is created that your table definition goes over the limit like you did on SQL 2000. When you restore a SQL 2000 database or set compatibility mode to SQL 2000 on a database then it still works in this new way so old SQL 2000 suddenly get this new feature

Whilst this might have the advantage that your insert statements will work the downside is because there is no warning when your table schema reaches 8060 you have to take extra steps if you want to ensure you stay within the 8060 byte limit.

See some examples of projects that OCC have worked on using MySQL and SQL Server.

Select variable number of rows with SQL

In SQL Server 2005, you can now use a variable when getting the first n number of rows in a SELECT query:

DECLARE @NumberOfRowsIWant INT
 SET @NumberOfRowsIWant = 23

SELECT TOP (@NumberOfRowsIWant) *
 FROM T_Client

The OUTPUT clause added in SQL 2005 gives you access to the inserted and deleted tables, normally only useable in a trigger, in your stored procedures (SP). Combine the clause with an UPDATE or a DELETE and you can, for example, select fields from the rows inserted into a temporary table for later use. A slight drawback is that triggers on the tables inserted / updated / deleted could cause values in these tables in your SP not to be available.

See some examples of projects that OCC have worked on using MySQL and SQL Server.

Tally Tables in SQL

A tally table is a table with a set of numbers in, such as all of the integers from 1 to 1,000,000  in order. It may also known as a number table and could contain every day between two dates, or every Monday between two dates, etc.

Tally tables can be used for a large and varied number of ways, to make code simpler and/or faster.

Converting a CSV string into a dataset

A common use of a tally table is in converting a CSV string into a dataset. The obvious way would be to start at the beginning of the string, go forward one character at a time and output a row whenever a comma is found. This is all very well, but not quite set-based (which is what we should be trying to do in SQL). However, you can do this in a tally table like this:

DECLARE @CSV VARCHAR (8000)
 SET @CSV = 'Testing,testing,one,two,three,,also works with empty ones'
 SET @CSV = ',' + @CSV + ','

SELECT SUBSTRING (@CSV, N + 1, CHARINDEX (',', @CSV, N + 1) - N - 1)
 FROM TSys_Tally_N
 WHERE (N < LEN (@CSV))
 AND (SUBSTRING (@CSV, N, 1) = ',')

This is using (SUBSTRING (@CSV, N, 1) = ',') to find every row in the tally table which corresponds to a comma in the string. For each of these rows, it then uses the value of N to extract the appropriate section from the string and return it to the output dataset.

Outsmarting SQL Joins

If you work with SQL you will know about the various types of join and what they do:

  • [Inner] Join
  • Left [Outer] Join
  • Right [Outer] Join
  • Full [Outer] Join
  • Cross Join

(Where the Inner and Outer keywords are optional).

But how does SQL actually do the joining? Usually, it picks the most appropriate method (ie the fastest) and everything’s fine. Occasionally, however, it drops the performance ball and everything grinds to a halt. In these situations, knowing what’s going on behind the scenes can help to overrule SQL and get a better performance.

Here’s how SQL does joins:

Nested Loop
Good for joining from a small table to a large indexed table. Each row in the small table is taken in turn and matches found in the large indexed table use an index lookup.

Merge
Good for joining between two large tables. These are both sorted on the columns being used for the join. The first row in one table is then compared with the first row in the second. Depending on which has the bigger value for the join column, the next row is then read from one or other of the tables. This process, which is, essentially, stepping down through the tables one row at a time, matching on the join column, is repeated until the end of one is reached.

Hash
Used when all else fails. Neither table needs to be indexed nor sorted. Each row in the smaller table is hashed and put into a hash bucket. Each row in the larger table is then hashed and checked against the corresponding hash bucket for a match.

Sometimes, we know that one particular type of join will be better when SQL decides to pick another. SQL can be forced to use one option quite simply with the syntax: Left Merge Join, etc. This can result in significant performance improvements so give it a try!