Replacing trigger-based validation with foreign keys

Several techniques are often deployed in well-designed databases to minimise the amount of invalid data. Common ones include:

  • Check Constraints– checks that can be made on a single row in a single table, for example:
    • StartDate <= EndDate
    • Height > 0
    • Age < 199
  • Unique Indexes– to make sure we don’t have duplicate entries in a table, for example:
    • National Insurance Number
    • Car Registration Number
  • Triggers – a last resort for cases where “it’s complicated” or we have to check data from more than one table.

Foreign keys are also often used to check data from one table against that in another to ensure its referential integrity. For example:

  • If a Product has a Category ID of 27, then there is a Category with an ID of 27.
  • If an Employee relates to a Department, the Department exists.

If we want to make additional checks, for example that the Category is of a type appropriate for the Product, or that the Department hasn’t been soft-deleted, these generally have had to be implemented via code in triggers. (One alternative, of implementing cross-table validation using check constraints, can have serious performance implications.)

Check constraints, unique indexes and foreign keys are good both from a performance viewpoint and also because they provide essentially code-free validation. Doing validation in triggers is far from ideal for several, well-documented reasons (such as performance and the need to replicate the same validation in more than one trigger).

So wouldn’t it be good if we could somehow replace trigger-based validation code with foreign keys, which are, after all, all about checking the data in two tables? The following two examples outline an approach in which the use of foreign keys is extended beyond the normal “does that ID exist?” check.

Example 1

Table InvoiceItem has two columns, ExpenditureExpenseCodeID and IncomeExpenseCodeID, both of which reference the Primary Key ExpenseCodeID column of the ExpenseCode table:

Entitly relationship diagram showing the InvoiceItem and ExpenseCode tables

However, each ExpenseCode can be marked as being suitable for income only, expenditure only or both via a nullable IsIncome column (where a value of Null means both).

One option would be to put each type in to a separate table (e.g. ExpenseCodeIncome, ExpenseCodeExpenditure or ExpenseCodeBoth) but that would be a bit unwieldy (not only because ‘both’-type rows would have to be stored in all three tables).

Instead, validation code is typically added to InvoiceItem’s trigger to ensure that only appropriate ExpenseCodes are chosen for IncomeExpenseCodeID and ExpenditureExpenseCodeID. (The same validation has to be carried out in ExpenseCode’s trigger in case an attempt is made to change its IsIncome column.)

In the new approach, we have two additional columns in the ExpenseCode table, computed as follows:

ExpenseCodeIDForIncome =
	Case When IsNull (IsIncome, 1) = 1 Then ExpenseCodeID
	Else -ExpenseCodeID End
ExpenseCodeIDForExpenditure =
	Case When IsNull (IsIncome, 0) = 0 Then ExpenseCodeID
	Else -ExpenseCodeID End

So we might have:

ExpenseCodeID Description IsIncome ExpenseCodeID ForIncome ExpenseCodeID ForExpenditure
1 I-3875 1 1 -1
2 E-9872 0 -2 2
3 4431 Null 3 3

We could then change the foreign keys from the InvoiceItem table; instead of referencing ExpenseCodeID, the one from IncomeExpenseCodeID would reference ExpenseCodeIDForIncome and the one from ExpenditureExpenseCodeID would reference ExpenseCodeIDForExpenditure. In both cases, if the record referenced in ExpenseCode was not appropriate, the calculated column would have a negative value and an error would result.

Example 2

This one is slightly more complicated. We have two tables, Product and Category. Each Product has a Category.

Entity relationship diagram showing the Product and Category tables

Both can be soft-deleted but we do not allow Products to be associated with a deleted Category unless the Product is also deleted. Normally, this requirement would be implemented by code in triggers, which would have to be applied to both tables. However, consider a computed column on each table:

  • In Category: CategoryIDForValidation = Case When Deleted = 0 Then CategoryID Else –CategoryID End
  • In Product: CategoryIDForValidation = Case When Deleted = 0 Then CategoryID End

Then, we can set up a foreign key between these two columns which will only be violated if the record in Category is soft-deleted while the record in Product isn’t. Here’s a table listing the possible combinations:

CategoryID Deleted CategoryID ForValidation ProductID CategoryID Deleted CategoryID ForValidation Allowed?
10 No 10 20 10 No 10 Yes
11 No 11 21 11 Yes NULL Yes
12 Yes -12 22 12 No 12 No
13 Yes -13 23 13 Yes NULL Yes

The foreign key won’t attempt to link any records where CategoryIDForValidation is NULL in Product and will succeed where both records are undeleted. However, if the Category has been soft-deleted by the Product hasn’t (third row), an error will be thrown.

Summary

This technique provides a way of replacing trigger-based validation with foreign key validation in some, but not all, situations and has the following benefits:

  1. Improved performance
  2. Elimination of duplicated code

Debugging inside an SQL Trigger

Debugging inside an SQL Server Trigger can be incredibly difficult, since the data exists only in the scope of that trigger. However, there are a couple of techniques that can help.

Raise an error

You can output debug data from the Inserted and Deleted tables by converting it to a string via XML as follows:

Declare @ErrorInfo varchar(8000)

Set @ErrorInfo = Char(13) + 'Inserted: ' + Char(13) + IsNull ((Select * From Inserted FOR XML PATH('')), '')
Set @ErrorInfo = @ErrorInfo + Char(13) + 'Deleted: ' + Char(13) + IsNull ((Select * From Deleted FOR XML PATH('')), '')
Set @ErrorInfo = Replace (@ErrorInfo, '><', '>' + Char(13) + '<')

RaisError ('Error details: %s.', 15, 1, @ErrorInfo)

Whilst this technique is simple, it’s not perfect, and may be subject to string truncation if the message gets too long.

Debug using Visual Studio

You may know that you can debug stored procedured using Visual Studio but this will also work for triggers. In SQL Server 2005 you will need to set up a server connection via Visual Studio, but in SQL Server 2008 and above you can launch into debugging directly from SQL Server Management Studio. Either way, you’ll then be able to step through your code as usual and see the values of all your variables, etc. as you go.

You can take a look at an example of this on MSDN: Walkthrough: Debugging a Transact-SQL Trigger.

Rank your SQL SELECT results

The third of a series of posts introducing handy functions built into SQL Server (2005 and above) to help you include interesting metadata about your SELECT statement’s results.

  1. Include row numbers in your SQL SELECT results
  2. Partition your SQL SELECT results into groups
  3. Rank your SQL SELECT results

Rank() and Dense_Rank()

If you’d like to rank your SQL SELECT result set by one or more factors then you can use Rank() or Dense_Rank(), which are best demonstrated with an example:

Select SortName,
       TestScore,
       Row_Number() Over (Order By TestScore Desc) As Row_Number,
       Rank() Over (Order By TestScore Desc) As Rank,
       Dense_Rank() Over (Order By TestScore Desc) As Dense_Rank
From   T_Tests
Where  Surname = 'Lesseps'
Order By TestScore Desc

Which gives us:

SortName            TestScore  Row_Number   Rank    Dense_Rank
------------------  ---------  ----------   ------  ----------
Lesseps, Abby       100        1            1       1
Lesseps, Aaliyah    98         2            2       2
Lesseps, Abigail    98         3            2       2
Lesseps, Aaliyah    95         4            4       3
Lesseps, Abby       90         5            5       4
Lesseps, Aaralyn    89         6            6       5
Lesseps, Abrianna   88         7            7       6
Lesseps, Addison    88         8            7       6
Lesseps, Aaron      87         9            9       7
Lesseps, Addison    87         10           9       7
Lesseps, Aaliyah    85         11           11      8
Lesseps, Abby       85         12           11      8
Lesseps, Abby       84         13           13      9
Lesseps, Abigail    83         14           14      10
Lesseps, Aaliyah    80         15           15      11
Lesseps, Addison    76         16           16      12
Lesseps, Adam       75         17           17      13
Lesseps, Aaralyn    73         18           18      14
Lesseps, Abrianna   70         19           19      15
Lesseps, Aaliyah    68         20           20      16
Lesseps, Addison    68         21           20      16

As you can see, Rank gives us a straight ranking and where two results are tied, they receive the same rank and then the next result picks up the next appropriate rank. In our example results 2 and 3 have the same test score and so both are given the rank of 2; result 4 then picks up from there with a rank of 4. So using Rank will not guarantee consecutive ranking numbers.

Dense_Rank is slightly different, it does guarantee consecutive ranking numbers, so in this case results 2 and 3 have a rank of 2, and then result 4 has a rank of 3.

Ranking within partitions

We looked at how to partition your result set back in the second article in this series – you can also perform rankings within partitions. Take this example:

Select SortName,
       TestScore
       Rank() Over (Partition By Forename Order By TestScore Desc) As Rank
From   T_Tests
Where  Surname = 'Lesseps'
Order By SortName Asc, TestScore Desc
SortName            TestScore  Rank
------------------  ---------  ----
Lesseps, Aaliyah    98         1
Lesseps, Aaliyah    95         2
Lesseps, Aaliyah    85         3
Lesseps, Aaliyah    80         4
Lesseps, Aaliyah    68         5
Lesseps, Aaralyn    89         1
Lesseps, Aaralyn    73         2
Lesseps, Brian      100        1
Lesseps, Brian      93         2
Lesseps, Brian      93         2
Lesseps, Brian      85         4

So here each person’s score is ranked partitioned by their name, so the ranking begins again for each person.

Partition your SQL SELECT results into groups

The second of a series of posts introducing handy functions built into SQL Server (2005 and above) to help you include interesting metadata about your SELECT statement’s results.

  1. Include row numbers in your SQL SELECT results
  2. Partition your SQL SELECT results into groups

Partition By

If you’d like to partition your SELECT statement’s result set into groups, which can be treated separately for things like row-counts or aggregate functions, then the Partition By syntax is perfect. Let’s look at an example, say you run the following query:

Select SortName,
       TestScore,
       Row_Number() Over (Order By Forename) As Row_Number
From   T_Tests
Where  Surname = 'Lesseps'

Which uses the Row_Number() function we looked at in the previous article. This gives us the following result set:

SortName             TestScore Row_Number
-----------------    --------- ----------
Lesseps, Aaliyah     98        1
Lesseps, Aaliyah     95        2
Lesseps, Aaliyah     85        3
Lesseps, Aaliyah     80        4
Lesseps, Aaliyah     68        5
Lesseps, Aaralyn     73        6
Lesseps, Aaralyn     89        7
Lesseps, Aaron       87        8
Lesseps, Abby        100       9
Lesseps, Abby        85        10
Lesseps, Abby        90        11
Lesseps, Abby        84        12
Lesseps, Abigail     98        13
Lesseps, Abigail     83        14
Lesseps, Abrianna    70        15
Lesseps, Abrianna    88        16
Lesseps, Adam        75        17
Lesseps, Addison     88        18
Lesseps, Addison     87        19
Lesseps, Addison     68        20
Lesseps, Addison     76        21

We can simply drop in the Partition By syntax in combination with our Row_Number function to give us, for example, row numbers grouped by Forename:

Select SortName,
       TestScore,
       Row_Number() Over (Order By Forename) As Row_Number,
       Row_Number() Over (Partition By Forename Order By Forename) As P_Row_Number
From   T_Tests
Where  Surname = 'Lesseps'
SortName             TestScore Row_Number   P_Row_Number
------------------   --------- -----------  ------------
Lesseps, Aaliyah     82        1            1
Lesseps, Aaliyah     62        2            2
Lesseps, Aaliyah     77        3            3
Lesseps, Aaliyah     76        4            4
Lesseps, Aaliyah     84        5            5
Lesseps, Aaralyn     61        6            1
Lesseps, Aaralyn     71        7            2
Lesseps, Aaron       76        8            1
Lesseps, Abby        64        9            1
Lesseps, Abby        76        10           2
Lesseps, Abby        78        11           3
Lesseps, Abby        64        12           4
Lesseps, Abigail     92        13           1
Lesseps, Abigail     75        14           2
Lesseps, Abrianna    74        15           1
Lesseps, Abrianna    65        16           2
Lesseps, Adam        65        17           1
Lesseps, Addison     94        18           1
Lesseps, Addison     97        19           2
Lesseps, Addison     64        20           3
Lesseps, Addison     72        21           4

Partitioning with aggregate functions

A great time-saving use of the Partition By syntax is to use it with aggregate functions. Extending our example above, we could do the following:

Select SortName,
       TestScore,
       Row_Number() Over (Order By Forename) As Row_Number,
       Row_Number() Over (Partition By Forename Order By Forename) As P_Row_Number,
       Count(TestScore) Over (Partition By Forename Order By Forename) As Count,
       Avg(TestScore) Over (Partition By Forename Order By Forename) As Avg,
       Min(TestScore) Over (Partition By Forename Order By Forename) As Min,
       Max(TestScore) Over (Partition By Forename Order By Forename) As Max
From   T_Tests
Where  Surname = 'Lesseps'

Which would give us:

SortName             TestScore Row_Number   P_Row_Number   Count  Avg    Min   Max
------------------   --------- ----------   ------------   -----  -----  ----  ---
Lesseps, Aaliyah     67        1            1              5      79.20  65    97
Lesseps, Aaliyah     90        2            2              5      79.20  65    97
Lesseps, Aaliyah     97        3            3              5      79.20  65    97
Lesseps, Aaliyah     65        4            4              5      79.20  65    97
Lesseps, Aaliyah     77        5            5              5      79.20  65    97
Lesseps, Aaralyn     76        6            1              2      87.50  76    99
Lesseps, Aaralyn     99        7            2              2      87.50  76    99
Lesseps, Aaron       98        8            1              1      98.00  98    98
Lesseps, Abby        79        9            1              4      83.50  71    93
Lesseps, Abby        91        10           2              4      83.50  71    93
Lesseps, Abby        71        11           3              4      83.50  71    93
Lesseps, Abby        93        12           4              4      83.50  71    93
Lesseps, Abigail     66        13           1              2      72.00  66    78
Lesseps, Abigail     78        14           2              2      72.00  66    78
Lesseps, Abrianna    81        15           1              2      80.00  79    81
Lesseps, Abrianna    79        16           2              2      80.00  79    81
Lesseps, Adam        91        17           1              1      91.00  91    91
Lesseps, Addison     91        18           1              4      76.25  60    91
Lesseps, Addison     60        19           2              4      76.25  60    91
Lesseps, Addison     91        20           3              4      76.25  60    91
Lesseps, Addison     63        21           4              4      76.25  60    91

Much easier than writing sub-queries or incorporating awkward group-by clauses to achieve the same thing.

Include row numbers in your SQL SELECT results

The first of a series of posts introducing handy functions built into SQL Server (2005 and above) to help you include interesting metadata about your SELECT statement’s results.

Row_Number()

If you’d like to include the row number of the items as they appear in your SELECT result set then the Row_Number function does exactly that. For example:

Select      SortName,
            Row_Number() Over (Order By Forename) As Row
From        T_Person
Where       Surname = 'Lawson'

Would result in:

SortName                  Row
------------------------- ------
Lawson, Aaliyah          1
Lawson, Aaron            2
Lawson, Adam             3
Lawson, Adrian           4
Lawson, Alexa            5
Lawson, Alexander        6
Lawson, Ashley           7
Lawson, Braden           8

So regardless of a record’s position in the source table, the Row_Number is its position in the result set, after all ordering and grouping has been applied.

We can go further than this and use the row number to solve a couple of other problems:

Returning a subset of rows

If you need to select rows 10-20 (perhaps for pagination functionality) then you could do that using the row_number:

Select      SortName,
            Row_Number() Over (Order By Forename) As Row
From        T_Person
Where       Row Between 10 And 20

Returning the Nth row

If you wanted to return say the 5th row from our result set, you can do that using row_number too:

Select      SortName,
            Row_Number() Over (Order By Forename) As Row
From        T_Person
Where       Row = 5

Foreign keys and deletion performance

Relational databases tend to be furnished with many foreign key constraints. These prevent some types of invalid data from being added to tables – you can think of them as an ASSERT if you’re more familiar with C# and the like. Normally, the performance impact is minimal, based on the following arrangement:

  1. The column being referenced by the foreign key will almost always be the primary key of its table.
  2. As such it will be indexed so that, when adding new records to the referencing table, it will be possible to check that the new values being added are valid relatively quickly, by looking for them in the referenced table via the index.

However, there is a potential performance pitfall when deleting from tables. We came across a situation where a very innocuous-looking piece of code was deleting a relatively small number of rows from a table (which contained over a million rows). This was taking hours to complete. The cause was a foreign key from one column in the table to the ID column (the primary key) of the same table.

When adding rows, this foreign key wouldn’t be a problem because the check would be made on values in the ID column, which is indexed. However, when deleting, the check would be made against values in the referencing column, which didn’t have an index. For example, if a row with an ID of 10 was being deleted, a check would have to be made that there were no other rows which had a value of 10 for referencing column.

Adding an index to the referencing column enabled the deletions to complete in seconds rather than hours.

Display SQL messages instantly, mid-execution

Sometimes when you are running a long running script or stored procedure it’s nice to know whereabouts it has got to, often we put in print statements for this purpose. The problem is that half the time you won’t see the printed statement until SQL Server decides it can’t buffer anymore and shoves it back to the client (e.g. Management Studio) which may not be until it has finished in some cases.

Instead of using print in these situations the following syntax can be used:

raiserror ('This is my message, show it now!', 0, 1) with nowait

This will have the same effect as if you had said:

print 'This is my message, show it now!'

Except it will send it to the client immediately.

Also note that sys.dm_exec_requests may also be used to find out exactly what statement is being executed at any particular instant and from within which stored procedure, UDF, etc.

Row size limits in SQL Server 2005

In SQL Server 2005 there is no longer a 8060 byte row limit. However pages within the database are still 8060 bytes so if a row goes over this limit the data goes into an overflow page. This has a performance impact when querying against and updating rows that use the overflow area since it does certain bits of synchronously as opposed to asynchronously.

The main issue is that you get no warning when the table is created that your table definition goes over the limit like you did on SQL 2000. When you restore a SQL 2000 database or set compatibility mode to SQL 2000 on a database then it still works in this new way so old SQL 2000 suddenly get this new feature

Whilst this might have the advantage that your insert statements will work the downside is because there is no warning when your table schema reaches 8060 you have to take extra steps if you want to ensure you stay within the 8060 byte limit.

See some examples of projects that OCC have worked on using MySQL and SQL Server.

Select variable number of rows with SQL

In SQL Server 2005, you can now use a variable when getting the first n number of rows in a SELECT query:

DECLARE @NumberOfRowsIWant INT
 SET @NumberOfRowsIWant = 23

SELECT TOP (@NumberOfRowsIWant) *
 FROM T_Client

The OUTPUT clause added in SQL 2005 gives you access to the inserted and deleted tables, normally only useable in a trigger, in your stored procedures (SP). Combine the clause with an UPDATE or a DELETE and you can, for example, select fields from the rows inserted into a temporary table for later use. A slight drawback is that triggers on the tables inserted / updated / deleted could cause values in these tables in your SP not to be available.

See some examples of projects that OCC have worked on using MySQL and SQL Server.

Tally Tables in SQL

A tally table is a table with a set of numbers in, such as all of the integers from 1 to 1,000,000  in order. It may also known as a number table and could contain every day between two dates, or every Monday between two dates, etc.

Tally tables can be used for a large and varied number of ways, to make code simpler and/or faster.

Converting a CSV string into a dataset

A common use of a tally table is in converting a CSV string into a dataset. The obvious way would be to start at the beginning of the string, go forward one character at a time and output a row whenever a comma is found. This is all very well, but not quite set-based (which is what we should be trying to do in SQL). However, you can do this in a tally table like this:

DECLARE @CSV VARCHAR (8000)
 SET @CSV = 'Testing,testing,one,two,three,,also works with empty ones'
 SET @CSV = ',' + @CSV + ','

SELECT SUBSTRING (@CSV, N + 1, CHARINDEX (',', @CSV, N + 1) - N - 1)
 FROM TSys_Tally_N
 WHERE (N < LEN (@CSV))
 AND (SUBSTRING (@CSV, N, 1) = ',')

This is using (SUBSTRING (@CSV, N, 1) = ',') to find every row in the tally table which corresponds to a comma in the string. For each of these rows, it then uses the value of N to extract the appropriate section from the string and return it to the output dataset.