Friday, September 24, 2010

SQL Server Programming Guidelines

SQL Server Programming Guidelines

1.0 Introduction
This section provides guidelines and best practices for SQL Server programming. Guidelines and best practices should be followed as a general rule, but it is understood that exception situations may exist. Developers must be prepared to provide a justification for any exceptions.
2.0 Code Readability and Format
• Write comments in your stored procedures, triggers and SQL batches generously, whenever something is not very obvious. This helps other programmers understand your code. Don't worry about the length of the comments, as it won't impact the performance, unlike interpreted languages (e.g., ASP 2.0).
• Always use case consistently in your code. On a case insensitive server, your code might work fine, but it will fail on a case sensitive SQL Server if the code is not consistent in case. For example, if you create a table in SQL Server or a database that has a case-sensitive or binary sort order, all references to the
table must use the same case that was specified in the CREATE TABLE statement. If you name the table "MyTable" in the CREATE TABLE statement and use "mytable" in the SELECT statement, you get an "object not found" error.
• Do not use column numbers in the ORDER BY clause. In the following examples, note that the second query is more readable than the first. Example 1:
SELECT OrderID, OrderDate FROM Orders ORDER BY 2
Example 2:
SELECT OrderID, OrderDate FROM Orders ORDER BY OrderDate
• Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI joins, the WHERE clause is used only for filtering data. With older style joins, the WHERE clause handles both the join condition and filtering data. The first of the following two examples shows the old style join syntax, while the second one shows the new ANSI join syntax. Example 1:
SELECT a.au_id, t.title FROM titles t, authors a, titleauthor ta WHERE a.au_id = ta.au_id AND ta.title_id = t.title_id AND t.title LIKE '%Computer%'
Example 2:
SELECT a.au_id, t.title FROM authors a INNER JOIN titleauthor ta ON a.au_id = ta.au_id INNER JOIN titles t ON ta.title_id = t.title_id WHERE t.title LIKE '%Computer%'
• To make SQL statements more readable, start each clause on a new line and indent when needed. E.g.:
SELECT title_id, title FROM titles WHERE title LIKE '%Computer%' AND title LIKE '%cook%'
• As is true with any other programming language, do not use GOTO, or use it sparingly. Excessive usage of GOTO can lead to hard-to-read-and-understand code.
3.0 Datatypes
• Use User Defined Datatypes if a particular column repeats in multiple tables so that the datatype of that column is consistent across all your tables.
• Use the CHAR data type for a column only when the column is non-nullable. If a CHAR column is nullable, it is treated as a fixed length column in SQL Server 7.0+. So, a CHAR(100), when NULL, will eat up 100 bytes, resulting in space wastage. Use VARCHAR(100) in this situation. Of course, variable length columns do have a very little processing overhead over fixed length columns. Carefully choose between CHAR and VARCHAR depending upon the length of the data you are going to store.
• Use Unicode datatypes, like NCHAR, NVARCHAR, or NTEXT, if your database is going to store not just plain English characters but a variety of characters used all over the world. Use these datatypes only when they are absolutely needed as they use twice as much space as non-Unicode datatypes.
• Try not to use TEXT or NTEXT datatypes for storing large blocks of textual data. The TEXT datatype has some inherent problems associated with it. For example, you cannot directly write or update text data using the INSERT or UPDATE statements. Instead, you have to use special statements like READTEXT, WRITETEXT and UPDATETEXT. There are also a lot of bugs associated with replicating tables containing text columns. So, if you don't have to store more than 8KB of text, use CHAR(8000) or VARCHAR(8000) datatypes instead.
4.0 Stored Procedures
• Always add an @Debug parameter to your stored procedures. This can be a BIT data type. When a '1' is passed for this parameter, print all the intermediate results, variable contents using SELECT or PRINT statements. When '0' is passed do not print anything. This helps in quickly debugging stored procedures as you don't have to add and remove these PRINT/SELECT statements before and after troubleshooting problems.
• Do not call functions repeatedly within your stored procedures, triggers, functions and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed. Instead, call the LEN function once, and store the result in a variable for later use.
• Make sure your stored procedures always return a value indicating their status. Standardize on the return values of stored procedures for success and failures. The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters.
• If your stored procedure always returns a single row resultset, consider returning the resultset using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than resultsets returned by SELECT statements.
• Do not prefix your stored procedure names with 'sp_'. The prefix 'sp_' is reserved for system stored procedure that ship with SQL Server. Whenever SQL Server encounters a procedure name starting with 'sp_', it first tries to locate the procedure in the master database, then it looks for any qualifiers (database, owner) provided, then it tries dbo as the owner. You can save time in locating the stored procedure by avoiding the 'sp_' prefix.
• Do not let your front-end applications query/manipulate the data directly using SELECT or INSERT/UPDATE/DELETE statements. Instead, create stored procedures and let your applications access these stored procedures. This keeps the data access clean and consistent across all the modules of your application, while at the same time centralizing the business logic within the database.
5.0 Performance Considerations
• While designing your database, keep performance in mind. You can't really tune performance later when your database is in production as it involves rebuilding tables and indexes, re-writing queries, etc. Use the graphical execution plan in Query Analyzer or SHOWPLAN_TEXT or SHOWPLAN_ALL commands to analyze your queries. Make sure your queries do an "Index seek" instead of an "Index scan" or a "Table scan." A table scan or an index scan should be avoided where possible. Choose the right indexes on the right columns.
• Initially, your data should be normalized at least to the third normal form. If you then need to denormalize some of the data to improve performance, you may do so. There should be a documented rationale for all denormalization activities.
• Do not use 'SELECT *' in your queries. Always write the required column names after the SELECT statement, as in the following example:
SELECT CustomerID, CustomerFirstName, City
This technique results in reduced disk I/O and better performance.
• Avoid the creation of temporary tables while processing data as much as possible, as creating a temporary table means more disk I/O. Consider using advanced SQL, views, SQL Server 2000 table variable, or derived tables instead of temporary tables.
• Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword as that results in a full table scan, which defeats the purpose of an index. The first example below results in an index scan, while the second example results in an index seek.
Example 1:
SELECT LocationID FROM Locations WHERE Specialties LIKE '%pples'
Example 2:
SELECT LocationID FROM Locations WHERE Specialties LIKE 'A%s'

The use of functions in SELECT statements will not take advantage of indexing.
• Also avoid searching using not equals operators (<> and NOT) as they result in table and index scans.
• Use derived tables wherever possible, as they perform better. Consider the following query to find the second highest salary from the Employees table:

SELECT MIN(Salary) FROM Employees WHERE EmpID IN (SELECT TOP 2 EmpID FROM Employees ORDER BY Salary Desc)

The same query can be re-written using a derived table, as shown below, and it performs twice as fast as the above query:

SELECT MIN(Salary) FROM (SELECT TOP 2 Salary FROM Employees ORDER BY Salary Desc) AS A

This is just an example and your results might differ in different scenarios depending on the database design, indexes, volume of data, etc. So, test all the possible ways a query could be written and go with the most efficient one.
• Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments, as this suppresses messages like '(1 row(s) affected)' after executing INSERT, UPDATE, DELETE and SELECT statements. This improves the performance of stored procedures by reducing network traffic.

Perform all your referential integrity checks and data validations using constraints (foreign key and check constraints) instead of triggers, as they are faster. Use triggers only for auditing, custom tasks and validations that cannot be performed using constraints. Constraints save you time as well, as you don't have to write code for these validations, allowing the RDBMS to do all the work for you.
6.0 Miscellaneous Topics
• Try to avoid server side cursors as much as possible. Always stick to a "set-based approach" instead of a "procedural approach" for accessing and manipulating data. Cursors can often be avoided by using SELECT statements instead. If a cursor is unavoidable, use a WHILE loop instead. A WHILE loop is always faster than a cursor. For a WHILE loop to replace a cursor you need a column (primary key or unique key) to identify each row uniquely. Every table must have a primary or unique key in any case.
• Views are generally used to show specific data to specific users based on their interest. Views are also used to restrict access to the base tables by granting permission only on views. Yet another significant use of views is that they simplify your queries. Incorporate your frequently required, complicated joins and calculations into a view so that you don't have to repeat those joins/calculations in all your queries. Instead, just select from the view.
• If you have a choice, do not store binary or image files (Binary Large Objects or BLOBs) inside the database. Instead, store the path to the binary or image file in the database and use that as a pointer to the actual binary file stored elsewhere on a server. Retrieving and manipulating these large binary files is better performed outside the database. Keep in mind that a database is not meant for storing files.
• Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than static SQL, as SQL Server must generate an execution plan every time at runtime. IF and CASE statements come in handy to avoid dynamic SQL. Another major disadvantage of using dynamic SQL is that it requires users to have direct access permissions on all accessed objects, like tables and views. Generally, users are given access to the stored procedures which reference the tables, but not directly on the tables. In this case, dynamic SQL will not work.
• Consider the following drawbacks before using the IDENTITY property for generating primary keys. IDENTITY is very much SQL Server specific, and you will have problems porting your database application to some other RDBMS. IDENTITY columns have other inherent problems. For example, IDENTITY columns can run out of numbers at some point, depending on the data type selected; numbers can't be reused automatically, after deleting rows; and problems may arise if you are using replication. So, come up with an algorithm to generate a primary key in the front-end or from within the inserting stored procedure. There still could be issues with generating your own primary keys too, like concurrency while generating the
key, or running out of values. So, consider both options and choose the one that is most appropriate for your circumstances.
• Minimize the use of NULLs, as they often confuse the front-end applications, unless the applications are coded intelligently to eliminate NULLs or convert the NULLs into some other form. Any expression that deals with NULL results in a NULL output. ISNULL and COALESCE functions are helpful in dealing with NULL values.
• Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding or dropping a column).
• Always access tables in the same order in all your stored procedures and triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are:
• Keep your transactions as short as possible.
• Touch the least amount of data possible during a transaction.
• Do not wait for user input in the middle of a transaction.
• Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed.
• Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction in case the previous transaction fails with error 1205.
• In your applications, process all the results returned by SQL Server immediately so that the locks on the processed rows are released, hence no blocking.
• Offload tasks, like string manipulations, concatenations, row numbering, case conversions, type conversions etc., to the front-end applications if these operations are going to consume more CPU cycles on the database server. Also try to do basic validations in the front-end itself during data entry. This saves unnecessary network roundtrips.
• Always check the global variable @@ERROR immediately after executing a data manipulation statement (like INSERT/UPDATE/DELETE), so that you can rollback the transaction in case of an error (@@ERROR will be greater than 0 in case of an error). This is important because, by default, SQL Server will not rollback all the previous changes within a transaction if a particular statement fails. This behavior can be changed by executing SET XACT_ABORT ON. The @@ROWCOUNT variable also plays an important role in determining how many rows were affected by a previous data manipulation (also, retrieval) statement, and based on that you could choose to commit or rollback a particular transaction.
• Always store 4 digit years instead of 2 digit years in dates (especially when using CHAR or INT datatype columns) to avoid any confusion and problems. This is not a problem with DATETIME columns, as the century is stored even if you specify a 2 digit year. But it's always a good practice to specify 4 digit years even with DATETIME datatype columns
  • Do not forget to enforce unique constraints on your alternate keys.

0 comments:

Post a Comment