| Craig S. Mullins
Getting Integrity in SYBASE SQL Server 10|
by Craig S. Mullins
User-defined integrity is a component of the relational model that has been neglected for too long. Fortunately, several DBMS vendors have begun to implement more robust user-defined integrity into their products. Foremost among these vendors is Sybase, Inc. (Emeryville, CA) with their SQL Server 10 product offering.
Defining User-Defined Integrity
Most data professionals are aware that the relational model provides basic integrity features to support both referential integrity and entity integrity. However, the concept of user-defined integrity is also inherent to the relational model. When user-defined integrity is supported, the RDBMS can automatically manage the particular values that are stored within the database.
User-defined integrity constraints go far beyond simple data type checking and referential value checking. Values can be excluded from a specific column or columns based upon business requirements. In the absence of user-defined integrity support, this type of functionality is typically performed by an application program. A systematic and non-bypassable method of integrity checking without the need to write code provides an obvious benefit in terms of reduced development time.
Types of User-Defined Integrity
SYBASE SQL Server 10 provides user-defined integrity in several different ways, each of which will be introduced in this article. The following features support user-defined integrity:
A check constraint is a mechanism for allowing predicates to be defined on a column. The predicate is attached to the column as DDL and performs automatic edit checking of supplied values. Each check constraints is performed whenever data is inserted or updated. Check constraints can be coded at the column or table level.
Let’s examine column-level constraints first. Column-level constraints consist of a name and the actual predicate. Refer to Figure 1.
Figure 1. Column-Level Check Constraint
Every constraint must have a name. Failure to explicitly specify a name causes SQL Server to automatically generate a unique name for the constraint. It is wise to always explicitly assign names to each check constraint because the constraint name that SQL Server generates can be difficult to administer later. The name of the constraint in the depicted example in salary_cons. The predicate portion defines the actual conditions of the edit check and is coded as a typical SQL where clause (without the actual "where" keyword, of course).
Unfortunately, check constraints can not be defined as a select from another table. This limits their overall benefit. However, check constraints, even in this limited form, are superior to coding the condition into each and every application program that updates the column or columns in question.
In addition to column-level check constraints, it is possible to specify check constraints at the table level. Instead of being attached to a single column, the constraint is attached to the entire table.
It is usually sufficient to code a check constraint at the column level. However, there are situations were table-level check constraints are required. Any time two columns of the same table need to be specified in the constraint, a table-level check constraint is required.
The sample shown in Figure 2 depicts a table-level check constraint to ensure that an employee’s bonus is less than or equal to his commission. It would have been impossible to code this particular constraint at the column level because it accesses two columns instead of one column and one constant as the previous example showed.
Figure 2. Table-Level Check Constraint
User-defined messages can be attached to both column-level and table-level constraints. Consider the check constraint depicted in Figure 2 for the salary column. The message "salary too high" can be assigned to the constraint by adding a message and binding it to the constraint as follows:
Messages are always assigned a number greater than 20,000 because the first 19,999 message numbers are reserved for SQL Server use. The message text can be up to 255 characters long.
Finally, information on check constraints can be retrieved from the system using the sp_helpconstraint system procedure. By passing a table name as a parameter to sp_helpconstraint, a list of all constraints defined for the given table is displayed. This procedure is new as of System 10.
Rules are similar to check constraints, but rules are "free-standing" database objects. They are created using the create rule DDL statement and exist independently of any table or column. Like check constraints, rules can be used to define data validation. Once created, a rule can be bound to a table column. Thereafter, whenever data is inserted or updated, the rule is checked to ensure that the data modification complies with the rule.
The advantage of creating "free-standing" rules instead of using check constraints is to enhance reusability. Rules are reusable, check constraints are not! A free-standing rule can be created and applied as follows:
The first statement creates the rule; the second binds it to a specific column in a specific table. The same rule can be bound to as many different columns in as many different tables as is desired. Keep these rules of thumb in mind before binding a rule to a column:
Check Constraints or Rules?
The second bullet in the previous list introduces an interesting problem. When should check constraints be used? When should rules be used? And what if both are used on the same column?
Check constraints were added to SQL Server 10 to support the ANSI SQL standard. There is no concept of a rule currently in the ANSI standard. SQL Server has featured rules for many releases. Both integrity features implement the same basic function: they place restrictions on the data values that can be stored in a column.
Of the two methods, rules are more flexible. Rules are created as free-standing database objects and can be bound to columns and user-defined data types. Check constraints, on the other hand, are specified in the table DDL. They are useful when a constraint exists between two columns of the same table.
In general, the following rules of thumb should be followed:
SQL Server 10 also supports unique constraints. This type of constraint is applied to a column or columns to ensure that duplicate value can not be stored.
SQL Server enforces uniqueness by automatically creating a unique index on the specified column(s). Therefore, unique constraints can also specify the type of index to be generated: clustered or nonclustered. Nonclustered is the default. A unique constraint will allow one null to be stored in the column(s).
Although the relational model forbids duplicate rows, most relational DBMS products allow duplicates to be stored by default. Unique constraints enable the database designer to force the DBMS to follow this relational tenet.
User-Defined Data Types
All RDBMS products provide basic system data types such as integer, character, and decimal. SQL Server, however, enables users to define additional data types called user-defined data types. User-defined data types are based upon system-defined data types but can provide additional constraints on the data content. For example, the user-defined data type can provide a precision, scale, and/or length attribute as well as a column property (i.e., null or not null). User-defined data types, once created in the database, become fundamental data types; usable by any table in that database just like a system-defined data type. Rules and defaults can be bound to user-defined data types.
An example of a user-defined data type definition follows:
User-defined data types are quite useful for ensuring consistency throughout a database design. Consider, for example, a system in which it is necessary to store a social security number in multiple tables. Confusion may arise as to whether the number should be stored in character or numeric format. Furthermore, if it stored in character format, should it contain embedded hyphens? Consult Figure 3 for a list of valid options for storing social security numbers.
However, a user-defined data type can be defined, say SSN. The SSN data type will be standard and can be used for all columns that store social security number data. This ensures consistency from table to table and column to column. Whether the user-defined data type is character or numeric is not important. The point is that the data definition is consistent.
Figure 3. Social Security Number Storage Options
An additional benefit that is accrued by establishing user-defined data types is a higher level of abstraction in a database design. It is much easier to discuss the salary data type (with all its implied definitions, properties, and constraints) than it is to talk about a decimal(12,2) or smallmoney data type (with no implied characteristics other than its inherent type).
User-defined data types can also decrease maintenance. When a default or rule is bound to a user-defined data type, every column that is assigned that user-defined data type "inherits" the attached default and/or rule. Likewise, when a rule or default bound to a user-defined data type is changed, the change is automatically reflected in all columns that are assigned that user-defined data type.
Supporting Domains Using SQL Server
Domains have been a part of the relational model since its inception in 1969. However, no current RDBMS explicitly supports domains. SQL Server 10 supports domains only implicitly and incompletely.
What is a domain? According to Chris Date: "A domain is the set of all possible data values of some particular type." SQL Server’s domain support is only partial because it does not support the following domain characteristics:
Domains can be partially implemented in SQL Server using a combination of user-defined data types, rules, and defaults. Suppose that you wish to define a domain for product codes to be stored in a SQL Server database. All product codes conform to the following standards:
To implement a domain for the product code take the following steps:
Remember, however, that this provides rudimentary domain support only. It does not implement full domain support as described in Codd and Date's relational writings.
It is also possible to support user-defined integrity in SQL Server through the use of triggers. Triggers are event-driven specialized procedures that are stored in the RDBMS. Each trigger is attached to a single, specified table. Triggers can be thought of as an advanced form of "rule" or "constraint" written using procedural logic. A trigger can not be directly called or executed; it is automatically executed (or "fired") by the RDBMS as the result of an action—usually a data modification to the associated table.
Although triggers are often used for implementing referential integrity, there are many practical reasons for using triggers to implement user-defined integrity. Quite often it is impossible to code business rules into the database using only DDL. For example, the business rule may be too complex to support using a rule or check constraint. Triggers offer a flexible vehicle for the specification of user-defined integrity. Complex strings of instructions can be coded and stored within the DBMS as a trigger. Whenever data is added, removed, or modified, the logic in the trigger will be executed to ensure the required integrity constraints are maintained.
SQL Server 10 provides a wealth of mechanisms for supporting user-defined relational integrity. Developers would be wise to utilize these features to develop robust applications that provide system-manage data integrity, promote reusability, and provide consistent data structures.
© 1994, 2000 Craig S. Mullins. All rights reserved.