Craig S. Mullins

Return to Home Page

August 2005

 

 

 

                                         


As published in:

 

Assuring Data Integrity in DB2 - Part 1

by Craig S. Mullins

 

DB2 provides mechanisms to automatically enforce and maintain the integrity of data as it is added to, and modified within DB2 tables. This two-part article will offer an overview of DB2's data integrity features.

The simplest form of data integrity enforcement available to DB2 is with data typing. By choosing the appropriate data types DB2 will force columns to contain only the proper form of data (e.g. character, numeric, date, etc.). Of course, DB2 offers more sophisticated forms of ensuring data integrity, too. Features such as referential integrity, check constraints, triggers, validation routines, and edit procedures all can be used to ensure the integrity of DB2 data.

Automatically enforcing DB2 data integrity is usually a wise choice of action because it offloads such work from application programs. Additionally, DB2-enforced data integrity will be applied for both planned and ad hoc modifications.

Referential Integrity

When translating a logical model to a physical DB2 database the relationships are implemented as referential constraints. To define a referential constraint you must create a primary key in the parent table and a foreign key in the dependent table. The referential constraint ties the primary key to the foreign key. The table with the primary key is called the parent table and the table with the foreign key is called the dependent table (or child table).

Referential integrity (RI), therefore, can be defined as a method of ensuring data integrity between tables related by primary and foreign keys. When RI is implemented between tables DB2 will guarantee that an acceptable value is always in each foreign key column based on the data values of the primary key columns.

RI defines the integrity and usability of a relationship by establishing rules that govern that relationship. The combination of the relationship and the rules attached to that relationship is referred to as a referential constraint. The rules that accompany the RI definition are just as important as the relationship to ensure correct and useful DB2 databases.

The RI rules defined for each referential constraint are specified to determine how DB2 will handle dependent rows when a primary key row is deleted or updated. For example, when a primary key is deleted that refers to existing foreign key values, the rule specifies whether DB2 should void the primary key deletion, delete the foreign key values too, or set the foreign key values to null.

The concept of RI can be summarized by the following "quick and dirty" definition: RI is a guarantee that an acceptable value is always in each foreign key column. Acceptable is defined in terms of an appropriate value as recorded in the corresponding primary key, or a null.

Two other important RI terms are parent and child tables.  For any given referential constraint, the parent table is the table that contains the primary key and the child table is the table that contains the foreign key.  Refer to Figure 1 below.  The parent table in the employs relationship is the DEPT table.  The child table is the EMP table.  So the primary key (say DEPTNO) resides in the DEPT table and a corresponding foreign key of the same data type and length, but not necessarily the with same column name (say WORKDEPT), exists in the EMP table.

 

Figure 1. A relationship between two tables.

As a general rule of thumb it is a good physical design practice to implement referential integrity using database constraints instead of trying to program integrity into application programs. Using database RI will ensure that integrity is maintained whether data is changed in a planned manner through an application program or in an ad hoc manner through SQL statements or query tools.

Additionally, it is almost always a good idea to define a primary (or unique) key to prohibit duplicate table rows. This should be done to ensure entity integrity regardless of whether dependent tables are related to the table being defined. Entity integrity ensures that each row in a table represents a single, real-world entity.

Of course, there are exceptions to every rule.

Defining DB2 Referential Constraints

Referential constraints are defined using the FOREIGN KEY clause. A referential constraint consists of three components: a constraint name, the columns comprising the foreign key and a references clause. The same constraint name cannot be specified more than once for the same table. If a constraint name is not explicitly coded, DB2 will automatically create a unique name for the constraint derived from the name of the first column in the foreign key.

For example, consider the relationship between the DSN8810.DEPT and DSN8810.EMP tables:

CREATE TABLE DSN8810.EMP

     (EMPNO             CHAR(6)       NOT NULL,

      FIRSTNME          VARCHAR(12)   NOT NULL,

      MIDINIT           CHAR(1)       NOT NULL,

      LASTNAME          VARCHAR(15)   NOT NULL,

      WORKDEPT          CHAR(3),

      PHONENO           CHAR(4) CONSTRAINT NUMBER CHECK

                        (PHONENO >= '0000' AND

                         PHONENO <= '9999'),

      HIREDATE          DATE,

      JOB               CHAR(8),

      EDLEVEL           SMALLINT,

      SEX               CHAR(1),

      BIRTHDATE         DATE,

      SALARY            DECIMAL(9,2),

      BONUS             DECIMAL(9,2),

      COMM              DECIMAL(9,2),

      PRIMARY KEY (EMPNO)

      FOREIGN KEY RED (WORKDEPT)

        REFERENCES DSN8810.DEPT ON DELETE SET NULL

     )

    EDITPROC DSN8EAE1

    IN DSN8D81A.DSN8S81E;

 

CREATE TABLE DSN8810.DEPT

     (DEPTNO            CHAR(3)        NOT NULL,

      DEPTNAME          VARCHAR(36)    NOT NULL,

      MGRNO             CHAR(6),

      ADMRDEPT          CHAR(3)        NOT NULL,

      LOCATION          CHAR(16),

      PRIMARY KEY (DEPTNO)

     )

    IN DSN8D81A.DSN8S81D;

    ALTER TABLE DSN8810.DEPT

      FOREIGN KEY RDD (ADMRDEPT)

        REFERENCES DSN8810.DEPT ON DELETE CASCADE;

    ALTER TABLE DSN8810.DEPT

      FOREIGN KEY RDE (MGRNO)

        REFERENCES DSN8810.EMP ON DELETE SET NULL;

 

The primary key of EMP is EMPNO; the primary key of DEPT is DEPTNO. Several foreign keys exist, but let's examine the foreign key that relates EMP to DEPT. The foreign key, named RDE, in the DEPT table relates the MGRNO column to a specific EMPNO in the EMP table. This referential constraint ensures that no MGRNO can exist in the DEPT table before the employee exists in the EMP table. The MGRNO must take on a value of EMPNO. Additionally, the foreign key value in DEPT cannot subsequently be updated to a value that is not a valid employee value in EMP, and the primary key of EMP cannot be deleted without the appropriate check for corresponding values in the DEPT foreign key column or columns.

To ensure that this integrity remains intact, DB2 has a series of rules for inserting, deleting, and updating:

  •       When inserting a row with a foreign key, DB2 checks the values of the foreign key columns against the values of the primary key columns in the parent table. If no matching primary key columns are found, the insert is disallowed.

  •       A new primary key row can be inserted as long as the primary key is unique for the table.

  •       When updating foreign key values, DB2 performs the same checks as when it is inserting a row with a foreign key.

  •       If a primary key value is updated, DB2 does not allow there to be any existing foreign keys that refer back to the primary key that is changing. All foreign key rows first must be either deleted or be set to NULL before the value of the primary key can be changed.

  •       Deleting a row with a foreign key is always permitted.

  •       When deleting a row with a primary key, DB2 takes action as indicated in the DDL; it either restricts deletion, cascades deletes to foreign key rows, or sets all referenced foreign keys to null.

Each referential constraint must define the action that will be taken on foreign key rows when a primary key is deleted. There are four options that can be specified:

  •       RESTRICT: disallows the deletion of the primary key row if any foreign keys relate to the row.

  •       CASCADE: allows the deletion of the primary key row and also deletes the foreign key rows that relate to it.

  •       SET NULL: allows the deletion of the primary key row and, instead of deleting all related foreign key rows, sets the foreign key columns to NULL.

  •       NO ACTION: the behavior of NO ACTION is similar to RESTRICT. The only difference between RESTRICT and NO ACTION is when the referential constraint is enforced. RESTRICT enforces the delete rule immediately; NO ACTION enforces the delete rule at the end of the statement.

The processing needs of the application dictate which delete option should be specified in the table create statements. All of these options are valid depending on the business rules that apply to the data.

If efficiency is your primary goal, the RESTRICT option usually uses fewer resources because data modification of dependent tables is not performed. If data modification is necessary, however, allowing DB2 to perform it is usually preferable to writing cascade or set null logic in a high-level language (e.g. COBOL, C, Java, etc.).

Referential Sets

A referential set is a group of tables that are connected together by referential constraints. It is a wise course of action to avoid very large referential sets. Try not to tie together all tables in a large system; otherwise, recovery, quiesce, and other utility processing will be difficult to develop and administer.

You should follow some general rules when deciding how to limit the scope of DB2-defined referential integrity:

  •       Consider removing code and reference tables from your referential structures. These tables are usually static and easy to control within your programs. Adding them to your referential sets can complicate administrative tasks.

  •       Reduce the size of very large referential sets by breaking them apart into smaller structures. Referential sets of more than a dozen (or so) tables can become unwieldy to manage. Consider breaking up referential sets into groups having a dozen or so related tables. Doing so makes it easier to keep track of the RI defined to DB2 and the rules that are in effect. However, it also opens the door to data integrity problems caused by updates outside the scope of the application programs that enforce the integrity. Weigh the performance impact against the possible loss of integrity before deciding to bypass DB2-enforced RI.

  •       Try to control the number of cycles in a referential set. A cycle is a referential path that connects a table to itself. In the cycle shown in Figure 2, Table A is connected to itself.

    Furthermore, a table cannot be delete-connected to itself in a cycle. A table is delete-connected to another table if it is a dependent of a table specified with a CASCADE delete rule.

Figure 2. A cycle.

  •       Whether RI is checked by DB2 or by an application program, overhead is incurred. Efficiency cannot be increased simply by moving RI from DB2 to the program. Be sure that the application program can achieve better performance than DB2 (by taking advantage of innate knowledge of the data that DB2 does not have) before eliminating DB2-enforced RI.

  •       If updates to tables are permitted in an uncontrolled environment (for example, QMF, SPUFI, or third-party table editors like File-Aid for DB2, implement DB2-enforced RI if data integrity is important. Otherwise, you cannot ensure that data is correct from a referential integrity standpoint.

 

Summary

And so we conclude part 1 of this article. Part 2 follows on to this discussion by providing offers some guidelines to follow when implementing constraints in Db2 for z/OS.

 

 

 

 

From DBAzine, August 2005.

© 2008 Craig S. Mullins,  All rights reserved.

Home.