Craig S. Mullins 

Return to Home Page

September 1999

 

Using Nulls in SQL Server 

by Craig S. Mullins

A null is the way in which a relational database records missing or unknown information. When you assign a null to a column instance, it means that a value currently does not exist for that column. It is important to understand that a column assigned to null logically means one of two things:

1.      the column does not apply to this row, or;

2.      the column applies to this row, but the information is not known at present

Nulls are not the same as the value 0 or the value blank. Null means that no entry has been made. Quite literally, a null is the lack of a value. Nulls sometimes are inappropriately referred to as null values. Using the term value to describe a null column is incorrect because the term null implies the lack of a value. Even the SQL Server documentation uses the term “null value,” which just serves to confuse users.

By using null, instead of a default value, you can distinguish between a deliberate entry of 0 (for numerical columns) or a blank (for character columns) or any actual, valid value from an unknown or inapplicable entry.

For example, suppose that a table contains information on the hair color of employees. The HAIR_COLOR column is defined in the table as being capable of accepting nulls. Three new employees are added today: a man with black hair, a woman with unknown hair color, and a bald man. The woman with the unknown hair color and the bald man both could be assigned a null HAIR_COLOR, but for different reasons. The hair column color for the woman would be null because she has hair but the color presently is unknown. The hair color column for the bald man would be null also, but this is because he has no hair and so hair color does not apply.

SQL Server does not differentiate between nulls that signify unknown data and those that signify inapplicable data. This distinction must be made by the program logic of each application. The relational model, as defined by its creator, E.F. Codd, provides for the notion of i-marks and a-marks to differentiate between inapplicable (i-mark) and applicable but unknown (a-mark). Codd introduced “marks” instead of nulls late in the life cycle of the relational model in what he refers to as RMV2 (Relational Model Version 2). However, no current relational DBMS implementation supports these marks.

Null or Not Null?

For each column defined in each table you create you must decide whether nulls should be allowed. This is done in the CREATE TABLE statement by specifying either NULL or NOT NULL after the data type specification for the column. You can use the sp_help system procedure to find out which columns are nullable, and which are not.

So, defining a column’s value to be null provides a place holder for data that is not (yet) known. For example, in the titles table (of the sample pubs database), price, advance, royalty, and ytd_sales are set up to allow nulls. However, title_id and title do not allow nulls because the title_id is the key which defines the row and hence can not be null and storing data about a book where the title in unknown would not be a sound business practice. A price without a title makes no sense, but a title without a price could indicate that the book is just about to be published but the price has yet to be determined.

In general, when you create tables, define columns to be NOT NULL when the value for the column is required to provide meaning to the other columns in the row. Nulls should only be allowed where column values can be legitimately unknown and yet, still cause the row to make sense. You must have specific knowledge of how the absence of a value will impact the applications and queries using the data. In practice, you may want to create nullable columns only in rare circumstances because of the confusing behavior of nulls in relational queries.

When inserting data, the DEFAULT keyword can be used with nullable columns. This is so even if there is no default defined. By default, the default for a nullable column is NULL.

Rules and check constraints treat nulls as special conditions. A column definition that allows nulls can have nulls inserted even if the rule or constraint does not explicitly allow nulls. For example, consider a columns that allows nulls and has a rule defined for it as follows:

             (@variable > 10)

This would seem to indicate that only legitimate values greater than the number 10 can be stored in the column. However, a null can be stored as the default if no value is specified for the column in an insert statement.

The Behavior of Nulls

Avoid nulls in columns that must participate in arithmetic logic (for example, money values). The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. For example, the price column in the titles table is nullable, so the result of the following query:

            select avg(price
     from titles

is not the same as for this query if any price is actually set to NULL:

     select sum(price)/count(*)
     from titles

even though you may logically assume it to be equivalent. For this reason, avoid nulls in columns involved in math functions.

One other note: the aggregate functions MIN, MAX, SUM, and AVG return NULL if no rows are returned. So, even if none of your columns are nullable, you may need to prepare for the possibility of a null result if you use these functions and no rows satisfy the WHERE clause.

Another “apparent” quirk of nulls is that two nulls are not equivalent to one another. This confuses many developers who test columns for equivalency in a WHERE clause and expect two columns set to null to evaluate to true. But, just to be difficult, there is an exception: nulls are considered to be duplicates in a GROUP BY clause or an ORDER BY clause. Also, nulls are considered to be equal when duplicates are eliminated by SELECT DISTINCT or COUNT (DISTINCT column).

However, any expression using a comparison operator evaluates to false if any of the operands is null. This means that a null never matches another value (not even another null). Instead, the syntax IS [NOT] NULL is used to find nulls in queries.  So, for example, you can specify:

    where price is null

to check for instances where the price columns is set to null. This formulation must be used.  The following syntax is not valid:

    where price = null

Nulls and Time

When date and time columns can be unknown, it is usually best to assign the columns as nullable. SQL Server checks to ensure that only valid dates and times are placed in columns defined as such. If the column can be unknown, it must be defined to be nullable because the only other option is to define a default for these columns from the valid domain of dates and times. And this can be confusing at best, or at worst cause problems similar to what the industry is experiencing with the Year 2000 crisis.

Avoid Special Sequencing for Nullable Columns

Treat nullable columns the same as you would any other column in terms of their “positioning” in your SQL Server tables. Some DBAs place nullable columns after non-nullable columns in the table. This is supposed to assist in administering the null columns, but in my opinion it does not. Sequencing nullable columns in this manner provides no clear benefit and should be avoided.

Synopsis

Nulls are one of the most potentially confusing concepts in relational technology. Confusing or not, as a developer or DBA you need to understand how nulls work so that your databases will be designed properly and your applications will function correctly. As this article has demonstrated, nulls may need to be dealt with even if none of your columns are defined as nullable, so there is no clear and certain path to avoiding nulls.

This column has touched upon the basics of nulls: what they are, their functionality and quirks, and guidelines for their implementation. If you are using nulls extensively, it is wise to understand how they differ from traditional database values. And please, don’t call a null a “null value,” if anything it is a “null lack of a value.”

© 1999, Craig S, Mullins 

From SQL Server Update (Xephon) September 1999.

© 1999 Craig S. Mullins, All rights reserved.
Home.