by Brenda Honeycutt and Craig S. Mullins
October / November 2008
Every DBA strives to maintain a healthy DB2 environment every day. However, there are numerous lingering misconceptions about database health to which many organizations fall prey. These misconceptions occur because DBAs believe in, or apply concepts that are, objectively false.
Like old wives' tales, some misconceptions exist simply because they have been passed down from older generations. In terms of DB2, this means that even best practices true for older DB2 versions may no longer be best. This article comprises a number of the most common ones about DB2 maintenance and recovery.
"Lies, Damned Lies, and Statistics"
With that said, it is relevant to distinguish between the two types of statistics available to us: catalog statistics and real-
For the skeptics, it’s time to dispel the delusion that using real-
All Backups Are There!
So you've got your statistics straight. They reflect your database activity accurately and you can rely on them as a basis for backups. The real-
Even after progressing to this point, there are other common fallacies regarding database backups outside the realm of determining when to do them.
Migrating too early to tape. Some organizations migrate current backups to tape immediately in order to free up disk space. Well, this approach will free up some DASD, but the time needed to recall that tape will have to be added to the recovery time. So which is more important: disk space, or time to recover? As a general rule of thumb, you should consider keeping the most current backups on DASD and migrate them to tape only when a new backup is taken. Of course, your tape versus disk strategy for backups will be heavily dependent on the nature of the data being backed up and the recovery time objectives required for that data.
Some shops use dual backups. Rather than an attempt at cost savings for disk space, you might consider limiting dual backups to objects that require a critical copy. A critical copy is one taken after a REORG or a LOAD LOG NO, as well as a REORG with a rebuild of the compression dictionary for compressed table spaces. In other cases, DB2 always has the opportunity to fall back to a prior copy in the unlikely event of an unusable copy.
Virtual tape storage is as good as DASD. Not. There is a hard-
Oops, the archive logs are on tape! If your archive logs are on tape or VTS, a parallel recovery is not possible. Keeping archive logs on DASD isn't an option for some shops. When backing up archive logs to tape, do not follow the rule of thumb pertaining to the 28672 block size normally optimal for tape. Instead, use a block size of 24576. To enable parallel processing for recovery, the archive logs on tape must be copied to DASD, and 24576 is the preferred size; it’s been the ZPARM default since DB2 V8.
Think about the trade-
I can use mirroring instead of dual logging. Some organizations mistakenly believe that single active logging is fine if mirroring is used. But what happens if a technical error with the channel, the device, or the firmware of the device occurs? The data will be mirrored as inconsistently as the source. You should always do dual logging, without exception. The DB2 log is the most important resource to assure data consistency.
Fast, Faster, Flash It. IBM introduced BACKUP SYSTEM and RESTORE SYSTEM in V8 to take full advantage of the immense speed gains in FlashCopy I and II technology. With FlashCopy, the I/O subsystem does the copy; freeing the CPU to do other more important work. FlashCopies enable ultra-
Some people have hailed FlashCopy as "the end of the image copy," but that isn’t the case for all installations. Indeed, it’s absolutely the best for select, large, enterprise-
FlashCopies are not registered anywhere, unless from the BACKUP SYSTEM DB2 command. You must remember that you did a flash, and remember what was flashed. Flashes do not reset the COPY PENDING status. Do not, however, allow these facts to discourage you from using FlashCopy outside of DB2. Some third-
Copying Indexes? Some folks blindly continue on, administering their DB2 systems with little or no changes even as new versions roll in with new functionality. We’ve been able to use COPY to make image copy backups of DB2 indexes for some time now, but many shops ignore this capability completely. If the index was created (or altered) with the COPY YES parameter, then it can be recovered using either the REBUILD or RECOVER utilities. An index defined as COPY NO only can be recovered using the REBUILD utility.
It can make sense to backup indexes for recovery because the REBUILD utility can take a long time to complete for very large amounts of data or when a large number of indexes exist. For example, in one simple test, recovering a 50 million row index substantially outperformed rebuilding that same index. Of course, your mileage may vary.
Another reason for backing up indexes is in the case of a recovery of both the data object and the index. COPY YES indexes can be restored at the same time as the data is being restored, thus reducing the overall outage time.
Keep in mind, however, that all index copies must be full copies as incremental image copies are not permitted for indexes.
Recovery Ready, Set, Go
If you prepare well, there should be no recovery problem since all backups are available and current. You are carefully aligning your backup frequency to update rates using automation, thresholds, and monitoring. You are likely convinced that your backup strategy is fine-
What is wrong with this picture? You probably haven't the slightest inkling of whether the recovery duration supports the business needs. Threshold-
There is a misguided notion that recovering small objects, especially those with minimal updates, will always be fast. But this does not take into consideration situations where single updates to a page cover several archive logs. Recovering a large object with high update rates, all of which are active, can be much faster.
On the other hand, what about those large objects with high update rates that contain your most critical enterprise data? Although a well-
In terms of recovery-
A, B, Zs of ZPARMs: Don't depend on ZPARM defaults … or any defaults, for that matter. Defaults might seem to make sense for "one size fits all" shops, but even if your shop is an average fit, the recommended values change with new versions of DB2. Migrating to a new version does not apply the new defaults.
Fast Log Apply (LOGAPSTG) represents the log apply storage and it should be set to the maximum of 100MB. Although most tuning experts agree with the old maxim "never say never," there are exceptions to every rule; so never define a value of less than 100MB for LOGAPSTG. This is the default value, but it was changed only since V8. If DB2 gets less, it just takes less. Limiting this value critically restricts the available log apply storage and may slow down DB2 restart or a recovery. Assigning anything less than 100 therefore just doesn’t make any sense at all.
The check frequency parameter (CHKFREQ) determines the system checkpoint frequency in minutes or in number of log records. The default is 500,000 log records. Consider changing this to a time interval in order to maximize performance. Using log intervals can miss important checkpoints if your logging rates vary significantly. A reasonable starting point is to set CHKFREQ at 2 to 5 minutes.
If data sharing is in use, Retained Lock Wait (RETLWAIT) represents a multiplier on how long a member of a data-
The OUTBUFF parameter is the size of the output buffer that is used for writing active log data sets. Believe it or not, many shops are still using the V7 default of 400, which was changed to 4000 for V8. In order to decrease the number of forced I/O operations that occur because there are no more buffers, you should select as large a size as your system can tolerate. Some popular health checks recommend a minimum value of at least 40000K in order to increase the speed that DB2 can rollback to. Reading the output buffer is always much faster than reading the active log.
These are only a few examples of relevant ZPARMS, specifically the ones that impact your recovery times. The bottom line is to know your ZPARMs well and, because no installation is static, always do periodic checks on the values you choose. The best resource for DSNZPARM information is the IBM DB2 Installation Guide (GC18-
The coupling facility performs great. Don't be fooled! Trying to draw a correlation between how the coupling facility (CF) performs during normal operations and how it recovers is more likely to cause blindness than a certain activity in an old wives' tale, because keeping the CF in tiptop shape can be like grappling in the dark.
When a member and/or a coupling facility comes crashing down, all of the data sets are put into LPL or GRECP. To get the objects out of LPL or GRECP, you will have to build and submit lots of jobs. Each job should contain up to 99 –STA commands and up to 10 jobs can run in parallel. Why all these odd numbers? It boils down to LOGAPSTG being used as much and as efficiently as possible. Once you have built and submitted all these jobs, the coupling facility and the SCA, which holds the LPL, starts thrashing for its very life. The following guidelines are indicative of a CF that will stand the test of a speedy recovery:
"CF transfer time" is less than 2,000 microseconds.
"Number of messages rejected due to lack of message buffer" is zero.
"Sub channel busy" percentage is less than 10%.
"False lock contention" percentage is less than 3%.
"All path busy termination count" is zero.
GBP "cross invalidations due to directory reclaims" is zero.
GBP "asynchronous failed writes due to lack of storage" is zero.
The Group Buffer Pools are easy to check with a DISPLAY command. For GBP problems, the use of the "Auto Alter" in z/OS is a very good starting-
IXCQUERY requests information about resources the coupling facility manages.
IXLMG requests measurement data related to the use of a coupling facility.
IXLZSTR retrieves control information and structure data from a dump.
If the coupling facility metrics are obscure or exceeded, you may well be heading for real trouble when you attempt a recovery.
Quite regularly, application problems become database-
Yet space problems are space problems. If there is a permanent shortage of DASD space, even the most sophisticated settings of SMS DATA CLASS parameters are not going to help. Do pay close attention when somebody starts mumbling terms like Volume Count, Space Constraint Relief, Dynamic Volume Count, or Extent Constraint Removal. The settings for those parameters determine whether or not a data set or cluster will be multi-
If you set DATACLAS to enable multi-
VSAM Extent Constraint Removal was introduced with z/OS 1.7. It removes the 255 extent limit (255 extents per component, or per stripe, respectively), whereas the limit of 123 extents per volume remains. The maximum number of extents is now 7257, 59 volumes with 123 extents each. You don't want that, do you?
On Logging and Auditing
Probably one of the biggest misconceptions is that logging supports auditing requirements. Transaction logs are an integral component for ensuring database integrity. Logs function to record all changes as they are made to DB2 data. DB2 will log every modification made to every piece of DB2 data. Log records are written for every modification that is successfully executed and committed (with a few exceptions, such as LOAD LOG NO). Log records are written to the active logs. There are usually two active log data sets to safeguard against physical device errors. The active logs must reside on disk (they cannot reside on tape). The active log data sets are managed by DB2 using the BSDS.
As the active logs are filled, a process called log offloading is invoked by DB2 to move the log information offline to archive log data sets. This process prohibits the active logs from filling up during DB2 processing which would stifle the DB2 environment.
The logged data can be used for rolling back transactions and recovering table spaces and indexes. Indeed, this is its primary and most important purpose – to ensure the integrity of the data in your DB2 databases. However, some organizations use the logs for database auditing.
Database auditing is the process of monitoring access to and modification of selected database objects and resources within operational databases and retaining a detailed record of the access where said record can be used to proactively trigger actions and can be retrieved and analyzed as needed. But it is a bad idea to rely on the transaction logs for your auditing.
The biggest problem when relying on logs for auditing is that reads are not recorded. Sometimes regulations and policies require auditing reads. But if you need to know who accessed a particular table and when, the log cannot answer that question.
Another problem with using logs for auditing is a lack of separation of duties. Understanding what is on the logs and which logs are needed requires DBAs. But we must also audit DBA activity. This is the type of problem that auditors like to avoid.
Perhaps the biggest problem with log-
To do database auditing justice requires a software solution that captures database requests from the server without impacting database logs or recovery objectives.
Recognizing misconceptions requires, as a first step, recognizing them. After that, changing them is a matter of finding better ways to perform your processes. Today's business availability requirements are higher than ever and unless you've already been forced into a corner, you may have been lucky enough not to even know that some practices might not be the best ones. Lots of resources are at your disposal and tools are available, too. Although the notion has been passed down for generations, eating carrots probably won't improve your eyesight; but hopefully the contents of this article have improved your insight!
From zJournal, October / November 2008.
© 2012 Craig S. Mullins, SEG,