Share on Facebook Share on Twitter Share on Digg Share on Stumble Upon Share via e-mail Print

Apps are Temporary, but Data is Forever!

An interview with Tony Shaw and Craig S. Mullins

February 2003

Intro


Craig Mullins is Director, Technology Planning, for BMC Software.  And though he currently has ties to a specific software vendor, he’s managed to maintain his status in the industry for authoritative and unbiased advice on “Modern Database Administration.”  He wrote a book on that topic titled “Database Administration: The Complete Guide to Practices and Procedures,” published in late 2002 (a 5 star Amazon rating by the way, congratulations!) and is presenting a tutorial at the DAMA International Symposium and Wilshire Meta-Data Conference in Orlando on April 28.  He also writes prolifically in the field, and is a contributor to various web publications (DBAzine.com, TDAN.com, Database Trends and Applications, and others).  

 

Craig, you’ve written that you see database administration is a misunderstood function, and that it needs to be approached more rigorously as a management discipline.  Would you explain your perspective please?


You’re right, Tony, database administration needs to be practiced in a more rigorous manner. Too often the DBA is viewed as a fireman. This is not meant to disparage firemen, of course, but the fireman is called after the fire has started. Database administration practiced as a management discipline would be focused on prevention first, cure second. But I’m getting ahead of myself a bit here. Let me back up.

There are really several layers of misunderstanding and poor DBA practices that need to be addressed. The most pervasive these days, I think, is the situation facing DBAs working for organizations where the Internet rules.  As companies move from traditional development to e-business development there is the inevitable change in mindset that creates mad rushes. This is the "get it done NOW!" mentality -- industry pundits have even coined the phrase "Internet time" to describe this phenomenon. Basically, when a business starts operating on "Internet time" things move faster. One "Web month" is said to be equivalent to about three standard months. There is a nugget of truth in this load of crap, but it is mostly just a load of crap. The nugget is this Web projects move very fast because business executives want to conduct more and more business over the Web to save costs and to connect better with their clients. But just as frequently projects are forced to move fast because someone read an article in an airline magazine saying that Web projects should move fast and, well, everyone else is moving fast so you'd better move fast, too, or risk losing business.

This is bad for the DBA because this approach is bad for database design. Why? Applications are temporary, but the data is permanent. Organizations are forever coding and re-coding their applications - sometimes the next incarnation of an application is being developed before the last one even has been moved to production. But when did you ever throw away data? Oh, sure, you may redesign a database or move from one DBMS to another. But what did you do? Chances are, you saved the data and migrated it from the old database to the new one. Some changes had to be made, maybe some external data was purchased to combine with the existing data, and most likely some parts of the database were not completely populated. But data lives forever.

So, the Internet causes rapid delivery schedules which causes database administration to be an afterthought. This means many organizations are trying to implement database applications with no DBAs – instead, the application developer acts as a pseudo-DBA and performs just the basics to get the application delivered. Meaning, database design, performance, availability, and maintenance will suffer.


That all sounds bad, but you mentioned “several layers of misunderstanding”… what else is missing?


So far I’ve just ranted on about those organizations doing database administration on a “wing and a prayer,” but things are worse than that. Many so-called mature organizations approach database administration only as a “reactive” chore. This gets back to my fireman metaphor. Oh, yes, everyone says they are “proactive” but that is usually a great big lie. Many DBAs are up to their ears in paperwork, design tasks, and performance tuning – with a line of folks out the door of their cubicle looking for help. Now, how many of these DBAs do you think are being proactive and looking for more “potential” problems so they can fix them before they occur? None! They are all trying to put out the fires that are on their desk.


So what can be done?


A big step in the right direction would be to implement service level management (SLM) and service level agreements (SLAs). Service level management (SLM) is the disciplined, proactive methodology and procedures used to ensure that adequate levels of service are delivered to all IT users in accordance with business priorities and at acceptable cost. So, in order to effectively manage service levels, the business needs to prioritize application and identify the amount of time, effort, and capital that can be expended delivering service for those applications. When DBA is practiced as a management discipline every database application will have an SLA attached to it.


A service level is a measure of operational behavior. SLM ensures applications behave accordingly by applying resources to those applications based on their importance to the organization. Depending on the needs of the organization SLM can focus on availability, performance, or both. In terms of availability, the service level may be defined as "99.95% up time, during the hours of 9:00 AM to 10:00 PM on weekdays." Of course, a service level can be more specific stating "average response time for transactions will be two seconds or less for workloads of 500 or fewer users."


For a service level agreement (SLA) to be successful all of the parties involved must agree upon stated objectives for availability and performance. The end users must be satisfied with the performance of their applications and the DBAs and technicians must be content with their ability to manage the system to the objectives. Compromise is essential to reach a useful SLA.


A proper SLM discipline makes performance management predictable. SLM manages the expectations of all involved. Without a pre-defined and agreed upon SLA how will the DBA and the end users know whether or not an application is performing adequately. Not every application can, or needs to, deliver sub-second response time. Without SLAs, business users and DBAs may have different expectations, resulting in unsatisfied business executives and frustrated DBAs. Not a good situation.


With SLM in place, DBAs can adjust resources by applying them to the most mission critical applications as defined in the SLA. Costs will be controlled and capital will be expended on the portions of the business that are most important to the business.



What do you see as the role of the DBA in the application development process?


This is a loaded question, Tony. Application folks generally don’t want to turn over too much control of their applications to the DBA. And many DBAs don’t want to get too ingrained in writing code. But the two functions need to be joined at the hip in order to produce efficient database applications. And this can be done without stepping on the toes of either the DBA or the application developer.

The long and the short of it is that DBAs must understand and participate in application design, not necessarily the actual coding though. Application design includes database concerns such as interfacing SQL with traditional programming languages and the type of SQL to use. But every aspect of program coding will affect the usability and effectiveness of the application. Furthermore, each application program must be designed to ensure the integrity of the data it modifies.  

Designing a proper database application system is a complex and time-consuming task and the choices made during that process will impact the usefulness of the final delivered application. An improperly designed and coded application may need to be redesigned and re-coded from scratch if it is inefficient, ineffective, or not easy to use. To properly design an application that relies on databases for persistent data storage, the system designer must match the application development languages and tools to the physical database design and the functionality of the DBMS being used. The first thing to be mastered, though, must be a sound understanding of SQL. And, unfortunately, many application folks just do not have enough SQL knowledge to formulate the most efficient query in every situation. SQL is changing and growing, and the DBA will usually know what is possible and what SQL formulations perform best when. The application developer and the DBA need to work as a team to make sure that this knowledge gets built into each and every database application.

Of course, there are other issues, such as the interface to the programming language, drivers, procedures for compilation, and so on. But let’s just summarize by saying the DBA needs to be involved in application design to ensure efficient database usage – and therefore efficient applications.

One final note here, the DBAs need to participate in the design reviews for the code, too, at least when the SQL is being reviewed.


It seems to me that data administration and database administration are gradually coming closer together.  This is partly the motivation for us launching a DBA track at the upcoming DAMA Symposium and Wilshire Meta-Data Conference in Orlando (by the way, thank you for leading the tutorial on “Modern Database Administration”).  Would you agree with this observation about job function convergence, and if yes, do you have any thoughts on why this is happening more? And is it a desirable trend?


I agree that it does seem to be the trend, but I think this is an unfortunate trend because it is happening for all the wrong reasons. First let me state that I think the DA and DBA functions need to cross-train so that the DBA understands more about the business and the DA understands more about the technology. This enables both to do a better job because the DBA will know why he is creating and maintaining the database – and the DA will know what is possible given current technology and why some things may need to change in order to be physically implemented.


So why is this a bad trend? It seems to me that this trend is happening because when companies cut back due to the economy (or for whatever reason) one place that seems to get cut more often than not is the DA. Therefore, the DBA is tasked with picking up the DA’s role now that the DA has been let go. So the DBA does the bare minimum. Heck, he already has a more than full-time job and can’t possibly pick up another full-time job. So the DBA models as best he can, but other more important DA functions drop by the wayside - repository maintenance, proper data usage, metadata definition, stewardship programs, etc. all just kind of grind to a halt – and over time, the practices die. Sure, the CEO or CIO still says “We treat data as a corporate asset” but they don’t really.


So in a large IT shop, what do you see as the ideal demarcation between the roles of the DBA, DA and other data-related job functions?


Data administration (DA) separates the business aspects of data resource management from the technology used to manage data. When the DA function exists in an organization it is more closely aligned with the actual business users of data. The DA group is responsible for understanding the business lexicon and translating it into a logical data model. The DA is involved more in the requirements gathering, analysis, and design phases; DBA in the design, development, testing, and operational phases.


Another differentiating factor between DA and DBA is the focus of their effort. The DA is responsible for issues such as identifying and cataloging the data required by business users, production of conceptual and logical data models to accurately depict the relationship among data elements for business processes, production of an enterprise data model that incorporates all of the data used by all of the organization’s business processes, setting data policies for the organization, identifying data owners and stewards, and setting standards for control and usage of data.  In short, the DA can be thought of as the Chief Data Officer of the corporation. The DA position has nothing specifically to do with technology, though.


The larger the organization, the more likely that a DA function exists. However, when the DA role is undefined in the organization, the DBA must assumes the mantle of data planner and modeler for the organization. The DBA usually will not be able to assume all of the functions and responsibility of DA as summarized in this section because the DBA has many other technical duties to perform that will consume most of his time, the manager of the DBA group typically does not have an executive position enabling him to dictate policy, the DBA generally does not have the skills to communicate effectively with business users and build consensus, and frankly, most DBAs are happier dealing with technical issues and technicians, than with business issues and non-technicians

When DA and DBA functions exist within the organization, the two groups must work very closely with one another. It is not necessary that both have the same manager, though it could make it easier to facilitate cooperation between DA and DBA. At any rate, it is imperative that there be some degree of cross-pollination of skills between the two groups. The DA will never understand the physical database like a DBA, and the DBA will never understand the business issues of data like a DA, but each job function would be more effective with some knowledge about the other.

For those readers interested in more details, I tackle this question in much more depth in my book, “Database Administration: The Complete Guide to Practices and Procedures” (2002, Addison Wesley, ISBN: 0201741296)


To change tack a bit, let me ask you about increasing data stores.  Organizations are accumulating so much data nowadays, from so many different sources.  And I don’t just mean from a knowledge management (“we don’t know what we know”) standpoint, but they really have little idea what data they have or where it it’s stored.  I talk to CTOs who say this is their number one priority in the next two years –coming up with a strategy to cope with their rapidly expanding data and figuring out how to make use of it.  Do you have any advice for them?


Interestingly enough, this problem could have been avoided too a great deal by implementing and properly staffing a data administration function. Oh, they might still have as lot of data, but at least they’d know what it was, where it came from, and why it is there. Part of the charter of a DA group is metadata management. If no data came into the organization without first being categorized, classified, and having metadata assigned to it, then at least we’d know what the data was. Let’s face it, you can’t have it both ways. You can’t say out of one corner of your mouth “We don’t have time or budget to define our data” and then out of the other corner say “But we want to know what data we have, where it resides, why we have it, and when we can get rid of it.”


Wait a minute, I take that back. You can say it, but that’d make you a horse’s ass. (Can I say that?)


Given that we are where we are here is some advice: implement tools that track data usage and keep historical statistics. You might be able to convert application performance monitors that track SQL to help out. For example, if the monitor records the SQL statements then you’d be able to track what columns in which tables were accessed when. With some additional work you might even be able to determine which rows.


Of course, this works for relational data only. What about all that non-relational data out there in spreadsheets, word processing documents, data files, and so on? One radical approach would be to change the security on those files and then see who complains. As the complaints roll in you know who is using the data. Of course, this can have a severe impact on availability and I’m not recommending it as a general approach.


Another helpful tactic would be for organizations to stop copying data willy-nilly and do the hard work that is necessary for their applications and users to utilize shared databases. It is a sad fact of life today that most data is copied three, four, five, or more times – and there it sits. The same piece of data used in the online systems, the batch systems, for the data warehouse, on PCs in Excel, and, oh yes, it is in the legacy IMS database, then we moved it to DB2, and then sent it off to the Oracle database on Unix, too. Then we all sit back and complain about how much data storage we need.


One final though on this topic: if your organization is serious about tackling the large amount of data being stored, then you’ll need to get serious about data archival. For each database (or file) you need to assign a rule for when the data can be archived to a cheaper storage type, or when the data can be purged.


Let me ask you a general open-ended question – what do you see as the future of database and data management technology?  What are the major directions for the next 5-10 years?  


At a high level, things will be getting more automated, more connected, more wireless, and more complex.


The next wave of database administration is intelligent automation. I refer to this as implementing software scrubbing bubbles that “work hard, so you don’t have to.” (Remember that commercial!) DBMS software and third party database solutions are providing built-in intelligence to begin performing tasks that, in the past, required human intervention; that is, required a DBA. As these features become more robust and functional, the DBA will be able to turn over proactive management of certain maintenance and administrative tasks to the tools – leaving the DBA more time to be proactive himself.


Regarding the wireless aspect of things, pervasive devices (PDA, handhelds, cell phones, etc.) will increasingly have a database system installed. The management of that system is not so much an issue as will be the process of managing the synchronization of the remote device with a centralized database server. DBAs will need to get involved there to ensure successful data synchronization. Another aspect of the wireless trend will be wireless DBA – where the DBA uses a PDA to check on the health of the databases. This will be another tool in the DBA’s bag of management tricks – but the DBA will not be able to perform all of his tasks using the PDA, just those that require round-the-clock vigilance such as checking to see if the database instance is available or checking the status of system parameters.


Let me borrow a phrase from Ross Perot to describe another big database trend – “the giant sucking sound.” This is the sound of the DBMS as it sucks up technologies and functions that previously required you to purchase separate software. Remember when the DBMS had no ETL or OLAP functionality? Those days are gone. This will continue as the DBMS adds capabilities to tackle more and more IT tasks.

Another trend impacting DBAs will be a change in some of their roles as some of the recent DBMS features actually start being used in more production systems. One such trend I see is an off-shoot of the DBA role I call the Procedural DBA.


What exactly is a Procedural DBA? Can you elaborate on that a bit please?  


I coined the term Procedural DBA back in 1995 to describe a new type of database administration required by modern database management. The concept is a simple one: a new type of DBA is required to manage the procedural logic that is increasingly being stored in relational database management systems.


Initially, the sole purpose of a DBMS was to store, manage, and access data.  Over time, database management systems have evolved by integrating procedural logic in the form of complex triggers, stored procedures, and user-defined functions. This code is tightly coupled to the DBMS. As these features are exploited, management tasks such as administration, design, and tuning typically are assigned to the current DBA staff by default. But this is not always the best approach. What is required is an expansion of the role of database administration.

The Procedural DBA takes on the responsibility of ensuring that triggers, user-defined functions, and stored procedures are developed appropriately and will run efficiently and effectively once implemented. This requires both application programming and database administration skills. You see, these procedural objects have properties of both a program and a database object. So the Procedural DBA must be able to span both disciplines to function. Oftentimes, DBAs are not adequately trained in both areas. It is a distinctly different skill to program than it is to create well-designed relational databases.


Whether the procedural objects are actually programmed by the Procedural DBA will differ from shop-to-shop depending on the size of the shop, the number of DBAs available, and the scope of implementation. Minimally, the Procedural DBA should lead procedural object code reviews, assure reusability, and manage the database environment to assure successful procedural object implementation and use. Additionally, the Procedural DBA must be on-call to help out when a stored procedure, trigger, or user-defined function fails.


And the notion of a DBA scrubbing bubble is interesting, too. Can you shed some more light on that topic?


Glad to, Tony. The trend I see is a continuing build-up of intelligent automation. This differs from traditional automation. With traditional automation, human interaction is necessary – an individual uses a tool to create a script or set up an alert so that when X happens, the tool automatically does Y.

With intelligent automation the DBMS or tool is smart enough to figure it out without human intervention. So, instead of the DBA needing to constantly monitor database statistics to determine when to reorganize a table space, the tool will keep track of the statistics and have built-in knowledge of what those statistics mean. And we will see more knowledge being applied than is currently happening. For example, in DB2 one of the indicators for reorganization is cluster ratio. If the ratio falls below say 90%, then think about reorganizing. But software with intelligent automation can take advantage of performance management software that will examine access patterns. Say for that same table space 98% of the access is random (as opposed to sequential). This means that cluster ratio is not as important, so the “scrubbing bubble” can factor that into the mix and schedule the reorganization based not just on statistics, but actual usage patterns. This will be revolutionary.


Furthermore, the software will understand the mitigating circumstances, such as “Can’t schedule a REORG in the middle of the day when database activity is high”, “Have to work the REORG into the batch window,” and “These particular jobs will highly benefit from a REORG so is there room to squeeze the REORG in before that.” This will be done by tighter integration between database tools and systems management tools and by intelligent analysis of job execution trends.


Going back to the issue of growing data again for a moment, I’m particularly curious to ask you about unstructured data.  Not just documents, but the whole range of unstructured “stuff”, like web pages, maps, pictures, audio files, video images, etc.  There are all sorts of great reasons to try and bring more structure to this data in order to make it more manageable, but are we raising our expectations too high?  I mean, making this variety of file types and sources available and searchable is really a huge challenge.  Can we meet user expectations and integrate all this data effectively?


Not today we can’t. Seriously, Tony, most organizations are barely managing to handle their traditional data properly, so how can we expect them to take on this huge task? And the DBMS products are not up to the task yet either. Oh, they’ll allow you to store the unstructured data as some type of LOB, and even to access it if you learn some different techniques. But management is another topic altogether. Usually traditional database utilities (LOAD, UNLOAD, REORG, etc.) will not operate on the LOBs. So once they are there, you are sorta stuck with them.


Federation is an alternate approach. One method is to just allow the DBMS to access the unstructured data where it lies. You don’t pull it into the DBMS. This has certain advantages because it extends the usefulness of the unstructured data without putting additional management burden on the databases.


I have to ask, since it’s the topic de jour (and seems to have been for the past couple of years).  XML – where are you seeing it used to manage data?  Is it working as expected? Are the available tools satisfying the customer needs?  What are the high-payoff data management implementations?  Any general observations on its efficacy for mainstream corporate data management?


The hype around XML is just astounding. It started off as a tag-based mechanism to make data transfer more understandable. Now, if you believe everything you read, XML is going to slay dragons, leap from tall buildings, and enable your Grandma to write an order entry application. What nonsense! Sure, XML can be used as a format for exchanging data. The tags help to make the data somewhat self-documenting. But I don’t really see XML as anything all that groundbreaking.


And XML databases? Somebody, please, stop it before it happens! XML is hierarchical – do you really want to go back to the days of IMS? I know, I know, some of us never left those days – and that is fine. But we don’t need another hierarchical monster on our hands. Sure, we’ll probably be storing some XML data in our relational databases – either as CLOBs or by breaking the document apart and storing its elements as columns of tables. But let’s not go backward and champion yet another type of DBMS. I wish the XML DBMS movement all the success that Object DBMS movement has achieved (grin).


For those who are not in one camp or another yet, just take a look at XQuery – compare that sucker to SQL for ease of use. SQL isn’t perfect, but after trying to use XQuery you’ll likely run screaming back to SQL.


And as companies grapple with the web services paradigm, how do you see web services impacting the DBA role (if at all)?


Web services are being hyped even more so than XML these days. I like the concept but almost no one is really doing true web services. Most “web services” are being implemented across a corporate network and not across the vast Internet. Bandwidth, security, and sharing issues make true Internet web services difficult to implement.


If and when true web services become more pervasive, it will impact the DBA. It will mean more skills the DBA has to learn. Tracking down problems becomes more difficult – and the DBA is probably “Problem Sniffer Number One” in most organizations.


Craig, I’m envious that you manage to write so prodigiously.  When do you find the time to do so much writing?  What’s your secret?  Perhaps I should ask the question the other way around -- when do you do your “real” job?  


I’ve always loved to write and I make the time to write a little bit every day – no matter what. I suppose I am blessed because I don’t find it to be a job, it is just something I enjoy doing. The only time it can become burdensome is when I am working on a book and a deadline is looming. Books are time-consuming; it is kind of like working a second job. But it is so rewarding to finally hold the finished product in your hand – and even more rewarding to hear from people who read the book and find it useful!


As for the second question, writing is part of my “read” job, so I am lucky there, too. But my “real” job also feeds my writing. I present at user groups and conferences and many times the attendees bring up great questions giving me ideas to write about. And our products at BMC address all of the most troubling DBA problems, so my “real” job helps to seed my writing with ideas.


Thanks for taking the time to talk Craig.  I will see you in Orlando at the DAMA+Meta-Data show at the end of April.


Looking forward to it, Tony!

Data

Discussions

Interview conducted by Tony Shaw of Wilshire Conferences, February 2003.

© 2012 Craig S. Mullins,