Craig S. Mullins

Return to Home Page

December 2000

 

 

 

                                         


The eDBA Series... as published in:
 

New Technologies of the eDBA: XML

by Craig S. Mullins

 

Welcome to the third installment of my regular eDBA column where we explore and investigate the skills required of DBAs to support the data management needs of an e-business. Many new technologies will be introduced to organizations as they transform from a traditional business to an e-business model. Some of these technologies are obvious such as connectivity, networking, and basic web skills. But some are brand new and will impact the way in which an eDBA performs her job.

In the last eDBA column I discussed one new technology, Java. In this edition we will examine another new technology; namely, XML. The intent here is not to deliver an in-depth tutorial on the subject, but to introduce the subject and to describe why an eDBA will need to know XML and how it will impact their job.

What is XML?

XML is getting a lot of publicity these days. If you believe everything you read, then XML is going to solve all of our interoperability problems, completely replace SQL, and possibly even deliver world peace. Okay, that last one is an exaggeration, but you get the point. In reality, all of the previous assertions about XML are untrue.

XML stands for eXtensible Markup Language. Like HTML, XML is based upon SGML (Standard Generalized Markup Language). HTML uses tags to describe how data appears on a web page. But XML uses tags to describe the data itself. XML retains the key SGML advantage of self-description, while avoiding the complexity of full-blown SGML. XML allows tags to be defined by users that describe the data in the document. This capability provides users a means to describe the structure and nature of the data in the document. In essence, the document becomes self-describing.

The simple syntax of XML makes it easy to process by machine while remaining understandable to humans. Once again, let’s use HTML as a metaphor to help us understand XML.  HTML uses tags to describe the appearance of data on a page. For example the tag, “<b> text </b>”, would specify that the “text” data should appear in bold face. XML uses tags to describe the data itself, instead of its appearance. For example, consider the following XML describing a customer address:

 

<CUSTOMER>
<first_name>Craig</first_name>
<middle_initial>S.</middle_initial>
<last_name>Mullins</last_name>
<company_name>BMC Software, Inc.</company_name>
<street_address>2101 CityWest Blvd.</street_address>
<city>Houston</city>
<state>TX</state>
<zip_code>77042</zip_code>
<country>U.S.A.</country>
</CUSTOMER>

XML is actually a meta language for defining other markup languages. These languages are collected in dictionaries called Document Type Definitions (DTDs). The DTD stores definitions of tags for specific industries or fields of knowledge. So, the meaning of a tag must be defined in a "document type declaration" (DTD), such as:

<!DOCTYPE CUSTOMER [
<!ELEMENT CUSTOMER (first_name, middle_initial, last_name,
                                               company_name, street_address, city, state,
                                               zip_code, country*)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_initial (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT company_name (#PCDATA)>
<!ELEMENT street_address (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip_code (#PCDATA)>
<!ELEMENT country (#PCDATA)>
]

 

The DTD for an XML document can be either part of the document or stored in an external file. The XML code samples shown are meant to be examples only. By examining them you can quickly see how the document itself describes its contents. For data management professionals, this is beneficial because it removes the trouble of trying to track down the meaning of data elements. One of the biggest problems associated with database management and processing is tracking down and maintaining the meaning of stored data. If the data can be stored in documents using XML, the documents themselves will describe their data content.

Of course, the DTD is a rudimentary vehicle for defining data semantics. Standards committees are working on the definition of the XML Schema to replace the DTD for defining XML tags. The XML Schema will allow for more precise definition of data such as data types, lengths, and

The important thing to remember about XML is that it solves a different problem than HTML. HTML is a markup language, but XML is a meta-language. In other words, XML is a language that generates other kinds of languages. The idea is to use XML to generate a language specifically tailored for each requirement you encounter. It is essential that you understand this paradigm shift in order for you to understand the power of XML. (Note: XSL, eXtensible Stylesheet Language, can be used with XML to format XML data for display.)

In short, XML allows designers to create their own customized tags, thereby enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. So the most important reason to learn XML is that it is quickly becoming the de facto standard for application interfaces.

Some Skepticism

However, there are some problems with XML. For example, support for XML is less than complete in the standard and most popular web browsers. This problem will be alleviated in time as more XML capabilities are supported and come to market.

Another problem with XML is not really the fault of XML, but of market hype. There is a lot of confusion surrounding XML in the industry. Some folks believe that XML will provide metadata where none currently exists or that XML will replace SQL as a data access method for relational data. Neither of these assertions is true.

There is no way that any technology, XML included, can conjure up information that does not exist. Humans must create the metadata tags in XML for the data to be described. XML enables self-describing documents. It does not describe your data for you.

And XML does not do what SQL does. Hence, XML cannot replace SQL. SQL is the standard access method for relational data. It is used to “tell” a relational DBMS what data is to be retrieved. XML is a document description language. It describes the contents of data. XML may be useful for defining databases, but not for accessing them.

Integrating XML With the DBMS

More and more of the popular DBMS products are providing support for XML. One example is the XML Extender provided with DB2 UDB Version 7. The XML Extender enables XML documents to be integrated with DB2 databases. By integrating XML into DB2 you can more directly and quickly access the XML documents. You can search and store entire XML documents using SQL. You also have the option of combining XML documents with traditional data stored in relational tables. 

When you store or compose a document you can invoke DBMS functions to trigger an event to automate the interchange of data between applications. An XML document can be stored complete in a single text column. Or XML documents can be broken into component pieces and stored as multiple columns across multiple tables.

The XML Extender provides user-defined data types (UDTs) and user-defined functions (UDFs) to store and manipulate XML in the DB2 database. UDTs are defined by the XML Extender for XMLVARCHAR, XMLCLOB, and XMLFILE.  Once the XML is stored in the database, the UDFs can be used to search and retrieve the XML data as a complete document or in pieces. The UDFs supplied by the XML Extender include:

  • storage functions to insert XML documents into a DB2 database
  • retrieval functions to access XML documents from XML columns
  • extraction functions to extract and convert the element content or attribute values from an XML document to the data type that is specified by the function name
  • update functions to modify element contents or attribute values (and to return a copy of an XML document with an updated value)

More and more DBMS products are providing capabilities to store and generate XML. The basic functionality enables XML to be passed back and forth between databases in the DBMS. Refer to Figure 1.

 

Figure 1. XML and Database Integration

 

Conclusion

Putting all skepticism and hype aside, XML is definitely the wave of the immediate future. The future of the web will be defined using XML. The benefits of self-describing documents are just too many for XML to be ignored. Furthermore, being able to use XML to generate an application-specific language is powerful. This capability will drive XML to the forefront of computing.

XML is being used by more and more organizations to transfer data. And more capabilities are being added to DBMS products to support XML. Clearly DBAs will need to understand XML as their companies become e-businesses. Learning XML today will go a long way to helping eDBAs be prepared to integrate XML into their data management and application development infrastructure. For more details and specifics regarding XML, refer to the following web site: http://www.xmlecontent.com/XML/Default.htm.

And remember this column is your column, too! Please feel free to e-mail me with any burning e-business issues you are experiencing in your shop and I’ll try to discuss it in a future column. And please share your successes and failures along the way to becoming an eDBA. By sharing our knowledge we make our jobs easier and our lives simpler.

 

 

 

 

 

 

 

 

From DBAzine, December 2000.

© 2008 Craig S. Mullins,  All rights reserved.

Home.