Elliotte Rusty Harold on native XML data servers

16 August 2007 » DB2, IBM, MySQL, PHP, Web development

Soon after a New York PHP mailing list exchange debating the merits of storing information in hierarchical XML format versus traditional relational tables, XML guru Elliotte Rusty Harold posted a summary of the State of Native XML Databases to his blog.

Like the thread that inspired it, the post has generated a lot of comments showing that it’s an emerging technology whose potential is not well understood and that the products which implement the technology aren’t well known.

Why use an XML database?
Before considering an investment in a data server which offers native XML storage (one which doesn’t decompose it, nor store it as an unstructured chunk; and which allows the user to query its arbitrary individual elements), it’s necessary to take a step back and see what XML as a storage method offers the Web developer.

  • What sort of information should be stored as XML?
    The examples cited by Elliotte include large documents where the document itself is composed of related data, yet which it would be inefficient to break down into related tables and columns. A book can be broken down logically into a title and an abstract but what about the individual paragraphs in each chapter? What of the table of contents and index which are derived dynamically from data which exists elsewhere in the document?
  • Why can’t this data be stored in another format?
    It can be stored, but how do you make use of it? You might shred it, but this requires time to decompose and then recompose, assuming you can get back the data in the form you require. For example, what if you needed the first paragraph and figure of every chapter to compose a detailed table of contents? How would you write that query? What would you do if you needed to add, remove, or reorder a paragraph in an encyclopedia?
  • Why is data stored in XML format increasingly becoming valuable?
    According to Elliotte Rusty Harold and Anant Jhingran, most existing data isn’t traditional relational data at all. There is a ton of information that can not be queried currently with traditional SQL nor stored efficiently in relational tables. Think about the Web itself, it’s a collection of documents that have individual (ideally semantic) structure.

Sound bites
Here are a few of the insightful nuggets which sum up Elliotte’s point of view. For some of his thoughts on the future of XML in general, have a look at his Ten predictions for XML in 2007.

From http://lists.nyphp.org/pipermail/talk/2007-August/022724.html

Roughly 80% of the world’s data cannot plausibly be stored in a
relational database. The 20% that does fit there is important enough
that we’ve spent the last 20 years stuffing it into relational databases
and doing interesting things with it. I’m still doing a lot of that.

But there’s a lot more data out there that doesn’t look like tables than
does. Much of this data fits very nicely in a native XML database like
Mark Logic or eXist. There’s also data that has some tabular parts and
some non-tabular parts. This may work well in a hybrid XML-relational
database like DB2 9.

If your only place to put pegs is a table with square holes, then you’re
going to try pound every peg you find into a square hole. However, some
of us have noticed that a lot of the pegs we encounter aren’t shaped
like squares, and sometimes we need to buy a different table with
different shaped holes. :-)

Relational databases didn’t take the world by storm overnight. XML
databases won’t either. But they will be adopted because they do let
people solve problems they have today that they cannot solve with any
other tools.

From http://lists.nyphp.org/pipermail/talk/2007-August/022788.html

XML is not a file format. We’ve been down this road before. A native XML
database is no more based on a file format than MySQL is based on tab
delimited text.

From http://lists.nyphp.org/pipermail/talk/2007-August/022789.html

Storing books, web pages, and the like in a relational database has only
two basic approaches: make it a blob or cut it into tiny little pieces.
The first eliminates search capabilities; the second performs like a dog.

Also from http://lists.nyphp.org/pipermail/talk/2007-August/022788.html

>> I’m glad we have multiple tools to bring to bear on this kind of
>> problem, because I worry about the performance implications of
>> querying an XML database for the average price of those books, or
>> performing an operation that adds another field (tag?) to each book’s
>> “record”.

Average prices, or adding a field, can be done pretty fast. I don’t know
if it’s as fast as oracle or MySQL. I don’t much care. Sales systems are
exactly the sort of apps that relational databases fit well. But
actually publishing the books? That’s a very different story.

>> If it’s not too much trouble, could you give us some other use cases
>> for an XML database? Because title and first paragraph, if that’s
>> something a system “routinely does” could easily be stored as
>> relational data at the time of import.

Just surf around Safari sometime. Think about what it’s doing. Then try
to imagine doing that on top of a relational database.

Think about combining individual chapters, sections, and even smaller
divisions to make new on-off books like Safari U does. Consider the
generation of tables of contents and indexes for these books.

Closer to home, think about a blogging system or a content management
system. Now imagine what you could do if the page structure were
actually queryable, and not just an opaque blob in MySQL somewhere.

And the takeaway from the State of Native XML Databases:

If you’re working in publishing, including web publishing, you owe it to yourself to take a serious look at the available XML databases. If they already meet your needs, use them. If not, check back again again in a year or two when there’ll be more and better choices.

The relational revolution didn’t happen overnight, and the XQuery revolution isn’t going to happen overnight either. However it will happen because for many applications the benefits are just too compelling to ignore.

Conclusion
This is interesting stuff, and I’m glad Elliotte was able to put forward some of the reasons one might use an XML database and describe the maturity level of the data server products out there now.
FLWOR
We’ve asked Elliotte to present at one of the upcoming New York PHP meetings in October or November. If he can’t make it, it would be interesting to hear from other folks doing PHP work with XML databases, such as the XML Content Store / Zend_Db_Xml in the Zend Framework.

Introduction to databases for programmers

21 May 2007 » DB2, MySQL, PHP, Web development

Kenneth Downs from Secure Data Software will take us back to basics with his “Introduction to Databases for Programmers” at Tuesday night’s NYPHP meeting:

This May, New York PHP takes a look at a critical, yet often overlooked, aspect of web development. The database is the heart of any system, but it often gets-no-respect by application developers. Join Kenneth Downs as he leads us through the important database concepts for a successful system.

Databases work on principles that are different from programs. Best programming practices, when applied to databases, will produce bad results, and vice versa. In this talk we will look at how databases differ from code (the so-called “impedance mismatch” between code and data), see a brief introduction to normalization, find out what “ACID compliant” means, and see some triggers and stored procedures.

Thanks to IBM for providing a great presentation space with seating for plenty.

There’s still time to submit your RSVP until noon EDT Tuesday, so hop to it.

Thoughts on DB2 as a MySQL storage engine

26 April 2007 » DB2, IBM, MySQL, PHP

IBM and MySQL have announced plans to port MySQL to IBM’s mid-range i5/OS servers.

Taking it one step further, DB2 will also be available as a storage engine behind a MySQL front end, just as the popular MyISAM and InnoDB table types are today.

The technology is still off in the future, and even then has only been announced for System i, but it opens the door to some interesting possibilities.

I don’t work for the group at IBM responsible for this technology, nor do I know what their detailed plans are, but here’s how I see the potential behind this collaboration.

You’ll be able to…

  • Run MySQL natively on i5/OS
    Can’t say I’ve ever built an application consisting wholly of the MySQL Server, but this is the foundation that makes everything else possible.

  • Run third-party applications built for MySQL and PHP on System i without modification
    Paired with strong PHP support, the entire ecosystem of third-party apps would now be available on this platform and none the wiser to their environment.

  • Enable existing data stored in DB2 to be accessed through MySQL
    Imagine you’re using a scripting language or application framework that has a MySQL driver, but a driver for DB2 is incomplete, unstable, or simply doesn’t exist. MySQL could be the glue.

  • Use MySQL itself as a de facto database abstraction layer
    MySQL is a relatively small download. What if instead of writing database-agnostic SQL or working through a PHP-based or C-based abstraction layer you could use the database server itself to translate your calls, and simply program to the mysqli_* API?

  • Manage access levels and tweak performance based on user type
    If you store your data in a DB2 storage engine, and provide access to it to two separate user communities; read-only Web visitors through the MySQL Server and read/write internal content editors through the standard DB2 clients, you’ve given yourself more flexibility to manage privileges, caching, and performance settings.

It seems there will be a lot of intriguing options available to developers as this cooperation between IBM and MySQL evolves. These are just some of my first thoughts on what might be possible.

I’d love to find out more from the folks working to make this happen, and hear what else might be possible from other developers in the community at large.

Chris Shiflett at NYPHP April 24th

23 April 2007 » Ajax, Apache, DB2, JavaScript, MySQL, PHP

New York PHP has officially lined up its next four monthly presenters:

The RSVP system for the April meeting tomorrow night is still open till 6pm EDT tonight. Hope you can make it.

Update: The RSVP deadline has been moved to midnight tonight.

Writing SQL for both MySQL and DB2

05 May 2006 » DB2, MySQL, PHP, Web design

Migration guides and abstraction layers abound, but you may still run into some interesting behavior when porting or trying to run your queries from PHP for both MySQL and DB2 databases. To address some of these quirks, I’ve put together some tips for writing database independent PHP applications.

TAMPonS?

I’ve put together a simplified guide for installing Apache, MySQL, PHP, and Tomcat on Linux and Solaris. It’s not pretty, but it will get the job done.

PHP/MySQL sites

23 January 2002 » MySQL, PHP, Web design, Web development

The krook.net redesign has been forestalled yet again. I am currently working on four PHP/MySQL projects: StrongBrain.org, RantRave.com, StrayFromTheHeart.org, and Kalax.com. If they go well, I may abandon my present XML backend project and upgrade to a MySQL or PostgreSQL backend for this site.

« Previous page