XML is a flexible medium
Most comparisons of EDI and XML begin by emphasizing that the former is highly standardized, and the latter flexible. When two partners first transact EDI, there's little risk of misunderstanding: published standards clearly specify the meaning of each datum transmitted and received.
XML, in contrast, has a very general and abstract definition. "Human-readability" is one of the features of XML, and this carries with it the implication that XML can capture a wide range of human communications. Here's an example of well-formed XML ("well-formed" is a technical term whose precise definition needn't concern us for now):
<invoice_update> <note>Pay now!</note> </invoice_update>
This example illustrates that XML has the capacity to communicate even emotion in a way EDI can't. The obvious price of that flexibility is ambiguity: this particular
invoice_updatedoesn't provide such basic information as who should be paid, how much, and for what purpose. You can trust EDI to get those basics right.
XML specialists have created a number of techniques for managing XML's flexibility. Many e-commerce architects and implementers don't seem to be aware of the advantages of these definitions.
DTDs and more
When standardization of XML is discussed in e-commerce contexts, DTD--document type definition--is often assumed. DTD is a special language which defines sets of messages. For e-commerce, a standard DTD might tell us that an invoice includes a timestamp, a vendor code, a total dollar amount, payment instructions, and so on, and perhaps even such details as the order in which different items appear.
We've already seen that XML is known for its flexibility and human-readability. XML can effectively be used for fully-automated, computer-to-computer operations, as EDI enables. XML's advantage is that, unlike EDI, modestly-trained humans can read it well enough to check and correct it by sight. Information Technology (IT) has abundant tools to transmit, store, transform, and verify XML.
These facts mean that any XML project needs a clear explanation of exactly what kind of XML it is transacting. EDI doesn't require this, because EDI is intrinsically about commercial transactions. When vendors and customers agree on XML, though, they've only begun a negotiation which has several more steps.
This is where the confusion and even inefficiency arises. Because DTD has often been used in the past to define e-commerce projects, many practitioners assume that it's the only choice for specifying an e-commerce XML standard. This simply isn't so.
It's unfortunate, too, because DTDs have a reputation for being difficult and expensive to construct. The result is that XML projects look as though they require more expertise and cost than is necessarily true.
Here are alternatives for standardization of an XML project:
For sufficiently small, light-weight, or casual projects, it can be enough to define XML in a statement of project requirements, along with such other technical details as operating system, performance constraints, and so on. As long as all the parties involved understand each other, the formality of a DTD or the other methods below is unnecessary.
Formal methods have their own advantages, of course. With a DTD in place, automatic tools quickly detect a wide range of common errors. DTD and the methods below are generally less ambiguous than English; a requirements document might identify a mailing address as a "text string" without proper allowance for the length of that string, whether the characters are case-sensitive, whether accented characters are allowed, and so on. A DTD can provide all that structure.
As mentioned above, much of the existing literature emphasizes the difficulty and cost of DTD preparation. This is skewed in a couple of ways:
- it neglects the benefit of a DTD, or, equivalently, the cost of reliance on informal methods; and
- it confuses social processes involved in standardization with technical aspects of DTD preparation.
DTD construction ishard, when it demands agreement on an industry-wide definition and careful authorship to satisfy all contending parties. For an XML project aimed at a single set of suppliers and customers, however, a DTD can be written rather quickly. In fact, it's formally possible to outline a skeletal DTD, begin to use it with the XML of an early implementation, and refine the DTD in an "agile" and iterative way. In my experience, this dramatically reduces the costs and overhead of DTD management.
In this perspective, DTD is just a convenient formalization for expression of the kinds of requirements an XML project needs anyway. DTD provides a common language that all participants can understand. That's not all, though: the XML tool marketplace has an abundance of products for preparation and use of DTDs.
What more can there be to say? The smallest XML projects can get away with informal methods, but DTD will benefit anything larger or long-lived.
Those aren't the only choices, though. While DTD has a long history, the last decade of XML practice has yielded new and better methods of XML definition. Any single one of these methods is called an "XML schema" (note the capitalization). Some authors make a point of including DTD among the XML schemata; for others, DTD is not an XML schema.
What advantages do the newer XML schemata have over DTD? They're largely technical, the kinds of details that inspire passion in programmers the way Java vs. C# or Windows vs. Linux debates do. In broad terms, the newer XML schemata are object-oriented, more concise for real-world XML projects, and far more expressive. DTDs are hard to re-use; they modularize poorly. DTD is even subject to a kind of security exploit that "inlining" of XML schemata prevents.
Among the many XML schemata developed since DTD, much the most popular is one called, rather confusingly, "XML Schema", or, among developers, "XSD", for "XML Schema document". Tools to handle XSD are roughly as common as those for DTD, and many are more powerful, with better abilities to handle partial results.
XML Schema isn't my favorite XML schema; when possible, I like to use RNC. RNC derives from "RNG compact", while RNG itself comes from "RELAX NG", or REgular LAnguage for XML Next Generation. Far fewer tools are compatible with RNC or RNG than are available for XSD and DTD. As author Michael Fitzgerald described ten years ago, however, programming with RNC "... is like driving a sports car." A capable developer can achieve great results with RNC economically.
Advanced XML Schemata
This doesn't exhaust the list. There are, in principle, other tools for making the most of an XML project, including Schematron, PSVI, and so on. I do notgenerally recommend these; tools haven't caught up with them, and, for e-commerce practitioners, they are rarely more than research-level experiments. One of them might achieve "critical mass", though, during the coming years; don't be surprised to hear about them as time goes on.
Whenever you're involved with an XML project, make sure you understand what its plan is for validation of data, and why that choice was made. Will the XML be formatted in terms of a human-language specification? With a DTD? XSD? RNC? Or one of the advanced XML Schemata? This decision is a crucial one for an XML architecture that boosts the efficiency of your business as much as possible.