| |
Klixxx Magazine Archive - Web Mastery |
|
|
What is XML? - The Basics & Beyond
By Gary B. Smith
Extensible Markup Language, or XML, is quite literally changing the face of the Web as it gains in popularity among Webmasters and other online specialists, who require greater functionality in their Web pages. While most documents on the Web are stored in HTML (Hypertext Mark-up Language), this trend is changing as more developers begin to realize the numerous advantages of XML.
According to the authors of The XML Files (www.aicpa.org/pubs/jofa/may1999/), XML's beauty lies in that it can perform even the most sophisticated data-management tasks: "Imagine that you could give each electronic record or each unit of information in your office a tag that explains what the data means, whether to a person or a computer programmer. For example, Jane Doe would no longer be just the name of a person but rather identified as a corporate client in Kansas; similarly, $322.28 would be labeled as an accounts payable item to Acme Office Supplies. Even if the tags were in plain English, your computer system would understand them." XML makes this possible. Another expert has stated that one should think of XML as HTML 'without the training wheels.'
XML is newer than HTML, but its design is based on SGML or the Standardized Generalized Mark-up Language. According to Connolly, Khare and Rifkin in their article The Evolution of Web Documents (www.xml.com/pub/a/w3j), SGML predates both HTML and the Web, and was designed to give information managers the flexibility to say exactly what they mean - no more and no less. XML brings this flexibility to the Web and because it allows you to develop your own custom tag-sets. In other words, XML is used to define tags and the structural relationships between them. This also means that as there are no predefined tag sets, all the semantics of an XML document will be defined by processing applications or style sheets.
Simply put, XML is about structuring data in a way that makes logical sense. In technical terms it can be defined as "a simple, very flexible text format derived from SGML (ISO 8879). It was originally designed to meet the challenges of large-scale electronic publishing. XML is also playing an increasingly important role in the exchange of a wide variety of data, on the Web and elsewhere". See www.w3.org/xml/
The essence of XML is that it is much more flexible than HTML and can be used in a myriad of ways to convey and share information. XML is intended to make it easy to define documents, and easy to transmit and share information across the Internet. There has been a surge in the number of programs and applications that are currently being developed - or are already on the market - that are based on or related to XML format. The XML format also has application beyond text documents; it also relates to other data formats including vector graphics, e-commerce transactions, and server API's and dozens of other kinds of structured data.
What makes it extensible?
XML is called extensible because it's not a fixed format like HTML, which is a single, predefined mark-up language. According to Peter Flynn, editor of The XML FAQ, (www.ucc.ie:8080/cocoon/xmlfaq), "XML is a 'meta-language' (a language for describing other languages), that lets you design your own customized mark-up languages for limitless types of documents." XML is able to do this because it's written in SGML, which is the international standard meta-language for text mark-up systems.
The XML Specification (www.ucc.ie:8080/cocoon) states that "XML is intended to make it easy and straightforward to use SGML on the Web: easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web." However, XML is not just for Web pages; it can be used to store any kind of structured information, and to enclose or encapsulate information in order to pass it between different computing systems, which would otherwise be unable to communicate.
What's all the XML fuss about?
Whereas HTML was designed to display data and determine the layout and appearance of data, XML on the other hand rather describes the data with the focus on what that data is. In The XML FAQ, Flynn states that in the recent past, Web developments have been held back by two chief constraints:
1. The dependence on a single, inflexible document type HTML, which was being over exploited for tasks it was never designed for; and
2. The complexity of full SGML, whose syntax allows many powerful, but hard-to-program options.
"In a nutshell", he says, "XML allows the flexible development of user-defined document-types. It provides a robust, non-proprietary, persistent and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program for."
Many people have raised the question - Why not just keep extending HTML? As the Web has evolved, developers have found it necessary to extend the HTML tag set for different tasks and functions. The problem lies in the fact that HTML is not 'unilaterally extensible' and any new tag also has the potential of ambiguous semantics. According to Flynn, "HTML is already overburdened with dozens of interesting but incompatible inventions from different manufacturers, because it provides only one way of describing your information." By contrast, the advantage of XML is that it allows people to create customized mark-up applications for exchanging information in their domain. In summary, he outlines the advantages of XML over HTML.
- XML allows you to design your own document mark-up instead of being stuck with HTML. You can tailor your document to an application, and ensure that your mark-up always says what it means. For example: <date yymmdd="2002-12-31">next Monday</date>.
- Because XML's descriptive and hypertext linking abilities are so much greater than HTML, you can make your information content richer and easier to use.
- XML allows you to provide more and better facilities for browser presentation and performance, using CSS and XSLT style sheets.
- XML removes many of SGML's underlying complexities in favor of a more flexible model. This means writing programs to handle XML is much easier than doing the same for full SGML.
- Using XML makes information more accessible and re-usable. That's because XML's more flexible mark-up can be used by any XML software, instead of being restricted to certain manufacturers, as with HTML.
- Valid XML files can be used outside the Web as well as in existing document processing - because they're still SGML.
Finding More Information about XML
|
The Future of HTML
Many people feel that HTML has reached the pinnacle of its ability to describe information, and many new applications need a more flexible format. But, according to online experts like Dan Connolly and Adam Rifkin, "It will not be an either-or choice between HTML and XML; you do not have to plan for a Flag Day when your shop stops using HTML and starts using XML." (www.xml.com/pub/a/w3j/)
The reality is that as HTML tools have evolved to support the whole range of XML, choices will expand with them. And just as the value to information providers is becoming evident, the cost of generic mark-up is going down because XML is considerably simpler than SGML.
How does XML work?
XML consists of three basic document types: a main XML data file containing the codes, and two support files. The raw data is contained in the main file; structural information is contained in the second, and the presentation instructions are provided in the third. Here's a brief description of each document type:
XML data file stores the marked-up information. For example, a purchase order document would contain information such as the customer number, name, and the purchase order line items that include the quantity ordered, item name and price.
An XML file is a simple text file - just like HTML - and you can usually make sense of it just by viewing it as a text file in WordPad. In XML - From Bytes to Characters, author Bert Bos says, "The structure of an XML file is so simple that you can write one 'by hand'--that is, with any editor or word processor that can write text files."
DTD is a document-type definition that describes the structure of the data in the XML file. You can do without the DTD but it is very useful in helping developers properly format the data and allowing the recipient to understand the information. If, for example, everyone in the world could agree to a single DTD for universal items such as business cards or addresses, everyone could easily swap them.
The XSL document, or style sheet, is optional and contains the formatting and presentation guides for the data contained in the XML data file. You can use different style sheets for displaying information in a printed format, for display on a website and for storage on a CD-ROM. You won't need a style sheet, for example, when sharing information between computers, which don't care what the data looks like. (Example: www.aicpa.org/pubs/)
How does an XML file function?
An XML file is like HTML in that the code does not "execute." The code is parsed (by a program that can parse XML code) and the document is then rendered in the way an HTML browser might render a document. Several free parsers are available on the Web: Take a look at xml.apace.org, www.xml.com and www.xmlhack.com.
There has been much debate about the significance of XML for the future of Web development. In XML, Java and the future of the Web, author Jon Bosak says, "The extraordinary growth of the World Wide Web has been fuelled by the ability it gives authors to easily and cheaply distribute electronic documents." He also writes that as the documents have grown in size and complexity, the limitations of a medium that does not provide the extensibility, structure, and data-checking needed for large-scale commercial publishing, have become plain.
According to Bosak, with the growing need for applications that require functionality beyond HTML, the World Wide Web Consortium has developed XML to address the requirements of commercial Web publishing to enable the further expansion of Web technology into new domains of distributed document processing.
Bert Bos in XML, from Bytes to Characters, states the World Wide Web Consortium "sees its mission as leading the evolution of the Web." And since nobody owns or controls XML, he says it's feeding that evolution as "the Web itself is becoming a kind of cyborg intelligence: human and machine, harnessed together to generate and manipulate information."
He continues this interesting viewpoint:
"If automatability is to be a human right, then machine assistance must eliminate the drudge work; and the shift from structural HTML mark-up to semantic XML mark-up has become, as Douglas Adams (Hitchhiker's Guide to the Universe) says, a critical phase in the struggle from information space into a universal knowledge network."
Bosak says the applications that are currently driving XML's acceptance are those that go beyond HTML's limitations. He says that these fall into four categories of applications that:
1. require the Web client to mediate between heterogeneous databases;
2. attempt to distribute a significant proportion of the processing load from the Web server to the Web client;
3. require the Web client to present different views data to different users; and
4. require intelligent Web agents to tailor information discovery to the needs of individual users.
Bosak predicts a future domain for XML applications when intelligent Web agents begin to make larger demands for structured data than can easily be conveyed by HTML. "Perhaps the earliest applications in this category will be those in which user preferences must be represented in a standard way to mass media providers."
Bosak says this need would arise with the advent of customized newspapers or even personalized TV guides. Such a guide working across the spectrum of providers would require that the user's characteristics be specified in a standard, vendor-independent manner, and also that the programs be described in a way that allows agents to intelligently select the ones most likely to be of interest to the user.
"This second requirement can be met only by a standardized system that uses many specialized tags to convey specific attributes of a particular program offering (subject category, audience category, leading actors, length, date made, critical rating, specialized content, language, etc.)." Such applications aren't with us just yet, but it's plain to see that they will play an increasingly important role in our lives and, according to Bosak, that "their implementation will require XML-like data in order to function inter-operably, thereby allowing intelligent Web agents to compete effectively in an open market."
|

|
|
|
|