XML 1.0 to 1.1 and back? [updated 03/04/2008]

November 24th, 2007 by Matthew Ross

XML 1.1 became a W3C Recommendation in early 2004. However XML 1.1 was controversial and there are very few XML 1.1 documents in the wild. The W3C is now considering back-porting the naming philosophy from XML 1.1 as an update to XML 1.0 and if passed then expects to deprecate XML 1.1.

[updated 03/04/08] - Since this article was written in November 2007, the W3C have released a Proposed Edited Recommendation to XML 1.0 in which erratum E09 relaxes the restrictions on element names as explained.
XML 1.0

The XML 1.0 Recommendation of 1998 was based on Unicode 2.0, the version of that standard current at the time. XML 1.0 restricts element and attribute names (the markup) to a fixed list of Unicode 2.0 characters.

However the Unicode standard has evolved, now in version 5, with several thousands of new characters. These new characters cover the languages Ethiopic, Cherokee, Canadian Syllabics, Khmer, Mongolian, Yi, Philippine, New Tai Lue, Buginese, Syloti Nagri, N’Ko, Amharic, ancient Cypriot, Burmese, Mongolian, Cambodian and other minority languages.

These new Unicode characters are already valid in the content of an XML 1.0 document, but not in element and attribute names.

XML 1.1

XML 1.1 changes the naming philosophy to permit any character not expressly forbidden. To quote the XML 1.1 Recommendation:

Whereas XML 1.0 provided a rigid definition of names, wherein
everything that was not permitted was forbidden, XML 1.1 names are
designed so that everything that is not forbidden (for a specific
reason) is permitted. Since Unicode will continue to grow past
version 4.0, further changes to XML can be avoided by allowing
almost any character, including those not yet assigned, in names.

XML 1.1 also introduced a number of other small but disruptive changes which were controversial including:

  • adding the “NEL” end-of-line character used only on IBM mainframes
  • requiring that the C1 control characters must be escaped.

The sum effect of the changes in XML 1.1 mean that it is neither forwards nor backwards compatible. The intricacies of these issues are well covered elsewhere including Chapter 3 of Elliote Rusty Harrold’s Effective XML.

Because the use cases that justify sending an XML 1.1 document are relatively few, a cycle has ensued where the widely distributed XML parsers do not support 1.1 because the parser authors believe that few document authors will use it, and authors therefore avoid writing XML 1.1 because of concerns over parser support.

New Zealand context

The Maori macronised characters were contained in Unicode 2.0 and therefore in the original XML 1.0 specification. XML 1.1 offers no advantages in the use of Maori macronised characters.

Back to 1.0?

The W3C XML Core Working Group appear to be considering what is effectively a back-port - taking the new element and attribute naming philosophy from XML 1.1 and introducing that as a revision to 1.0.

To paraphrase Paul Grosso, this would provide the major end user benefit currently achievable only by using XML 1.1.

Further resources:

http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=f2cab0fb-fd13-4914-949f-580e8d9ed170


Slashdot Digg Reddit del.icio.us Facebook Technorati Google StumbleUpon

1 Star2 Stars3 Stars4 Stars5 Stars (125 votes, average: 2.94 out of 5)

Tags: ,

Leave a Comment





Is rain wet or dry?