Penguin
Note: You are viewing an old revision of this page. View the current version.
An XML document is WellFormed if it parses correctly. This means mainly that all tags must be closed, that they must be nested correctly, and that all attribute values must be quoted. This is not valid
<p>A paragraph <strong>here</strong>. <p>And another one there.
The paragraph tags are not closed, and the nesting for the emphasis and strong tags is wrong. To be WellFormed, this piece of the document has to be written like so
<p>A paragraph here.</p> <p>And another one there.</p>

In the second fragment, neither the <b> nor the <c> tag are closed. Unlike SGML, XML does not allow tags to be automatically closed when the enclosing tag is closed. This is the reason why the <p> tag in HTML/XHTML gives people grief---in HTML you only need to put in the opened tags while in XHTML you need to put in both the opening and the closing tag.

If you want non-nesting overlapping ranges you cannot use something like

<a> ... ... </a> ...

but should use something like

<a id="1"/> ... <b id="2"/> ... <a id="1"/> ... <b id="2"/>

instead, and then you can reconstruct either of the tags as necessary.

I disagree strongly with that practice. You are undercutting the purpose of XML by flattening the markup. Instead, you should use attributes to your advantage.

<foo type="a" partof="1"> ... </foo> <foo type="a" partof="1" partof="2"> ... </foo> <foo type="b" partof="1"> ... </foo>

Yes, it's bulkier. Try in some way to process markup written with both approaches using XSLT and you'll notice the difference.
--AristotlePagaltzis

WellFormed XML differs from Valid XML in that Valid XML has been (or could be) checked against a Schema or DTD.