Penguin

An XML document is WellFormed if

  • all tags are closed
  • they nest correctly
  • all attribute values are quoted
  • no invalid characters appear
  • and a few more minor criteria are met.
This is not valid
<p>A paragraph <strong>here</strong>. <p>And another one there.
The paragraph tags are not closed, and the nesting for the emphasis and strong tags is wrong. To be WellFormed, this piece of the document has to be written like so
<p>A paragraph here.</p> <p>And another one there.</p>

In the second fragment, neither the <b> nor the <c> tag are closed. Unlike SGML, XML does not allow tags to be automatically closed when the enclosing tag is closed. This is the reason why the <p> tag in HTML/XHTML gives people grief -- in HTML you only need to put in the opened tags while in XHTML you need to put in both the opening and the closing tag.

There is no way to directly express non-nesting overlapping ranges in XML. While SGML's CONCUR feature is a solution, it appears never to have been implemented, and has not been included in XML.

If you need such, a way to express it might be something like

<foo type="a" partof="1"> ... </foo> <foo type="a" partof="1" partof="2"> ... </foo> <foo type="b" partof="1"> ... </foo>

If you are certain that this is too bulky (especially when you know you have a very large number of overlapping structures), a flat alternative might look like

<a id="1"/> ... <b id="2"/> ... <a id="1"/> ... <b id="2"/>

You can reconstruct the tags as necessary from either form. The flattened form allows to apply some kinds of diff-like reasoning which are much more difficult on trees, but the structured form is generally easier to machine process, eg using XSLT.

WellFormed XML differs from Valid XML in that Valid XML is not only WellFormed, but also has been (or could be) checked against a Schema or DTD.