Penguin
Annotated edit history of WellFormed version 5, including all changes. View license author blame.
Rev Author # Line
5 AristotlePagaltzis 1 An [XML] document is WellFormed if
2 * all tags are closed
3 * they nest correctly
4 * all attribute values are quoted
5 * no invalid characters appear
6 * and a few more minor criteria are met.
1 StuartYeates 7
5 AristotlePagaltzis 8 This is not valid:
9
10 __<p>__A paragraph __<em><strong>__here__</em></strong>__.
11 __<p>__And another one there.
1 StuartYeates 12
2 AristotlePagaltzis 13 The paragraph tags are not closed, and the nesting for the emphasis and strong tags is wrong. To be WellFormed, this piece of the document has to be written like so:
1 StuartYeates 14
5 AristotlePagaltzis 15 __<p>__A paragraph __<strong><em>__here__</em></strong>__.__</p>__
16 __<p>__And another one there.__</p>__
1 StuartYeates 17
5 AristotlePagaltzis 18 In the second fragment, neither the __<b>__ nor the __<c>__ tag are closed. Unlike [SGML], [XML] does not allow tags to be automatically closed when the enclosing tag is closed. This is the reason why the __<p>__ tag in [HTML]/[XHTML] gives people grief -- in [HTML] you only need to put in the opened tags while in [XHTML] you need to put in both the opening and the closing tag.
1 StuartYeates 19
5 AristotlePagaltzis 20 There is no way to directly express non-nesting overlapping ranges in [XML]. While [SGML]'s [CONCUR] feature is a solution, it appears never to have been implemented, and has not been included in [XML].
1 StuartYeates 21
5 AristotlePagaltzis 22 If you need such, a way to express it might be something like
1 StuartYeates 23
5 AristotlePagaltzis 24 __<foo type="a" partof="1">__ ... __</foo>__
25 __<foo type="a" partof="1" partof="2">__ ... __</foo>__
26 __<foo type="b" partof="1">__ ... __</foo>__
2 AristotlePagaltzis 27
5 AristotlePagaltzis 28 If you are certain that this is too bulky (especially when you know you have a very large number of overlapping structures), a flat alternative might look like
2 AristotlePagaltzis 29
5 AristotlePagaltzis 30 __<a id="1"/>__ ... __<b id="2"/>__ ... __<a id="1"/>__ ... __<b id="2"/>__
2 AristotlePagaltzis 31
5 AristotlePagaltzis 32 You can reconstruct the tags as necessary from either form. The flattened form allows to apply some kinds of diff-like reasoning which are much more difficult on trees, but the structured form is generally easier to machine process, eg using [XSLT].
1 StuartYeates 33
5 AristotlePagaltzis 34 WellFormed [XML] differs from [Valid] [XML] in that [Valid] [XML] is not only WellFormed, but also has been (or could be) checked against a [Schema] or [DTD].