version 1, including all changes.
.
Rev |
Author |
# |
Line |
1 |
perry |
1 |
XMLWF |
|
|
2 |
!!!XMLWF |
|
|
3 |
NAME |
|
|
4 |
SYNOPSIS |
|
|
5 |
DESCRIPTION |
|
|
6 |
WELL-FORMED DOCUMENTS |
|
|
7 |
OPTIONS |
|
|
8 |
OUTPUT |
|
|
9 |
BUGS |
|
|
10 |
ALTERNATIVES |
|
|
11 |
SEE ALSO |
|
|
12 |
AUTHOR |
|
|
13 |
---- |
|
|
14 |
!!NAME |
|
|
15 |
|
|
|
16 |
|
|
|
17 |
xmlwf -- Determines if an XML document is well-formed |
|
|
18 |
!!SYNOPSIS |
|
|
19 |
|
|
|
20 |
|
|
|
21 |
__xmlwf__ [[__-s__] [[__-n__] [[__-p__] [[__-x__] |
|
|
22 |
[[__-e__ ''encoding] [[''__-w__''] [[''__-d__ |
|
|
23 |
''output-dir] [[''__-c__''] [[''__-m__''] |
|
|
24 |
[[''__-r__''] [[''__-t__''] [[file |
|
|
25 |
...]'' |
|
|
26 |
!!DESCRIPTION |
|
|
27 |
|
|
|
28 |
|
|
|
29 |
__xmlwf__ uses the Expat library to determine if an XML |
|
|
30 |
document is well-formed. It is non-validating. |
|
|
31 |
|
|
|
32 |
|
|
|
33 |
If you do not specify any files on the command-line, and you |
|
|
34 |
have a recent version of xmlwf, the input file will be read |
|
|
35 |
from stdin. |
|
|
36 |
!!WELL-FORMED DOCUMENTS |
|
|
37 |
|
|
|
38 |
|
|
|
39 |
A well-formed document must adhere to the following |
|
|
40 |
rules: |
|
|
41 |
|
|
|
42 |
|
|
|
43 |
The file begins with an XML declaration. For instance, |
|
|
44 |
__ |
|
|
45 |
__. ''NOTE:'' xmlwf does |
|
|
46 |
not currently check for a valid XML |
|
|
47 |
declaration. |
|
|
48 |
|
|
|
49 |
|
|
|
50 |
Every start tag is either empty ( |
|
|
51 |
|
|
|
52 |
|
|
|
53 |
There is exactly one root element. This element must contain |
|
|
54 |
all other elements in the document. Only comments, white |
|
|
55 |
space, and processing instructions may come after the close |
|
|
56 |
of the root element. |
|
|
57 |
|
|
|
58 |
|
|
|
59 |
All elements nest properly. |
|
|
60 |
|
|
|
61 |
|
|
|
62 |
All attribute values are enclosed in quotes (either single |
|
|
63 |
or double). |
|
|
64 |
|
|
|
65 |
|
|
|
66 |
If the document has a DTD, and it strictly complies with |
|
|
67 |
that DTD, then the document is also considered ''valid''. |
|
|
68 |
xmlwf is a non-validating parser -- it does not check the |
|
|
69 |
DTD. However, it does support external entities (see the -x |
|
|
70 |
option). |
|
|
71 |
!!OPTIONS |
|
|
72 |
|
|
|
73 |
|
|
|
74 |
When an option includes an argument, you may specify the |
|
|
75 |
argument either separate ( |
|
|
76 |
|
|
|
77 |
|
|
|
78 |
__-c__ If the input file is well-formed and xmlwf doesn't |
|
|
79 |
encounter any errors, the input file is simply copied to the |
|
|
80 |
output directory unchanged. This implies no namespaces |
|
|
81 |
(turns off -n) and requires -d to specify an output |
|
|
82 |
file. |
|
|
83 |
|
|
|
84 |
|
|
|
85 |
__-d output-dir__ |
|
|
86 |
|
|
|
87 |
|
|
|
88 |
Specifies a directory to contain transformed representations |
|
|
89 |
of the input files. By default, -d outputs a canonical |
|
|
90 |
representation (described below). You can select different |
|
|
91 |
output formats using -c and -m. |
|
|
92 |
|
|
|
93 |
|
|
|
94 |
The output filenames will be exactly the same as the input |
|
|
95 |
filenames or |
|
|
96 |
cat |
|
|
97 |
in most shells). |
|
|
98 |
|
|
|
99 |
|
|
|
100 |
Two structurally equivalent XML documents have a |
|
|
101 |
byte-for-byte identical canonical XML representation. Note |
|
|
102 |
that ignorable white space is considered significant and is |
|
|
103 |
treated equivalently to data. More on canonical XML can be |
|
|
104 |
found at http://www.jclark.com/xml/canonxml.html |
|
|
105 |
. |
|
|
106 |
|
|
|
107 |
|
|
|
108 |
__-e encoding__ |
|
|
109 |
|
|
|
110 |
|
|
|
111 |
Specifies the character encoding for the document, |
|
|
112 |
overriding any document encoding declaration. xmlwf has four |
|
|
113 |
built-in encodings: __US-ASCII__, __UTF-8__, |
|
|
114 |
__UTF-16__, and __ISO-8859-1__. Also see the -w |
|
|
115 |
option. |
|
|
116 |
|
|
|
117 |
|
|
|
118 |
__-m__ Outputs some strange sort of XML file that |
|
|
119 |
completely describes the the input file, including character |
|
|
120 |
postitions. Requires -d to specify an output |
|
|
121 |
file. |
|
|
122 |
|
|
|
123 |
|
|
|
124 |
__-n__ Turns on namespace processing. (describe |
|
|
125 |
namespaces) -c disables namespaces. |
|
|
126 |
|
|
|
127 |
|
|
|
128 |
__-p__ Tells xmlwf to process external DTDs and parameter |
|
|
129 |
entities. |
|
|
130 |
|
|
|
131 |
|
|
|
132 |
Normally xmlwf never parses parameter entities. -p tells it |
|
|
133 |
to always parse them. -p implies -x. |
|
|
134 |
|
|
|
135 |
|
|
|
136 |
__-r__ Normally xmlwf memory-maps the XML file before |
|
|
137 |
parsing. -r turns off memory-mapping and uses normal file IO |
|
|
138 |
calls instead. Of course, memory-mapping is automatically |
|
|
139 |
turned off when reading from STDIN. |
|
|
140 |
|
|
|
141 |
|
|
|
142 |
__-s__ Prints an error if the document is not standalone. |
|
|
143 |
A document is standalone if it has no external subset and no |
|
|
144 |
references to parameter entities. |
|
|
145 |
|
|
|
146 |
|
|
|
147 |
__-t__ Turns on timings. This tells Expat to parse the |
|
|
148 |
entire file, but not perform any processing. This gives a |
|
|
149 |
fairly accurate idea of the raw speed of Expat itself |
|
|
150 |
without client overhead. -t turns off most of the output |
|
|
151 |
options (-d, -m -c, ...) |
|
|
152 |
|
|
|
153 |
|
|
|
154 |
__-w__ Enables Windows code pages. Normally, xmlwf will |
|
|
155 |
throw an error if it runs across an encoding that it is not |
|
|
156 |
equipped to handle itself. With -w, xmlwf will try to use a |
|
|
157 |
Windows code page. See also -e. |
|
|
158 |
|
|
|
159 |
|
|
|
160 |
__-x__ Turns on parsing external entities. |
|
|
161 |
|
|
|
162 |
|
|
|
163 |
Non-validating parsers are not required to resolve external |
|
|
164 |
entities, or even expand entities at all. Expat always |
|
|
165 |
expands internal entities (?), but external entity parsing |
|
|
166 |
must be enabled explicitly. |
|
|
167 |
|
|
|
168 |
|
|
|
169 |
External entities are simply entities that obtain their data |
|
|
170 |
from outside the XML file currently being |
|
|
171 |
parsed. |
|
|
172 |
|
|
|
173 |
|
|
|
174 |
This is an example of an internal entity: |
|
|
175 |
|
|
|
176 |
|
|
|
177 |
|
|
|
178 |
|
|
|
179 |
And here are some examples of external |
|
|
180 |
entities: |
|
|
181 |
|
|
|
182 |
|
|
|
183 |
|
|
|
184 |
|
|
|
185 |
__--__ For some reason, xmlwf specifically ignores |
|
|
186 |
__ |
|
|
187 |
|
|
|
188 |
|
|
|
189 |
Older versions of xmlwf do not support reading from |
|
|
190 |
STDIN. |
|
|
191 |
!!OUTPUT |
|
|
192 |
|
|
|
193 |
|
|
|
194 |
If an input file is not well-formed, xmlwf outputs a single |
|
|
195 |
line describing the problem to STDOUT. If a file is well |
|
|
196 |
formed, xmlwf outputs nothing. Note that the result code is |
|
|
197 |
''not'' set. |
|
|
198 |
!!BUGS |
|
|
199 |
|
|
|
200 |
|
|
|
201 |
According to the W3C standard, an XML file without a |
|
|
202 |
declaration at the beginning is not considered well-formed. |
|
|
203 |
However, xmlwf allows this to pass. |
|
|
204 |
|
|
|
205 |
|
|
|
206 |
xmlwf returns a 0 - noerr result, even if the file is not |
|
|
207 |
well-formed. There is no good way for a program to use xmlwf |
|
|
208 |
to quickly check a file -- it must parse xmlwf's |
|
|
209 |
STDOUT. |
|
|
210 |
|
|
|
211 |
|
|
|
212 |
The errors should go to STDERR, not stdout. |
|
|
213 |
|
|
|
214 |
|
|
|
215 |
There should be a way to get -d to send its output to STDOUT |
|
|
216 |
rather than forcing the user to send it to a |
|
|
217 |
file. |
|
|
218 |
|
|
|
219 |
|
|
|
220 |
I have no idea why anyone would want to use the -d, -c and |
|
|
221 |
-m options. If someone could explain it to me, I'd like to |
|
|
222 |
add this information to this manpage. |
|
|
223 |
!!ALTERNATIVES |
|
|
224 |
|
|
|
225 |
|
|
|
226 |
Here are some XML validators on the web: |
|
|
227 |
|
|
|
228 |
|
|
|
229 |
http://www.hcrc.ed.ac.uk/~richard/xml-check.html |
|
|
230 |
http://www.stg.brown.edu/service/xmlvalid/ |
|
|
231 |
http://www.scripting.com/frontier5/xml/code/xmlValidator.html |
|
|
232 |
http://www.xml.com/pub/a/tools/ruwf/check.html |
|
|
233 |
(on a page with no less than 15 ads! Shame!) |
|
|
234 |
!!SEE ALSO |
|
|
235 |
|
|
|
236 |
|
|
|
237 |
The Expat home page: http://expat.sourceforge.net/ |
|
|
238 |
The W3 XML specification: http://www.w3.org/TR/REC-xml |
|
|
239 |
!!AUTHOR |
|
|
240 |
|
|
|
241 |
|
|
|
242 |
This manual page was written by Scott Bronson |
|
|
243 |
bronson@rinspin.com for the __Debian GNU/Linux__ system |
|
|
244 |
(but may be used by others). Permission is granted to copy, |
|
|
245 |
distribute and/or modify this document under the terms of |
|
|
246 |
the GNU Free Documentation License, Version |
|
|
247 |
1.1. |
|
|
248 |
---- |