xmlwf
XMLWF(F)                                                 XMLWF(F)



NAME
       xmlwf -- Determines if an XML document is well-formed

SYNOPSIS
       xmlwf  [-s]   [-n]   [-p]   [-x]  [-e encoding]  [-w]  [-d
       output-dir]  [-c]  [-m]  [-r]  [-t]  [file ...]

DESCRIPTION
       xmlwf uses the Expat library to determine if an XML  docu-
       ment is well-formed.  It is non-validating.


       If  you  do not specify any files on the command-line, and
       you have a recent version of xmlwf, the input file will be
       read from stdin.


WELL-FORMED DOCUMENTS
       A well-formed document must adhere to the following rules:


          o  The  file  begins  with  an  XML  declaration.   For
             instance,  <?xml  version="1.0"  standalone="yes"?>.
             NOTE: xmlwf does not currently check for a valid XML
             declaration.


          o  Every  start  tag  is either empty (<tag/>) or has a
             corresponding end tag.


          o  There is exactly one  root  element.   This  element
             must  contain  all  other  elements in the document.
             Only comments, white space, and processing  instruc-
             tions  may come after the close of the root element.


          o  All elements nest properly.


          o  All attribute values are enclosed in quotes  (either
             single or double).


       If  the  document has a DTD, and it strictly complies with
       that DTD, then the  document  is  also  considered  valid.
       xmlwf  is a non-validating parser -- it does not check the
       DTD.  However, it does support external entities (see  the
       -x option).


OPTIONS
       When  an  option includes an argument, you may specify the
       argument either separate ("d output") or  mashed  ("-dout-
       put").  xmlwf supports both.

       -c        If  the  input  file  is  well-formed  and xmlwf
                 doesn't encounter any errors, the input file  is
                 simply copied to the output directory unchanged.
                 This implies no namespaces (turns  off  -n)  and
                 requires -d to specify an output file.


       -d output-dir
                 Specifies  a  directory  to  contain transformed
                 representations of the input files.  By default,
                 -d outputs a canonical representation (described
                 below).  You can select different output formats
                 using -c and -m.


                 The output filenames will be exactly the same as
                 the input filenames or "STDIN" if the  input  is
                 coming from STDIN.  Therefore, you must be care-
                 ful that the output file does not  go  into  the
                 same  directory  as  the input file.  Otherwise,
                 xmlwf will delete the input file before it  gen-
                 erates  the output file (just like running cat <
                 file > file in most shells).


                  Two structurally equivalent XML documents  have
                 a  byte-for-byte  identical canonical XML repre-
                 sentation.  Note that ignorable white  space  is
                 considered  significant  and  is treated equiva-
                 lently to data.  More on canonical  XML  can  be
                 found at http://www.jclark.com/xml/canonxml.html
                 .


       -e encoding
                 Specifies the character encoding for  the  docu-
                 ment,  overriding any document encoding declara-
                 tion.  xmlwf has four  built-in  encodings:  US-
                 ASCII,  UTF-8, UTF-16, and ISO-8859-1.  Also see
                 the -w option.


       -m        Outputs some strange sort of XML file that  com-
                 pletely  describes the the input file, including
                 character postitions.  Requires -d to specify an
                 output file.


       -n        Turns on namespace processing.  (describe names-
                 paces) -c disables namespaces.


       -p        Tells xmlwf to process external DTDs and parame-
                 ter entities.


                 Normally  xmlwf never parses parameter entities.
                 -p tells it to always parse  them.   -p  implies
                 -x.


       -r        Normally  xmlwf  memory-maps the XML file before
                 parsing.  -r turns off memory-mapping  and  uses
                 normal  file  IO calls instead.  Of course, mem-
                 ory-mapping is  automatically  turned  off  when
                 reading from STDIN.


       -s        Prints  an  error  if  the document is not stan-
                 dalone.  A document is standalone if it  has  no
                 external  subset  and no references to parameter
                 entities.


       -t        Turns on timings.  This tells Expat to parse the
                 entire  file,  but  not  perform any processing.
                 This gives a fairly accurate  idea  of  the  raw
                 speed  of  Expat itself without client overhead.
                 -t turns off most of the output options (-d,  -m
                 -c, ...)


       -w        Enables  Windows  code  pages.   Normally, xmlwf
                 will throw an error if it runs across an  encod-
                 ing  that  it  is not equipped to handle itself.
                 With -w, xmlwf will try to use  a  Windows  code
                 page.  See also -e.


       -x        Turns on parsing external entities.


                 Non-validating   parsers  are  not  required  to
                 resolve external entities, or even expand  enti-
                 ties  at  all.   Expat  always  expands internal
                 entities (?), but external entity  parsing  must
                 be enabled explicitly.


                 External   entities  are  simply  entities  that
                 obtain their data from outside the XML file cur-
                 rently being parsed.


                 This is an example of an internal entity:

       <!ENTITY vers '1.0.2'>


                 And here are some examples of external entities:


       <!ENTITY header SYSTEM "header-&vers;.xml">  (parsed)
       <!ENTITY logo SYSTEM "logo.png" PNG>         (unparsed)



       --        For some reason, xmlwf specifically ignores "--"
                 anywhere it appears on the command line.


       Older versions of xmlwf do not support reading from STDIN.


OUTPUT
       If an input file is not well-formed, xmlwf outputs a  sin-
       gle  line  describing the problem to STDOUT.  If a file is
       well formed, xmlwf outputs nothing.  Note that the  result
       code is not set.


BUGS
       According  to the W3C standard, an XML file without a dec-
       laration at the beginning is not  considered  well-formed.
       However, xmlwf allows this to pass.


       xmlwf  returns a 0 - noerr result, even if the file is not
       well-formed.  There is no good way for a  program  to  use
       xmlwf  to  quickly  check  a file -- it must parse xmlwf's
       STDOUT.


       The errors should go to STDERR, not stdout.


       There should be a way to get -d to send its output to STD-
       OUT rather than forcing the user to send it to a file.


       I have no idea why anyone would want to use the -d, -c and
       -m options.  If someone could explain it to me,  I'd  like
       to add this information to this manpage.


ALTERNATIVES
       Here are some XML validators on the web:


       http://www.hcrc.ed.ac.uk/~richard/xml-check.html
       http://www.stg.brown.edu/service/xmlvalid/
       http://www.scripting.com/frontier5/xml/code/xmlValidator.html
       http://www.xml.com/pub/a/tools/ruwf/check.html
            (on a page with no less than 15 ads!  Shame!)



SEE ALSO
       The Expat home page:        http://expat.sourceforge.net/
       The W3 XML specification:   http://www.w3.org/TR/REC-xml



AUTHOR
       This manual page was written by Scott Bronson bronson@rin-
       spin.com for the Debian GNU/Linux system (but may be  used
       by  others).   Permission  is  granted to copy, distribute
       and/or modify this document under the  terms  of  the  GNU
       Free Documentation License, Version 1.1.




                                                         XMLWF(F)