Home
Main website
Display Sidebar
Hide Ads
Recent Changes
View Source:
html2text(1)
Edit
PageHistory
Diff
Info
LikePages
html2text !!!html2text NAME SYNOPSIS DESCRIPTION OPTIONS FILES CONFORMING TO NOTES RESTRICTIONS AUTHOR SEE ALSO ---- !!NAME html2text - an advanced HTML-to-text converter !!SYNOPSIS __html2text -help html2text -version html2text__ [[ __-unparse__ | __-check__ ] [[ __-debug-scanner__ ] [[ __-debug-parser__ ] [[ __-rcfile__ ''path'' ] [[ __-style__ ( __compact__ | __pretty__ ) ] [[ __-width__ ''width'' ] [[ __-o__ ''output-file'' ] [[ __-nobs__ ] [[ ''input-uri'' ... ] !!DESCRIPTION __html2text__ reads HTML 3.2 documents from the ''input-uri''s, formats each into a stream of plain text characters (__ISO 8859-1__) and writes the result to standard output (or into ''output-file'', if the __-o__ command line option is used). Documents that are specified by an URI that begins with RFC 1738__) are retrieved with the Hypertext Transfer Protocol (__RFC 1945__). URIs that begin with __ If no ''input-uri''s are specified on the command line, __html2text__ reads from standard input. A dash as the ''input-uri'' is an alternate way to specify standard input. __html2text__ understands all HTML 3.2 constructs, but can render only part of them due to the limitations of the text output format. However, the program attempts to provide good substitutes for the elements it cannot render. It also accepts syntactically incorrect input and attempts to interpret it __ The way in that __html2text__ formats the HTML documents is controlled by formatting properties read from an RC file. __html2text__ attempts to read ''$HOME/.html2textrc'' (or the file specified by the __-rcfile__ command line option); if that file cannot be read, __html2text__ attempts to read ''/etc/html2textrc''. If no RC file can be read (or if the RC file does not override all formatting properties), then ''html2textrc__(5) manual page. !!OPTIONS __-help__ Print command line summary and exit. __-version__ Print program version and exit. __-unparse__ This option is for diagnostic purposes: Instead of formatting the parsed document, generate HTML code, that is guaranteed to be syntactically correct. If __html2text__ has problems parsing a syntactically incorrect HTML document, this option may help you to understand what __html2text__ thinks that the original HTML code means. __-check__ This option is for diagnostic purposes: The HTML document is only parsed and not processed otherwise. In this mode of operation, __html2text__ will report on parse errors and scan errors, which it does not in other modes of operation. Notice that parse and scan errors are not fatal for __html2text__, but may cause mis-interpretation of the HTML code and/or portions of the document being swallowed. __-debug-scanner__ While scanning the HTML document, __html2text__ reports on each lexical token scanned. This option is for diagnostic purposes. __-debug-parser__ While scanning the HTML document, __html2text__ reports on the tokens being shifted, rules being applied, etc. This option is for diagnostic purposes. __-rcfile__ ''path'' Attempt to read the file specified in ''path'' as RC file. __-style__ ( __compact__ | __pretty__ ) Style __pretty__ changes some of the default values of the formatting parameters documented in html2textrc(5). To find out which and how the formatting parameter defaults are changed, check the file __compact__ is assumed as default. __-width__ ''width'' By default, __html2text__ formats the HTML documents for a screen width of 79 characters. If redirecting the output into a file, or if your terminal has a width other than 80 characters, or if you just want to get an idea how __html2text__ deals with large tables and different terminal widths, you may want to specify a different ''width''. __-o__ ''output-file'' Write the output to ''output-file'' instead of standard output. A dash as the ''output-file'' is an alternate way to specify the standard output. __-nobs__ By default, __html2text__ renders underlined letters with sequences like more(1), less(1), or similar. For other applications, or when redirecting the output into a file, it may be desirable not to render character attributes with such backspace sequences, which can be specified with this command line option. !!FILES ''/etc/html2textrc'' System wide parser configuration file. ''$HOME/.html2textrc'' Personal parser configuration file, overrides the system wide values. !!CONFORMING TO __HTML 3.2__ (HTML 3.2 Reference Specification - http://www.w3.org/TR/REC-html32),__ RFC 1945__ (Hypertext Transfer Protocol - HTTP). !!NOTES __html2text__ undergoes considerable effort to parse syntactically incorrect input, but is not always as successful as other HTML processors. If you have the possibility to correct the HTML source code, you may want to use the __-unparse__ or __-check__ options to find out what exactly __html2text__'s problem is. !!RESTRICTIONS __html2text__ provides only a basic implementation of the Hypertext Transfer Protocol (HTTP). It requires the complete and exactly matching URI to be given as argument and will not follow redirections (HTTP 301/ 307). !!AUTHOR __html2text__ was written up to version 1.2.2 by Arno Unkrig __ Current maintainer and primary download location is: Martin Bayer http://userpage.fu-berlin.de/~mbayer/tools/html2text.html !!SEE ALSO html2textrc(5), less(1), more(1) ----
2 pages link to
html2text(1)
:
Man1h
html2textrc(5)
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.