Penguin
Blame: html2text(1)
EditPageHistoryDiffInfoLikePages
Annotated edit history of html2text(1) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 html2text
2 !!!html2text
3 NAME
4 SYNOPSIS
5 DESCRIPTION
6 OPTIONS
7 FILES
8 CONFORMING TO
9 NOTES
10 RESTRICTIONS
11 AUTHOR
12 SEE ALSO
13 ----
14 !!NAME
15
16
17 html2text - an advanced HTML-to-text converter
18 !!SYNOPSIS
19
20
21 __html2text -help
22 html2text -version
23 html2text__ [[ __-unparse__ | __-check__ ] [[
24 __-debug-scanner__ ] [[ __-debug-parser__ ] [[
25 __-rcfile__ ''path'' ] [[ __-style__ (
26 __compact__ | __pretty__ ) ] [[ __-width__
27 ''width'' ] [[ __-o__ ''output-file'' ] [[
28 __-nobs__ ] [[ ''input-uri'' ... ]
29 !!DESCRIPTION
30
31
32 __html2text__ reads HTML 3.2 documents from the
33 ''input-uri''s, formats each into a stream of plain text
34 characters (__ISO 8859-1__) and writes the result to
35 standard output (or into ''output-file'', if the
36 __-o__ command line option is used).
37
38
39 Documents that are specified by an URI that begins with
40 RFC 1738__) are retrieved with the
41 Hypertext Transfer Protocol (__RFC 1945__). URIs that
42 begin with
43 __
44
45
46 If no ''input-uri''s are specified on the command line,
47 __html2text__ reads from standard input. A dash as the
48 ''input-uri'' is an alternate way to specify standard
49 input.
50
51
52 __html2text__ understands all HTML 3.2 constructs, but
53 can render only part of them due to the limitations of the
54 text output format. However, the program attempts to provide
55 good substitutes for the elements it cannot render. It also
56 accepts syntactically incorrect input and attempts to
57 interpret it __
58
59
60 The way in that __html2text__ formats the HTML documents
61 is controlled by formatting properties read from an RC file.
62 __html2text__ attempts to read ''$HOME/.html2textrc''
63 (or the file specified by the __-rcfile__ command line
64 option); if that file cannot be read, __html2text__
65 attempts to read ''/etc/html2textrc''. If no RC file can
66 be read (or if the RC file does not override all formatting
67 properties), then
68 ''html2textrc__(5) manual page.
69 !!OPTIONS
70
71
72 __-help__
73
74
75 Print command line summary and exit.
76
77
78 __-version__
79
80
81 Print program version and exit.
82
83
84 __-unparse__
85
86
87 This option is for diagnostic purposes: Instead of
88 formatting the parsed document, generate HTML code, that is
89 guaranteed to be syntactically correct. If __html2text__
90 has problems parsing a syntactically incorrect HTML
91 document, this option may help you to understand what
92 __html2text__ thinks that the original HTML code
93 means.
94
95
96 __-check__
97
98
99 This option is for diagnostic purposes: The HTML document is
100 only parsed and not processed otherwise. In this mode of
101 operation, __html2text__ will report on parse errors and
102 scan errors, which it does not in other modes of operation.
103 Notice that parse and scan errors are not fatal for
104 __html2text__, but may cause mis-interpretation of the
105 HTML code and/or portions of the document being
106 swallowed.
107
108
109 __-debug-scanner__
110
111
112 While scanning the HTML document, __html2text__ reports
113 on each lexical token scanned. This option is for diagnostic
114 purposes.
115
116
117 __-debug-parser__
118
119
120 While scanning the HTML document, __html2text__ reports
121 on the tokens being shifted, rules being applied, etc. This
122 option is for diagnostic purposes.
123
124
125 __-rcfile__ ''path''
126
127
128 Attempt to read the file specified in ''path'' as RC
129 file.
130
131
132 __-style__ ( __compact__ | __pretty__
133 )
134
135
136 Style __pretty__ changes some of the default values of
137 the formatting parameters documented in
2 perry 138 html2textrc(5). To find out which and how the
1 perry 139 formatting parameter defaults are changed, check the file
140 __compact__ is assumed as default.
141
142
143 __-width__ ''width''
144
145
146 By default, __html2text__ formats the HTML documents for
147 a screen width of 79 characters. If redirecting the output
148 into a file, or if your terminal has a width other than 80
149 characters, or if you just want to get an idea how
150 __html2text__ deals with large tables and different
151 terminal widths, you may want to specify a different
152 ''width''.
153
154
155 __-o__ ''output-file''
156
157
158 Write the output to ''output-file'' instead of standard
159 output. A dash as the ''output-file'' is an alternate way
160 to specify the standard output.
161
162
163 __-nobs__
164
165
166 By default, __html2text__ renders underlined letters with
167 sequences like
168 more(1),
169 less(1), or similar. For other applications, or when
170 redirecting the output into a file, it may be desirable not
171 to render character attributes with such backspace
172 sequences, which can be specified with this command line
173 option.
174 !!FILES
175
176
177 ''/etc/html2textrc''
178
179
180 System wide parser configuration file.
181
182
183 ''$HOME/.html2textrc''
184
185
186 Personal parser configuration file, overrides the system
187 wide values.
188 !!CONFORMING TO
189
190
191 __HTML 3.2__ (HTML 3.2 Reference Specification -
192 http://www.w3.org/TR/REC-html32),__
193 RFC 1945__ (Hypertext Transfer Protocol -
194 HTTP).
195 !!NOTES
196
197
198 __html2text__ undergoes considerable effort to parse
199 syntactically incorrect input, but is not always as
200 successful as other HTML processors. If you have the
201 possibility to correct the HTML source code, you may want to
202 use the __-unparse__ or __-check__ options to find out
203 what exactly __html2text__'s problem is.
204 !!RESTRICTIONS
205
206
207 __html2text__ provides only a basic implementation of the
208 Hypertext Transfer Protocol (HTTP). It requires the complete
209 and exactly matching URI to be given as argument and will
210 not follow redirections (HTTP 301/ 307).
211 !!AUTHOR
212
213
214 __html2text__ was written up to version 1.2.2 by Arno
215 Unkrig
216 __
217
218
219 Current maintainer and primary download location is:
220 Martin Bayer
221 http://userpage.fu-berlin.de/~mbayer/tools/html2text.html
222 !!SEE ALSO
223
224
2 perry 225 html2textrc(5), less(1),
1 perry 226 more(1)
227 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.