Penguin
Annotated edit history of perlfaq9(1) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 PERLFAQ9
2 !!!PERLFAQ9
3 NAME
4 DESCRIPTION
5 AUTHOR AND COPYRIGHT
6 ----
7 !!NAME
8
9
10 perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $)
11 !!DESCRIPTION
12
13
14 This section deals with questions related to networking, the
15 internet, and a few on the web.
16
17
18 __My CGI script runs from the command line
19 but not the browser. (500 Server Error)__
20
21
22 If you can demonstrate that you've read the following FAQs
23 and that your problem isn't something simple that can be
24 easily answered, you'll probably receive a courteous and
25 useful reply to your question if you post it on
26 comp.infosystems.www.authoring.cgi (if it's something to do
27 with HTTP , HTML , or the
28 CGI protocols). Questions that appear to be
29 Perl questions but are really CGI ones that
30 are posted to comp.lang.perl.misc may not be so well
31 received.
32
33
34 The useful FAQs and related documents are:
35
36
37 CGI FAQ
38 http://www.webthing.com/tutorials/cgifaq.html
39 Web FAQ
40 http://www.boutell.com/faq/
41 WWW Security FAQ
42 http://www.w3.org/Security/Faq/
43 HTTP Spec
44 http://www.w3.org/pub/WWW/Protocols/HTTP/
45 HTML Spec
46 http://www.w3.org/TR/REC-html40/
2 perry 47 http://www.w3.org/pub/WWW/!MarkUp/
1 perry 48 CGI Spec
49 http://www.w3.org/CGI/
50 CGI Security FAQ
51 http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
52
53
54 __How can I get better error messages from a
55 CGI program?__
56
57
58 Use the CGI::Carp module. It replaces warn and
59 die, plus the normal Carp modules carp,
60 croak, and confess functions with more
61 verbose and safer versions. It still sends them to the
62 normal server error log.
63
64
65 use CGI::Carp;
66 warn
67 The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to catch compile-time warnings as well:
68
69
70 BEGIN {
71 use CGI::Carp qw(carpout);
72 open(LOG,
73 You can even arrange for fatal errors to go back to the client browser, which is nice for your own debugging, but might confuse the end user.
74
75
76 use CGI::Carp qw(fatalsToBrowser);
77 die
78 Even if the error happens before you get the HTTP header out, the module will try to take care of this to avoid the dreaded server 500 errors. Normal warnings still go out to the server error log (or wherever you've sent them with carpout) with the application name and date stamp prepended.
79
80
81 __How do I remove HTML from a
82 string?__
83
84
85 The most correct way (albeit not the fastest) is to use
86 HTML::Parser from CPAN . Another mostly
2 perry 87 correct way is to use HTML::!FormatText which not only
1 perry 88 removes HTML but also attempts to do a little
89 simple formatting of the resulting plain text.
90
91
92 Many folks attempt a simple-minded regular expression
93 approach, like s/, but that fails in
94 many cases because the tags may continue over line breaks,
95 they may contain quoted angle-brackets, or
96 HTML comment may be present. Plus, folks
97 forget to convert entities--like for
98 example.
99
100
101 Here's one ``simple-minded'' approach, that works for most
102 files:
103
104
105 #!/usr/bin/perl -p0777
106 s/
107 If you want a more complete solution, see the 3-stage striphtml program in http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz .
108
109
110 Here are some tricky cases that you should think about when
111 picking a solution:
112
113
114
115
116
117
118
119
120 If HTML comments include other tags, those solutions would also break on text like this:
121
122
123
124
125
126 __How do I extract URLs?__
127
128
129 A quick but imperfect approach is
130
131
132 #!/usr/bin/perl -n00
133 # qxurl - tchrist@perl.com
134 print
135 This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal with HREF and NAME attributes in the same tag, understand extra qualifiers like TARGET , or accept URLs themselves as arguments. It also runs about 100x faster than a more ``complete'' solution using the LWP suite of modules, such as the http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
136
137
138 __How do I download a file from the user's machine? How do
139 I open a file on another machine?__
140
141
142 In the context of an HTML form, you can use
143 what's known as __multipart/form-data__ encoding. The
144 CGI .pm module (available from
145 CPAN ) supports this in the
146 ''start_multipart_form()'' method, which isn't the same
147 as the ''startform()'' method.
148
149
150 __How do I make a pop-up menu in HTML
151 ?__
152
153
154 Use the ____SELECT and __
155 __OPTION tags. The CGI
156 .pm module (available from CPAN ) supports
157 this widget, as well as many others, including some that it
158 cleverly synthesizes on its own.
159
160
161 __How do I fetch an HTML
162 file?__
163
164
165 One approach, if you have the lynx text-based
166 HTML browser installed on your system, is
167 this:
168
169
170 $html_code = `lynx -source $url`;
171 $text_data = `lynx -dump $url`;
172 The libwww-perl ( LWP ) modules from CPAN provide a more powerful way to do this. They don't require lynx, but like lynx, can still work through proxies:
173
174
175 # simplest version
176 use LWP::Simple;
177 $content = get($URL);
178 # or print HTML from a URL
179 use LWP::Simple;
180 getprint
181 # or print ASCII from HTML from a URL
182 # also need HTML-Tree package from CPAN
183 use LWP::Simple;
184 use HTML::Parser;
2 perry 185 use HTML::!FormatText;
1 perry 186 my ($html, $ascii);
187 $html = get(
188
189
190 __How do I automate an HTML form
191 submission?__
192
193
194 If you're submitting values using the GET
195 method, create a URL and encode the form
196 using the query_form method:
197
198
199 use LWP::Simple;
200 use URI::URL;
201 my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
202 $url-
203 If you're using the POST method, create your own user agent and encode the content appropriately.
204
205
206 use HTTP::Request::Common qw(POST);
2 perry 207 use LWP::!UserAgent;
208 $ua = LWP::!UserAgent-
1 perry 209
210
211 __How do I decode or create those %-encodings on the
212 web?__
213
214
215 If you are writing a CGI script, you should
216 be using the CGI .pm module that comes with
217 perl, or some other equivalent module. The
218 CGI module automatically decodes queries for
219 you, and provides an ''escape()'' function to handle
220 encoding.
221
222
223 The best source of detailed information on
224 URI encoding is RFC 2396.
225 Basically, the following substitutions do it:
226
227
228 s/([[^w()'*~!.-])/sprintf '%%%02x', $1/eg; # encode
229 s/%([[A-Fa-fd]{2})/chr hex $1/eg; # decode
230 However, you should only apply them to individual URI components, not the entire URI , otherwise you'll lose information and generally mess things up. If that didn't explain it, don't worry. Just go read section 2 of the RFC , it's probably the best explanation there is.
231
232
233 RFC 2396 also contains a lot of other useful
234 information, including a regexp for breaking any arbitrary
235 URI into components (Appendix
236 B).
237
238
239 __How do I redirect to another page?__
240
241
242 According to RFC 2616, ``Hypertext Transfer
243 Protocol -- HTTP/1 .1'', the preferred method
244 is to send a Location: header instead of a
245 Content-Type: header:
246
247
248 Location: http://www.domain.com/newpage
249 Note that relative URLs in these headers can cause strange effects because of ``optimizations'' that servers do.
250
251
252 $url =
253 To target a particular frame in a frameset, include the ``Window-target:'' in the header.
254
255
256 print
257 EOF
258 To be correct to the spec, each of those virtual newlines should really be physical sequences by the time your message is received by the client browser. Except for NPH scripts, though, that local newline should get translated by your server into standard form, so you shouldn't have a problem here, even if you are stuck on MacOS. Everybody else probably won't even notice.
259
260
261 __How do I put a password on my web pages?__
262
263
264 That depends. You'll need to read the documentation for your
265 web server, or perhaps check some of the other FAQs
266 referenced above.
267
268
269 __How do I edit my .htpasswd and .htgroup files with
270 Perl?__
271
272
2 perry 273 The HTTPD::!UserAdmin and HTTPD::!GroupAdmin modules provide a
1 perry 274 consistent OO interface to these files,
275 regardless of how they're stored. Databases may be text,
276 dbm, Berkley DB or any database with a
2 perry 277 DBI compatible driver. HTTPD::!UserAdmin
1 perry 278 supports files used by the `Basic' and `Digest'
279 authentication schemes. Here's an example:
280
281
2 perry 282 use HTTPD::!UserAdmin ();
283 HTTPD::!UserAdmin
1 perry 284 -
285
286
287 __How do I make sure users can't enter values into a form
288 that cause my CGI script to do bad
289 things?__
290
291
292 Read the CGI security FAQ , at
293 http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html
294 , and the Perl/CGI FAQ at
295 http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html
296 .
297
298
299 In brief: use tainting (see perlsec), which makes sure that
300 data from outside your script (eg, CGI
301 parameters) are never used in eval or
302 system calls. In addition to tainting, never use
303 the single-argument form of ''system()'' or
304 ''exec()''. Instead, supply the command and arguments as
305 a list, which prevents shell globbing.
306
307
308 __How do I parse a mail header?__
309
310
311 For a quick-and-dirty solution, try this solution derived
312 from ``split'' in perlfunc:
313
314
315 $/ = '';
316 $header =
2 perry 317 That solution doesn't do well if, for example, you're trying to maintain all the Received lines. A more complete approach is to use the Mail::Header module from CPAN (part of the !MailTools package).
1 perry 318
319
320 __How do I decode a CGI
321 form?__
322
323
324 You use a standard module, probably CGI .pm.
325 Under no circumstances should you attempt to do so by
326 hand!
327
328
329 You'll see a lot of CGI programs that blindly
330 read from STDIN the number of bytes equal to
331 CONTENT_LENGTH for POSTs, or grab
332 QUERY_STRING for decoding GETs. These
333 programs are very poorly written. They only work sometimes.
334 They typically forget to check the return value of the
335 ''read()'' system call, which is a cardinal sin. They
336 don't handle HEAD requests. They don't handle
337 multipart forms used for file uploads. They don't deal with
338 GET/POST combinations where query fields are
339 in more than one place. They don't deal with keywords in the
340 query string.
341
342
343 In short, they're bad hacks. Resist them at all costs.
344 Please do not be tempted to reinvent the wheel. Instead, use
345 the CGI .pm or CGI_Lite.pm (available from
346 CPAN ), or if you're trapped in the
347 module-free land of perl1 .. perl4, you might look into
348 cgi-lib.pl (available from
349 http://cgi-lib.stanford.edu/cgi-lib/ ).
350
351
352 Make sure you know whether to use a GET or a
353 POST in your form. GETs should only be used
354 for something that doesn't update the server. Otherwise you
355 can get mangled databases and repeated feedback mail
356 messages. The fancy word for this is ``idempotency''. This
357 simply means that there should be no difference between
358 making a GET request for a particular
359 URL once or multiple times. This is because
360 the HTTP protocol definition says that a
361 GET request may be cached by the browser, or
362 server, or an intervening proxy. POST
363 requests cannot be cached, because each request is
364 independent and matters. Typically, POST
365 requests change or depend on state on the server (query or
366 update a database, send mail, or purchase a
367 computer).
368
369
370 __How do I check a valid mail address?__
371
372
373 You can't, at least, not in real time. Bummer,
374 eh?
375
376
377 Without sending mail to the address and seeing whether
378 there's a human on the other hand to answer you, you cannot
379 determine whether a mail address is valid. Even if you apply
380 the mail header standard, you can have problems, because
381 there are deliverable addresses that aren't
382 RFC-822 (the mail header standard) compliant,
383 and addresses that aren't deliverable which are
384 compliant.
385
386
387 Many are tempted to try to eliminate many frequently-invalid
388 mail addresses with a simple regex, such as
389 /^[[w.-]+@(?:[[w-]+.)+w+$/. It's a very bad idea.
390 However, this also throws out many valid ones, and says
391 nothing about potential deliverability, so it is not
392 suggested. Instead, see
393 http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz,
394 which actually checks against the full RFC
395 spec (except for nested comments), looks for addresses you
396 may not wish to accept mail to (say, Bill Clinton or your
397 postmaster), and then makes sure that the hostname given can
398 be looked up in the DNS MX records. It's not
399 fast, but it works for what it tries to do.
400
401
402 Our best advice for verifying a person's mail address is to
403 have them enter their address twice, just as you normally do
404 to change a password. This usually weeds out typos. If both
405 versions match, send mail to that address with a personal
406 message that looks somewhat like:
407
408
409 Dear someuser@host.com,
410 Please confirm the mail address you gave us Wed May 6 09:38:41
411 MDT 1998 by replying to this message. Include the string
412 If you get the message back and they've followed your directions, you can be reasonably assured that it's real.
413
414
415 A related strategy that's less open to forgery is to give
416 them a PIN (personal ID
417 number). Record the address and PIN (best
418 that it be a random one) for later processing. In the mail
419 you send, ask them to include the PIN in
420 their reply. But if it bounces, or the message is included
421 via a ``vacation'' script, it'll be there anyway. So it's
422 best to ask them to mail back a slight alteration of the
423 PIN , such as with the characters reversed,
424 one added or subtracted to each digit, etc.
425
426
427 __How do I decode a MIME/BASE64
428 string?__
429
430
431 The MIME-Base64 package (available from CPAN
432 ) handles this as well as the MIME/QP
433 encoding. Decoding BASE64 becomes as simple
434 as:
435
436
437 use MIME::Base64;
438 $decoded = decode_base64($encoded);
439 The MIME-Tools package (available from CPAN ) supports extraction with decoding of BASE64 encoded attachments and content directly from email messages.
440
441
442 If the string to decode is short (less than 84 bytes long) a
443 more direct approach is to use the ''unpack()''
444 function's ``u'' format after minor
445 transliterations:
446
447
448 tr#A-Za-z0-9+/##cd; # remove non-base64 chars
449 tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
450 $len = pack(
451
452
453 __How do I return the user's mail address?__
454
455
456 On systems that support getpwuid, the $
457
458
459 use Sys::Hostname;
460 $address = sprintf('%s@%s', scalar getpwuid($
461 Company policies on mail address can mean that this generates addresses that the company's mail system will not accept, so you should ask for users' mail addresses when this matters. Furthermore, not all systems on which Perl runs are so forthcoming with this information as is Unix.
462
463
464 The Mail::Util module from CPAN (part of the
2 perry 465 !MailTools package) provides a ''mailaddress()'' function
1 perry 466 that tries to guess the mail address of the user. It makes a
467 more intelligent guess than the code above, using
468 information given when the module was installed, but it
469 could still be incorrect. Again, the best way is often just
470 to ask the user.
471
472
473 __How do I send mail?__
474
475
476 Use the sendmail program directly:
477
478
479 open(SENDMAIL,
480 Body of the message goes here after the blank line
481 in as many lines as you like.
482 EOF
483 close(SENDMAIL) or warn
484 The __-oi__ option prevents sendmail from interpreting a line consisting of a single dot as ``end of message''. The __-t__ option says to use the headers to decide who to send the message to, and __-odq__ says to put the message into the queue. This last option means your message won't be immediately delivered, so leave it out if you want immediate delivery.
485
486
487 Alternate, less convenient approaches include calling mail
488 (sometimes called mailx) directly or simply opening up port
489 25 have having an intimate conversation between just you and
490 the remote SMTP daemon, probably
491 sendmail.
492
493
494 Or you might be able use the CPAN module
495 Mail::Mailer:
496
497
498 use Mail::Mailer;
499 $mailer = Mail::Mailer-
500 The Mail::Internet module uses Net::SMTP which is less Unix-centric than Mail::Mailer, but less reliable. Avoid raw SMTP commands. There are many reasons to use a mail transport agent like sendmail. These include queueing, MX records, and security.
501
502
503 __How do I use MIME to make an attachment to
504 a mail message?__
505
506
507 This answer is extracted directly from the MIME::Lite
508 documentation. Create a multipart message (i.e., one with
509 attachments).
510
511
512 use MIME::Lite;
513 ### Create a new multipart message:
514 $msg = MIME::Lite-
515 ### Add parts (each
516 $text = $msg-
517 MIME::Lite also includes a method for sending these things.
518
519
520 $msg-
521 This defaults to using sendmail(1) but can be customized to use SMTP via Net::SMTP.
522
523
524 __How do I read mail?__
525
526
527 While you could use the Mail::Folder module from
2 perry 528 CPAN (part of the !MailFolder package) or the
1 perry 529 Mail::Internet module from CPAN (also part of
2 perry 530 the !MailTools package), often a module is overkill. Here's a
1 perry 531 mail sorter.
532
533
534 #!/usr/bin/perl
535 # bysub1 - simple sort by subject
536 my(@msgs, @sub);
537 my $msgno = -1;
538 $/ = ''; # paragraph reads
539 while (
540 Or more succinctly,
541
542
543 #!/usr/bin/perl -n00
544 # bysub2 - awkish sort-by-subject
545 BEGIN { $msgno = -1 }
546 $sub[[++$msgno] = (/^Subject:s*(?:Re:s*)*(.*)/mi)[[0] if /^From/m;
547 $msg[[$msgno] .= $_;
548 END { print @msg[[ sort { $sub[[$a] cmp $sub[[$b] $a
549
550
551 __How do I find out my hostname/domainname/IP
552 address?__
553
554
555 The normal way to find your own hostname is to call the
556 `hostname` program. While sometimes expedient, this
557 has some problems, such as not knowing whether you've got
558 the canonical name or not. It's one of those tradeoffs of
559 convenience versus portability.
560
561
562 The Sys::Hostname module (part of the standard perl
563 distribution) will give you the hostname after which you can
564 find out the IP address (assuming you have
565 working DNS ) with a ''gethostbyname()''
566 call.
567
568
569 use Socket;
570 use Sys::Hostname;
571 my $host = hostname();
572 my $addr = inet_ntoa(scalar gethostbyname($host 'localhost'));
573 Probably the simplest way to learn your DNS domain name is to grok it out of /etc/resolv.conf, at least under Unix. Of course, this assumes several things about your resolv.conf configuration, including that it exists.
574
575
576 (We still need a good DNS domain
577 name-learning method for non-Unix systems.)
578
579
580 __How do I fetch a news article or the active
581 newsgroups?__
582
583
584 Use the Net::NNTP or News::NNTPClient modules, both
585 available from CPAN . This can make tasks
586 like fetching the newsgroup list as simple as
587
588
589 perl -MNews::NNTPClient
590 -e 'print News::NNTPClient-
591
592
593 __How do I fetch/put an FTP
594 file?__
595
596
597 LWP::Simple (available from CPAN ) can fetch
598 but not put. Net::FTP (also available from
599 CPAN ) is more complex but can put as well as
600 fetch.
601
602
603 __How can I do RPC in Perl?__
604
605
606 A DCE::RPC module is being developed (but is
607 not yet available) and will be released as part of the
608 DCE-Perl package (available from CPAN ). The
609 rpcgen suite, available from CPAN/authors/id/JAKE/, is an
610 RPC stub generator and includes an
611 RPC::ONC module.
612 !!AUTHOR AND COPYRIGHT
613
614
615 Copyright (c) 1997-1999 Tom Christiansen and Nathan
616 Torkington. All rights reserved.
617
618
619 When included as part of the Standard Version of Perl, or as
620 part of its complete documentation whether printed or
621 otherwise, this work may be distributed only under the terms
622 of Perl's Artistic License. Any distribution of this file or
623 derivatives thereof ''outside'' of that package require
624 that special arrangements be made with copyright
625 holder.
626
627
628 Irrespective of its distribution, all code examples in this
629 file are hereby placed into the public domain. You are
630 permitted and encouraged to use this code in your own
631 programs for fun or for profit as you see fit. A simple
632 comment in the code giving credit would be courteous but is
633 not required.
634 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.