Blame: perlfaq9(1) - Waikato Linux Users Group

Annotated edit history of perlfaq9(1) version 2, including all changes. View license author blame.

Rev	Author	#	Line
1	perry	1	`PERLFAQ9`
		2	`!!!PERLFAQ9`
		3	`NAME`
		4	`DESCRIPTION`
		5	`AUTHOR AND COPYRIGHT`
		6	`----`
		7	`!!NAME`
		8
		9
		10	`perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $)`
		11	`!!DESCRIPTION`
		12
		13
		14	`This section deals with questions related to networking, the`
		15	`internet, and a few on the web.`
		16
		17
		18	`__My CGI script runs from the command line`
		19	`but not the browser. (500 Server Error)__`
		20
		21
		22	`If you can demonstrate that you've read the following FAQs`
		23	`and that your problem isn't something simple that can be`
		24	`easily answered, you'll probably receive a courteous and`
		25	`useful reply to your question if you post it on`
		26	`comp.infosystems.www.authoring.cgi (if it's something to do`
		27	`with HTTP , HTML , or the`
		28	`CGI protocols). Questions that appear to be`
		29	`Perl questions but are really CGI ones that`
		30	`are posted to comp.lang.perl.misc may not be so well`
		31	`received.`
		32
		33
		34	`The useful FAQs and related documents are:`
		35
		36
		37	`CGI FAQ`
		38	`http://www.webthing.com/tutorials/cgifaq.html`
		39	`Web FAQ`
		40	`http://www.boutell.com/faq/`
		41	`WWW Security FAQ`
		42	`http://www.w3.org/Security/Faq/`
		43	`HTTP Spec`
		44	`http://www.w3.org/pub/WWW/Protocols/HTTP/`
		45	`HTML Spec`
		46	`http://www.w3.org/TR/REC-html40/`
2	perry	47	`http://www.w3.org/pub/WWW/!MarkUp/`
1	perry	48	`CGI Spec`
		49	`http://www.w3.org/CGI/`
		50	`CGI Security FAQ`
		51	`http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt`
		52
		53
		54	`__How can I get better error messages from a`
		55	`CGI program?__`
		56
		57
		58	`Use the CGI::Carp module. It replaces warn and`
		59	`die, plus the normal Carp modules carp,`
		60	`croak, and confess functions with more`
		61	`verbose and safer versions. It still sends them to the`
		62	`normal server error log.`
		63
		64
		65	`use CGI::Carp;`
		66	`warn`
		67	`The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to catch compile-time warnings as well:`
		68
		69
		70	`BEGIN {`
		71	`use CGI::Carp qw(carpout);`
		72	`open(LOG,`
		73	`You can even arrange for fatal errors to go back to the client browser, which is nice for your own debugging, but might confuse the end user.`
		74
		75
		76	`use CGI::Carp qw(fatalsToBrowser);`
		77	`die`
		78	`Even if the error happens before you get the HTTP header out, the module will try to take care of this to avoid the dreaded server 500 errors. Normal warnings still go out to the server error log (or wherever you've sent them with carpout) with the application name and date stamp prepended.`
		79
		80
		81	`__How do I remove HTML from a`
		82	`string?__`
		83
		84
		85	`The most correct way (albeit not the fastest) is to use`
		86	`HTML::Parser from CPAN . Another mostly`
2	perry	87	`correct way is to use HTML::!FormatText which not only`
1	perry	88	`removes HTML but also attempts to do a little`
		89	`simple formatting of the resulting plain text.`
		90
		91
		92	`Many folks attempt a simple-minded regular expression`
		93	`approach, like s/, but that fails in`
		94	`many cases because the tags may continue over line breaks,`
		95	`they may contain quoted angle-brackets, or`
		96	`HTML comment may be present. Plus, folks`
		97	`forget to convert entities--like for`
		98	`example.`
		99
		100
		101	Here's one ``simple-minded'' approach, that works for most
		102	`files:`
		103
		104
		105	`#!/usr/bin/perl -p0777`
		106	`s/`
		107	`If you want a more complete solution, see the 3-stage striphtml program in http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz .`
		108
		109
		110	`Here are some tricky cases that you should think about when`
		111	`picking a solution:`
		112
		113
		114
		115
		116
		117
		118
		119
		120	`If HTML comments include other tags, those solutions would also break on text like this:`
		121
		122
		123
		124
		125
		126	`__How do I extract URLs?__`
		127
		128
		129	`A quick but imperfect approach is`
		130
		131
		132	`#!/usr/bin/perl -n00`
		133	`# qxurl - tchrist@perl.com`
		134	`print`
		135	This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal with HREF and NAME attributes in the same tag, understand extra qualifiers like TARGET , or accept URLs themselves as arguments. It also runs about 100x faster than a more ``complete'' solution using the LWP suite of modules, such as the http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
		136
		137
		138	`__How do I download a file from the user's machine? How do`
		139	`I open a file on another machine?__`
		140
		141
		142	`In the context of an HTML form, you can use`
		143	`what's known as __multipart/form-data__ encoding. The`
		144	`CGI .pm module (available from`
		145	`CPAN ) supports this in the`
		146	`''start_multipart_form()'' method, which isn't the same`
		147	`as the ''startform()'' method.`
		148
		149
		150	`__How do I make a pop-up menu in HTML`
		151	`?__`
		152
		153
		154	`Use the ____SELECT and __`
		155	`__OPTION tags. The CGI`
		156	`.pm module (available from CPAN ) supports`
		157	`this widget, as well as many others, including some that it`
		158	`cleverly synthesizes on its own.`
		159
		160
		161	`__How do I fetch an HTML`
		162	`file?__`
		163
		164
		165	`One approach, if you have the lynx text-based`
		166	`HTML browser installed on your system, is`
		167	`this:`
		168
		169
		170	$html_code = `lynx -source $url`;
		171	$text_data = `lynx -dump $url`;
		172	`The libwww-perl ( LWP ) modules from CPAN provide a more powerful way to do this. They don't require lynx, but like lynx, can still work through proxies:`
		173
		174
		175	`# simplest version`
		176	`use LWP::Simple;`
		177	`$content = get($URL);`
		178	`# or print HTML from a URL`
		179	`use LWP::Simple;`
		180	`getprint`
		181	`# or print ASCII from HTML from a URL`
		182	`# also need HTML-Tree package from CPAN`
		183	`use LWP::Simple;`
		184	`use HTML::Parser;`
2	perry	185	`use HTML::!FormatText;`
1	perry	186	`my ($html, $ascii);`
		187	`$html = get(`
		188
		189
		190	`__How do I automate an HTML form`
		191	`submission?__`
		192
		193
		194	`If you're submitting values using the GET`
		195	`method, create a URL and encode the form`
		196	`using the query_form method:`
		197
		198
		199	`use LWP::Simple;`
		200	`use URI::URL;`
		201	`my $url = url('http://www.perl.com/cgi-bin/cpan_mod');`
		202	`$url-`
		203	`If you're using the POST method, create your own user agent and encode the content appropriately.`
		204
		205
		206	`use HTTP::Request::Common qw(POST);`
2	perry	207	`use LWP::!UserAgent;`
		208	`$ua = LWP::!UserAgent-`
1	perry	209
		210
		211	`__How do I decode or create those %-encodings on the`
		212	`web?__`
		213
		214
		215	`If you are writing a CGI script, you should`
		216	`be using the CGI .pm module that comes with`
		217	`perl, or some other equivalent module. The`
		218	`CGI module automatically decodes queries for`
		219	`you, and provides an ''escape()'' function to handle`
		220	`encoding.`
		221
		222
		223	`The best source of detailed information on`
		224	`URI encoding is RFC 2396.`
		225	`Basically, the following substitutions do it:`
		226
		227
		228	`s/([[^w()'*~!.-])/sprintf '%%%02x', $1/eg; # encode`
		229	`s/%([[A-Fa-fd]{2})/chr hex $1/eg; # decode`
		230	`However, you should only apply them to individual URI components, not the entire URI , otherwise you'll lose information and generally mess things up. If that didn't explain it, don't worry. Just go read section 2 of the RFC , it's probably the best explanation there is.`
		231
		232
		233	`RFC 2396 also contains a lot of other useful`
		234	`information, including a regexp for breaking any arbitrary`
		235	`URI into components (Appendix`
		236	`B).`
		237
		238
		239	`__How do I redirect to another page?__`
		240
		241
		242	According to RFC 2616, ``Hypertext Transfer
		243	`Protocol -- HTTP/1 .1'', the preferred method`
		244	`is to send a Location: header instead of a`
		245	`Content-Type: header:`
		246
		247
		248	`Location: http://www.domain.com/newpage`
		249	Note that relative URLs in these headers can cause strange effects because of ``optimizations'' that servers do.
		250
		251
		252	`$url =`
		253	To target a particular frame in a frameset, include the ``Window-target:'' in the header.
		254
		255
		256	`print`
		257	`EOF`
		258	`To be correct to the spec, each of those virtual newlines should really be physical sequences by the time your message is received by the client browser. Except for NPH scripts, though, that local newline should get translated by your server into standard form, so you shouldn't have a problem here, even if you are stuck on MacOS. Everybody else probably won't even notice.`
		259
		260
		261	`__How do I put a password on my web pages?__`
		262
		263
		264	`That depends. You'll need to read the documentation for your`
		265	`web server, or perhaps check some of the other FAQs`
		266	`referenced above.`
		267
		268
		269	`__How do I edit my .htpasswd and .htgroup files with`
		270	`Perl?__`
		271
		272
2	perry	273	`The HTTPD::!UserAdmin and HTTPD::!GroupAdmin modules provide a`
1	perry	274	`consistent OO interface to these files,`
		275	`regardless of how they're stored. Databases may be text,`
		276	`dbm, Berkley DB or any database with a`
2	perry	277	`DBI compatible driver. HTTPD::!UserAdmin`
1	perry	278	supports files used by the `Basic' and `Digest'
		279	`authentication schemes. Here's an example:`
		280
		281
2	perry	282	`use HTTPD::!UserAdmin ();`
		283	`HTTPD::!UserAdmin`
1	perry	284	`-`
		285
		286
		287	`__How do I make sure users can't enter values into a form`
		288	`that cause my CGI script to do bad`
		289	`things?__`
		290
		291
		292	`Read the CGI security FAQ , at`
		293	`http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html`
		294	`, and the Perl/CGI FAQ at`
		295	`http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html`
		296	`.`
		297
		298
		299	`In brief: use tainting (see perlsec), which makes sure that`
		300	`data from outside your script (eg, CGI`
		301	`parameters) are never used in eval or`
		302	`system calls. In addition to tainting, never use`
		303	`the single-argument form of ''system()'' or`
		304	`''exec()''. Instead, supply the command and arguments as`
		305	`a list, which prevents shell globbing.`
		306
		307
		308	`__How do I parse a mail header?__`
		309
		310
		311	`For a quick-and-dirty solution, try this solution derived`
		312	from ``split'' in perlfunc:
		313
		314
		315	`$/ = '';`
		316	`$header =`
2	perry	317	`That solution doesn't do well if, for example, you're trying to maintain all the Received lines. A more complete approach is to use the Mail::Header module from CPAN (part of the !MailTools package).`
1	perry	318
		319
		320	`__How do I decode a CGI`
		321	`form?__`
		322
		323
		324	`You use a standard module, probably CGI .pm.`
		325	`Under no circumstances should you attempt to do so by`
		326	`hand!`
		327
		328
		329	`You'll see a lot of CGI programs that blindly`
		330	`read from STDIN the number of bytes equal to`
		331	`CONTENT_LENGTH for POSTs, or grab`
		332	`QUERY_STRING for decoding GETs. These`
		333	`programs are very poorly written. They only work sometimes.`
		334	`They typically forget to check the return value of the`
		335	`''read()'' system call, which is a cardinal sin. They`
		336	`don't handle HEAD requests. They don't handle`
		337	`multipart forms used for file uploads. They don't deal with`
		338	`GET/POST combinations where query fields are`
		339	`in more than one place. They don't deal with keywords in the`
		340	`query string.`
		341
		342
		343	`In short, they're bad hacks. Resist them at all costs.`
		344	`Please do not be tempted to reinvent the wheel. Instead, use`
		345	`the CGI .pm or CGI_Lite.pm (available from`
		346	`CPAN ), or if you're trapped in the`
		347	`module-free land of perl1 .. perl4, you might look into`
		348	`cgi-lib.pl (available from`
		349	`http://cgi-lib.stanford.edu/cgi-lib/ ).`
		350
		351
		352	`Make sure you know whether to use a GET or a`
		353	`POST in your form. GETs should only be used`
		354	`for something that doesn't update the server. Otherwise you`
		355	`can get mangled databases and repeated feedback mail`
		356	messages. The fancy word for this is ``idempotency''. This
		357	`simply means that there should be no difference between`
		358	`making a GET request for a particular`
		359	`URL once or multiple times. This is because`
		360	`the HTTP protocol definition says that a`
		361	`GET request may be cached by the browser, or`
		362	`server, or an intervening proxy. POST`
		363	`requests cannot be cached, because each request is`
		364	`independent and matters. Typically, POST`
		365	`requests change or depend on state on the server (query or`
		366	`update a database, send mail, or purchase a`
		367	`computer).`
		368
		369
		370	`__How do I check a valid mail address?__`
		371
		372
		373	`You can't, at least, not in real time. Bummer,`
		374	`eh?`
		375
		376
		377	`Without sending mail to the address and seeing whether`
		378	`there's a human on the other hand to answer you, you cannot`
		379	`determine whether a mail address is valid. Even if you apply`
		380	`the mail header standard, you can have problems, because`
		381	`there are deliverable addresses that aren't`
		382	`RFC-822 (the mail header standard) compliant,`
		383	`and addresses that aren't deliverable which are`
		384	`compliant.`
		385
		386
		387	`Many are tempted to try to eliminate many frequently-invalid`
		388	`mail addresses with a simple regex, such as`
		389	`/^[[w.-]+@(?:[[w-]+.)+w+$/. It's a very bad idea.`
		390	`However, this also throws out many valid ones, and says`
		391	`nothing about potential deliverability, so it is not`
		392	`suggested. Instead, see`
		393	`http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz,`
		394	`which actually checks against the full RFC`
		395	`spec (except for nested comments), looks for addresses you`
		396	`may not wish to accept mail to (say, Bill Clinton or your`
		397	`postmaster), and then makes sure that the hostname given can`
		398	`be looked up in the DNS MX records. It's not`
		399	`fast, but it works for what it tries to do.`
		400
		401
		402	`Our best advice for verifying a person's mail address is to`
		403	`have them enter their address twice, just as you normally do`
		404	`to change a password. This usually weeds out typos. If both`
		405	`versions match, send mail to that address with a personal`
		406	`message that looks somewhat like:`
		407
		408
		409	`Dear someuser@host.com,`
		410	`Please confirm the mail address you gave us Wed May 6 09:38:41`
		411	`MDT 1998 by replying to this message. Include the string`
		412	`If you get the message back and they've followed your directions, you can be reasonably assured that it's real.`
		413
		414
		415	`A related strategy that's less open to forgery is to give`
		416	`them a PIN (personal ID`
		417	`number). Record the address and PIN (best`
		418	`that it be a random one) for later processing. In the mail`
		419	`you send, ask them to include the PIN in`
		420	`their reply. But if it bounces, or the message is included`
		421	via a ``vacation'' script, it'll be there anyway. So it's
		422	`best to ask them to mail back a slight alteration of the`
		423	`PIN , such as with the characters reversed,`
		424	`one added or subtracted to each digit, etc.`
		425
		426
		427	`__How do I decode a MIME/BASE64`
		428	`string?__`
		429
		430
		431	`The MIME-Base64 package (available from CPAN`
		432	`) handles this as well as the MIME/QP`
		433	`encoding. Decoding BASE64 becomes as simple`
		434	`as:`
		435
		436
		437	`use MIME::Base64;`
		438	`$decoded = decode_base64($encoded);`
		439	`The MIME-Tools package (available from CPAN ) supports extraction with decoding of BASE64 encoded attachments and content directly from email messages.`
		440
		441
		442	`If the string to decode is short (less than 84 bytes long) a`
		443	`more direct approach is to use the ''unpack()''`
		444	function's ``u'' format after minor
		445	`transliterations:`
		446
		447
		448	`tr#A-Za-z0-9+/##cd; # remove non-base64 chars`
		449	`tr#A-Za-z0-9+/# -_#; # convert to uuencoded format`
		450	`$len = pack(`
		451
		452
		453	`__How do I return the user's mail address?__`
		454
		455
		456	`On systems that support getpwuid, the $`
		457
		458
		459	`use Sys::Hostname;`
		460	$address = sprintf('%s@%s', scalar getpwuid($
		461	`Company policies on mail address can mean that this generates addresses that the company's mail system will not accept, so you should ask for users' mail addresses when this matters. Furthermore, not all systems on which Perl runs are so forthcoming with this information as is Unix.`
		462
		463
		464	`The Mail::Util module from CPAN (part of the`
2	perry	465	`!MailTools package) provides a ''mailaddress()'' function`
1	perry	466	`that tries to guess the mail address of the user. It makes a`
		467	`more intelligent guess than the code above, using`
		468	`information given when the module was installed, but it`
		469	`could still be incorrect. Again, the best way is often just`
		470	`to ask the user.`
		471
		472
		473	`__How do I send mail?__`
		474
		475
		476	`Use the sendmail program directly:`
		477
		478
		479	`open(SENDMAIL,`
		480	`Body of the message goes here after the blank line`
		481	`in as many lines as you like.`
		482	`EOF`
		483	`close(SENDMAIL) or warn`
		484	The __-oi__ option prevents sendmail from interpreting a line consisting of a single dot as ``end of message''. The __-t__ option says to use the headers to decide who to send the message to, and __-odq__ says to put the message into the queue. This last option means your message won't be immediately delivered, so leave it out if you want immediate delivery.
		485
		486
		487	`Alternate, less convenient approaches include calling mail`
		488	`(sometimes called mailx) directly or simply opening up port`
		489	`25 have having an intimate conversation between just you and`
		490	`the remote SMTP daemon, probably`
		491	`sendmail.`
		492
		493
		494	`Or you might be able use the CPAN module`
		495	`Mail::Mailer:`
		496
		497
		498	`use Mail::Mailer;`
		499	`$mailer = Mail::Mailer-`
		500	`The Mail::Internet module uses Net::SMTP which is less Unix-centric than Mail::Mailer, but less reliable. Avoid raw SMTP commands. There are many reasons to use a mail transport agent like sendmail. These include queueing, MX records, and security.`
		501
		502
		503	`__How do I use MIME to make an attachment to`
		504	`a mail message?__`
		505
		506
		507	`This answer is extracted directly from the MIME::Lite`
		508	`documentation. Create a multipart message (i.e., one with`
		509	`attachments).`
		510
		511
		512	`use MIME::Lite;`
		513	`### Create a new multipart message:`
		514	`$msg = MIME::Lite-`
		515	`### Add parts (each`
		516	`$text = $msg-`
		517	`MIME::Lite also includes a method for sending these things.`
		518
		519
		520	`$msg-`
		521	`This defaults to using sendmail(1) but can be customized to use SMTP via Net::SMTP.`
		522
		523
		524	`__How do I read mail?__`
		525
		526
		527	`While you could use the Mail::Folder module from`
2	perry	528	`CPAN (part of the !MailFolder package) or the`
1	perry	529	`Mail::Internet module from CPAN (also part of`
2	perry	530	`the !MailTools package), often a module is overkill. Here's a`
1	perry	531	`mail sorter.`
		532
		533
		534	`#!/usr/bin/perl`
		535	`# bysub1 - simple sort by subject`
		536	`my(@msgs, @sub);`
		537	`my $msgno = -1;`
		538	`$/ = ''; # paragraph reads`
		539	`while (`
		540	`Or more succinctly,`
		541
		542
		543	`#!/usr/bin/perl -n00`
		544	`# bysub2 - awkish sort-by-subject`
		545	`BEGIN { $msgno = -1 }`
		546	`$sub[[++$msgno] = (/^Subject:s(?:Re:s)(.)/mi)[[0] if /^From/m;`
		547	`$msg[[$msgno] .= $_;`
		548	`END { print @msg[[ sort { $sub[[$a] cmp $sub[[$b] $a`
		549
		550
		551	`__How do I find out my hostname/domainname/IP`
		552	`address?__`
		553
		554
		555	`The normal way to find your own hostname is to call the`
		556	`hostname` program. While sometimes expedient, this
		557	`has some problems, such as not knowing whether you've got`
		558	`the canonical name or not. It's one of those tradeoffs of`
		559	`convenience versus portability.`
		560
		561
		562	`The Sys::Hostname module (part of the standard perl`
		563	`distribution) will give you the hostname after which you can`
		564	`find out the IP address (assuming you have`
		565	`working DNS ) with a ''gethostbyname()''`
		566	`call.`
		567
		568
		569	`use Socket;`
		570	`use Sys::Hostname;`
		571	`my $host = hostname();`
		572	`my $addr = inet_ntoa(scalar gethostbyname($host 'localhost'));`
		573	`Probably the simplest way to learn your DNS domain name is to grok it out of /etc/resolv.conf, at least under Unix. Of course, this assumes several things about your resolv.conf configuration, including that it exists.`
		574
		575
		576	`(We still need a good DNS domain`
		577	`name-learning method for non-Unix systems.)`
		578
		579
		580	`__How do I fetch a news article or the active`
		581	`newsgroups?__`
		582
		583
		584	`Use the Net::NNTP or News::NNTPClient modules, both`
		585	`available from CPAN . This can make tasks`
		586	`like fetching the newsgroup list as simple as`
		587
		588
		589	`perl -MNews::NNTPClient`
		590	`-e 'print News::NNTPClient-`
		591
		592
		593	`__How do I fetch/put an FTP`
		594	`file?__`
		595
		596
		597	`LWP::Simple (available from CPAN ) can fetch`
		598	`but not put. Net::FTP (also available from`
		599	`CPAN ) is more complex but can put as well as`
		600	`fetch.`
		601
		602
		603	`__How can I do RPC in Perl?__`
		604
		605
		606	`A DCE::RPC module is being developed (but is`
		607	`not yet available) and will be released as part of the`
		608	`DCE-Perl package (available from CPAN ). The`
		609	`rpcgen suite, available from CPAN/authors/id/JAKE/, is an`
		610	`RPC stub generator and includes an`
		611	`RPC::ONC module.`
		612	`!!AUTHOR AND COPYRIGHT`
		613
		614
		615	`Copyright (c) 1997-1999 Tom Christiansen and Nathan`
		616	`Torkington. All rights reserved.`
		617
		618
		619	`When included as part of the Standard Version of Perl, or as`
		620	`part of its complete documentation whether printed or`
		621	`otherwise, this work may be distributed only under the terms`
		622	`of Perl's Artistic License. Any distribution of this file or`
		623	`derivatives thereof ''outside'' of that package require`
		624	`that special arrangements be made with copyright`
		625	`holder.`
		626
		627
		628	`Irrespective of its distribution, all code examples in this`
		629	`file are hereby placed into the public domain. You are`
		630	`permitted and encouraged to use this code in your own`
		631	`programs for fun or for profit as you see fit. A simple`
		632	`comment in the code giving credit would be courteous but is`
		633	`not required.`
		634	`----`

This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.

Last edited on Tuesday, June 4, 2002 12:22:34 am by "perry"

Edit PageHistory Diff Info LikePages