Penguin
Annotated edit history of perlfaq4(1) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 PERLFAQ4
2 !!!PERLFAQ4
3 NAME
4 DESCRIPTION
5 Data: Numbers
6 Data: Dates
7 Data: Strings
8 Data: Arrays
9 Data: Hashes (Associative Arrays)
10 Data: Misc
11 AUTHOR AND COPYRIGHT
12 ----
13 !!NAME
14
15
16 perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 20:37:49 $)
17 !!DESCRIPTION
18
19
20 The section of the FAQ answers questions
21 related to the manipulation of data as numbers, dates,
22 strings, arrays, hashes, and miscellaneous data
23 issues.
24 !!Data: Numbers
25
26
27 __Why am I getting long decimals (eg, 19.9499999999999)
28 instead of the numbers I should be getting (eg,
29 19.95)?__
30
31
32 The infinite set that a mathematician thinks of as the real
33 numbers can only be approximated on a computer, since the
34 computer only has a finite number of bits to store an
35 infinite number of, um, numbers.
36
37
38 Internally, your computer represents floating-point numbers
39 in binary. Floating-point numbers read in from a file or
40 appearing as literals in your program are converted from
41 their decimal floating-point representation (eg, 19.95) to
42 an internal binary representation.
43
44
45 However, 19.95 can't be precisely represented as a binary
46 floating-point number, just like 1/3 can't be exactly
47 represented as a decimal floating-point number. The
48 computer's binary representation of 19.95, therefore, isn't
49 exactly 19.95.
50
51
52 When a floating-point number gets printed, the binary
53 floating-point representation is converted back to decimal.
54 These decimal numbers are displayed in either the format you
55 specify with ''printf()'', or the current output format
56 for numbers. (See ``$#'' in perlvar if you use print.
57 $# has a different default value in Perl5 than it
58 did in Perl4. Changing $# yourself is
59 deprecated.)
60
61
62 This affects __all__ computer languages that represent
63 decimal floating-point numbers in binary, not just Perl.
64 Perl provides arbitrary-precision decimal numbers with the
2 perry 65 Math::!BigFloat module (part of the standard Perl
1 perry 66 distribution), but mathematical operations are consequently
67 slower.
68
69
70 To get rid of the superfluous digits, just use a format (eg,
71 printf() to get the
72 required precision. See ``Floating-point Arithmetic'' in
73 perlop.
74
75
76 __Why isn't my octal data interpreted
77 correctly?__
78
79
80 Perl only understands octal and hex numbers as such when
81 they occur as literals in your program. If they are read in
82 from somewhere and assigned, no automatic conversion takes
83 place. You must explicitly use ''oct()'' or ''hex()''
84 if you want the values converted. ''oct()'' interprets
85 both hex (``0x350'') numbers and octal ones (``0350'' or
86 even without the leading ``0'', like ``377''), while
87 ''hex()'' only converts hexadecimal ones, with or without
88 a leading ``0x'', like ``0x255'', ``3A'', ``ff'', or
89 ``deadbeef''.
90
91
92 This problem shows up most often when people try using
93 ''chmod()'', ''mkdir()'', ''umask()'', or
94 ''sysopen()'', which all want permissions in
95 octal.
96
97
98 chmod(644, $file); # WRONG -- perl -w catches this
99 chmod(0644, $file); # right
100
101
102 __Does Perl have a__ ''round()'' __function? What
103 about__ ''ceil()'' __and__ ''floor()''__? Trig
104 functions?__
105
106
107 Remember that ''int()'' merely truncates toward 0. For
108 rounding to a certain number of digits, ''sprintf()'' or
109 ''printf()'' is usually the easiest route.
110
111
112 printf(
113 The POSIX module (part of the standard Perl distribution) implements ''ceil()'', ''floor()'', and a number of other mathematical and trigonometric functions.
114
115
116 use POSIX;
117 $ceil = ceil(3.5); # 4
118 $floor = floor(3.5); # 3
119 In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex module. With 5.004, the Math::Trig module (part of the standard Perl distribution) implements the trigonometric functions. Internally it uses the Math::Complex module and some functions can break out from the real axis into the complex plane, for example the inverse sine of 2.
120
121
122 Rounding in financial applications can have serious
123 implications, and the rounding method used should be
124 specified precisely. In these cases, it probably pays not to
125 trust whichever system rounding is being used by Perl, but
126 to instead implement the rounding function you need
127 yourself.
128
129
130 To see why, notice how you'll still have an issue on
131 half-way-point alternation:
132
133
134 for ($i = 0; $i
135 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
136 0.8 0.8 0.9 0.9 1.0 1.0
137 Don't blame Perl. It's the same as in C. IEEE says we have to do this. Perl numbers whose absolute values are integers under 2**31 (on 32 bit machines) will work pretty much like mathematical integers. Other numbers are not guaranteed.
138
139
140 __How do I convert bits into ints?__
141
142
143 To turn a string of 1s and 0s like 10110110 into a
144 scalar containing its binary value, use the ''pack()''
145 and ''unpack()'' functions (documented in ``pack'' in
146 perlfunc and ``unpack'' in perlfunc):
147
148
149 $decimal = unpack('c', pack('B8', '10110110'));
150 This packs the string 10110110 into an eight bit binary structure. This is then unpacked as a character, which returns its ordinal value.
151
152
153 This does the same thing:
154
155
156 $decimal = ord(pack('B8', '10110110'));
157 Here's an example of going the other way:
158
159
160 $binary_string = unpack('B*',
161
162
163 __Why doesn't
164 __
165
166
167 The behavior of binary arithmetic operators depends on
168 whether they're used on numbers or strings. The operators
169 treat a string as a series of bits and work with that (the
170 string is the bit pattern
171 00110011). The operators work with the binary form
172 of a number (the number 3 is treated as the bit
173 pattern 00000011).
174
175
176 So, saying 11 performs the ``and''
177 operation on numbers (yielding 1). Saying
178 performs the
179 ``and'' operation on strings (yielding
180 ).
181
182
183 Most problems with and arise because the
184 programmer thinks they have a number but really it's a
185 string. The rest arise because the programmer
186 says:
187
188
189 if (
190 but a string consisting of two null bytes (the result of ) is not a false value in Perl. You need:
191
192
193 if ( (
194
195
196 __How do I multiply matrices?__
197
198
2 perry 199 Use the Math::Matrix or Math::!MatrixReal modules (available
1 perry 200 from CPAN ) or the PDL
201 extension (also available from CPAN
202 ).
203
204
205 __How do I perform an operation on a series of
206 integers?__
207
208
209 To call a function on each element in an array, and collect
210 the results, use:
211
212
213 @results = map { my_func($_) } @array;
214 For example:
215
216
217 @triple = map { 3 * $_ } @single;
218 To call a function on each element of an array, but ignore the results:
219
220
221 foreach $iterator (@array) {
222 some_func($iterator);
223 }
224 To call a function on each integer in a (small) range, you __can__ use:
225
226
227 @results = map { some_func($_) } (5 .. 25);
228 but you should be aware that the .. operator creates an array of all integers in the range. This can take a lot of memory for large ranges. Instead use:
229
230
231 @results = ();
232 for ($i=5; $i
233 This situation has been fixed in Perl5.005. Use of .. in a for loop will iterate over the range, without creating the entire range.
234
235
236 for my $i (5 .. 500_005) {
237 push(@results, some_func($i));
238 }
239 will not create a list of 500,000 integers.
240
241
242 __How can I output Roman numerals?__
243
244
245 Get the http://www.perl.com/CPAN/modules/by-module/Roman
246 module.
247
248
249 __Why aren't my random numbers random?__
250
251
252 If you're using a version of Perl before 5.004, you must
253 call srand once at the start of your program to
254 seed the random number generator. 5.004 and later
255 automatically call srand at the beginning. Don't
256 call srand more than once--you make your numbers
257 less random, rather than more.
258
259
260 Computers are good at being predictable and bad at being
261 random (despite appearances caused by bugs in your programs
262 :-). http://www.perl.com/CPAN/doc/FMTEYEWTK/random ,
263 courtesy of Tom Phoenix, talks more about this. John von
264 Neumann said, ``Anyone who attempts to generate random
265 numbers by deterministic means is, of course, living in a
266 state of sin.''
267
268
269 If you want numbers that are more random than rand
270 with srand provides, you should also check out the
2 perry 271 Math::!TrulyRandom module from CPAN . It uses
1 perry 272 the imperfections in your system's timer to generate random
273 numbers, but this takes quite a while. If you want a better
274 pseudorandom generator than comes with your operating
275 system, look at ``Numerical Recipes in C'' at
276 http://www.nr.com/ .
277 !!Data: Dates
278
279
280 __How do I find the
281 week-of-the-year/day-of-the-year?__
282
283
284 The day of the year is in the array returned by
285 ''localtime()'' (see ``localtime'' in
286 perlfunc):
287
288
289 $day_of_year = (localtime(time()))[[7];
290 or more legibly (in 5.004 or higher):
291
292
293 use Time::localtime;
294 $day_of_year = localtime(time())-
295 You can find the week of the year by dividing this by 7:
296
297
298 $week_of_year = int($day_of_year / 7);
299 Of course, this believes that weeks start at zero. The Date::Calc module from CPAN has a lot of date calculation functions, including day of the year, week of the year, and so on. Note that not all businesses consider ``week 1'' to be the same; for example, American businesses often consider the first week with a Monday in it to be Work Week #1, despite ISO 8601, which considers WW1 to be the first week with a Thursday in it.
300
301
302 __How do I find the current century or
303 millennium?__
304
305
306 Use the following simple functions:
307
308
309 sub get_century {
310 return int((((localtime(shift time))[[5] + 1999))/100);
311 }
312 sub get_millennium {
313 return 1+int((((localtime(shift time))[[5] + 1899))/1000);
314 }
315 On some systems, you'll find that the POSIX module's ''strftime()'' function has been extended in a non-standard way to use a %C format, which they sometimes claim is the ``century''. It isn't, because on most such systems, this is only the first two digits of the four-digit year, and thus cannot be used to reliably determine the current century or millennium.
316
317
318 __How can I compare two dates and find the
319 difference?__
320
321
322 If you're storing your dates as epoch seconds then simply
323 subtract one from the other. If you've got a structured date
324 (distinct year, day, month, hour, minute, seconds values),
325 then for reasons of accessibility, simplicity, and
326 efficiency, merely use either timelocal or timegm (from the
327 Time::Local module in the standard distribution) to reduce
328 structured dates to epoch seconds. However, if you don't
329 know the precise format of your dates, then you should
330 probably use either of the Date::Manip and Date::Calc
331 modules from CPAN before you go hacking up
332 your own parsing routine to handle arbitrary date
333 formats.
334
335
336 __How can I take a string and turn it into epoch
337 seconds?__
338
339
340 If it's a regular enough string that it always has the same
341 format, you can split it up and pass the parts to
342 timelocal in the standard Time::Local module.
343 Otherwise, you should look into the Date::Calc and
344 Date::Manip modules from CPAN .
345
346
347 __How can I find the Julian Day?__
348
349
2 perry 350 Use the Time::!JulianDay module (part of the Time-modules
1 perry 351 bundle available from CPAN .)
352
353
354 Before you immerse yourself too deeply in this, be sure to
355 verify that it is the ''Julian'' Day you really want. Are
356 you really just interested in a way of getting serial days
357 so that they can do date arithmetic? If you are interested
358 in performing date arithmetic, this can be done using either
359 Date::Manip or Date::Calc, without converting to Julian Day
360 first.
361
362
363 There is too much confusion on this issue to cover in this
364 FAQ , but the term is applied (correctly) to
365 a calendar now supplanted by the Gregorian Calendar, with
366 the Julian Calendar failing to adjust properly for leap
367 years on centennial years (among other annoyances). The term
368 is also used (incorrectly) to mean: [[1] days in the
369 Gregorian Calendar; and [[2] days since a particular starting
370 time or `epoch', usually 1970 in the Unix world and 1980 in
371 the MS-DOS/Windows world. If you find that it is not the
372 first meaning that you really want, then check out the
373 Date::Manip and Date::Calc modules. (Thanks to David Cassell
374 for most of this text.)
375
376
377 __How do I find yesterday's date?__
378
379
380 The time() function returns the current time in
381 seconds since the epoch. Take twenty-four hours off
382 that:
383
384
385 $yesterday = time() - ( 24 * 60 * 60 );
386 Then you can pass this to localtime() and get the individual year, month, day, hour, minute, seconds values.
387
388
389 Note very carefully that the code above assumes that your
390 days are twenty-four hours each. For most people, there are
391 two days a year when they aren't: the switch to and from
392 summer time throws this off. A solution to this issue is
393 offered by Russ Allbery.
394
395
396 sub yesterday {
397 my $now = defined $_[[0] ? $_[[0] : time;
398 my $then = $now - 60 * 60 * 24;
399 my $ndst = (localtime $now)[[8]
400
401
402 __Does Perl have a Year 2000 problem? Is Perl Y2K
403 compliant?__
404
405
406 Short answer: No, Perl does not have a Year 2000 problem.
407 Yes, Perl is Y2K compliant (whatever that means). The
408 programmers you've hired to use it, however, probably are
409 not.
410
411
412 Long answer: The question belies a true understanding of the
413 issue. Perl is just as Y2K compliant as your pencil--no
414 more, and no less. Can you use your pencil to write a
415 non-Y2K-compliant memo? Of course you can. Is that the
416 pencil's fault? Of course it isn't.
417
418
419 The date and time functions supplied with Perl (gmtime and
420 localtime) supply adequate information to determine the year
421 well beyond 2000 (2038 is when trouble strikes for 32-bit
422 machines). The year returned by these functions when used in
423 a list context is the year minus 1900. For years between
424 1910 and 1999 this ''happens'' to be a 2-digit decimal
425 number. To avoid the year 2000 problem simply do not treat
426 the year as a 2-digit number. It isn't.
427
428
429 When ''gmtime()'' and ''localtime()'' are used in
430 scalar context they return a timestamp string that contains
431 a fully-expanded year. For example, $timestamp =
432 gmtime(1005613200) sets $timestamp to ``Tue
433 Nov 13 01:00:00 2001''. There's no year 2000 problem
434 here.
435
436
437 That doesn't mean that Perl can't be used to create non-Y2K
438 compliant programs. It can. But so can your pencil. It's the
439 fault of the user, not the language. At the risk of
440 inflaming the NRA: ``Perl doesn't break Y2K,
441 people do.'' See http://language.perl.com/news/y2k.html for
442 a longer exposition.
443 !!Data: Strings
444
445
446 __How do I validate input?__
447
448
449 The answer to this question is usually a regular expression,
450 perhaps with auxiliary logic. See the more specific
451 questions (numbers, mail addresses, etc.) for
452 details.
453
454
455 __How do I unescape a string?__
456
457
458 It depends just what you mean by ``escape''.
459 URL escapes are dealt with in perlfaq9. Shell
460 escapes with the backslash (\) character are
461 removed with
462
463
464 s/\(.)/$1/g;
465 This won't expand or or any other special escapes.
466
467
468 __How do I remove consecutive pairs of
469 characters?__
470
471
472 To turn into
473 :
474
475
476 s/(.)1/$1/g; # add /s to include newlines
477 Here's a solution that turns ``abbcccd'' to ``abcd'':
478
479
480 y///cs; # y == tr, but shorter :-)
481
482
483 __How do I expand function calls in a
484 string?__
485
486
487 This is documented in perlref. In general, this is fraught
488 with quoting and readability problems, but it is possible.
489 To interpolate a subroutine call (in list context) into a
490 string:
491
492
493 print
494 If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:
495
496
497 print
498 Version 5.004 of Perl had a bug that gave list context to the expression in ${...}, but this is fixed in version 5.005.
499
500
501 See also ``How can I expand variables in text strings?'' in
502 this section of the FAQ .
503
504
505 __How do I find matching/nesting anything?__
506
507
508 This isn't something that can be done in one regular
509 expression, no matter how complicated. To find something
510 between two single characters, a pattern like
511 /x([[^x]*)x/ will get the intervening bits in
512 $1. For multiple ones, then something more like
513 /alpha(.*?)omega/ would be needed. But none of
514 these deals with nested patterns, nor can they. For that
515 you'll have to write a parser.
516
517
518 If you are serious about writing a parser, there are a
519 number of modules or oddities that will make your life a lot
520 easier. There are the CPAN modules
2 perry 521 Parse::!RecDescent, Parse::Yapp, and Text::Balanced; and the
1 perry 522 byacc program.
523
524
525 One simple destructive, inside-out approach that you might
526 try is to pull out the smallest nesting parts one at a
527 time:
528
529
530 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
531 # do something with $1
532 }
533 A more complicated and sneaky approach is to make Perl's regular expression engine do it for you. This is courtesy Dean Inada, and rather has the nature of an Obfuscated Perl Contest entry, but it really does work:
534
535
536 # $_ contains the string to parse
537 # BEGIN and END are the opening and closing markers for the
538 # nested text.
539 @( = ('(','');
540 @) = (')','');
541 ($re=$_)=~s/((BEGIN)(END).)/$)[[!$3]Q$1E$([[!$2]/gs;
542 @$ = (eval{/$re/},$@!~/unmatched/);
543 print join(
544
545
546 __How do I reverse a string?__
547
548
549 Use ''reverse()'' in scalar context, as documented in
550 ``reverse'' in perlfunc.
551
552
553 $reversed = reverse $string;
554
555
556 __How do I expand tabs in a string?__
557
558
559 You can do it yourself:
560
561
562 1 while $string =~ s/t+/' ' x (length($
563 Or you can just use the Text::Tabs module (part of the standard Perl distribution).
564
565
566 use Text::Tabs;
567 @expanded_lines = expand(@lines_with_tabs);
568
569
570 __How do I reformat a paragraph?__
571
572
573 Use Text::Wrap (part of the standard Perl
574 distribution):
575
576
577 use Text::Wrap;
578 print wrap(
579 The paragraphs you give to Text::Wrap should not contain embedded newlines. Text::Wrap doesn't justify the lines (flush-right).
580
581
582 __How can I access/change the first N letters of a
583 string?__
584
585
586 There are many ways. If you just want to grab a copy, use
587 ''substr()'':
588
589
590 $first_byte = substr($a, 0, 1);
591 If you want to modify part of a string, the simplest way is often to use ''substr()'' as an lvalue:
592
593
594 substr($a, 0, 3) =
595 Although those with a pattern matching kind of thought process will likely prefer
596
597
598 $a =~ s/^.../Tom/;
599
600
601 __How do I change the Nth occurrence of
602 something?__
603
604
605 You have to keep track of N yourself. For example, let's say
606 you want to change the fifth occurrence of
607 or
608 into
609 or
610 , case insensitively. These
611 all assume that $_ contains the string to be
612 altered.
613
614
615 $count = 0;
616 s{((whom?)ever)}{
617 ++$count == 5 # is it the 5th?
618 ?
619 In the more general case, you can use the /g modifier in a while loop, keeping count of matches.
620
621
622 $WANT = 3;
623 $count = 0;
624 $_ =
625 That prints out: You can also use a repetition count and repeated pattern like this:
626
627
628 /(?:w+s+fishs+){2}(w+)s+fish/i;
629
630
631 __How can I count the number of occurrences of a substring
632 within a string?__
633
634
635 There are a number of ways, with varying efficiency. If you
636 want a count of a certain single character (X) within a
637 string, you can use the tr/// function like
638 so:
639
640
641 $string =
642 This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a larger string, tr/// won't work. What you can do is wrap a ''while()'' loop around a global pattern match. For example, let's count negative integers:
643
644
645 $string =
646
647
648 __How do I capitalize all the words on one
649 line?__
650
651
652 To make the first letter of each word upper
653 case:
654
655
656 $line =~ s/b(w)/U$1/g;
657 This has the strange effect of turning don't do it`` into ''Don'T Do It
658
659
660 $string =~ s/ (
661 (^w) #at the beginning of the line
662 # or
663 (sw) #preceded by whitespace
664 )
665 /U$1/xg;
666 $string =~ /([[w']+)/uL$1/g;
667 To make the whole line upper case:
668
669
670 $line = uc($line);
671 To force each word to be lower case, with the first letter upper case:
672
673
674 $line =~ s/(w+)/uL$1/g;
675 You can (and probably should) enable locale awareness of those characters by placing a use locale pragma in your program. See perllocale for endless details on locales.
676
677
678 This is sometimes referred to as putting something into
679 ``title case'', but that's not quite accurate. Consider the
680 proper capitalization of the movie ''Dr. Strangelove or:
681 How I Learned to Stop Worrying and Love the Bomb'', for
682 example.
683
684
685 __How can I split a [[character] delimited string except
686 when inside [[character]? (Comma-separated
687 files)__
688
689
690 Take the example case of trying to split a string that is
691 comma-separated into its different fields. (We'll pretend
692 you said comma-separated, not comma-delimited, which is
693 different and almost never what you mean.) You can't use
694 split(/,/) because you shouldn't split if the comma
695 is inside quotes. For example, take a data line like
696 this:
697
698
699 SAR001,
700 Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text):
701
702
703 @new = ();
704 push(@new, $+) while $text =~ m{
705 If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, . Unescaping them is a task addressed earlier in this section.
706
707
2 perry 708 Alternatively, the Text::!ParseWords module (part of the
1 perry 709 standard Perl distribution) lets you say:
710
711
2 perry 712 use Text::!ParseWords;
1 perry 713 @new = quotewords(
714 There's also a Text::CSV (Comma-Separated Values) module on CPAN .
715
716
717 __How do I strip blank space from the beginning/end of a
718 string?__
719
720
721 Although the simplest approach would seem to be
722
723
724 $string =~ s/^s*(.*?)s*$/$1/;
725 not only is this unnecessarily slow and destructive, it also fails with embedded newlines. It is much faster to do this operation in two steps:
726
727
728 $string =~ s/^s+//;
729 $string =~ s/s+$//;
730 Or more nicely written as:
731
732
733 for ($string) {
734 s/^s+//;
735 s/s+$//;
736 }
737 This idiom takes advantage of the foreach loop's aliasing behavior to factor out common code. You can do this on several strings at once, or arrays, or even the values of a hash if you use a slice:
738
739
740 # trim whitespace in the scalar, the array,
741 # and all the values in the hash
742 foreach ($scalar, @array, @hash{keys %hash}) {
743 s/^s+//;
744 s/s+$//;
745 }
746
747
748 __How do I pad a string with blanks or pad a number with
749 zeroes?__
750
751
752 (This answer contributed by Uri Guttman, with kibitzing from
753 Bart Lateur.)
754
755
756 In the following examples, $pad_len is the length
757 to which you wish to pad the string, $text or
758 $num contains the string to be padded, and
759 $pad_char contains the padding character. You can
760 use a single character string constant instead of the
761 $pad_char variable if you know what it is in
762 advance. And in the same way you can use an integer in place
763 of $pad_len if you know the pad length in
764 advance.
765
766
767 The simplest method uses the sprintf function. It
768 can pad on the left or right with blanks and on the left
769 with zeroes and it will not truncate the result. The
770 pack function can only pad strings on the right
771 with blanks and it will truncate the result to a maximum
772 length of $pad_len.
773
774
775 # Left padding a string with blanks (no truncation):
776 $padded = sprintf(
777 # Right padding a string with blanks (no truncation):
778 $padded = sprintf(
779 # Left padding a number with 0 (no truncation):
780 $padded = sprintf(
781 # Right padding a string with blanks using pack (will truncate):
782 $padded = pack(
783 If you need to pad with a character other than blank or zero you can use one of the following methods. They all generate a pad string with the x operator and combine that with $text. These methods do not truncate $text.
784
785
786 Left and right padding with any character, creating a new
787 string:
788
789
790 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
791 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
792 Left and right padding with any character, modifying $text directly:
793
794
795 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
796 $text .= $pad_char x ( $pad_len - length( $text ) );
797
798
799 __How do I extract selected columns from a
800 string?__
801
802
803 Use ''substr()'' or ''unpack()'', both documented in
804 perlfunc. If you prefer thinking in terms of columns instead
805 of widths, you can use this kind of thing:
806
807
808 # determine the unpack format needed to split Linux ps output
809 # arguments are cut columns
810 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
811 sub cut2fmt {
812 my(@positions) = @_;
813 my $template = '';
814 my $lastpos = 1;
815 for my $place (@positions) {
816 $template .=
817
818
819 __How do I find the soundex value of a
820 string?__
821
822
823 Use the standard Text::Soundex module distributed with Perl.
824 Before you do so, you may want to determine whether
825 `soundex' is in fact what you think it is. Knuth's soundex
826 algorithm compresses words into a small space, and so it
827 does not necessarily distinguish between two words which you
828 might want to appear separately. For example, the last names
829 `Knuth' and `Kant' are both mapped to the soundex code K530.
830 If Text::Soundex does not do what you are looking for, you
831 might want to consider the String::Approx module available
832 at CPAN .
833
834
835 __How can I expand variables in text
836 strings?__
837
838
839 Let's assume that you have a string like:
840
841
842 $text = 'this has a $foo in it and a $bar';
843 If those were both global variables, then this would suffice:
844
845
846 $text =~ s/$(w+)/${$1}/g; # no /e needed
847 But since they are probably lexicals, or at least, they could be, you'd have to do this:
848
849
850 $text =~ s/($w+)/$1/eeg;
851 die if $@; # needed /ee, not /e
852 It's probably better in the general case to treat those variables as entries in some special hash. For example:
853
854
855 %user_defs = (
856 foo =
857 See also ``How do I expand function calls in a string?'' in this section of the FAQ .
858
859
860 __What's wrong with always quoting
861 ``$vars''?__
862
863
864 The problem is that those double-quotes force
865 stringification-- coercing numbers and references into
866 strings--even when you don't want them to be strings. Think
867 of it this way: double-quote expansion is used to produce
868 new strings. If you already have a string, why do you need
869 more?
870
871
872 If you get used to writing odd things like
873 these:
874
875
876 print
877 You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:
878
879
880 print $var;
881 $new = $old;
882 somefunc($var);
883 Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference:
884
885
886 func(@array);
887 sub func {
888 my $aref = shift;
889 my $oref =
890 You can also get into subtle problems on those few operations in Perl that actually do care about the difference between a string and a number, such as the magical ++ autoincrement operator or the ''syscall()'' function.
891
892
893 Stringification also destroys arrays.
894
895
896 @lines = `command`;
897 print
898
899
900 __Why don't my __HERE documents
901 work?__
902
903
904 Check for these three things:
905
906
907 1. There must be no space after the
908
909
910 2. There (probably) should be a semicolon at the
911 end.
912
913
914 3. You can't (easily) have any space in front of the
915 tag.
916
917
918 If you want to indent the text in the here document, you can
919 do this:
920
921
922 # all in one
923 ($VAR =
924 But the HERE_TARGET must still be flush against the margin. If you want that indented also, you'll have to quote in the indentation.
925
926
927 ($quote =
928 A nice general-purpose fixer-upper function for indented here documents follows. It expects to be called with a here document as its argument. It looks to see whether each line begins with a common substring, and if so, strips that substring off. Otherwise, it takes the amount of leading whitespace found on the first line and removes that much off each subsequent line.
929
930
931 sub fix {
932 local $_ = shift;
933 my ($white, $leader); # common whitespace and common leading string
934 if (/^s*(?:([[^ws]+)(s*).*n)(?:s*12?.*n)+$/) {
935 ($white, $leader) = ($2, quotemeta($1));
936 } else {
937 ($white, $leader) = (/^(s+)/, '');
938 }
939 s/^s*?$leader(?:$white)?//gm;
940 return $_;
941 }
942 This works with leading special strings, dynamically determined:
943
944
945 $remember_the_main = fix
946 Or with a fixed amount of leading whitespace, with remaining indentation correctly preserved:
947
948
949 $poem = fix
950 !!Data: Arrays
951
952
953 __What is the difference between a list and an
954 array?__
955
956
957 An array has a changeable length. A list does not. An array
958 is something you can push or pop, while a list is a set of
959 values. Some people make the distinction that a list is a
960 value while an array is a variable. Subroutines are passed
961 and return lists, you put things into list context, you
962 initialize arrays with lists, and you ''foreach()''
963 across a list. @ variables are arrays, anonymous
964 arrays are arrays, arrays in scalar context behave like the
965 number of elements in them, subroutines access their
966 arguments through the array @_, and push/pop/shift
967 only work on arrays.
968
969
970 As a side note, there's no such thing as a list in scalar
971 context. When you say
972
973
974 $scalar = (2, 5, 7, 9);
975 you're using the comma operator in scalar context, so it uses the scalar comma operator. There never was a list there at all! This causes the last value to be returned: 9.
976
977
978 __What is the difference between__ $array__[[1]
979 and__ @array__[[1]?__
980
981
982 The former is a scalar value; the latter an array slice,
983 making it a list with one (scalar) value. You should use $
984 when you want a scalar value (most of the time) and @ when
985 you want a list with one scalar value in it (very, very
986 rarely; nearly never, in fact).
987
988
989 Sometimes it doesn't make a difference, but sometimes it
990 does. For example, compare:
991
992
993 $good[[0] = `some program that outputs several lines`;
994 with
995
996
997 @bad[[0] = `same program that outputs several lines`;
998 The use warnings pragma and the __-w__ flag will warn you about these matters.
999
1000
1001 __How can I remove duplicate elements from a list or
1002 array?__
1003
1004
1005 There are several possible ways, depending on whether the
1006 array is ordered and whether you wish to preserve the
1007 ordering.
1008
1009
1010 a)
1011
1012
1013 If @in is sorted, and you want @out to be
1014 sorted: (this assumes all true values in the
1015 array)
1016
1017
1018 $prev =
1019 This is nice in that it doesn't use much extra memory, simulating uniq(1)'s behavior of removing only adjacent duplicates. The ``, 1'' guarantees that the expression is true (so that grep picks it up) even if the $_ is 0,
1020
1021
1022 b)
1023
1024
1025 If you don't know whether @in is
1026 sorted:
1027
1028
1029 undef %saw;
1030 @out = grep(!$saw{$_}++, @in);
1031
1032
1033 c)
1034
1035
1036 Like (b), but @in contains only small
1037 integers:
1038
1039
1040 @out = grep(!$saw[[$_]++, @in);
1041
1042
1043 d)
1044
1045
1046 A way to do (b) without any loops or greps:
1047
1048
1049 undef %saw;
1050 @saw{@in} = ();
1051 @out = sort keys %saw; # remove sort if undesired
1052
1053
1054 e)
1055
1056
1057 Like (d), but @in contains only small positive
1058 integers:
1059
1060
1061 undef @ary;
1062 @ary[[@in] = @in;
1063 @out = grep {defined} @ary;
1064
1065
1066 But perhaps you should have been using a hash all along,
1067 eh?
1068
1069
1070 __How can I tell whether a list or array contains a certain
1071 element?__
1072
1073
1074 Hearing the word ``in'' is an ''in''dication that you
1075 probably should have used a hash, not a list or array, to
1076 store your data. Hashes are designed to answer this question
1077 quickly and efficiently. Arrays aren't.
1078
1079
1080 That being said, there are several ways to approach this. If
1081 you are going to make this query many times over arbitrary
1082 string values, the fastest way is probably to invert the
1083 original array and keep an associative array lying about
1084 whose keys are the first array's values.
1085
1086
1087 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1088 undef %is_blue;
1089 for (@blues) { $is_blue{$_} = 1 }
1090 Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
1091
1092
1093 If the values are all small integers, you could use a simple
1094 indexed array. This kind of an array will take up less
1095 space:
1096
1097
1098 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1099 undef @is_tiny_prime;
1100 for (@primes) { $is_tiny_prime[[$_] = 1 }
1101 # or simply @istiny_prime[[@primes] = (1) x @primes;
1102 Now you check whether $is_tiny_prime[[$some_number].
1103
1104
1105 If the values in question are integers instead of strings,
1106 you can save quite a lot of space by using bit strings
1107 instead:
1108
1109
1110 @articles = ( 1..10, 150..2000, 2017 );
1111 undef $read;
1112 for (@articles) { vec($read,$_,1) = 1 }
1113 Now check whether vec($read,$n,1) is true for some $n.
1114
1115
1116 Please do not use
1117
1118
1119 ($is_there) = grep $_ eq $whatever, @array;
1120 or worse yet
1121
1122
1123 ($is_there) = grep /$whatever/, @array;
1124 These are slow (checks every element even if the first matches), inefficient (same reason), and potentially buggy (what if there are regex characters in $whatever?). If you're only testing once, then use:
1125
1126
1127 $is_there = 0;
1128 foreach $elt (@array) {
1129 if ($elt eq $elt_to_find) {
1130 $is_there = 1;
1131 last;
1132 }
1133 }
1134 if ($is_there) { ... }
1135
1136
1137 __How do I compute the difference of two arrays? How do I
1138 compute the intersection of two arrays?__
1139
1140
1141 Use a hash. Here's code to do both and more. It assumes that
1142 each element is unique in a given array:
1143
1144
1145 @union = @intersection = @difference = ();
1146 %count = ();
1147 foreach $element (@array1, @array2) { $count{$element}++ }
1148 foreach $element (keys %count) {
1149 push @union, $element;
1150 push @{ $count{$element}
1151 Note that this is the ''symmetric difference'', that is, all elements in either A or in B but not in both. Think of it as an xor operation.
1152
1153
1154 __How do I test whether two arrays or hashes are
1155 equal?__
1156
1157
1158 The following code works for single-level arrays. It uses a
1159 stringwise comparison, and does not distinguish defined
1160 versus undefined empty strings. Modify if you have other
1161 needs.
1162
1163
1164 $are_equal = compare_arrays(@frogs, @toads);
1165 sub compare_arrays {
1166 my ($first, $second) = @_;
1167 no warnings; # silence spurious -w undef complaints
1168 return 0 unless @$first == @$second;
1169 for (my $i = 0; $i
2 perry 1170 For multilevel structures, you may wish to use an approach more like this one. It uses the CPAN module !FreezeThaw:
1 perry 1171
1172
2 perry 1173 use !FreezeThaw qw(cmpStr);
1 perry 1174 @a = @b = (
1175 printf
1176 This approach also works for comparing hashes. Here we'll demonstrate two different answers:
1177
1178
2 perry 1179 use !FreezeThaw qw(cmpStr cmpStrHard);
1 perry 1180 %a = %b = (
1181 printf
1182 printf
1183 The first reports that both those the hashes contain the same data, while the second reports that they do not. Which you prefer is left as an exercise to the reader.
1184
1185
1186 __How do I find the first array element for which a
1187 condition is true?__
1188
1189
1190 You can use this if you care about the index:
1191
1192
1193 for ($i= 0; $i
1194 Now $found_index has what you want.
1195
1196
1197 __How do I handle linked lists?__
1198
1199
1200 In general, you usually don't need a linked list in Perl,
1201 since with regular arrays, you can push and pop or shift and
1202 unshift at either end, or you can use splice to add and/or
1203 remove arbitrary number of elements at arbitrary points.
1204 Both pop and shift are both O(1) operations on Perl's
1205 dynamic arrays. In the absence of shifts and pops, push in
1206 general needs to reallocate on the order every log(N) times,
1207 and unshift will need to copy pointers each
1208 time.
1209
1210
1211 If you really, really wanted, you could use structures as
1212 described in perldsc or perltoot and do just what the
1213 algorithm book tells you to do. For example, imagine a list
1214 node like this:
1215
1216
1217 $node = {
1218 VALUE =
1219 You could walk the list this way:
1220
1221
1222 print
1223 You could add to the list this way:
1224
1225
1226 my ($head, $tail);
1227 $tail = append($head, 1); # grow a new head
1228 for $value ( 2 .. 10 ) {
1229 $tail = append($tail, $value);
1230 }
1231 sub append {
1232 my($list, $value) = @_;
1233 my $node = { VALUE =
1234 But again, Perl's built-in are virtually always good enough.
1235
1236
1237 __How do I handle circular lists?__
1238
1239
1240 Circular lists could be handled in the traditional fashion
1241 with linked lists, or you could just do something like this
1242 with an array:
1243
1244
1245 unshift(@array, pop(@array)); # the last shall be first
1246 push(@array, shift(@array)); # and vice versa
1247
1248
1249 __How do I shuffle an array randomly?__
1250
1251
1252 Use this:
1253
1254
1255 # fisher_yates_shuffle( @array ) :
1256 # generate a random permutation of @array in place
1257 sub fisher_yates_shuffle {
1258 my $array = shift;
1259 my $i;
1260 for ($i = @$array; --$i; ) {
1261 my $j = int rand ($i+1);
1262 @$array[[$i,$j] = @$array[[$j,$i];
1263 }
1264 }
1265 fisher_yates_shuffle( @array ); # permutes @array in place
1266 You've probably seen shuffling algorithms that work using splice, randomly picking another element to swap the current element with
1267
1268
1269 srand;
1270 @new = ();
1271 @old = 1 .. 10; # just a demo
1272 while (@old) {
1273 push(@new, splice(@old, rand @old, 1));
1274 }
1275 This is bad because splice is already O(N), and since you do it N times, you just invented a quadratic algorithm; that is, O(N**2). This does not scale, although Perl is so efficient that you probably won't notice this until you have rather largish arrays.
1276
1277
1278 __How do I process/modify each element of an
1279 array?__
1280
1281
1282 Use for/foreach:
1283
1284
1285 for (@lines) {
1286 s/foo/bar/; # change that word
1287 y/XZ/ZX/; # swap those letters
1288 }
1289 Here's another; let's compute spherical volumes:
1290
1291
1292 for (@volumes = @radii) { # @volumes has changed parts
1293 $_ **= 3;
1294 $_ *= (4/3) * 3.14159; # this will be constant folded
1295 }
1296 If you want to do the same thing to modify the values of the hash, you may not use the values function, oddly enough. You need a slice:
1297
1298
1299 for $orbit ( @orbits{keys %orbits} ) {
1300 ($orbit **= 3) *= (4/3) * 3.14159;
1301 }
1302
1303
1304 __How do I select a random element from an
1305 array?__
1306
1307
1308 Use the ''rand()'' function (see ``rand'' in
1309 perlfunc):
1310
1311
1312 # at the top of the program:
1313 srand; # not needed for 5.004 and later
1314 # then later on
1315 $index = rand @array;
1316 $element = $array[[$index];
1317 Make sure you ''only call srand once per program, if then''. If you are calling it more than once (such as before each call to rand), you're almost certainly doing something wrong.
1318
1319
1320 __How do I permute N elements of a list?__
1321
1322
1323 Here's a little program that generates all permutations of
1324 all the words on each line of input. The algorithm embodied
1325 in the ''permute()'' function should work on any
1326 list:
1327
1328
1329 #!/usr/bin/perl -n
1330 # tsc-permute: permute each word of input
1331 permute([[split], [[]);
1332 sub permute {
1333 my @items = @{ $_[[0] };
1334 my @perms = @{ $_[[1] };
1335 unless (@items) {
1336 print
1337
1338
1339 __How do I sort an array by (anything)?__
1340
1341
1342 Supply a comparison function to ''sort()'' (described in
1343 ``sort'' in perlfunc):
1344
1345
1346 @list = sort { $a
1347 The default sort function is cmp, string comparison, which would sort (1, 2, 10) into (1, 10, 2). , used above, is the numerical comparison operator.
1348
1349
1350 If you have a complicated function needed to pull out the
1351 part you want to sort on, then don't do it inside the sort
1352 function. Pull it out first, because the sort
1353 BLOCK can be called many times for the same
1354 element. Here's an example of how to pull out the first word
1355 after the first number on each item, and then sort those
1356 words case-insensitively.
1357
1358
1359 @idx = ();
1360 for (@data) {
1361 ($item) = /d+s*(S+)/;
1362 push @idx, uc($item);
1363 }
1364 @sorted = @data[[ sort { $idx[[$a] cmp $idx[[$b] } 0 .. $#idx ];
1365 which could also be written this way, using a trick that's come to be known as the Schwartzian Transform:
1366
1367
1368 @sorted = map { $_-
1369 If you need to sort on several fields, the following paradigm is useful.
1370
1371
1372 @sorted = sort { field1($a)
1373 This can be conveniently combined with precalculation of keys as given above.
1374
1375
1376 See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for
1377 more about this approach.
1378
1379
1380 See also the question below on sorting hashes.
1381
1382
1383 __How do I manipulate arrays of bits?__
1384
1385
1386 Use ''pack()'' and ''unpack()'', or else ''vec()''
1387 and the bitwise operations.
1388
1389
1390 For example, this sets $vec to have bit N set if
1391 $ints[[N] was set:
1392
1393
1394 $vec = '';
1395 foreach(@ints) { vec($vec,$_,1) = 1 }
1396 And here's how, given a vector in $vec, you can get those bits into your @ints array:
1397
1398
1399 sub bitvec_to_list {
1400 my $vec = shift;
1401 my @ints;
1402 # Find null-byte density then select best algorithm
1403 if ($vec =~ tr/0// / length $vec
1404 This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfried Koenig.)
1405
1406
1407 Here's a demo on how to use ''vec()'':
1408
1409
1410 # vec demo
1411 $vector =
1412 set_vec(1,1,1);
1413 set_vec(3,1,1);
1414 set_vec(23,1,1);
1415 set_vec(3,1,3);
1416 set_vec(3,2,3);
1417 set_vec(3,4,3);
1418 set_vec(3,4,7);
1419 set_vec(3,8,3);
1420 set_vec(3,8,7);
1421 set_vec(0,32,17);
1422 set_vec(1,32,17);
1423 sub set_vec {
1424 my ($offset, $width, $value) = @_;
1425 my $vector = '';
1426 vec($vector, $offset, $width) = $value;
1427 print
1428 sub pvec {
1429 my $vector = shift;
1430 my $bits = unpack(
1431 print
1432
1433
1434 __Why does__ ''defined()'' __return true on empty
1435 arrays and hashes?__
1436
1437
1438 The short story is that you should probably only use defined
1439 on scalars or functions, not on aggregates (arrays and
1440 hashes). See ``defined'' in perlfunc in the 5.004 release or
1441 later of Perl for more detail.
1442 !!Data: Hashes (Associative Arrays)
1443
1444
1445 __How do I process an entire hash?__
1446
1447
1448 Use the ''each()'' function (see ``each'' in perlfunc) if
1449 you don't care whether it's sorted:
1450
1451
1452 while ( ($key, $value) = each %hash) {
1453 print
1454 If you want it sorted, you'll have to use ''foreach()'' on the result of sorting the keys as shown in an earlier question.
1455
1456
1457 __What happens if I add or remove keys from a hash while
1458 iterating over it?__
1459
1460
1461 Don't do that. :-)
1462
1463
1464 [[lwall] In Perl 4, you were not allowed to modify a hash at
1465 all while iterating over it. In Perl 5 you can delete from
1466 it, but you still can't add to it, because that might cause
1467 a doubling of the hash table, in which half the entries get
1468 copied up to the new top half of the table, at which point
1469 you've totally bamboozled the iterator code. Even if the
1470 table doesn't double, there's no telling whether your new
1471 entry will be inserted before or after the current iterator
1472 position.
1473
1474
1475 Either treasure up your changes and make them after the
1476 iterator finishes or use keys to fetch all the old keys at
1477 once, and iterate over the list of keys.
1478
1479
1480 __How do I look up a hash element by
1481 value?__
1482
1483
1484 Create a reverse hash:
1485
1486
1487 %by_value = reverse %by_key;
1488 $key = $by_value{$value};
1489 That's not particularly efficient. It would be more space-efficient to use:
1490
1491
1492 while (($key, $value) = each %by_key) {
1493 $by_value{$value} = $key;
1494 }
1495 If your hash could have repeated values, the methods above will only find one of the associated keys. This may or may not worry you. If it does worry you, you can always reverse the hash into a hash of arrays instead:
1496
1497
1498 while (($key, $value) = each %by_key) {
1499 push @{$key_list_by_value{$value}}, $key;
1500 }
1501
1502
1503 __How can I know how many entries are in a
1504 hash?__
1505
1506
1507 If you mean how many keys, then all you have to do is take
1508 the scalar sense of the ''keys()'' function:
1509
1510
1511 $num_keys = scalar keys %hash;
1512 The ''keys()'' function also resets the iterator, which in void context is faster for tied hashes than would be iterating through the whole hash, one key-value pair at a time.
1513
1514
1515 __How do I sort a hash (optionally by value instead of
1516 key)?__
1517
1518
1519 Internally, hashes are stored in a way that prevents you
1520 from imposing an order on key-value pairs. Instead, you have
1521 to sort a list of the keys or values:
1522
1523
1524 @keys = sort keys %hash; # sorted by key
1525 @keys = sort {
1526 $hash{$a} cmp $hash{$b}
1527 } keys %hash; # and by value
1528 Here we'll do a reverse numeric sort by value, and if two keys are identical, sort by length of key, or if that fails, by straight ASCII comparison of the keys (well, possibly modified by your locale--see perllocale).
1529
1530
1531 @keys = sort {
1532 $hash{$b}
1533
1534
1535 __How can I always keep my hash sorted?__
1536
1537
1538 You can look into using the DB_File module and ''tie()''
1539 using the $DB_BTREE hash bindings as documented in
2 perry 1540 ``In Memory Databases'' in DB_File. The Tie::!IxHash module
1 perry 1541 from CPAN might also be
1542 instructive.
1543
1544
1545 __What's the difference between ``delete'' and ``undef''
1546 with hashes?__
1547
1548
1549 Hashes are pairs of scalars: the first is the key, the
1550 second is the value. The key will be coerced to a string,
1551 although the value can be any kind of scalar: string,
1552 number, or reference. If a key $key is present in
1553 the array, exists($key) will return true. The value
1554 for a given key can be undef, in which case
1555 $array{$key} will be undef while
1556 $exists{$key} will return true. This corresponds to
1557 ($key, undef) being in the
1558 hash.
1559
1560
1561 Pictures help... here's the %ary
1562 table:
1563
1564
1565 keys values
1566 +------+------+
1567 a 3
1568 x 7
1569 d 0
1570 e 2
1571 +------+------+
1572 And these conditions hold
1573
1574
1575 $ary{'a'} is true
1576 $ary{'d'} is false
1577 defined $ary{'d'} is true
1578 defined $ary{'a'} is true
1579 exists $ary{'a'} is true (Perl5 only)
1580 grep ($_ eq 'a', keys %ary) is true
1581 If you now say
1582
1583
1584 undef $ary{'a'}
1585 your table now reads:
1586
1587
1588 keys values
1589 +------+------+
1590 a undef
1591 x 7
1592 d 0
1593 e 2
1594 +------+------+
1595 and these conditions now hold; changes in caps:
1596
1597
1598 $ary{'a'} is FALSE
1599 $ary{'d'} is false
1600 defined $ary{'d'} is true
1601 defined $ary{'a'} is FALSE
1602 exists $ary{'a'} is true (Perl5 only)
1603 grep ($_ eq 'a', keys %ary) is true
1604 Notice the last two: you have an undef value, but a defined key!
1605
1606
1607 Now, consider this:
1608
1609
1610 delete $ary{'a'}
1611 your table now reads:
1612
1613
1614 keys values
1615 +------+------+
1616 x 7
1617 d 0
1618 e 2
1619 +------+------+
1620 and these conditions now hold; changes in caps:
1621
1622
1623 $ary{'a'} is false
1624 $ary{'d'} is false
1625 defined $ary{'d'} is true
1626 defined $ary{'a'} is false
1627 exists $ary{'a'} is FALSE (Perl5 only)
1628 grep ($_ eq 'a', keys %ary) is FALSE
1629 See, the whole entry is gone!
1630
1631
1632 __Why don't my tied hashes make the defined/exists
1633 distinction?__
1634
1635
1636 They may or may not implement the ''EXISTS
1637 ()'' and ''DEFINED ()'' methods
1638 differently. For example, there isn't the concept of undef
1639 with hashes that are tied to DBM* files. This means the
1640 true/false tables above will give different results when
1641 used on such a hash. It also means that exists and defined
1642 do the same thing with a DBM* file, and what they end up
1643 doing is not what they do with ordinary hashes.
1644
1645
1646 __How do I reset an__ ''each()'' __operation part-way
1647 through?__
1648
1649
1650 Using keys %hash in scalar context returns the
1651 number of keys in the hash ''and'' resets the iterator
1652 associated with the hash. You may need to do this if you use
1653 last to exit a loop early so that when you re-enter
1654 it, the hash iterator has been reset.
1655
1656
1657 __How can I get the unique keys from two
1658 hashes?__
1659
1660
1661 First you extract the keys from the hashes into lists, then
1662 solve the ``removing duplicates'' problem described above.
1663 For example:
1664
1665
1666 %seen = ();
1667 for $element (keys(%foo), keys(%bar)) {
1668 $seen{$element}++;
1669 }
1670 @uniq = keys %seen;
1671 Or more succinctly:
1672
1673
1674 @uniq = keys %{{%foo,%bar}};
1675 Or if you really want to save space:
1676
1677
1678 %seen = ();
1679 while (defined ($key = each %foo)) {
1680 $seen{$key}++;
1681 }
1682 while (defined ($key = each %bar)) {
1683 $seen{$key}++;
1684 }
1685 @uniq = keys %seen;
1686
1687
1688 __How can I store a multidimensional array in a
1689 DBM file?__
1690
1691
1692 Either stringify the structure yourself (no fun), or else
1693 get the MLDBM (which uses Data::Dumper)
1694 module from CPAN and layer it on top of
1695 either DB_File or GDBM_File.
1696
1697
1698 __How can I make my hash remember the order I put elements
1699 into it?__
1700
1701
2 perry 1702 Use the Tie::!IxHash from CPAN .
1 perry 1703
1704
2 perry 1705 use Tie::!IxHash;
1706 tie(%myhash, Tie::!IxHash);
1 perry 1707 for ($i=0; $i
1708
1709
1710 __Why does passing a subroutine an undefined element in a
1711 hash create it?__
1712
1713
1714 If you say something like:
1715
1716
1717 somefunc($hash{
1718 Then that element ``autovivifies''; that is, it springs into existence whether you store something there or not. That's because functions get scalars passed in by reference. If ''somefunc()'' modifies $_[[0], it has to be ready to write it back into the caller's version.
1719
1720
1721 This has been fixed as of Perl5.004.
1722
1723
1724 Normally, merely accessing a key's value for a nonexistent
1725 key does ''not'' cause that key to be forever there. This
1726 is different than awk's behavior.
1727
1728
1729 __How can I make the Perl equivalent of a C structure/C
1730 ++ class/hash or array of hashes or
1731 arrays?__
1732
1733
1734 Usually a hash ref, perhaps like this:
1735
1736
1737 $record = {
1738 NAME =
1739 References are documented in perlref and the upcoming perlreftut. Examples of complex data structures are given in perldsc and perllol. Examples of structures and object-oriented classes are in perltoot.
1740
1741
1742 __How can I use a reference as a hash key?__
1743
1744
1745 You can't do this directly, but you could use the standard
1746 Tie::Refhash module distributed with Perl.
1747 !!Data: Misc
1748
1749
1750 __How do I handle binary data correctly?__
1751
1752
1753 Perl is binary clean, so this shouldn't be a problem. For
1754 example, this works fine (assuming the files are
1755 found):
1756
1757
1758 if (`cat /vmunix` =~ /gzip/) {
1759 print
1760 On less elegant (read: Byzantine) systems, however, you have to play tedious games with ``text'' versus ``binary'' files. See ``binmode'' in perlfunc or perlopentut. Most of these ancient-thinking systems are curses out of Microsoft, who seem to be committed to putting the backward into backward compatibility.
1761
1762
1763 If you're concerned about 8-bit ASCII data,
1764 then see perllocale.
1765
1766
1767 If you want to deal with multibyte characters, however,
1768 there are some gotchas. See the section on Regular
1769 Expressions.
1770
1771
1772 __How do I determine whether a scalar is a
1773 number/whole/integer/float?__
1774
1775
1776 Assuming that you don't care about IEEE
1777 notations like ``NaN'' or ``Infinity'', you probably just
1778 want to use a regular expression.
1779
1780
1781 if (/D/) { print
1782 If you're on a POSIX system, Perl's supports the POSIX::strtod function. Its semantics are somewhat cumbersome, so here's a getnum wrapper function for more convenient access. This function takes a string and returns the number it found, or undef for input that isn't a C float. The is_numeric function is a front end to getnum if you just want to say, ``Is this a float?''
1783
1784
1785 sub getnum {
1786 use POSIX qw(strtod);
1787 my $str = shift;
1788 $str =~ s/^s+//;
1789 $str =~ s/s+$//;
1790 $! = 0;
1791 my($num, $unparsed) = strtod($str);
1792 if (($str eq '') ($unparsed != 0) $!) {
1793 return undef;
1794 } else {
1795 return $num;
1796 }
1797 }
1798 sub is_numeric { defined getnum($_[[0]) }
1799 Or you could check out the String::Scanf module on CPAN instead. The POSIX module (part of the standard Perl distribution) provides the strtod and strtol for converting strings to double and longs, respectively.
1800
1801
1802 __How do I keep persistent data across program
1803 calls?__
1804
1805
1806 For some specific applications, you can use one of the
1807 DBM modules. See AnyDBM_File. More
2 perry 1808 generically, you should consult the !FreezeThaw, Storable, or
1 perry 1809 Class::Eroot modules from CPAN . Here's one
1810 example using Storable's store and
1811 retrieve functions:
1812
1813
1814 use Storable;
1815 store(%hash,
1816 # later on...
1817 $href = retrieve(
1818
1819
1820 __How do I print out or copy a recursive data
1821 structure?__
1822
1823
1824 The Data::Dumper module on CPAN (or the 5.005
1825 release of Perl) is great for printing out data structures.
1826 The Storable module, found on CPAN , provides
1827 a function called dclone that recursively copies
1828 its argument.
1829
1830
1831 use Storable qw(dclone);
1832 $r2 = dclone($r1);
1833 Where $r1 can be a reference to any kind of data structure you'd like. It will be deeply copied. Because dclone takes and returns references, you'd have to add extra punctuation if you had a hash of arrays that you wanted to copy.
1834
1835
1836 %newhash = %{ dclone(%oldhash) };
1837
1838
1839 __How do I define methods for every
1840 class/object?__
1841
1842
1843 Use the UNIVERSAL class (see
1844 UNIVERSAL ).
1845
1846
1847 __How do I verify a credit card checksum?__
1848
1849
2 perry 1850 Get the Business::!CreditCard module from CPAN
1 perry 1851 .
1852
1853
1854 __How do I pack arrays of doubles or floats for
1855 XS code?__
1856
1857
1858 The kgbpack.c code in the PGPLOT module on
1859 CPAN does just this. If you're doing a lot of
1860 float or double processing, consider using the
1861 PDL module from CPAN
1862 instead--it makes number-crunching easy.
1863 !!AUTHOR AND COPYRIGHT
1864
1865
1866 Copyright (c) 1997-1999 Tom Christiansen and Nathan
1867 Torkington. All rights reserved.
1868
1869
1870 When included as part of the Standard Version of Perl, or as
1871 part of its complete documentation whether printed or
1872 otherwise, this work may be distributed only under the terms
1873 of Perl's Artistic License. Any distribution of this file or
1874 derivatives thereof ''outside'' of that package require
1875 that special arrangements be made with copyright
1876 holder.
1877
1878
1879 Irrespective of its distribution, all code examples in this
1880 file are hereby placed into the public domain. You are
1881 permitted and encouraged to use this code in your own
1882 programs for fun or for profit as you see fit. A simple
1883 comment in the code giving credit would be courteous but is
1884 not required.
1885 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.