Penguin
Annotated edit history of perlop(1) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 PERLOP
2 !!!PERLOP
3 NAME
4 SYNOPSIS
5 DESCRIPTION
6 ----
7 !!NAME
8
9
10 perlop - Perl operators and precedence
11 !!SYNOPSIS
12
13
14 Perl operators have the following associativity and
15 precedence, listed from highest precedence to lowest.
16 Operators borrowed from C keep the same precedence
17 relationship with each other, even where C's precedence is
18 slightly screwy. (This makes learning Perl easier for C
19 folks.) With very few exceptions, these all operate on
20 scalar values only, not array values.
21
22
23 left terms and list operators (leftward)
24 left -
25 In the following sections, these operators are covered in precedence order.
26
27
28 Many operators can be overloaded for objects. See
29 overload.
30 !!DESCRIPTION
31
32
33 __Terms and List Operators (Leftward)__
34
35
36 A TERM has the highest precedence in Perl.
37 They include variables, quote and quote-like operators, any
38 expression in parentheses, and any function whose arguments
39 are parenthesized. Actually, there aren't really functions
40 in this sense, just list operators and unary operators
41 behaving as functions because you put parentheses around the
42 arguments. These are all documented in
43 perlfunc.
44
45
46 If any list operator (''print()'', etc.) or any unary
47 operator (''chdir()'', etc.) is followed by a left
48 parenthesis as the next token, the operator and arguments
49 within parentheses are taken to be of highest precedence,
50 just like a normal function call.
51
52
53 In the absence of parentheses, the precedence of list
54 operators such as print, sort, or
55 chmod is either very high or very low depending on
56 whether you are looking at the left side or the right side
57 of the operator. For example, in
58
59
60 @ary = (1, 3, sort 4, 2);
61 print @ary; # prints 1324
62 the commas on the right of the sort are evaluated before the sort, but the commas on the left are evaluated after. In other words, list operators tend to gobble up all arguments that follow, and then act like a simple TERM with regard to the preceding expression. Be careful with parentheses:
63
64
65 # These evaluate exit before doing the print:
66 print($foo, exit); # Obviously not what you want.
67 print $foo, exit; # Nor is this.
68 # These do the print before evaluating exit:
69 (print $foo), exit; # This is what you want.
70 print($foo), exit; # Or this.
71 print ($foo), exit; # Or even this.
72 Also note that
73
74
75 print ($foo
76 probably doesn't do what you expect at first glance. See ``Named Unary Operators'' for more discussion of this.
77
78
79 Also parsed as terms are the do {} and eval
80 {} constructs, as well as subroutine and method calls,
81 and the anonymous constructors [[] and
82 {}.
83
84
85 See also ``Quote and Quote-like Operators'' toward the end
86 of this section, as well as ``I/O Operators''.
87
88
89 __The Arrow Operator__
90
91
92 -
93 ++ . If the right
94 side is either a [[...], {...}, or a
95 (...) subscript, then the left side must be either
96 a hard or symbolic reference to an array, a hash, or a
97 subroutine respectively. (Or technically speaking, a
98 location capable of holding a hard reference, if it's an
99 array or hash reference being used for assignment.) See
100 perlreftut and perlref.
101
102
103 Otherwise, the right side is a method name or a simple
104 scalar variable containing either the method name or a
105 subroutine reference, and the left side must be either an
106 object (a blessed reference) or a class name (that is, a
107 package name). See perlobj.
108
109
110 __Auto-increment and Auto-decrement__
111
112
113 ``++'' and ``--'' work as in C. That is, if placed before a
114 variable, they increment or decrement the variable before
115 returning the value, and if placed after, increment or
116 decrement the variable after returning the
117 value.
118
119
120 The auto-increment operator has a little extra builtin magic
121 to it. If you increment a variable that is numeric, or that
122 has ever been used in a numeric context, you get a normal
123 increment. If, however, the variable has been used in only
124 string contexts since it was set, and has a value that is
125 not the empty string and matches the pattern
126 /^[[a-zA-Z]*[[0-9]*z/, the increment is done as a
127 string, preserving each character within its range, with
128 carry:
129
130
131 print ++($foo = '99'); # prints '100'
132 print ++($foo = 'a0'); # prints 'a1'
133 print ++($foo = 'Az'); # prints 'Ba'
134 print ++($foo = 'zz'); # prints 'aaa'
135 The auto-decrement operator is not magical.
136
137
138 __Exponentiation__
139
140
141 Binary ``**'' is the exponentiation operator. It binds even
142 more tightly than unary minus, so -2**4 is -(2**4), not
143 (-2)**4. (This is implemented using C's pow(3)
144 function, which actually works on doubles
145 internally.)
146
147
148 __Symbolic Unary Operators__
149
150
151 Unary ``!'' performs logical negation, i.e., ``not''. See
152 also not for a lower precedence version of
153 this.
154
155
156 Unary ``-'' performs arithmetic negation if the operand is
157 numeric. If the operand is an identifier, a string
158 consisting of a minus sign concatenated with the identifier
159 is returned. Otherwise, if the string starts with a plus or
160 minus, a string starting with the opposite sign is returned.
161 One effect of these rules is that -bareword is
162 equivalent to .
163
164
165 Unary ``~'' performs bitwise negation, i.e., 1's complement.
166 For example, 0666 is 0640. (See also
167 ``Integer Arithmetic'' and ``Bitwise String Operators''.)
168 Note that the width of the result is platform-dependent: ~0
169 is 32 bits wide on a 32-bit platform, but 64 bits wide on a
170 64-bit platform, so if you are expecting a certain bit
171 width, remember use the
172
173
174 Unary ``+'' has no effect whatsoever, even on strings. It is
175 useful syntactically for separating a function name from a
176 parenthesized expression that would otherwise be interpreted
177 as the complete list of function arguments. (See examples
178 above under ``Terms and List Operators
179 (Leftward)''.)
180
181
182 Unary ``'' creates a reference to whatever follows it. See
183 perlreftut and perlref. Do not confuse this behavior with
184 the behavior of backslash within a string, although both
185 forms do convey the notion of protecting the next thing from
186 interpolation.
187
188
189 __Binding Operators__
190
191
192 Binary ``=~'' binds a scalar expression to a pattern match.
193 Certain operations search or modify the string $_
194 by default. This operator makes that kind of operation work
195 on some other string. The right argument is a search
196 pattern, substitution, or transliteration. The left argument
197 is what is supposed to be searched, substituted, or
198 transliterated instead of the default $_. When used
199 in scalar context, the return value generally indicates the
200 success of the operation. Behavior in list context depends
201 on the particular operator. See ``Regexp Quote-Like
202 Operators'' for details.
203
204
205 If the right argument is an expression rather than a search
206 pattern, substitution, or transliteration, it is interpreted
207 as a search pattern at run time. This can be less efficient
208 than an explicit search, because the pattern must be
209 compiled every time the expression is
210 evaluated.
211
212
213 Binary ``!~'' is just like ``=~'' except the return value is
214 negated in the logical sense.
215
216
217 __Multiplicative Operators__
218
219
220 Binary ``*'' multiplies two numbers.
221
222
223 Binary ``/'' divides two numbers.
224
225
226 Binary ``%'' computes the modulus of two numbers. Given
227 integer operands $a and $b: If $b
228 is positive, then $a % $b is $a minus the
229 largest multiple of $b that is not greater than
230 $a. If $b is negative, then $a %
231 $b is $a minus the smallest multiple of
232 $b that is not less than $a (i.e. the
233 result will be less than or equal to zero). Note than when
234 use integer is in scope, ``%'' gives you direct
235 access to the modulus operator as implemented by your C
236 compiler. This operator is not as well defined for negative
237 operands, but it will execute faster.
238
239
240 Binary ``x'' is the repetition operator. In scalar context
241 or if the left operand is not enclosed in parentheses, it
242 returns a string consisting of the left operand repeated the
243 number of times specified by the right operand. In list
244 context, if the left operand is enclosed in parentheses, it
245 repeats the list.
246
247
248 print '-' x 80; # print row of dashes
249 print
250 @ones = (1) x 80; # a list of 80 1's
251 @ones = (5) x @ones; # set all elements to 5
252
253
254 __Additive Operators__
255
256
257 Binary ``+'' returns the sum of two numbers.
258
259
260 Binary ``-'' returns the difference of two
261 numbers.
262
263
264 Binary ``.'' concatenates two strings.
265
266
267 __Shift Operators__
268
269
270 Binary ``
271
272
273 Binary ``
274
275
276 __Named Unary Operators__
277
278
279 The various named unary operators are treated as functions
280 with one argument, with optional parentheses. These include
281 the filetest operators, like -f, -M, etc.
282 See perlfunc.
283
284
285 If any list operator (''print()'', etc.) or any unary
286 operator (''chdir()'', etc.) is followed by a left
287 parenthesis as the next token, the operator and arguments
288 within parentheses are taken to be of highest precedence,
289 just like a normal function call. For example, because named
290 unary operators are higher precedence than :
291
292
293 chdir $foo die; # (chdir $foo) die
294 chdir($foo) die; # (chdir $foo) die
295 chdir ($foo) die; # (chdir $foo) die
296 chdir +($foo) die; # (chdir $foo) die
297 but, because * is higher precedence than named operators:
298
299
300 chdir $foo * 20; # chdir ($foo * 20)
301 chdir($foo) * 20; # (chdir $foo) * 20
302 chdir ($foo) * 20; # (chdir $foo) * 20
303 chdir +($foo) * 20; # chdir ($foo * 20)
304 rand 10 * 20; # rand (10 * 20)
305 rand(10) * 20; # (rand 10) * 20
306 rand (10) * 20; # (rand 10) * 20
307 rand +(10) * 20; # rand (10 * 20)
308 See also ``Terms and List Operators (Leftward)''.
309
310
311 __Relational Operators__
312
313
314 Binary ``
315
316
317 Binary ``
318
319
320 Binary ``
321
322
323 Binary ``
324
325
326 Binary ``lt'' returns true if the left argument is
327 stringwise less than the right argument.
328
329
330 Binary ``gt'' returns true if the left argument is
331 stringwise greater than the right argument.
332
333
334 Binary ``le'' returns true if the left argument is
335 stringwise less than or equal to the right
336 argument.
337
338
339 Binary ``ge'' returns true if the left argument is
340 stringwise greater than or equal to the right
341 argument.
342
343
344 __Equality Operators__
345
346
347 Binary ``=='' returns true if the left argument is
348 numerically equal to the right argument.
349
350
351 Binary ``!='' returns true if the left argument is
352 numerically not equal to the right argument.
353
354
355 Binary ``
356
357
358 perl -le '$a = NaN; print
359 Binary ``eq'' returns true if the left argument is stringwise equal to the right argument.
360
361
362 Binary ``ne'' returns true if the left argument is
363 stringwise not equal to the right argument.
364
365
366 Binary ``cmp'' returns -1, 0, or 1 depending on whether the
367 left argument is stringwise less than, equal to, or greater
368 than the right argument.
369
370
371 ``lt'', ``le'', ``ge'', ``gt'' and ``cmp'' use the collation
372 (sort) order specified by the current locale if use
373 locale is in effect. See perllocale.
374
375
376 __Bitwise And__
377
378
379 Binary ``
380
381
382 __Bitwise Or and Exclusive Or__
383
384
385 Binary ``'' returns its operators ORed together bit by bit.
386 (See also ``Integer Arithmetic'' and ``Bitwise String
387 Operators''.)
388
389
390 Binary ``^'' returns its operators XORed together bit by
391 bit. (See also ``Integer Arithmetic'' and ``Bitwise String
392 Operators''.)
393
394
395 __C-style Logical And__
396
397
398 Binary ``
399 AND operation. That is, if the left operand
400 is false, the right operand is not even evaluated. Scalar or
401 list context propagates down to the right operand if it is
402 evaluated.
403
404
405 __C-style Logical Or__
406
407
408 Binary ``'' performs a short-circuit logical
409 OR operation. That is, if the left operand is
410 true, the right operand is not even evaluated. Scalar or
411 list context propagates down to the right operand if it is
412 evaluated.
413
414
415 The and operators differ from C's in
416 that, rather than returning 0 or 1, they return the last
417 value evaluated. Thus, a reasonably portable way to find out
418 the home directory (assuming it's not ``0'') might
419 be:
420
421
422 $home = $ENV{'HOME'} $ENV{'LOGDIR'}
423 (getpwuid($
424 In particular, this means that you shouldn't use this for selecting between two aggregates for assignment:
425
426
427 @a = @b @c; # this is wrong
428 @a = scalar(@b) @c; # really meant this
429 @a = @b ? @b : @c; # this works fine, though
430 As more readable alternatives to and when used for control flow, Perl provides and and or operators (see below). The short-circuit behavior is identical. The precedence of ``and'' and ``or'' is much lower, however, so that you can safely use them after a list operator without the need for parentheses:
431
432
433 unlink
434 With the C-style operators that would have been written like this:
435
436
437 unlink(
438 Using ``or'' for assignment is unlikely to do what you want; see below.
439
440
441 __Range Operators__
442
443
444 Binary ``..'' is the range operator, which is really two
445 different operators depending on the context. In list
446 context, it returns an array of values counting (up by ones)
447 from the left value to the right value. If the left value is
448 greater than the right value then it returns the empty
449 array. The range operator is useful for writing foreach
450 (1..10) loops and for doing slice operations on arrays.
451 In the current implementation, no temporary array is created
452 when the range operator is used as the expression in
453 foreach loops, but older versions of Perl might
454 burn a lot of memory when you write something like
455 this:
456
457
458 for (1 .. 1_000_000) {
459 # code
460 }
461 In scalar context, ``..'' returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of __sed__, __awk__, and various editors. Each ``..'' operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, ''AFTER'' which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated. It can test the right operand and become false on the same evaluation it became true (as in __awk__), but it still returns true once. If you don't want it to test the right operand till the next evaluation, as in __sed__, just use three dots (``...'') instead of two. In all other regards, ``...'' behaves just like ``..'' does.
462
463
464 The right operand is not evaluated while the operator is in
465 the ``false'' state, and the left operand is not evaluated
466 while the operator is in the ``true'' state. The precedence
467 is a little lower than and
468 $. variable, the current line number.
469 Examples:
470
471
472 As a scalar operator:
473
474
475 if (101 .. 200) { print; } # print 2nd hundred lines
476 next line if (1 .. /^$/); # skip header lines
477 s/^/
478 # parse mail messages
479 while (
480 As a list operator:
481
482
483 for (101 .. 200) { print; } # print $_ 100 times
484 @foo = @foo[[0 .. $#foo]; # an expensive no-op
485 @foo = @foo[[$#foo-4 .. $#foo]; # slice last 5 items
486 The range operator (in list context) makes use of the magical auto-increment algorithm if the operands are strings. You can say
487
488
489 @alphabet = ('A' .. 'Z');
490 to get all normal letters of the alphabet, or
491
492
493 $hexdigit = (0 .. 9, 'a' .. 'f')[[$num
494 to get a hexadecimal digit, or
495
496
497 @z2 = ('01' .. '31'); print $z2[[$mday];
498 to get dates with leading zeros. If the final value specified is not in the sequence that the magical increment would produce, the sequence goes until the next value would be longer than the final value specified.
499
500
501 __Conditional Operator__
502
503
504 Ternary ``?:'' is the conditional operator, just as in C. It
505 works much like an if-then-else. If the argument before the
506 ? is true, the argument before the : is returned, otherwise
507 the argument after the : is returned. For
508 example:
509
510
511 printf
512 Scalar or list context propagates downward into the 2nd or 3rd argument, whichever is selected.
513
514
515 $a = $ok ? $b : $c; # get a scalar
516 @a = $ok ? @b : @c; # get an array
517 $a = $ok ? @b : @c; # oops, that's just a count!
518 The operator may be assigned to if both the 2nd and 3rd arguments are legal lvalues (meaning that you can assign to them):
519
520
521 ($a_or_b ? $a : $b) = $c;
522 Because this operator produces an assignable result, using assignments without parentheses will get you in trouble. For example, this:
523
524
525 $a % 2 ? $a += 10 : $a += 2
526 Really means this:
527
528
529 (($a % 2) ? ($a += 10) : $a) += 2
530 Rather than this:
531
532
533 ($a % 2) ? ($a += 10) : ($a += 2)
534 That should probably be written more simply as:
535
536
537 $a += ($a % 2) ? 10 : 2;
538
539
540 __Assignment Operators__
541
542
543 ``='' is the ordinary assignment operator.
544
545
546 Assignment operators work as in C. That is,
547
548
549 $a += 2;
550 is equivalent to
551
552
553 $a = $a + 2;
554 although without duplicating any side effects that dereferencing the lvalue might trigger, such as from ''tie()''. Other assignment operators work similarly. The following are recognized:
555
556
557 **= += *=
558 Although these are grouped by family, they all have the precedence of assignment.
559
560
561 Unlike in C, the scalar assignment operator produces a valid
562 lvalue. Modifying an assignment is equivalent to doing the
563 assignment and then modifying the variable that was assigned
564 to. This is useful for modifying a copy of something, like
565 this:
566
567
568 ($tmp = $global) =~ tr [[A-Z] [[a-z];
569 Likewise,
570
571
572 ($a += 2) *= 3;
573 is equivalent to
574
575
576 $a += 2;
577 $a *= 3;
578 Similarly, a list assignment in list context produces the list of lvalues assigned to, and a list assignment in scalar context returns the number of elements produced by the expression on the right hand side of the assignment.
579
580
581 __Comma Operator__
582
583
584 Binary ``,'' is the comma operator. In scalar context it
585 evaluates its left argument, throws that value away, then
586 evaluates its right argument and returns that value. This is
587 just like C's comma operator.
588
589
590 In list context, it's just the list argument separator, and
591 inserts both its arguments into the list.
592
593
594 The =
595
596
597 __List Operators (Rightward)__
598
599
600 On the right side of a list operator, it has very low
601 precedence, such that it controls all comma-separated
602 expressions found there. The only operators with lower
603 precedence are the logical operators ``and'', ``or'', and
604 ``not'', which may be used to evaluate calls to list
605 operators without the need for extra
606 parentheses:
607
608
609 open HANDLE,
610 See also discussion of list operators in ``Terms and List Operators (Leftward)''.
611
612
613 __Logical Not__
614
615
616 Unary ``not'' returns the logical negation of the expression
617 to its right. It's the equivalent of ``!'' except for the
618 very low precedence.
619
620
621 __Logical And__
622
623
624 Binary ``and'' returns the logical conjunction of the two
625 surrounding expressions. It's equivalent to
626
627
628 __Logical or and Exclusive Or__
629
630
631 Binary ``or'' returns the logical disjunction of the two
632 surrounding expressions. It's equivalent to except for the
633 very low precedence. This makes it useful for control
634 flow
635
636
637 print FH $data or die
638 This means that it short-circuits: i.e., the right expression is evaluated only if the left expression is false. Due to its precedence, you should probably avoid using this for assignment, only for control flow.
639
640
641 $a = $b or $c; # bug: this is wrong
642 ($a = $b) or $c; # really means this
643 $a = $b $c; # better written this way
644 However, when it's a list-context assignment and you're trying to use ``'' for control flow, you probably need ``or'' so that the assignment takes higher precedence.
645
646
647 @info = stat($file) die; # oops, scalar sense of stat!
648 @info = stat($file) or die; # better, now @info gets its due
649 Then again, you could always use parentheses.
650
651
652 Binary ``xor'' returns the exclusive-OR of the two
653 surrounding expressions. It cannot short circuit, of
654 course.
655
656
657 __C Operators Missing From Perl__
658
659
660 Here is what C has that Perl doesn't:
661
662
663 unary
664
665
666 Address-of operator. (But see the ``'' operator for taking a
667 reference.)
668
669
670 unary *
671
672
673 Dereference-address operator. (Perl's prefix dereferencing
674 operators are typed: $, @, %, and
675
676
677 ( TYPE )
678
679
680 Type-casting operator.
681
682
683 __Quote and Quote-like Operators__
684
685
686 While we usually think of quotes as literal values, in Perl
687 they function as operators, providing various kinds of
688 interpolating and pattern matching capabilities. Perl
689 provides customary quote characters for these behaviors, but
690 also provides a way for you to choose your quote character
691 for any of them. In the following table, a {}
692 represents any pair of delimiters you choose.
693
694
695 Customary Generic Meaning Interpolates
696 '' q{} Literal no
697 Non-bracketing delimiters use the same character fore and aft, but the four sorts of brackets (round, angle, square, curly) will all nest, which means that
698
699
700 q{foo{bar}baz}
701 is the same as
702
703
704 'foo{bar}baz'
705 Note, however, that this does not always work for quoting Perl code:
706
707
708 $s = q{ if($a eq
709 is a syntax error. The Text::Balanced module on CPAN is able to do this properly.
710
711
712 There can be whitespace between the operator and the quoting
713 characters, except when # is being used as the
714 quoting character. q#foo# is parsed as the string
715 foo, while q #foo# is the operator
716 q followed by a comment. Its argument will be taken
717 from the next line. This allows you to write:
718
719
720 s {foo} # Replace foo
721 {bar} # with bar.
722 For constructs that do interpolate, variables beginning with $`` or ''@
723
724
725 t tab (HT, TAB)
726 n newline (NL)
727 r return (CR)
728 f form feed (FF)
729 b backspace (BS)
730 a alarm (bell) (BEL)
731 e escape (ESC)
732 033 octal char (ESC)
733 x1b hex char (ESC)
734 x{263a} wide hex char (SMILEY)
735 c[[ control char (ESC)
736 N{name} named char
737 l lowercase next char
738 u uppercase next char
739 L lowercase till E
740 U uppercase till E
741 E end case modification
742 Q quote non-word characters till E
743 If use locale is in effect, the case map used by l, L, u and U is taken from the current locale. See perllocale. For documentation of N{name}, see charnames.
744
745
746 All systems use the virtual to
747 represent a line terminator, called a ``newline''. There is
748 no such thing as an unvarying, physical newline character.
749 It is only an illusion that the operating system, device
750 drivers, C libraries, and Perl all conspire to preserve. Not
751 all systems read as ASCII
752 CR and as ASCII
753 LF . For example, on a Mac, these are reversed, and
754 on systems without line terminator, printing
755 may emit no actual data. In general,
756 use when you mean a ``newline'' for
757 your system, but use the literal ASCII when
758 you need an exact character. For example, most networking
759 protocols expect and prefer a CR+LF
760 ( or )
761 for line terminators, and although they often accept just
762 , they seldom tolerate just
763 . If you get in the habit of using
764 for networking, you may be burned
765 some day.
766
767
768 You cannot include a literal $ or @ within
769 a Q sequence. An unescaped $ or @
770 interpolates the corresponding variable, while escaping will
771 cause the literal string $ to be inserted. You'll
772 need to write something like
773 m/QuserE@Qhost/.
774
775
776 Patterns are subject to an additional level of
777 interpretation as a regular expression. This is done as a
778 second pass, after variables are interpolated, so that
779 regular expressions may be incorporated into the pattern
780 from the variables. If this is not what you want, use
781 Q to interpolate a variable literally.
782
783
784 Apart from the behavior described above, Perl does not
785 expand multiple levels of interpolation. In particular,
786 contrary to the expectations of shell programmers,
787 back-quotes do ''NOT'' interpolate within
788 double quotes, nor do single quotes impede evaluation of
789 variables when used within double quotes.
790
791
792 __Regexp Quote-Like Operators__
793
794
795 Here are the quote-like operators that apply to pattern
796 matching and related activities.
797
798
799 ?PATTERN?
800
801
802 This is just like the /pattern/ search, except that
803 it matches only once between calls to the ''reset()''
804 operator. This is a useful optimization when you want to see
805 only the first occurrence of something in each file of a set
806 of files, for instance. Only ?? patterns local to
807 the current package are reset.
808
809
810 while (
811 This usage is vaguely deprecated, which means it just might possibly be removed in some distant future version of Perl, perhaps somewhere around the year 2168.
812
813
814 m/PATTERN/cgimosx
815
816
817 /PATTERN/cgimosx
818
819
820 Searches a string for a pattern match, and in scalar context
821 returns true if it succeeds, false if it fails. If no string
822 is specified via the =~ or !~ operator,
823 the $_ string is searched. (The string specified
824 with =~ need not be an lvalue--it may be the result
825 of an expression evaluation, but remember the =~
826 binds rather tightly.) See also perlre. See perllocale for
827 discussion of additional considerations that apply when
828 use locale is in effect.
829
830
831 Options are:
832
833
834 c Do not reset search position on a failed match when /g is in effect.
835 g Match globally, i.e., find all occurrences.
836 i Do case-insensitive pattern matching.
837 m Treat string as multiple lines.
838 o Compile pattern only once.
839 s Treat string as single line.
840 x Use extended regular expressions.
841 If ``/'' is the delimiter then the initial m is optional. With the m you can use any pair of non-alphanumeric, non-whitespace characters as delimiters. This is particularly useful for matching path names that contain ``/'', to avoid LTS (leaning toothpick syndrome). If ``?'' is the delimiter, then the match-only-once rule of ?PATTERN? applies. If ``''' is the delimiter, no interpolation is performed on the PATTERN .
842
843
844 PATTERN may contain variables, which will be
845 interpolated (and the pattern recompiled) every time the
846 pattern search is evaluated, except for when the delimiter
847 is a single quote. (Note that $(, $), and
848 $ are not interpolated because they look like
849 end-of-string tests.) If you want such a pattern to be
850 compiled only once, add a /o after the trailing
851 delimiter. This avoids expensive run-time recompilations,
852 and is useful when the value you are interpolating won't
853 change over the life of the script. However, mentioning
854 /o constitutes a promise that you won't change the
855 variables in the pattern. If you change them, Perl won't
856 even notice. See also ``qr/STRING/imosx''.
857
858
859 If the PATTERN evaluates to the empty string,
860 the last ''successfully'' matched regular expression is
861 used instead.
862
863
864 If the /g option is not used, m// in list
865 context returns a list consisting of the subexpressions
866 matched by the parentheses in the pattern, i.e.,
867 ($1, $2, $3...). (Note that here
868 $1 etc. are also set, and that this differs from
869 Perl 4's behavior.) When there are no parentheses in the
870 pattern, the return value is the list (1) for
871 success. With or without parentheses, an empty list is
872 returned upon failure.
873
874
875 Examples:
876
877
878 open(TTY, '/dev/tty');
879 if (/Version: *([[0-9.]*)/) { $version = $1; }
880 next if m#^/usr/spool/uucp#;
881 # poor man's grep
882 $arg = shift;
883 while (
884 if (($F1, $F2, $Etc) = ($foo =~ /^(S+)s+(S+)s*(.*)/))
885 This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were assigned, i.e., if the pattern matched.
886
887
888 The /g modifier specifies global pattern
889 matching--that is, matching as many times as possible within
890 the string. How it behaves depends on the context. In list
891 context, it returns a list of the substrings matched by any
892 capturing parentheses in the regular expression. If there
893 are no parentheses, it returns a list of all the matched
894 strings, as if there were parentheses around the whole
895 pattern.
896
897
898 In scalar context, each execution of m//g finds the
899 next match, returning true if it matches, and false if there
900 is no further match. The position after the last match can
901 be read or set using the ''pos()'' function; see ``pos''
902 in perlfunc. A failed match normally resets the search
903 position to the beginning of the string, but you can avoid
904 that by adding the /c modifier (e.g.
905 m//gc). Modifying the target string also resets the
906 search position.
907
908
909 You can intermix m//g matches with
910 m/G.../g, where G is a zero-width
911 assertion that matches the exact position where the previous
912 m//g, if any, left off. Without the /g
913 modifier, the G assertion still anchors at
914 ''pos()'', but the match is of course only attempted
915 once. Using G without /g on a target
916 string that has not previously had a /g match
917 applied to it is the same as using the A assertion
918 to match the beginning of the string.
919
920
921 Examples:
922
923
924 # list context
925 ($one,$five,$fifteen) = (`uptime` =~ /(d+.d+)/g);
926 # scalar context
927 $/ =
928 # using m//gc with G
929 $_ =
930 The last example should print:
931
932
933 1: 'oo', pos=4
934 2: 'q', pos=5
935 3: 'pp', pos=7
936 1: '', pos=7
937 2: 'q', pos=8
938 3: '', pos=8
939 Final: 'q', pos=8
940 Notice that the final match matched q instead of p, which a match without the G anchor would have done. Also note that the final match did not update pos -- pos is only updated on a /g match. If the final match did indeed match p, it's a good bet that you're running an older (pre-5.6.0) Perl.
941
942
943 A useful idiom for lex-like scanners is
944 /G.../gc. You can combine several regexps like this
945 to process a string part-by-part, doing different actions
946 depending on which regexp matched. Each regexp tries to
947 match where the previous one leaves off.
948
949
950 $_ =
951 Here is the output (split into several lines):
952
953
954 line-noise lowercase line-noise lowercase UPPERCASE line-noise
955 UPPERCASE line-noise lowercase line-noise lowercase line-noise
956 lowercase lowercase line-noise lowercase lowercase line-noise
957 MiXeD line-noise. That's all!
958
959
960 q/STRING/
961
962
963 'STRING'
964
965
966 A single-quoted, literal string. A backslash represents a
967 backslash unless followed by the delimiter or another
968 backslash, in which case the delimiter or backslash is
969 interpolated.
970
971
972 $foo = q!I said,
973
974
975 qq/STRING/
976
977
978 `` STRING ''
979
980
981 A double-quoted, interpolated string.
982
983
984 $_ .= qq
985 (*** The previous line contains the naughty word
986
987
988 qr/STRING/imosx
989
990
991 This operator quotes (and possibly compiles) its
992 ''STRING'' as a regular expression.
993 ''STRING'' is interpolated the same way as
994 ''PATTERN'' in m/PATTERN/. If
995 ``''' is used as the delimiter, no interpolation is done.
996 Returns a Perl value which may be used instead of the
997 corresponding /STRING/imosx
998 expression.
999
1000
1001 For example,
1002
1003
1004 $rex = qr/my.STRING/is;
1005 s/$rex/foo/;
1006 is equivalent to
1007
1008
1009 s/my.STRING/foo/is;
1010 The result may be used as a subpattern in a match:
1011
1012
1013 $re = qr/$pattern/;
1014 $string =~ /foo${re}bar/; # can be interpolated in other patterns
1015 $string =~ $re; # or used standalone
1016 $string =~ /$re/; # or this way
1017 Since Perl may compile the pattern at the moment of execution of ''qr()'' operator, using ''qr()'' may have speed advantages in some situations, notably if the result of ''qr()'' is used standalone:
1018
1019
1020 sub match {
1021 my $patterns = shift;
1022 my @compiled = map qr/$_/i, @$patterns;
1023 grep {
1024 my $success = 0;
1025 foreach my $pat (@compiled) {
1026 $success = 1, last if /$pat/;
1027 }
1028 $success;
1029 } @_;
1030 }
1031 Precompilation of the pattern into an internal representation at the moment of ''qr()'' avoids a need to recompile the pattern every time a match /$pat/ is attempted. (Perl has many other internal optimizations, but none would be triggered in the above example if we did not use ''qr()'' operator.)
1032
1033
1034 Options are:
1035
1036
1037 i Do case-insensitive pattern matching.
1038 m Treat string as multiple lines.
1039 o Compile pattern only once.
1040 s Treat string as single line.
1041 x Use extended regular expressions.
1042 See perlre for additional information on valid syntax for STRING , and for a detailed look at the semantics of regular expressions.
1043
1044
1045 qx/STRING/
1046
1047
1048 `STRING`
1049
1050
1051 A string which is (possibly) interpolated and then executed
1052 as a system command with /bin/sh or its equivalent.
1053 Shell wildcards, pipes, and redirections will be honored.
1054 The collected standard output of the command is returned;
1055 standard error is unaffected. In scalar context, it comes
1056 back as a single (potentially multi-line) string, or undef
1057 if the command failed. In list context, returns a list of
1058 lines (however you've defined lines with $/ or
1059 $INPUT_RECORD_SEPARATOR), or an empty list if the
1060 command failed.
1061
1062
1063 Because backticks do not affect standard error, use shell
1064 file descriptor syntax (assuming the shell supports this) if
1065 you care to address this. To capture a command's
1066 STDERR and STDOUT
1067 together:
1068
1069
1070 $output = `cmd 2
1071 To capture a command's STDOUT but discard its STDERR:
1072
1073
1074 $output = `cmd 2
1075 To capture a command's STDERR but discard its STDOUT (ordering is important here):
1076
1077
1078 $output = `cmd 2
1079 To exchange a command's STDOUT and STDERR in order to capture the STDERR but leave its STDOUT to come out the old STDERR:
1080
1081
1082 $output = `cmd 3
1083 To read both a command's STDOUT and its STDERR separately, it's easiest and safest to redirect them separately to files, and then read from those files when the program is done:
1084
1085
1086 system(
1087 Using single-quote as a delimiter protects the command from Perl's double-quote interpolation, passing it on to the shell instead:
1088
1089
1090 $perl_info = qx(ps $$); # that's Perl's $$
1091 $shell_info = qx'ps $$'; # that's the new shell's $$
1092 How that string gets evaluated is entirely subject to the command interpreter on your system. On most platforms, you will have to protect shell metacharacters if you want them treated literally. This is in practice difficult to do, as it's unclear how to escape which characters. See perlsec for a clean and safe example of a manual ''fork()'' and ''exec()'' to emulate backticks safely.
1093
1094
1095 On some platforms (notably DOS-like ones), the shell may not
1096 be capable of dealing with multiline commands, so putting
1097 newlines in the string may not get you what you want. You
1098 may be able to evaluate multiple commands in a single line
1099 by separating them with the command separator character, if
1100 your shell supports that (e.g. ; on many Unix
1101 shells; on the Windows NT
1102 cmd shell).
1103
1104
1105 Beginning with v5.6.0, Perl will attempt to flush all files
1106 opened for output before starting the child process, but
1107 this may not be supported on some platforms (see perlport).
1108 To be safe, you may need to set $ ($AUTOFLUSH in
1109 English) or call the autoflush() method of
1110 IO::Handle on any open handles.
1111
1112
1113 Beware that some command shells may place restrictions on
1114 the length of the command line. You must ensure your strings
1115 don't exceed this limit after any necessary interpolations.
1116 See the platform-specific release notes for more details
1117 about your particular environment.
1118
1119
1120 Using this operator can lead to programs that are difficult
1121 to port, because the shell commands called vary between
1122 systems, and may in fact not be present at all. As one
1123 example, the type command under the
1124 POSIX shell is very different from the
1125 type command under DOS . That
1126 doesn't mean you should go out of your way to avoid
1127 backticks when they're the right way to get something done.
1128 Perl was made to be a glue language, and one of the things
1129 it glues together is commands. Just understand what you're
1130 getting yourself into.
1131
1132
1133 See ``I/O Operators'' for more discussion.
1134
1135
1136 qw/STRING/
1137
1138
1139 Evaluates to a list of the words extracted out of
1140 STRING , using embedded whitespace as the
1141 word delimiters. It can be understood as being roughly
1142 equivalent to:
1143
1144
1145 split(' ', q/STRING/);
1146 the difference being that it generates a real list at compile time. So this expression:
1147
1148
1149 qw(foo bar baz)
1150 is semantically equivalent to the list:
1151
1152
1153 'foo', 'bar', 'baz'
1154 Some frequently seen examples:
1155
1156
1157 use POSIX qw( setlocale localeconv )
1158 @EXPORT = qw( foo bar baz );
1159 A common mistake is to try to separate the words with comma or to put comments into a multi-line qw-string. For this reason, the use warnings pragma and the __-w__ switch (that is, the $^W variable) produces warnings if the STRING contains the ``,'' or the ``#'' character.
1160
1161
1162 s/PATTERN/REPLACEMENT/egimosx
1163
1164
1165 Searches a string for a pattern, and if found, replaces that
1166 pattern with the replacement text and returns the number of
1167 substitutions made. Otherwise it returns false
1168 (specifically, the empty string).
1169
1170
1171 If no string is specified via the =~ or !~
1172 operator, the $_ variable is searched and modified.
1173 (The string specified with =~ must be scalar
1174 variable, an array element, a hash element, or an assignment
1175 to one of those, i.e., an lvalue.)
1176
1177
1178 If the delimiter chosen is a single quote, no interpolation
1179 is done on either the PATTERN or the
1180 REPLACEMENT . Otherwise, if the
1181 PATTERN contains a $ that looks like a
1182 variable rather than an end-of-string test, the variable
1183 will be interpolated into the pattern at run-time. If you
1184 want the pattern compiled only once the first time the
1185 variable is interpolated, use the /o option. If the
1186 pattern evaluates to the empty string, the last successfully
1187 executed regular expression is used instead. See perlre for
1188 further explanation on these. See perllocale for discussion
1189 of additional considerations that apply when use
1190 locale is in effect.
1191
1192
1193 Options are:
1194
1195
1196 e Evaluate the right side as an expression.
1197 g Replace globally, i.e., all occurrences.
1198 i Do case-insensitive pattern matching.
1199 m Treat string as multiple lines.
1200 o Compile pattern only once.
1201 s Treat string as single line.
1202 x Use extended regular expressions.
1203 Any non-alphanumeric, non-whitespace delimiter may replace the slashes. If single quotes are used, no interpretation is done on the replacement string (the /e modifier overrides this, however). Unlike Perl 4, Perl 5 treats backticks as normal delimiters; the replacement text is not evaluated as a command. If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g., s(foo)(bar) or s. A /e will cause the replacement portion to be treated as a full-fledged Perl expression and evaluated right then and there. It is, however, syntax checked at compile-time. A second e modifier will cause the replacement portion to be evaled before being run as a Perl expression.
1204
1205
1206 Examples:
1207
1208
1209 s/bgreenb/mauve/g; # don't change wintergreen
1210 $path =~ s/usr/bin/usr/local/bin;
1211 s/Login: $foo/Login: $bar/; # run-time pattern
1212 ($foo = $bar) =~ s/this/that/; # copy first, then change
1213 $count = ($paragraph =~ s/Misterb/Mr./g); # get change-count
1214 $_ = 'abc123xyz';
1215 s/d+/$
1216 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
1217 s/%(.)/$percent{$1} $
1218 # expand variables in $_, but dynamics only, using
1219 # symbolic dereferencing
1220 s/$(w+)/${$1}/g;
1221 # Add one to the value of any numbers in the string
1222 s/(d+)/1 + $1/eg;
1223 # This will expand any embedded scalar variable
1224 # (including lexicals) in $_ : First $1 is interpolated
1225 # to the variable name, and then evaluated
1226 s/($w+)/$1/eeg;
1227 # Delete (most) C comments.
1228 $program =~ s {
1229 /* # Match the opening delimiter.
1230 .*? # Match a minimal number of characters.
1231 */ # Match the closing delimiter.
1232 } [[]gsx;
1233 s/^s*(.*?)s*$/$1/; # trim white space in $_, expensively
1234 for ($variable) { # trim white space in $variable, cheap
1235 s/^s+//;
1236 s/s+$//;
1237 }
1238 s/([[^ ]*) *([[^ ]*)/$2 $1/; # reverse 1st two fields
1239 Note the use of $ instead of \ in the last example. Unlike __sed__, we use the __digit''''digit''''
1240
1241
1242 Occasionally, you can't use just a /g to get all
1243 the changes to occur that you might want. Here are two
1244 common cases:
1245
1246
1247 # put commas in the right places in an integer
1248 1 while s/(d)(ddd)(?!d)/$1,$2/g;
1249 # expand tabs to 8-column spacing
1250 1 while s/t+/' ' x (length($
1251
1252
1253 tr/SEARCHLIST/REPLACEMENTLIST/cds
1254
1255
1256 y/SEARCHLIST/REPLACEMENTLIST/cds
1257
1258
1259 Transliterates all occurrences of the characters found in
1260 the search list with the corresponding character in the
1261 replacement list. It returns the number of characters
1262 replaced or deleted. If no string is specified via the =~ or
1263 !~ operator, the $_ string is transliterated. (The
1264 string specified with =~ must be a scalar variable, an array
1265 element, a hash element, or an assignment to one of those,
1266 i.e., an lvalue.)
1267
1268
1269 A character range may be specified with a hyphen, so
1270 tr/A-J/0-9/ does the same replacement as
1271 tr/ACEGIBDFHJ/0246813579/. For __sed__ devotees,
1272 y is provided as a synonym for tr. If the
1273 SEARCHLIST is delimited by bracketing quotes,
1274 the REPLACEMENTLIST has its own pair of
1275 quotes, which may or may not be bracketing quotes, e.g.,
1276 tr[[A-Z][[a-z] or
1277 tr(+-*/)/ABCD/.
1278
1279
1280 Note that tr does __not__ do regular expression
1281 character classes such as d or [[:lower:].
1282 The
1283 tr''(1) utility. If you want to map strings between
1284 lower/upper cases, see ``lc'' in perlfunc and ``uc'' in
1285 perlfunc, and in general consider using the s
1286 operator if you need regular expressions.
1287
1288
1289 Note also that the whole range idea is rather unportable
1290 between character sets--and even within character sets they
1291 may cause results you probably didn't expect. A sound
1292 principle is to use only ranges that begin from and end at
1293 either alphabets of equal case (a-e, A-E), or digits (0-4).
1294 Anything else is unsafe. If in doubt, spell out the
1295 character sets in full.
1296
1297
1298 Options:
1299
1300
1301 c Complement the SEARCHLIST.
1302 d Delete found but unreplaced characters.
1303 s Squash duplicate replaced characters.
1304 If the /c modifier is specified, the SEARCHLIST character set is complemented. If the /d modifier is specified, any characters specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of some __tr__ programs, which delete anything they find in the SEARCHLIST , period.) If the /s modifier is specified, sequences of characters that were transliterated to the same character are squashed down to a single instance of the character.
1305
1306
1307 If the /d modifier is used, the
1308 REPLACEMENTLIST is always interpreted exactly
1309 as specified. Otherwise, if the
1310 REPLACEMENTLIST is shorter than the
1311 SEARCHLIST , the final character is
1312 replicated till it is long enough. If the
1313 REPLACEMENTLIST is empty, the
1314 SEARCHLIST is replicated. This latter is
1315 useful for counting characters in a class or for squashing
1316 character sequences in a class.
1317
1318
1319 Examples:
1320
1321
1322 $ARGV[[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
1323 $cnt = tr/*/*/; # count the stars in $_
1324 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
1325 $cnt = tr/0-9//; # count the digits in $_
1326 tr/a-zA-Z//s; # bookkeeper -
1327 ($HOST = $host) =~ tr/a-z/A-Z/;
1328 tr/a-zA-Z/ /cs; # change non-alphas to single space
1329 tr [[200-377]
1330 [[000-177]; # delete 8th bit
1331 If multiple transliterations are given for a character, only the first one is used:
1332
1333
1334 tr/AAA/XYZ/
1335 will transliterate any A to X.
1336
1337
1338 Because the transliteration table is built at compile time,
1339 neither the SEARCHLIST nor the
1340 REPLACEMENTLIST are subjected to double quote
1341 interpolation. That means that if you want to use variables,
1342 you must use an ''eval()'':
1343
1344
1345 eval
1346 eval
1347
1348
1349 __Gory details of parsing quoted
1350 constructs__
1351
1352
1353 When presented with something that might have several
1354 different interpretations, Perl uses the
1355 __DWIM__ (that's ``Do What I Mean'')
1356 principle to pick the most probable interpretation. This
1357 strategy is so successful that Perl programmers often do not
1358 suspect the ambivalence of what they write. But from time to
1359 time, Perl's notions differ substantially from what the
1360 author honestly meant.
1361
1362
1363 This section hopes to clarify how Perl handles quoted
1364 constructs. Although the most common reason to learn this is
1365 to unravel labyrinthine regular expressions, because the
1366 initial steps of parsing are the same for all quoting
1367 operators, they are all discussed together.
1368
1369
1370 The most important Perl parsing rule is the first one
1371 discussed below: when processing a quoted construct, Perl
1372 first finds the end of that construct, then interprets its
1373 contents. If you understand this rule, you may skip the rest
1374 of this section on the first reading. The other rules are
1375 likely to contradict the user's expectations much less
1376 frequently than this first one.
1377
1378
1379 Some passes discussed below are performed concurrently, but
1380 because their results are the same, we consider them
1381 individually. For different quoting constructs, Perl
1382 performs different numbers of passes, from one to five, but
1383 these passes are always performed in the same
1384 order.
1385
1386
1387 Finding the end
1388
1389
1390 The first pass is finding the end of the quoted construct,
1391 whether it be a multicharacter delimiter
1392 in the
1393 construct, a / that terminates a qq//
1394 construct, a ] which terminates qq[[]
1395 construct, or a which terminates a fileglob
1396 started with .
1397
1398
1399 When searching for single-character non-pairing delimiters,
1400 such as /, combinations of \ and
1401 / are skipped. However, when searching for
1402 single-character pairing delimiter like [[,
1403 combinations of \, ], and [[ are
1404 all skipped, and nested [[, ] are skipped
1405 as well. When searching for multicharacter delimiters,
1406 nothing is skipped.
1407
1408
1409 For constructs with three-part delimiters (s///,
1410 y///, and tr///), the search is repeated
1411 once more.
1412
1413
1414 During this search no attention is paid to the semantics of
1415 the construct. Thus:
1416
1417
1418
1419 or:
1420
1421
1422 m/
1423 bar # NOT a comment, this slash / terminated m//!
1424 /x
1425 do not form legal quoted expressions. The quoted part ends on the first and /, and the rest happens to be a syntax error. Because the slash that terminated m// was followed by a SPACE, the example above is not m//x, but rather m// with no /x modifier. So the embedded # is interpreted as a literal #.
1426
1427
1428 Removal of backslashes before delimiters
1429
1430
1431 During the second pass, text between the starting and ending
1432 delimiters is copied to a safe location, and the \
1433 is removed from combinations consisting of \ and
1434 delimiter--or delimiters, meaning both starting and ending
1435 delimiters will should these differ. This removal does not
1436 happen for multi-character delimiters. Note that the
1437 combination \ is left intact, just as it
1438 was.
1439
1440
1441 Starting from this step no information about the delimiters
1442 is used in parsing.
1443
1444
1445 Interpolation
1446
1447
1448 The next step is interpolation in the text obtained, which
1449 is now delimiter-independent. There are four different
1450 cases.
1451
1452
1453 , m'', s''',
1454 tr///, y///
1455
1456
1457 No interpolation is performed.
1458
1459
1460 '', q//
1461
1462
1463 The only interpolation is removal of \ from pairs
1464 \.
1465
1466
1467 ``'', ``, qq//, qx//,
1468
1469
1470 Q, U, u, L, l
1471 (possibly paired with E) are converted to
1472 corresponding Perl constructs. Thus,
1473 is converted to $foo .
1474 (quotemeta( internally. The
1475 other combinations are replaced with appropriate
1476 expansions.
1477
1478
1479 Let it be stressed that ''whatever falls between Q and
1480 E'' is interpolated in the usual way. Something like
1481 has no E inside. instead,
1482 it has Q, \, and E, so the result
1483 is the same as for . As a general
1484 rule, backslashes between Q and E may lead
1485 to counterintuitive results. So, is
1486 converted to quotemeta(, which is the
1487 same as (since TAB is
1488 not alphanumeric). Note also that:
1489
1490
1491 $str = 't';
1492 return
1493 may be closer to the conjectural ''intention'' of the writer of .
1494
1495
1496 Interpolated scalars and arrays are converted internally to
1497 the join and . catenation operations.
1498 Thus,
1499 becomes:
1500
1501
1502 $foo .
1503 All operations above are performed simultaneously, left to right.
1504
1505
1506 Because the result of has
1507 all metacharacters quoted, there is no way to insert a
1508 literal $ or @ inside a QE pair.
1509 If protected by \, $ will be quoted to
1510 became ; if not, it is interpreted as
1511 the start of an interpolated scalar.
1512
1513
1514 Note also that the interpolation code needs to make a
1515 decision on where the interpolated scalar ends. For
1516 instance, whether really
1517 means:
1518
1519
1520
1521 or:
1522
1523
1524
1525 Most of the time, the longest possible text that does not include spaces between components and which contains matching braces or brackets. because the outcome may be determined by voting based on heuristic estimators, the result is not strictly predictable. Fortunately, it's usually correct for ambiguous cases.
1526
1527
1528 ?RE?, /RE/, m/RE/,
1529 s/RE/foo/,
1530
1531
1532 Processing of Q, U, u,
1533 L, l, and interpolation happens (almost)
1534 as with qq// constructs, but the substitution of
1535 \ followed by RE-special chars (including
1536 \) is not performed. Moreover, inside
1537 (?{BLOCK}), (?# comment ), and a
1538 #-comment in a //x-regular expression, no
1539 processing is performed whatsoever. This is the first step
1540 at which the presence of the //x modifier is
1541 relevant.
1542
1543
1544 Interpolation has several quirks: $, $(,
1545 and $) are not interpolated, and constructs
1546 $var[[SOMETHING] are voted (by several different
1547 estimators) to be either an array element or $var
1548 followed by an RE alternative. This is where
1549 the notation ${arr[[$bar]} comes handy:
1550 /${arr[[0-9]}/ is interpreted as array element
1551 -9, not as a regular expression from the variable
1552 $arr followed by a digit, which would be the
1553 interpretation of /$arr[[0-9]/. Since voting among
1554 different estimators may occur, the result is not
1555 predictable.
1556
1557
1558 It is at this step that 1 is begrudgingly converted
1559 to $1 in the replacement text of s/// to
1560 correct the incorrigible ''sed'' hackers who haven't
1561 picked up the saner idiom yet. A warning is emitted if the
1562 use warnings pragma or the __-w__ command-line
1563 flag (that is, the $^W variable) was
1564 set.
1565
1566
1567 The lack of processing of \ creates specific
1568 restrictions on the post-processed text. If the delimiter is
1569 /, one cannot get the combination / into
1570 the result of this step. / will finish the regular
1571 expression, / will be stripped to / on the
1572 previous step, and \/ will be left as is. Because
1573 / is equivalent to / inside a regular
1574 expression, this does not matter unless the delimiter
1575 happens to be character special to the RE
1576 engine, such as in s*foo*bar*, m[[foo], or
1577 ?foo?; or an alphanumeric char, as in:
1578
1579
1580 m m ^ a s* b mmx;
1581 In the RE above, which is intentionally obfuscated for illustration, the delimiter is m, the modifier is mx, and after backslash-removal the RE is the same as for m/ ^ a s* b /mx). There's more than one reason you're encouraged to restrict your delimiters to non-alphanumeric, non-whitespace choices.
1582
1583
1584 This step is the last one for all constructs except regular
1585 expressions, which are processed further.
1586
1587
1588 Interpolation of regular expressions
1589
1590
1591 Previous steps were performed during the compilation of Perl
1592 code, but this one happens at run time--although it may be
1593 optimized to be calculated at compile time if appropriate.
1594 After preprocessing described above, and possibly after
1595 evaluation if catenation, joining, casing translation, or
1596 metaquoting are involved, the resulting ''string'' is
1597 passed to the RE engine for
1598 compilation.
1599
1600
1601 Whatever happens in the RE engine might be
1602 better discussed in perlre, but for the sake of continuity,
1603 we shall do so here.
1604
1605
1606 This is another step where the presence of the //x
1607 modifier is relevant. The RE engine scans the
1608 string from left to right and converts it to a finite
1609 automaton.
1610
1611
1612 Backslashed characters are either replaced with
1613 corresponding literal strings (as with {), or else
1614 they generate special nodes in the finite automaton (as with
1615 b). Characters special to the RE
1616 engine (such as ) generate corresponding nodes or groups of
1617 nodes. (?#...) comments are ignored. All the rest
1618 is either converted to literal strings to match, or else is
1619 ignored (as is whitespace and #-style comments if
1620 //x is present).
1621
1622
1623 Parsing of the bracketed character class construct,
1624 [[...], is rather different than the rule used for
1625 the rest of the pattern. The terminator of this construct is
1626 found using the same rules as for finding the terminator of
1627 a {}-delimited construct, the only exception being
1628 that ] immediately following [[ is treated
1629 as though preceded by a backslash. Similarly, the terminator
1630 of (?{...}) is found using the same rules as for
1631 finding the terminator of a {}-delimited
1632 construct.
1633
1634
1635 It is possible to inspect both the string given to
1636 RE engine and the resulting finite automaton.
1637 See the arguments debug/debugcolor in the
1638 use re pragma, as well as Perl's __-Dr__
1639 command-line switch documented in ``Command Switches'' in
1640 perlrun.
1641
1642
1643 Optimization of regular expressions
1644
1645
1646 This step is listed for completeness only. Since it does not
1647 change semantics, details of this step are not documented
1648 and are subject to change without notice. This step is
1649 performed over the finite automaton that was generated
1650 during the previous pass.
1651
1652
1653 It is at this stage that split() silently optimizes
1654 /^/ to mean /^/m.
1655
1656
1657 __I/O Operators__
1658
1659
1660 There are several I/O operators you should know
1661 about.
1662
1663
1664 A string enclosed by backticks (grave accents) first
1665 undergoes double-quote interpolation. It is then interpreted
1666 as an external command, and the output of that command is
1667 the value of the backtick string, like in a shell. In scalar
1668 context, a single string consisting of all output is
1669 returned. In list context, a list of values is returned, one
1670 per line of output. (You can set $/ to use a
1671 different line terminator.) The command is executed each
1672 time the pseudo-literal is evaluated. The status value of
1673 the command is returned in $? (see perlvar for the
1674 interpretation of $?). Unlike in __csh__, no
1675 translation is done on the return data--newlines remain
1676 newlines. Unlike in any of the shells, single quotes do not
1677 hide variable names in the command from interpretation. To
1678 pass a literal dollar-sign through to the shell you need to
1679 hide it with a backslash. The generalized form of backticks
1680 is qx//. (Because backticks always undergo shell
1681 expansion as well, see perlsec for security
1682 concerns.)
1683
1684
1685 In scalar context, evaluating a filehandle in angle brackets
1686 yields the next line from that file (the newline, if any,
1687 included), or undef at end-of-file or on error.
1688 When $/ is set to undef (sometimes known
1689 as file-slurp mode) and the file is empty, it returns
1690 '' the first time, followed by undef
1691 subsequently.
1692
1693
1694 Ordinarily you must assign the returned value to a variable,
1695 but there is one situation where an automatic assignment
1696 happens. If and only if the input symbol is the only thing
1697 inside the conditional of a while statement (even
1698 if disguised as a for(;;) loop), the value is
1699 automatically assigned to the global variable $_,
1700 destroying whatever was there previously. (This may seem
1701 like an odd thing to you, but you'll use the construct in
1702 almost every Perl script you write.) The $_
1703 variable is not implicitly localized. You'll have to put a
1704 local $_; before the loop if you want that to
1705 happen.
1706
1707
1708 The following lines are equivalent:
1709
1710
1711 while (defined($_ =
1712 This also behaves similarly, but avoids $_ :
1713
1714
1715 while (my $line =
1716 In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a
1717
1718
1719 while (($_ =
1720 In other boolean contexts, without an explicit defined test or comparison elicit a warning if the use warnings pragma or the __-w__ command-line switch (the $^W variable) is in effect.
1721
1722
1723 The filehandles STDIN , STDOUT
1724 , and STDERR are predefined. (The filehandles
1725 stdin, stdout, and stderr will
1726 also work except in packages, where they would be
1727 interpreted as local identifiers rather than global.)
1728 Additional filehandles may be created with the ''open()''
1729 function, amongst others. See perlopentut and ``open'' in
1730 perlfunc for details on this.
1731
1732
1733 If a FILEHANDLE
1734
1735
1736 FILEHANDLE
1737 readline(*FILEHANDLE). See ``readline'' in
1738 perlfunc.
1739
1740
1741 The null filehandle
1742 sed__ and __awk__. Input
1743 from
1744 __@ARGV array
1745 is checked, and if it is empty, $ARGV[[0] is set to
1746 ``-'', which when opened gives you standard input. The
1747 @ARGV array is then processed as a list of
1748 filenames. The loop
1749
1750
1751 while (
1752 is equivalent to the following Perl-like pseudo code:
1753
1754
1755 unshift(@ARGV, '-') unless @ARGV;
1756 while ($ARGV = shift) {
1757 open(ARGV, $ARGV);
1758 while (
1759 except that it isn't so cumbersome to say, and will actually work. It really does shift the @ARGV array and put the current filename into the $ARGV variable. It also uses filehandle ''ARGV'' internally--''ARGV ARGV
1760
1761
1762 You can modify @ARGV before the first
1763 $.) continue as
1764 though the input were one big happy file. See the example in
1765 ``eof'' in perlfunc for how to reset line numbers on each
1766 file.
1767
1768
1769 If you want to set @ARGV to your own list of files,
1770 go right ahead. This sets @ARGV to all plain text
1771 files if no @ARGV was given:
1772
1773
1774 @ARGV = grep { -f
1775 You can even set them to pipe commands. For example, this automatically filters compressed arguments through __gzip__:
1776
1777
1778 @ARGV = map { /.(gzZ)$/ ?
1779 If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this:
1780
1781
1782 while ($_ = $ARGV[[0], /^-/) {
1783 shift;
1784 last if /^--$/;
1785 if (/^-D(.*)/) { $debug = $1 }
1786 if (/^-v/) { $verbose++ }
1787 # ... # other switches
1788 }
1789 while (
1790 The undef for end-of-file only once. If you call it again after this, it will assume you are processing another @ARGV list, and if you haven't set @ARGV, will read input from STDIN .
1791
1792
1793 If angle brackets contain is a simple scalar variable (e.g.,
1794
1795
1796 $fh = *STDIN;
1797 $line =
1798 If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means is always a ''readline()'' from an indirect handle, but is always a ''glob()''. That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element.
1799
1800
1801 One level of double-quote interpretation is done first, but
1802 you can't say because that's an
1803 indirect filehandle as explained in the previous paragraph.
1804 (In older versions of Perl, programmers would insert curly
1805 brackets to force interpretation as a filename glob:
1806 . These days, it's considered cleaner
1807 to call the internal function directly as
1808 glob($foo), which is probably the right way to have
1809 done it in the first place.) For example:
1810
1811
1812 while (
1813 is roughly equivalent to:
1814
1815
1816 open(FOO,
1817 except that the globbing is actually done internally using the standard File::Glob extension. Of course, the shortest way to do the above is:
1818
1819
1820 chmod 0644,
1821 A (file)glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before it will start over. In list context, this isn't important because you automatically get them all anyway. However, in scalar context the operator returns the next value each time it's called, or undef when the list has run out. As with filehandle reads, an automatic defined is generated when the glob occurs in the test part of a while, because legal glob returns (e.g. a file called ''0'') would otherwise terminate the loop. Again, undef is returned only once. So if you're expecting a single value from a glob, it is much better to say
1822
1823
1824 ($file) =
1825 than
1826
1827
1828 $file =
1829 because the latter will alternate between returning a filename and returning false.
1830
1831
1832 It you're trying to do variable interpolation, it's
1833 definitely better to use the ''glob()'' function, because
1834 the older notation can cause people to become confused with
1835 the indirect filehandle notation.
1836
1837
1838 @files = glob(
1839
1840
1841 __Constant Folding__
1842
1843
1844 Like C, Perl does a certain amount of expression evaluation
1845 at compile time whenever it determines that all arguments to
1846 an operator are static and have no side effects. In
1847 particular, string concatenation happens at compile time
1848 between literals that don't do variable substitution.
1849 Backslash interpolation also happens at compile time. You
1850 can say
1851
1852
1853 'Now is the time for all' .
1854 and this all reduces to one string internally. Likewise, if you say
1855
1856
1857 foreach $file (@filenames) {
1858 if (-s $file
1859 the compiler will precompute the number which that expression represents so that the interpreter won't have to.
1860
1861
1862 __Bitwise String Operators__
1863
1864
1865 Bitstrings of any size may be manipulated by the bitwise
1866 operators (~ ).
1867
1868
1869 If the operands to a binary bitwise op are strings of
1870 different sizes, and __^__ ops act as though the shorter
1871 operand had additional zero bits on the right, while the
1872 ____ op acts as though the longer operand were
1873 truncated to the length of the shorter. The granularity for
1874 such extension or truncation is one or more
1875 bytes.
1876
1877
1878 # ASCII-based examples
1879 print
1880 If you are intending to manipulate bitstrings, be certain that you're supplying bitstrings: If an operand is a number, that will imply a __numeric__ bitwise operation. You may explicitly show which type of operation you intend by using or 0+, as in the examples below.
1881
1882
1883 $foo = 150 105 ; # yields 255 (0x96 0x69 is 0xFF)
1884 $foo = '150' 105 ; # yields 255
1885 $foo = 150 '105'; # yields 255
1886 $foo = '150' '105'; # yields string '155' (under ASCII)
1887 $baz = 0+$foo
1888 See ``vec'' in perlfunc for information on how to manipulate individual bits in a bit vector.
1889
1890
1891 __Integer Arithmetic__
1892
1893
1894 By default, Perl assumes that it must do most of its
1895 arithmetic in floating point. But by saying
1896
1897
1898 use integer;
1899 you may tell the compiler that it's okay to use integer operations (if it feels like it) from here to the end of the enclosing BLOCK . An inner BLOCK may countermand this by saying
1900
1901
1902 no integer;
1903 which lasts until the end of that BLOCK . Note that this doesn't mean everything is only an integer, merely that Perl may use integer operations if it is so inclined. For example, even under use integer, if you take the sqrt(2), you'll still get 1.4142135623731 or so.
1904
1905
1906 Used on numbers, the bitwise operators (``
1907 use integer still has
1908 meaning for them. By default, their results are interpreted
1909 as unsigned integers, but if use integer is in
1910 effect, their results are interpreted as signed integers.
1911 For example, ~0 usually evaluates to a large
1912 integral value. However, use integer; ~0 is
1913 -1 on twos-complement machines.
1914
1915
1916 __Floating-point Arithmetic__
1917
1918
1919 While use integer provides integer-only arithmetic,
1920 there is no analogous mechanism to provide automatic
1921 rounding or truncation to a certain number of decimal
1922 places. For rounding to a certain number of digits,
1923 ''sprintf()'' or ''printf()'' is usually the easiest
1924 route. See perlfaq4.
1925
1926
1927 Floating-point numbers are only approximations to what a
1928 mathematician would call real numbers. There are infinitely
1929 more reals than floats, so some corners must be cut. For
1930 example:
1931
1932
1933 printf
1934 Testing for exact equality of floating-point equality or inequality is not a good idea. Here's a (relatively expensive) work-around to compare whether two floating-point numbers are equal to a particular number of decimal places. See Knuth, volume II , for a more robust treatment of this topic.
1935
1936
1937 sub fp_equal {
1938 my ($X, $Y, $POINTS) = @_;
1939 my ($tX, $tY);
1940 $tX = sprintf(
1941 The POSIX module (part of the standard perl distribution) implements ''ceil()'', ''floor()'', and other mathematical and trigonometric functions. The Math::Complex module (part of the standard perl distribution) defines mathematical functions that work on both the reals and the imaginary numbers. Math::Complex not as efficient as POSIX , but POSIX can't work with complex numbers.
1942
1943
1944 Rounding in financial applications can have serious
1945 implications, and the rounding method used should be
1946 specified precisely. In these cases, it probably pays not to
1947 trust whichever system rounding is being used by Perl, but
1948 to instead implement the rounding function you need
1949 yourself.
1950
1951
1952 __Bigger Numbers__
1953
1954
2 perry 1955 The standard Math::!BigInt and Math::!BigFloat modules provide
1 perry 1956 variable-precision arithmetic and overloaded operators,
1957 although they're currently pretty slow. At the cost of some
1958 space and considerable speed, they avoid the normal pitfalls
1959 associated with limited-precision
1960 representations.
1961
1962
2 perry 1963 use Math::!BigInt;
1964 $x = Math::!BigInt-
1 perry 1965 # prints +15241578780673678515622620750190521
1966 There are several modules that let you calculate with (bound only by memory and cpu-time) unlimited or fixed precision. There are also some non-standard modules that provide faster implementations via external C libraries.
1967
1968
1969 Here is a short, but incomplete summary:
1970
1971
1972 Math::Fraction big, unlimited fractions like 9973 / 12967
1973 Math::String treat string sequences like numbers
2 perry 1974 Math::!FixedPrecision calculate with a fixed precision
1 perry 1975 Math::Currency for currency calculations
1976 Bit::Vector manipulate bit vectors fast (uses C)
2 perry 1977 Math::!BigIntFast Bit::Vector wrapper for big numbers
1 perry 1978 Math::Pari provides access to the Pari C library
2 perry 1979 Math::!BigInteger uses an external C library
1 perry 1980 Math::Cephes uses external Cephes C library (no big numbers)
1981 Math::Cephes::Fraction fractions via the Cephes library
1982 Math::GMP another one using an external C library
1983 Choose wisely.
1984 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.