Penguin
Annotated edit history of perlhack(1) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 PERLHACK
2 !!!PERLHACK
3 NAME
4 DESCRIPTION
5 EXTERNAL TOOLS FOR DEBUGGING PERL
6 AUTHOR
7 ----
8 !!NAME
9
10
11 perlhack - How to hack at the Perl internals
12 !!DESCRIPTION
13
14
15 This document attempts to explain how Perl development takes
16 place, and ends with some suggestions for people wanting to
17 become bona fide porters.
18
19
20 The perl5-porters mailing list is where the Perl standard
21 distribution is maintained and developed. The list can get
22 anywhere from 10 to 150 messages a day, depending on the
23 heatedness of the debate. Most days there are two or three
24 patches, extensions, features, or bugs being discussed at a
25 time.
26
27
28 A searchable archive of the list is at:
29
30
31 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
32 The list is also archived under the usenet group name perl.porters-gw at:
33
34
35 http://www.deja.com/
36 List subscribers (the porters themselves) come in several flavours. Some are quiet curious lurkers, who rarely pitch in and instead watch the ongoing development to ensure they're forewarned of new changes or features in Perl. Some are representatives of vendors, who are there to make sure that Perl continues to compile and work on their platforms. Some patch any reported bug that they know how to fix, some are actively patching their pet area (threads, Win32, the regexp engine), while others seem to do nothing but complain. In other words, it's your usual mix of technical people.
37
38
39 Over this group of porters presides Larry Wall. He has the
40 final word in what does and does not change in the Perl
41 language. Various releases of Perl are shepherded by a
42 ``pumpking'', a porter responsible for gathering patches,
43 deciding on a patch-by-patch feature-by-feature basis what
44 will and will not go into the release. For instance,
45 Gurusamy Sarathy is the pumpking for the 5.6 release of
46 Perl.
47
48
49 In addition, various people are pumpkings for different
50 things. For instance, Andy Dougherty and Jarkko Hietaniemi
51 share the ''Configure'' pumpkin, and Tom Christiansen is
52 the documentation pumpking.
53
54
55 Larry sees Perl development along the lines of the
56 US government: there's the Legislature (the
57 porters), the Executive branch (the pumpkings), and the
58 Supreme Court (Larry). The legislature can discuss and
59 submit patches to the executive branch all they like, but
60 the executive branch is free to veto them. Rarely, the
61 Supreme Court will side with the executive branch over the
62 legislature, or the legislature over the executive branch.
63 Mostly, however, the legislature and the executive branch
64 are supposed to get along and work out their differences
65 without impeachment or court cases.
66
67
68 You might sometimes see reference to Rule 1 and Rule 2.
69 Larry's power as Supreme Court is expressed in The
70 Rules:
71
72
73 1
74
75
76 Larry is always by definition right about how Perl should
77 behave. This means he has final veto power on the core
78 functionality.
79
80
81 2
82
83
84 Larry is allowed to change his mind about any matter at a
85 later date, regardless of whether he previously invoked Rule
86 1.
87
88
89 Got that? Larry is always right, even when he was wrong.
90 It's rare to see either Rule exercised, but they are often
91 alluded to.
92
93
94 New features and extensions to the language are contentious,
95 because the criteria used by the pumpkings, Larry, and other
96 porters to decide which features should be implemented and
97 incorporated are not codified in a few small design goals as
98 with some other languages. Instead, the heuristics are
99 flexible and often difficult to fathom. Here is one person's
100 list, roughly in decreasing order of importance, of
101 heuristics that new features have to be weighed
102 against:
103
104
105 Does concept match the general goals of Perl?
106
107
108 These haven't been written anywhere in stone, but one
109 approximation is:
110
111
112 1. Keep it fast, simple, and useful.
113 2. Keep features/concepts as orthogonal as possible.
114 3. No arbitrary limits (platforms, data sizes, cultures).
115 4. Keep it open and exciting to use/patch/advocate Perl everywhere.
116 5. Either assimilate new technologies, or build bridges to them.
117
118
119 Where is the implementation?
120
121
122 All the talk in the world is useless without an
123 implementation. In almost every case, the person or people
124 who argue for a new feature will be expected to be the ones
125 who implement it. Porters capable of coding new features
126 have their own agendas, and are not available to implement
127 your (possibly good) idea.
128
129
130 Backwards compatibility
131
132
133 It's a cardinal sin to break existing Perl programs. New
134 warnings are contentious--some say that a program that emits
135 warnings is not broken, while others say it is. Adding
136 keywords has the potential to break programs, changing the
137 meaning of existing token sequences or functions might break
138 programs.
139
140
141 Could it be a module instead?
142
143
144 Perl 5 has extension mechanisms, modules and
145 XS , specifically to avoid the need to keep
146 changing the Perl interpreter. You can write modules that
147 export functions, you can give those functions prototypes so
148 they can be called like built-in functions, you can even
149 write XS code to mess with the runtime data
150 structures of the Perl interpreter if you want to implement
151 really complicated things. If it can be done in a module
152 instead of in the core, it's highly unlikely to be
153 added.
154
155
156 Is the feature generic enough?
157
158
159 Is this something that only the submitter wants added to the
160 language, or would it be broadly useful? Sometimes, instead
161 of adding a feature with a tight focus, the porters might
162 decide to wait until someone implements the more generalized
163 feature. For instance, instead of implementing a ``delayed
164 evaluation'' feature, the porters are waiting for a macro
165 system that would permit delayed evaluation and much
166 more.
167
168
169 Does it potentially introduce new bugs?
170
171
172 Radical rewrites of large chunks of the Perl interpreter
173 have the potential to introduce new bugs. The smaller and
174 more localized the change, the better.
175
176
177 Does it preclude other desirable features?
178
179
180 A patch is likely to be rejected if it closes off future
181 avenues of development. For instance, a patch that placed a
182 true and final interpretation on prototypes is likely to be
183 rejected because there are still options for the future of
184 prototypes that haven't been addressed.
185
186
187 Is the implementation robust?
188
189
190 Good patches (tight code, complete, correct) stand more
191 chance of going in. Sloppy or incorrect patches might be
192 placed on the back burner until the pumpking has time to
193 fix, or might be discarded altogether without further
194 notice.
195
196
197 Is the implementation generic enough to be
198 portable?
199
200
201 The worst patches make use of a system-specific features.
202 It's highly unlikely that nonportable additions to the Perl
203 language will be accepted.
204
205
206 Is there enough documentation?
207
208
209 Patches without documentation are probably ill-thought out
210 or incomplete. Nothing can be added without documentation,
211 so submitting a patch for the appropriate manpages as well
212 as the source code is always a good idea. If appropriate,
213 patches should add to the test suite as well.
214
215
216 Is there another way to do it?
217
218
219 Larry said ``Although the Perl Slogan is ''There's More
220 Than One Way to Do It'', I hesitate to make 10 ways to do
221 something''. This is a tricky heuristic to navigate,
222 though--one man's essential addition is another man's
223 pointless cruft.
224
225
226 Does it create too much work?
227
228
229 Work for the pumpking, work for Perl programmers, work for
230 module authors, ... Perl is supposed to be
231 easy.
232
233
234 Patches speak louder than words
235
236
237 Working code is always preferred to pie-in-the-sky ideas. A
238 patch to add a feature stands a much higher chance of making
239 it to the language than does a random feature request, no
240 matter how fervently argued the request might be. This ties
241 into ``Will it be useful?'', as the fact that someone took
242 the time to make the patch demonstrates a strong desire for
243 the feature.
244
245
246 If you're on the list, you might hear the word ``core''
247 bandied around. It refers to the standard distribution.
248 ``Hacking on the core'' means you're changing the C source
249 code to the Perl interpreter. ``A core module'' is one that
250 ships with Perl.
251
252
253 __Keeping in sync__
254
255
256 The source code to the Perl interpreter, in its different
257 versions, is kept in a repository managed by a revision
258 control system (which is currently the Perforce program, see
259 http://perforce.com/). The pumpkings and a few others have
260 access to the repository to check in changes. Periodically
261 the pumpking for the development version of Perl will
262 release a new version, so the rest of the porters can see
263 what's changed. The current state of the main trunk of
264 repository, and patches that describe the individual changes
265 that have happened since the last public release are
266 available at this location:
267
268
269 ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/
270 If you are a member of the perl5-porters mailing list, it is a good thing to keep in touch with the most recent changes. If not only to verify if what you would have posted as a bug report isn't already solved in the most recent available perl development branch, also known as perl-current, bleading edge perl, bleedperl or bleadperl.
271
272
273 Needless to say, the source code in perl-current is usually
274 in a perpetual state of evolution. You should expect it to
275 be very buggy. Do __not__ use it for any purpose other
276 than testing and development.
277
278
279 Keeping in sync with the most recent branch can be done in
280 several ways, but the most convenient and reliable way is
281 using __rsync__, available at
282 ftp://rsync.samba.org/pub/rsync/ . (You can also get the
283 most recent branch by FTP .)
284
285
286 If you choose to keep in sync using rsync, there are two
287 approaches to doing so:
288
289
290 rsync'ing the source tree
291
292
293 Presuming you are in the directory where your perl source
294 resides and you have rsync installed and available, you can
295 `upgrade' to the bleadperl using:
296
297
298 # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ .
299 This takes care of updating every single item in the source tree to the latest applied patch level, creating files that are new (to your distribution) and setting date/time stamps of existing files to reflect the bleadperl status.
300
301
302 You can than check what patch was the latest that was
303 applied by looking in the file __.patch__, which will
304 show the number of the latest patch.
305
306
307 If you have more than one machine to keep in sync, and not
308 all of them have access to the WAN (so you
309 are not able to rsync all the source trees to the real
310 source), there are some ways to get around this
311 problem.
312
313
314 Using rsync over the LAN
315
316
317 Set up a local rsync server which makes the rsynced source
318 tree available to the LAN and sync the other
319 machines against this directory.
320
321
322 From http://rsync.samba.org/README.html:
323
324
325
326
327
328 Using pushing over the NFS
329
330
331 Having the other systems mounted over the NFS
332 , you can take an active pushing approach by checking the
333 just updated tree against the other not-yet synced trees. An
334 example would be
335
336
337 #!/usr/bin/perl -w
338 use strict;
339 use File::Copy;
340 my %MF = map {
341 m/(S+)/;
342 $1 =
343 my %remote = map { $_ =
344 foreach my $host (keys %remote) {
345 unless (-d $remote{$host}) {
346 print STDERR
347 though this is not perfect. It could be improved with checking file checksums before updating. Not all NFS systems support reliable utime support (when used over the NFS ).
348
349
350 rsync'ing the patches
351
352
353 The source tree is maintained by the pumpking who applies
354 patches to the files in the tree. These patches are either
355 created by the pumpking himself using diff -c after
356 updating the file manually or by applying patches sent in by
357 posters on the perl5-porters list. These patches are also
358 saved and rsync'able, so you can apply them yourself to the
359 source files.
360
361
362 Presuming you are in a directory where your patches reside,
363 you can get them in sync with
364
365
366 # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
367 This makes sure the latest available patch is downloaded to your patch directory.
368
369
370 It's then up to you to apply these patches, using something
371 like
372
373
374 # last=`ls -rt1 *.gz tail -1`
375 # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
376 # find . -name '*.gz' -newer $last -exec gzcat {} ;
377 or, since this is only a hint towards how it works, use CPAN-patchaperl from Andreas K
378
379
380 __Why rsync the source tree__
381
382
383 It's easier
384
385
386 Since you don't have to apply the patches yourself, you are
387 sure all files in the source tree are in the right
388 state.
389
390
391 It's more recent
392
393
394 According to Gurusamy Sarathy:
395
396
397
398
399
400
401 It's more reliable
402
403
404 Well, since the patches are updated by hand, I don't have to
405 say any more ... (see Sarathy's remark).
406
407
408 __Why rsync the patches__
409
410
411 It's easier
412
413
414 If you have more than one machine that you want to keep in
415 track with bleadperl, it's easier to rsync the patches only
416 once and then apply them to all the source trees on the
417 different machines.
418
419
420 In case you try to keep in pace on 5 different machines, for
421 which only one of them has access to the WAN
422 , rsync'ing all the source trees should than be done 5 times
423 over the NFS . Having rsync'ed the patches
424 only once, I can apply them to all the source trees
425 automatically. Need you say more ;-)
426
427
428 It's a good reference
429
430
431 If you do not only like to have the most recent development
432 branch, but also like to __fix__ bugs, or extend
433 features, you want to dive into the sources. If you are a
434 seasoned perl core diver, you don't need no manuals, tips,
435 roadmaps, perlguts.pod or other aids to find your way
436 around. But if you are a starter, the patches may help you
437 in finding where you should start and how to change the bits
438 that bug you.
439
440
441 The file __Changes__ is updated on occasions the pumpking
442 sees as his own little sync points. On those occasions, he
443 releases a tar-ball of the current source tree (i.e.
444 perl@7582.tar.gz), which will be an excellent point to start
445 with when choosing to use the 'rsync the patches' scheme.
446 Starting with perl@7582, which means a set of source files
447 on which the latest applied patch is number 7582, you apply
448 all succeeding patches available from then on (7583, 7584,
449 ...).
450
451
452 You can use the patches later as a kind of search
453 archive.
454
455
456 Finding a start point
457
458
459 If you want to fix/change the behaviour of function/feature
460 Foo, just scan the patches for patches that mention Foo
461 either in the subject, the comments, or the body of the fix.
462 A good chance the patch shows you the files that are
463 affected by that patch which are very likely to be the
464 starting point of your journey into the guts of
465 perl.
466
467
468 Finding how to fix a bug
469
470
471 If you've found ''where'' the function/feature Foo
472 misbehaves, but you don't know how to fix it (but you do
473 know the change you want to make), you can, again, peruse
474 the patches for similar changes and look how others apply
475 the fix.
476
477
478 Finding the source of misbehaviour
479
480
481 When you keep in sync with bleadperl, the pumpking would
482 love to ''see'' that the community efforts realy work. So
483 after each of his sync points, you are to 'make test' to
484 check if everything is still in working order. If it is, you
485 do 'make ok', which will send an OK report to
486 perlbug@perl.org. (If you do not have access to a mailer
487 from the system you just finished successfully 'make test',
488 you can do 'make okfile', which creates the file
489 perl.ok, which you can than take to your favourite
490 mailer and mail yourself).
491
492
493 But of course, as always, things will not allways lead to a
494 success path, and one or more test do not pass the 'make
495 test'. Before sending in a bug report (using 'make nok' or
496 'make nokfile'), check the mailing list if someone else has
497 reported the bug already and if so, confirm it by replying
498 to that message. If not, you might want to trace the source
499 of that misbehaviour __before__ sending in the bug, which
500 will help all the other porters in finding the
501 solution.
502
503
504 Here the saved patches come in very handy. You can check the
505 list of patches to see which patch changed what file and
506 what change caused the misbehaviour. If you note that in the
507 bug report, it saves the one trying to solve it, looking for
508 that point.
509
510
511 If searching the patches is too bothersome, you might
512 consider using perl's bugtron to find more information about
513 discussions and ramblings on posted bugs.
514
515
516 If you want to get the best of both worlds, rsync both the
517 source tree for convenience, reliability and ease and rsync
518 the patches for reference.
519
520
521 __Submitting patches__
522
523
524 Always submit patches to ''perl5-porters@perl.org''. This
525 lets other porters review your patch, which catches a
526 surprising number of errors in patches. Either use the diff
527 program (available in source code form from
528 ''ftp://ftp.gnu.org/pub/gnu/''), or use Johan Vromans'
529 ''makepatch'' (available from
530 ''CPAN/authors/id/JV/''). Unified diffs are preferred,
531 but context diffs are accepted. Do not send RCS-style diffs
532 or diffs without context lines. More information is given in
533 the ''Porting/patching.pod'' file in the Perl source
534 distribution. Please patch against the latest
535 __development__ version (e.g., if you're fixing a bug in
536 the 5.005 track, patch against the latest 5.005_5x version).
537 Only patches that survive the heat of the development branch
538 get applied to maintenance versions.
539
540
541 Your patch should update the documentation and test
542 suite.
543
544
545 To report a bug in Perl, use the program ''perlbug''
546 which comes with Perl (if you can't get Perl to work, send
547 mail to the address ''perlbug@perl.org'' or
548 ''perlbug@perl.com''). Reporting bugs through
549 ''perlbug'' feeds into the automated bug-tracking system,
550 access to which is provided through the web at
551 ''http://bugs.perl.org/''. It often pays to check the
552 archives of the perl5-porters mailing list to see whether
553 the bug you're reporting has been reported before, and if so
554 whether it was considered a bug. See above for the location
555 of the searchable archives.
556
557
558 The CPAN testers
559 (''http://testers.cpan.org/'') are a group of volunteers
560 who test CPAN modules on a variety of
561 platforms. Perl Labs (''http://labs.perl.org/'')
562 automatically tests Perl source releases on platforms and
563 gives feedback to the CPAN testers mailing
564 list. Both efforts welcome volunteers.
565
566
567 It's a good idea to read and lurk for a while before
568 chipping in. That way you'll get to see the dynamic of the
569 conversations, learn the personalities of the players, and
570 hopefully be better prepared to make a useful contribution
571 when do you speak up.
572
573
574 If after all this you still think you want to join the
575 perl5-porters mailing list, send mail to
576 ''perl5-porters-subscribe@perl.org''. To unsubscribe,
577 send mail to
578 ''perl5-porters-unsubscribe@perl.org''.
579
580
581 To hack on the Perl guts, you'll need to read the following
582 things:
583
584
585 perlguts
586
587
588 This is of paramount importance, since it's the
589 documentation of what goes where in the Perl source. Read it
590 over a couple of times and it might start to make sense -
591 don't worry if it doesn't yet, because the best way to study
592 it is to read it in conjunction with poking at Perl source,
593 and we'll do that later on.
594
595
596 You might also want to look at Gisle Aas's illustrated
597 perlguts - there's no guarantee that this will be absolutely
598 up-to-date with the latest documentation in the Perl core,
599 but the fundamentals will be right.
600 (http://gisle.aas.no/perl/illguts/)
601
602
603 perlxstut and perlxs
604
605
606 A working knowledge of XSUB programming is
607 incredibly useful for core hacking; XSUBs use techniques
608 drawn from the PP code, the portion of the
609 guts that actually executes a Perl program. It's a lot
610 gentler to learn those techniques from simple examples and
611 explanation than from the core itself.
612
613
614 perlapi
615
616
617 The documentation for the Perl API explains
618 what some of the internal functions do, as well as the many
619 macros used in the source.
620
621
622 ''Porting/pumpkin.pod''
623
624
625 This is a collection of words of wisdom for a Perl porter;
626 some of it is only useful to the pumpkin holder, but most of
627 it applies to anyone wanting to go about Perl
628 development.
629
630
631 The perl5-porters FAQ
632
633
634 This is posted to perl5-porters at the beginning on every
635 month, and should be available from
636 http://perlhacker.org/p5p-faq; alternatively, you can get
637 the FAQ emailed to you by sending mail to
638 perl5-porters-faq@perl.org. It contains hints on
639 reading perl5-porters, information on how perl5-porters
640 works and how Perl development in general
641 works.
642
643
644 __Finding Your Way Around__
645
646
647 Perl maintenance can be split into a number of areas, and
648 certain people (pumpkins) will have responsibility for each
649 area. These areas sometimes correspond to files or
650 directories in the source kit. Among the areas
651 are:
652
653
654 Core modules
655
656
657 Modules shipped as part of the Perl core live in the
658 ''lib/'' and ''ext/'' subdirectories: ''lib/'' is
659 for the pure-Perl modules, and ''ext/'' contains the core
660 XS modules.
661
662
663 Documentation
664
665
666 Documentation maintenance includes looking after everything
667 in the ''pod/'' directory, (as well as contributing new
668 documentation) and the documentation to the modules in
669 core.
670
671
672 Configure
673
674
675 The configure process is the way we make Perl portable
676 across the myriad of operating systems it supports.
677 Responsibility for the configure, build and installation
678 process, as well as the overall portability of the core code
679 rests with the configure pumpkin - others help out with
680 individual operating systems.
681
682
683 The files involved are the operating system directories,
684 (''win32/'', ''os2/'', ''vms/'' and so on) the
685 shell scripts which generate ''config.h'' and
686 ''Makefile'', as well as the metaconfig files which
687 generate ''Configure''. (metaconfig isn't included in the
688 core distribution.)
689
690
691 Interpreter
692
693
694 And of course, there's the core of the Perl interpreter
695 itself. Let's have a look at that in a little more
696 detail.
697
698
699 Before we leave looking at the layout, though, don't forget
700 that ''MANIFEST'' contains not only the
701 file names in the Perl distribution, but short descriptions
702 of what's in them, too. For an overview of the important
703 files, try this:
704
705
706 perl -lne 'print if /^[[^/]+.[[ch]s+/' MANIFEST
707
708
709 __Elements of the interpreter__
710
711
712 The work of the interpreter has two main stages: compiling
713 the code into the internal representation, or bytecode, and
714 then executing it. ``Compiled code'' in perlguts explains
715 exactly how the compilation stage happens.
716
717
718 Here is a short breakdown of perl's operation:
719
720
721 Startup
722
723
724 The action begins in ''perlmain.c''. (or
725 ''miniperlmain.c'' for miniperl) This is very high-level
726 code, enough to fit on a single screen, and it resembles the
727 code found in perlembed; most of the real action takes place
728 in ''perl.c''
729
730
731 First, ''perlmain.c'' allocates some memory and
732 constructs a Perl interpreter:
733
734
735 1 PERL_SYS_INIT3(
736 Line 1 is a macro, and its definition is dependent on your operating system. Line 3 references PL_do_undump, a global variable - all global variables in Perl start with PL_. This tells you whether the current running program was created with the -u flag to perl and then ''undump'', which means it's going to be false in any sane context.
737
738
739 Line 4 calls a function in ''perl.c'' to allocate memory
740 for a Perl interpreter. It's quite a simple function, and
741 the guts of it looks like this:
742
743
2 perry 744 my_perl = (!PerlInterpreter*)!PerlMem_malloc(sizeof(!PerlInterpreter));
745 Here you see an example of Perl's system abstraction, which we'll see later: !PerlMem_malloc is either your system's malloc, or Perl's own malloc as defined in ''malloc.c'' if you selected that option at configure time.
1 perry 746
747
748 Next, in line 7, we construct the interpreter; this sets up
749 all the special variables that Perl needs, the stacks, and
750 so on.
751
752
753 Now we pass Perl the command line options, and tell it to
754 go:
755
756
757 exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
758 if (!exitstatus) {
759 exitstatus = perl_run(my_perl);
760 }
761 perl_parse is actually a wrapper around S_parse_body, as defined in ''perl.c'', which processes the command line options, sets up any statically linked XS modules, opens the program and calls yyparse to parse it.
762
763
764 Parsing
765
766
767 The aim of this stage is to take the Perl source, and turn
768 it into an op tree. We'll see what one of those looks like
769 later. Strictly speaking, there's three things going on
770 here.
771
772
773 yyparse, the parser, lives in ''perly.c'',
774 although you're better off reading the original
775 YACC input in ''perly.y''. (Yes, Virginia,
776 there __is__ a YACC grammar for Perl!) The
777 job of the parser is to take your code and `understand' it,
778 splitting it into sentences, deciding which operands go with
779 which operators and so on.
780
781
782 The parser is nobly assisted by the lexer, which chunks up
783 your input into tokens, and decides what type of thing each
784 token is: a variable name, an operator, a bareword, a
785 subroutine, a core function, and so on. The main point of
786 entry to the lexer is yylex, and that and its
787 associated routines can be found in ''toke.c''. Perl
788 isn't much like other computer languages; it's highly
789 context sensitive at times, it can be tricky to work out
790 what sort of token something is, or where a token ends. As
791 such, there's a lot of interplay between the tokeniser and
792 the parser, which can get pretty frightening if you're not
793 used to it.
794
795
796 As the parser understands a Perl program, it builds up a
797 tree of operations for the interpreter to perform during
798 execution. The routines which construct and link together
799 the various operations are to be found in ''op.c'', and
800 will be examined later.
801
802
803 Optimization
804
805
806 Now the parsing stage is complete, and the finished tree
807 represents the operations that the Perl interpreter needs to
808 perform to execute our program. Next, Perl does a dry run
809 over the tree looking for optimisations: constant
810 expressions such as 3 + 4 will be computed now, and
811 the optimizer will also see if any multiple operations can
812 be replaced with a single one. For instance, to fetch the
813 variable $foo, instead of grabbing the glob
814 *foo and looking at the scalar component, the
815 optimizer fiddles the op tree to use a function which
816 directly looks up the scalar in question. The main optimizer
817 is peep in ''op.c'', and many ops have their own
818 optimizing functions.
819
820
821 Running
822
823
824 Now we're finally ready to go: we have compiled Perl byte
825 code, and all that's left to do is run it. The actual
826 execution is done by the runops_standard function
827 in ''run.c''; more specifically, it's done by these three
828 innocent looking lines:
829
830
831 while ((PL_op = CALL_FPTR(PL_op-
832 You may be more comfortable with the Perl version of that:
833
834
835 PERL_ASYNC_CHECK() while $Perl::op =
836 Well, maybe not. Anyway, each op contains a function pointer, which stipulates the function which will actually carry out the operation. This function will return the next op in the sequence - this allows for things like if which choose the next op dynamically at run time. The PERL_ASYNC_CHECK makes sure that things like signals interrupt execution if required.
837
838
839 The actual functions called are known as PP
840 code, and they're spread between four files: ''pp_hot.c''
841 contains the `hot' code, which is most often used and highly
842 optimized, ''pp_sys.c'' contains all the system-specific
843 functions, ''pp_ctl.c'' contains the functions which
844 implement control structures (if, while
845 and the like) and ''pp.c'' contains everything else.
846 These are, if you like, the C code for Perl's built-in
847 functions and operators.
848
849
850 __Internal Variable Types__
851
852
853 You should by now have had a look at perlguts, which tells
854 you about Perl's internal variable types: SVs, HVs, AVs and
855 the rest. If not, do that now.
856
857
858 These variables are used not only to represent Perl-space
859 variables, but also any constants in the code, as well as
860 some structures completely internal to Perl. The symbol
861 table, for instance, is an ordinary Perl hash. Your code is
862 represented by an SV as it's read into the
863 parser; any program files you call are opened via ordinary
864 Perl filehandles, and so on.
865
866
867 The core Devel::Peek module lets us examine SVs from a Perl
868 program. Let's see, for instance, how Perl treats the
869 constant .
870
871
872 % perl -MDevel::Peek -e 'Dump(
873 Reading Devel::Peek output takes a bit of practise, so let's go through it line by line.
874
875
876 Line 1 tells us we're looking at an SV which
877 lives at 0xa04ecbc in memory. SVs themselves are
878 very simple structures, but they contain a pointer to a more
879 complex structure. In this case, it's a PV ,
880 a structure which holds a string value, at location
881 0xa041450. Line 2 is the reference count; there are
882 no other references to this data, so it's 1.
883
884
885 Line 3 are the flags for this SV - it's
886 OK to use it as a PV , it's a
887 read-only SV (because it's a constant) and
888 the data is a PV internally. Next we've got
889 the contents of the string, starting at location
890 0xa0484e0.
891
892
893 Line 5 gives us the current length of the string - note that
894 this does __not__ include the null terminator. Line 6 is
895 not the length of the string, but the length of the
896 currently allocated buffer; as the string grows, Perl
897 automatically extends the available storage via a routine
898 called SvGROW.
899
900
901 You can get at any of these quantities from C very easily;
902 just add Sv to the name of the field shown in the
903 snippet, and you've got a macro which will return the value:
904 SvCUR(sv) returns the current length of the string,
905 SvREFCOUNT(sv) returns the reference count,
906 SvPV(sv, len) returns the string itself with its
907 length, and so on. More macros to manipulate these
908 properties can be found in perlguts.
909
910
911 Let's take an example of manipulating a PV ,
912 from sv_catpvn, in ''sv.c''
913
914
915 1 void
916 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
917 3 {
918 4 STRLEN tlen;
919 5 char *junk;
920 6 junk = SvPV_force(sv, tlen);
921 7 SvGROW(sv, tlen + len + 1);
922 8 if (ptr == junk)
923 9 ptr = SvPVX(sv);
924 10 Move(ptr,SvPVX(sv)+tlen,len,char);
925 11 SvCUR(sv) += len;
926 12 *SvEND(sv) = '0';
927 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
928 14 SvTAINT(sv);
929 15 }
930 This is a function which adds a string, ptr, of length len onto the end of the PV stored in sv. The first thing we do in line 6 is make sure that the SV __has__ a valid PV , by calling the SvPV_force macro to force a PV . As a side effect, tlen gets set to the current value of the PV , and the PV itself is returned to junk.
931
932
933 In line 7, we make sure that the SV will have
934 enough room to accommodate the old string, the new string
935 and the null terminator. If LEN isn't big enough,
936 SvGROW will reallocate space for us.
937
938
939 Now, if junk is the same as the string we're trying
940 to add, we can grab the string directly from the
941 SV ; SvPVX is the address of the
942 PV in the SV .
943
944
945 Line 10 does the actual catenation: the Move macro
946 moves a chunk of memory around: we move the string
947 ptr to the end of the PV - that's
948 the start of the PV plus its current length.
949 We're moving len bytes of type char. After
950 doing so, we need to tell Perl we've extended the string, by
951 altering CUR to reflect the new length.
952 SvEND is a macro which gives us the end of the
953 string, so that needs to be a
954 .
955
956
957 Line 13 manipulates the flags; since we've changed the
958 PV , any IV or
959 NV values will no longer be valid: if we have
960 $a=10; $a.= we don't want to use the
961 old IV of 10. SvPOK_only_utf8 is a
962 special UTF8-aware version of SvPOK_only, a macro
963 which turns off the IOK and
964 NOK flags and turns on POK .
965 The final SvTAINT is a macro which launders tainted
966 data if taint mode is turned on.
967
968
969 AVs and HVs are more complicated, but SVs are by far the
970 most common variable type being thrown around. Having seen
971 something of how we manipulate these, let's go on and look
972 at how the op tree is constructed.
973
974
975 __Op Trees__
976
977
978 First, what is the op tree, anyway? The op tree is the
979 parsed representation of your program, as we saw in our
980 section on parsing, and it's the sequence of operations that
981 Perl goes through to execute your program, as we saw in
982 ``Running''.
983
984
985 An op is a fundamental operation that Perl can perform: all
986 the built-in functions and operators are ops, and there are
987 a series of ops which deal with concepts the interpreter
988 needs internally - entering and leaving a block, ending a
989 statement, fetching a variable, and so on.
990
991
992 The op tree is connected in two ways: you can imagine that
993 there are two ``routes'' through it, two orders in which you
994 can traverse the tree. First, parse order reflects how the
995 parser understood the code, and secondly, execution order
996 tells perl what order to perform the operations
997 in.
998
999
1000 The easiest way to examine the op tree is to stop Perl after
1001 it has finished parsing, and get it to dump out the tree.
1002 This is exactly what the compiler backends B::Terse and
1003 B::Debug do.
1004
1005
1006 Let's have a look at how Perl sees $a = $b +
1007 $c:
1008
1009
1010 % perl -MO=Terse -e '$a=$b+$c'
1011 1 LISTOP (0x8179888) leave
1012 2 OP (0x81798b0) enter
1013 3 COP (0x8179850) nextstate
1014 4 BINOP (0x8179828) sassign
1015 5 BINOP (0x8179800) add [[1]
1016 6 UNOP (0x81796e0) null [[15]
1017 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
1018 8 UNOP (0x81797e0) null [[15]
1019 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
1020 10 UNOP (0x816b4f0) null [[15]
1021 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
1022 Let's start in the middle, at line 4. This is a BINOP , a binary operator, which is at location 0x8179828. The specific operator in question is sassign - scalar assignment - and you can find the code which implements it in the function pp_sassign in ''pp_hot.c''. As a binary operator, it has two children: the add operator, providing the result of $b+$c, is uppermost on line 5, and the left hand side is on line 10.
1023
1024
1025 Line 10 is the null op: this does exactly nothing. What is
1026 that doing there? If you see the null op, it's a sign that
1027 something has been optimized away after parsing. As we
1028 mentioned in ``Optimization'', the optimization stage
1029 sometimes converts two operations into one, for example when
1030 fetching a scalar variable. When this happens, instead of
1031 rewriting the op tree and cleaning up the dangling pointers,
1032 it's easier just to replace the redundant operation with the
1033 null op. Originally, the tree would have looked like
1034 this:
1035
1036
1037 10 SVOP (0x816b4f0) rv2sv [[15]
1038 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
1039 That is, fetch the a entry from the main symbol table, and then look at the scalar component of it: gvsv (pp_gvsv into ''pp_hot.c'') happens to do both these things.
1040
1041
1042 The right hand side, starting at line 5 is similar to what
1043 we've just seen: we have the add op
1044 (pp_add also in ''pp_hot.c'') add together two
1045 gvsvs.
1046
1047
1048 Now, what's this about?
1049
1050
1051 1 LISTOP (0x8179888) leave
1052 2 OP (0x81798b0) enter
1053 3 COP (0x8179850) nextstate
1054 enter and leave are scoping ops, and their job is to perform any housekeeping every time you enter and leave a block: lexical variables are tidied up, unreferenced variables are destroyed, and so on. Every program will have those first three lines: leave is a list, and its children are all the statements in the block. Statements are delimited by nextstate, so a block is a collection of nextstate ops, with the ops to be performed for each statement being the children of nextstate. enter is a single op which functions as a marker.
1055
1056
1057 That's how Perl parsed the program, from top to
1058 bottom:
1059
1060
1061 Program
1062 Statement
1063 =
1064 / \
1065 / \
1066 $a +
1067 / \
1068 $b $c
1069 However, it's impossible to __perform__ the operations in this order: you have to find the values of $b and $c before you add them together, for instance. So, the other thread that runs through the op tree is the execution order: each op has a field op_next which points to the next op to be run, so following these pointers tells us how perl executes the code. We can traverse the tree in this order using the exec option to B::Terse:
1070
1071
1072 % perl -MO=Terse,exec -e '$a=$b+$c'
1073 1 OP (0x8179928) enter
1074 2 COP (0x81798c8) nextstate
1075 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
1076 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
1077 5 BINOP (0x8179878) add [[1]
1078 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
1079 7 BINOP (0x81798a0) sassign
1080 8 LISTOP (0x8179900) leave
1081 This probably makes more sense for a human: enter a block, start a statement. Get the values of $b and $c, and add them together. Find $a, and assign one to the other. Then leave.
1082
1083
1084 The way Perl builds up these op trees in the parsing process
1085 can be unravelled by examining ''perly.y'', the
1086 YACC grammar. Let's take the piece we need to
1087 construct the tree for $a = $b + $c
1088
1089
1090 1 term : term ASSIGNOP term
1091 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
1092 3 term ADDOP term
1093 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1094 If you're not used to reading BNF grammars, this is how it works: You're fed certain things by the tokeniser, which generally end up in upper case. Here, ADDOP, is provided when the tokeniser sees + in your code. ASSIGNOP is provided when = is used for assigning. These are `terminal symbols', because you can't get any simpler than them.
1095
1096
1097 The grammar, lines one and three of the snippet above, tells
1098 you how to build up more complex forms. These complex forms,
1099 `non-terminal symbols' are generally placed in lower case.
1100 term here is a non-terminal symbol, representing a
1101 single expression.
1102
1103
1104 The grammar gives you the following rule: you can make the
1105 thing on the left of the colon if you see all the things on
1106 the right in sequence. This is called a ``reduction'', and
1107 the aim of parsing is to completely reduce the input. There
1108 are several different ways you can perform a reduction,
1109 separated by vertical bars: so, term followed by
1110 = followed by term makes a term,
1111 and term followed by + followed by
1112 term can also make a term.
1113
1114
1115 So, if you see two terms with an = or +,
1116 between them, you can turn them into a single expression.
1117 When you do this, you execute the code in the block on the
1118 next line: if you see =, you'll do the code in line
1119 2. If you see +, you'll do the code in line 4. It's
1120 this code which contributes to the op tree.
1121
1122
1123 term ADDOP term
1124 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1125 What this does is creates a new binary op, and feeds it a number of variables. The variables refer to the tokens: $1 is the first token in the input, $2 the second, and so on - think regular expression backreferences. $$ is the op returned from this reduction. So, we call newBINOP to create a new binary operator. The first parameter to newBINOP, a function in ''op.c'', is the op type. It's an addition operator, so we want the type to be ADDOP. We could specify this directly, but it's right there as the second token in the input, so we use $2. The second parameter is the op's flags: 0 means `nothing special'. Then the things to add: the left and right hand side of our expression, in scalar context.
1126
1127
1128 __Stacks__
1129
1130
1131 When perl executes something like addop, how does
1132 it pass on its results to the next op? The answer is,
1133 through the use of stacks. Perl has a number of stacks to
1134 store things it's currently working on, and we'll look at
1135 the three most important ones here.
1136
1137
1138 Argument stack
1139
1140
1141 Arguments are passed to PP code and returned
1142 from PP code using the argument stack,
1143 ST. The typical way to handle arguments is to pop
1144 them off the stack, deal with them how you wish, and then
1145 push the result back onto the stack. This is how, for
1146 instance, the cosine operator works:
1147
1148
1149 NV value;
1150 value = POPn;
1151 value = Perl_cos(value);
1152 XPUSHn(value);
1153 We'll see a more tricky example of this when we consider Perl's macros below. POPn gives you the NV (floating point value) of the top SV on the stack: the $x in cos($x). Then we compute the cosine, and push the result back as an NV . The X in XPUSHn means that the stack should be extended if necessary - it can't be necessary here, because we know there's room for one more item on the stack, since we've just removed one! The XPUSH* macros at least guarantee safety.
1154
1155
1156 Alternatively, you can fiddle with the stack directly:
1157 SP gives you the first element in your portion of
1158 the stack, and TOP* gives you the top SV/IV/NV/etc.
1159 on the stack. So, for instance, to do unary negation of an
1160 integer:
1161
1162
1163 SETi(-TOPi);
1164 Just set the integer value of the top stack entry to its negation.
1165
1166
1167 Argument stack manipulation in the core is exactly the same
1168 as it is in XSUBs - see perlxstut, perlxs and perlguts for a
1169 longer description of the macros used in stack
1170 manipulation.
1171
1172
1173 Mark stack
1174
1175
1176 I say `your portion of the stack' above because
1177 PP code doesn't necessarily get the whole
1178 stack to itself: if your function calls another function,
1179 you'll only want to expose the arguments aimed for the
1180 called function, and not (necessarily) let it get at your
1181 own data. The way we do this is to have a `virtual'
1182 bottom-of-stack, exposed to each function. The mark stack
1183 keeps bookmarks to locations in the argument stack usable by
1184 each function. For instance, when dealing with a tied
1185 variable, (internally, something with `P' magic) Perl has to
1186 call methods for accesses to the tied variables. However, we
1187 need to separate the arguments exposed to the method to the
1188 argument exposed to the original function - the store or
1189 fetch or whatever it may be. Here's how the tied
1190 push is implemented; see av_push in
1191 ''av.c'':
1192
1193
1194 1 PUSHMARK(SP);
1195 2 EXTEND(SP,2);
1196 3 PUSHs(SvTIED_obj((SV*)av, mg));
1197 4 PUSHs(val);
1198 5 PUTBACK;
1199 6 ENTER;
1200 7 call_method(
1201 The lines which concern the mark stack are the first, fifth and last lines: they save away, restore and remove the current position of the argument stack.
1202
1203
1204 Let's examine the whole implementation, for
1205 practice:
1206
1207
1208 1 PUSHMARK(SP);
1209 Push the current state of the stack pointer onto the mark stack. This is so that when we've finished adding items to the argument stack, Perl knows how many things we've added recently.
1210
1211
1212 2 EXTEND(SP,2);
1213 3 PUSHs(SvTIED_obj((SV*)av, mg));
1214 4 PUSHs(val);
1215 We're going to add two more items onto the argument stack: when you have a tied array, the PUSH subroutine receives the object and the value to be pushed, and that's exactly what we have here - the tied object, retrieved with SvTIED_obj, and the value, the SV val.
1216
1217
1218 5 PUTBACK;
1219 Next we tell Perl to make the change to the global stack pointer: dSP only gave us a local copy, not a reference to the global.
1220
1221
1222 6 ENTER;
1223 7 call_method(
1224 ENTER and LEAVE localise a block of code - they make sure that all variables are tidied up, everything that has been localised gets its previous value returned, and so on. Think of them as the { and } of a Perl block.
1225
1226
1227 To actually do the magic method call, we have to call a
1228 subroutine in Perl space: call_method takes care of
1229 that, and it's described in perlcall. We call the
1230 PUSH method in scalar context, and we're going to
1231 discard its return value.
1232
1233
1234 9 POPSTACK;
1235 Finally, we remove the value we placed on the mark stack, since we don't need it any more.
1236
1237
1238 Save stack
1239
1240
1241 C doesn't have a concept of local scope, so perl provides
1242 one. We've seen that ENTER and LEAVE are
1243 used as scoping braces; the save stack implements the C
1244 equivalent of, for example:
1245
1246
1247 {
1248 local $foo = 42;
1249 ...
1250 }
1251 See ``Localising Changes'' in perlguts for how to use the save stack.
1252
1253
1254 __Millions of Macros__
1255
1256
1257 One thing you'll notice about the Perl source is that it's
1258 full of macros. Some have called the pervasive use of macros
1259 the hardest thing to understand, others find it adds to
1260 clarity. Let's take an example, the code which implements
1261 the addition operator:
1262
1263
1264 1 PP(pp_add)
1265 2 {
1266 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
1267 4 {
1268 5 dPOPTOPnnrl_ul;
1269 6 SETn( left + right );
1270 7 RETURN;
1271 8 }
1272 9 }
1273 Every line here (apart from the braces, of course) contains a macro. The first line sets up the function declaration as Perl expects for PP code; line 3 sets up variable declarations for the argument stack and the target, the return value of the operation. Finally, it tries to see if the addition operation is overloaded; if so, the appropriate subroutine is called.
1274
1275
1276 Line 5 is another variable declaration - all variable
1277 declarations start with d - which pops from the top
1278 of the argument stack two NVs (hence nn) and puts
1279 them into the variables right and left,
1280 hence the rl. These are the two operands to the
1281 addition operator. Next, we call SETn to set the
1282 NV of the return value to the result of
1283 adding the two values. This done, we return - the
1284 RETURN macro makes sure that our return value is
1285 properly handled, and we pass the next operator to run back
1286 to the main run loop.
1287
1288
1289 Most of these macros are explained in perlapi, and some of
1290 the more important ones are explained in perlxs as well. Pay
1291 special attention to ``Background and
1292 PERL_IMPLICIT_CONTEXT '' in perlguts for
1293 information on the [[pad]THX_? macros.
1294
1295
1296 __Poking at Perl__
1297
1298
1299 To really poke around with Perl, you'll probably want to
1300 build Perl for debugging, like this:
1301
1302
1303 ./Configure -d -D optimize=-g
1304 make
1305 -g is a flag to the C compiler to have it produce debugging information which will allow us to step through a running program. ''Configure'' will also turn on the DEBUGGING compilation symbol which enables all the internal debugging code in Perl. There are a whole bunch of things you can debug with this: perlrun lists them all, and the best way to find out about them is to play about with them. The most useful options are probably
1306
1307
1308 l Context (loop) stack processing
1309 t Trace execution
1310 o Method and overloading resolution
1311 c String/numeric conversions
1312 Some of the functionality of the debugging code can be achieved using XS modules.
1313
1314
1315 -Dr =
1316
1317
1318 __Using a source-level debugger__
1319
1320
1321 If the debugging output of -D doesn't help you,
1322 it's time to step through perl's execution with a
1323 source-level debugger.
1324
1325
1326 We'll use gdb for our examples here; the principles
1327 will apply to any debugger, but check the manual of the one
1328 you're using.
1329
1330
1331 To fire up the debugger, type
1332
1333
1334 gdb ./perl
1335 You'll want to do that in your Perl source tree so the debugger can read the source code. You should see the copyright message, followed by the prompt.
1336
1337
1338 (gdb)
1339 help will get you into the documentation, but here are the most useful commands:
1340
1341
1342 run [[args]
1343
1344
1345 Run the program with the given arguments.
1346
1347
1348 break function_name
1349
1350
1351 break source.c:xxx
1352
1353
1354 Tells the debugger that we'll want to pause execution when
1355 we reach either the named function (but see ``Internal
1356 Functions'' in perlguts!) or the given line in the named
1357 source file.
1358
1359
1360 step
1361
1362
1363 Steps through the program a line at a time.
1364
1365
1366 next
1367
1368
1369 Steps through the program a line at a time, without
1370 descending into functions.
1371
1372
1373 continue
1374
1375
1376 Run until the next breakpoint.
1377
1378
1379 finish
1380
1381
1382 Run until the end of the current function, then stop
1383 again.
1384
1385
1386 'enter'
1387
1388
1389 Just pressing Enter will do the most recent operation again
1390 - it's a blessing when stepping through miles of source
1391 code.
1392
1393
1394 print
1395
1396
1397 Execute the given C code and print its results.
1398 __WARNING__ : Perl makes heavy use of
1399 macros, and ''gdb'' is not aware of macros. You'll have
1400 to substitute them yourself. So, for instance, you can't
1401 say
1402
1403
1404 print SvPV_nolen(sv)
1405 but you have to say
1406
1407
1408 print Perl_sv_2pv_nolen(sv)
1409 You may find it helpful to have a ``macro dictionary'', which you can produce by saying cpp -dM perl.c sort. Even then, ''cpp'' won't recursively apply the macros for you.
1410
1411
1412 __Dumping Perl Data Structures__
1413
1414
1415 One way to get around this macro hell is to use the dumping
1416 functions in ''dump.c''; these work a little like an
1417 internal Devel::Peek, but they also cover OPs and other
1418 structures that you can't get at from Perl. Let's take an
1419 example. We'll use the $a = $b + $c we used before,
1420 but give it a bit of context: $b =
1421 . Where's a good place to stop and poke
1422 around?
1423
1424
1425 What about pp_add, the function we examined earlier
1426 to implement the + operator:
1427
1428
1429 (gdb) break Perl_pp_add
1430 Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
1431 Notice we use Perl_pp_add and not pp_add - see ``Internal Functions'' in perlguts. With the breakpoint in place, we can run our program:
1432
1433
1434 (gdb) run -e '$b =
1435 Lots of junk will go past as gdb reads in the relevant source files and libraries, and then:
1436
1437
1438 Breakpoint 1, Perl_pp_add () at pp_hot.c:309
1439 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
1440 (gdb) step
1441 311 dPOPTOPnnrl_ul;
1442 (gdb)
1443 We looked at this bit of code before, and we said that dPOPTOPnnrl_ul arranges for two NVs to be placed into left and right - let's slightly expand it:
1444
1445
1446 #define dPOPTOPnnrl_ul NV right = POPn; \
1447 SV *leftsv = TOPs; \
1448 NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
1449 POPn takes the SV from the top of the stack and obtains its NV either directly (if SvNOK is set) or by calling the sv_2nv function. TOPs takes the next SV from the top of the stack - yes, POPn uses TOPs - but doesn't remove it. We then use SvNV to get the NV from leftsv in the same way as before - yes, POPn uses SvNV.
1450
1451
1452 Since we don't have an NV for $b,
1453 we'll have to use sv_2nv to convert it. If we step
1454 again, we'll find ourselves there:
1455
1456
1457 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
1458 1669 if (!sv)
1459 (gdb)
1460 We can now use Perl_sv_dump to investigate the SV:
1461
1462
1463 SV = PV(0xa057cc0) at 0xa0675d0
1464 REFCNT = 1
1465 FLAGS = (POK,pPOK)
1466 PV = 0xa06a510
1467 We know we're going to get 6 from this, so let's finish the subroutine:
1468
1469
1470 (gdb) finish
1471 Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
1472 0x462669 in Perl_pp_add () at pp_hot.c:311
1473 311 dPOPTOPnnrl_ul;
1474 We can also dump out this op: the current op is always stored in PL_op, and we can dump it with Perl_op_dump. This'll give us similar output to B::Debug.
1475
1476
1477 {
1478 13 TYPE = add ===
1479
1480
1481 __Patching__
1482
1483
1484 All right, we've now had a look at how to navigate the Perl
1485 sources and some things you'll need to know when fiddling
1486 with them. Let's now get on and create a simple patch.
1487 Here's something Larry suggested: if a U is the
1488 first active format during a pack, (for example,
1489 pack ) then the resulting
1490 string should be treated as UTF8
1491 encoded.
1492
1493
1494 How do we prepare to fix this up? First we locate the code
1495 in question - the pack happens at runtime, so it's
1496 going to be in one of the ''pp'' files. Sure enough,
1497 pp_pack is in ''pp.c''. Since we're going to be
1498 altering this file, let's copy it to
1499 ''pp.c~''.
1500
1501
1502 Now let's look over pp_pack: we take a pattern into
1503 pat, and then loop over the pattern, taking each
1504 format character in turn into datum_type. Then for
1505 each possible format character, we swallow up the other
1506 arguments in the pattern (a field width, an asterisk, and so
1507 on) and convert the next chunk input into the specified
1508 format, adding it onto the output SV
1509 cat.
1510
1511
1512 How do we know if the U is the first format in the
1513 pat? Well, if we have a pointer to the start of
1514 pat then, if we see a U we can test
1515 whether we're still at the start of the string. So, here's
1516 where pat is set up:
1517
1518
1519 STRLEN fromlen;
1520 register char *pat = SvPVx(*++MARK, fromlen);
1521 register char *patend = pat + fromlen;
1522 register I32 len;
1523 I32 datumtype;
1524 SV *fromstr;
1525 We'll have another string pointer in there:
1526
1527
1528 STRLEN fromlen;
1529 register char *pat = SvPVx(*++MARK, fromlen);
1530 register char *patend = pat + fromlen;
1531 + char *patcopy;
1532 register I32 len;
1533 I32 datumtype;
1534 SV *fromstr;
1535 And just before we start the loop, we'll set patcopy to be the start of pat:
1536
1537
1538 items = SP - MARK;
1539 MARK++;
1540 sv_setpvn(cat,
1541 Now if we see a U which was at the start of the string, we turn on the UTF8 flag for the output SV , cat:
1542
1543
1544 + if (datumtype == 'U'
1545 Remember that it has to be patcopy+1 because the first character of the string is the U which has been swallowed into datumtype!
1546
1547
1548 Oops, we forgot one thing: what if there are spaces at the
1549 start of the pattern? pack(
1550 will have U as the first active character, even
1551 though it's not the first thing in the pattern. In this
1552 case, we have to advance patcopy along with
1553 pat when we see spaces:
1554
1555
1556 if (isSPACE(datumtype))
1557 continue;
1558 needs to become
1559
1560
1561 if (isSPACE(datumtype)) {
1562 patcopy++;
1563 continue;
1564 }
1565 OK . That's the C part done. Now we must do two additional things before this patch is ready to go: we've changed the behaviour of Perl, and so we must document that change. We must also provide some more regression tests to make sure our patch works and doesn't create a bug somewhere else along the line.
1566
1567
1568 The regression tests for each operator live in ''t/op/'',
1569 and so we make a copy of ''t/op/pack.t'' to
1570 ''t/op/pack.t~''. Now we can add our tests to the end.
1571 First, we'll test that the U does indeed create
1572 Unicode strings:
1573
1574
1575 print 'not ' unless
1576 Now we'll test that we got that space-at-the-beginning business right:
1577
1578
1579 print 'not ' unless
1580 And finally we'll test that we don't make Unicode strings if U is __not__ the first active format:
1581
1582
1583 print 'not ' unless v1.20.300.4000 ne
1584 sprintf
1585 Mustn't forget to change the number of tests which appears at the top, or else the automated tester will get confused:
1586
1587
1588 -print
1589 We now compile up Perl, and run it through the test suite. Our new tests pass, hooray!
1590
1591
1592 Finally, the documentation. The job is never done until the
1593 paperwork is over, so let's describe the change we've just
1594 made. The relevant place is ''pod/perlfunc.pod''; again,
1595 we make a copy, and then we'll insert this text in the
1596 description of pack:
1597
1598
1599 =item *
1600 If the pattern begins with a C
1601 All done. Now let's create the patch. ''Porting/patching.pod'' tells us that if we're making major changes, we should copy the entire directory to somewhere safe before we begin fiddling, and then do
1602
1603
1604 diff -ruN old new
1605 However, we know which files we've changed, and we can simply do this:
1606
1607
1608 diff -u pp.c~ pp.c
1609 We end up with a patch looking a little like this:
1610
1611
1612 --- pp.c~ Fri Jun 02 04:34:10 2000
1613 +++ pp.c Fri Jun 16 11:37:25 2000
1614 @@ -4375,6 +4375,7 @@
1615 register I32 items;
1616 STRLEN fromlen;
1617 register char *pat = SvPVx(*++MARK, fromlen);
1618 + char *patcopy;
1619 register char *patend = pat + fromlen;
1620 register I32 len;
1621 I32 datumtype;
1622 @@ -4405,6 +4406,7 @@
1623 ...
1624 And finally, we submit it, with our rationale, to perl5-porters. Job done!
1625 !!EXTERNAL TOOLS FOR DEBUGGING PERL
1626
1627
1628 Sometimes it helps to use external tools while debugging and
1629 testing Perl. This section tries to guide you through using
1630 some common testing and debugging tools with Perl. This is
1631 meant as a guide to interfacing these tools with Perl, not
1632 as any kind of guide to the use of the tools
1633 themselves.
1634
1635
1636 __Rational Software's Purify__
1637
1638
1639 Purify is a commercial tool that is helpful in identifying
1640 memory overruns, wild pointers, memory leaks and other such
1641 badness. Perl must be compiled in a specific way for optimal
1642 testing with Purify. Purify is available under Windows
1643 NT , Solaris, HP-UX ,
1644 SGI , and Siemens Unix.
1645
1646
1647 The only currently known leaks happen when there are
1648 compile-time errors within eval or require. (Fixing these is
1649 non-trivial, unfortunately, but they must be fixed
1650 eventually.)
1651
1652
1653 __Purify on Unix__
1654
1655
1656 On Unix, Purify creates a new Perl binary. To get the most
1657 benefit out of Purify, you should create the perl to Purify
1658 using:
1659
1660
1661 sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
1662 -Uusemymalloc -Dusemultiplicity
1663 where these arguments mean:
1664
1665
1666 -Accflags=-DPURIFY
1667
1668
1669 Disables Perl's arena memory allocation functions, as well
1670 as forcing use of memory allocation functions derived from
1671 the system malloc.
1672
1673
1674 -Doptimize='-g'
1675
1676
1677 Adds debugging information so that you see the exact source
1678 statements where the problem occurs. Without this flag, all
1679 you will see is the source filename of where the error
1680 occurred.
1681
1682
1683 -Uusemymalloc
1684
1685
1686 Disable Perl's malloc so that Purify can more closely
1687 monitor allocations and leaks. Using Perl's malloc will make
1688 Purify report most leaks in the ``potential'' leaks
1689 category.
1690
1691
1692 -Dusemultiplicity
1693
1694
1695 Enabling the multiplicity option allows perl to clean up
1696 thoroughly when the interpreter shuts down, which reduces
1697 the number of bogus leak reports from Purify.
1698
1699
1700 Once you've compiled a perl suitable for Purify'ing, then
1701 you can just:
1702
1703
1704 make pureperl
1705 which creates a binary named 'pureperl' that has been Purify'ed. This binary is used in place of the standard 'perl' binary when you want to debug Perl memory problems.
1706
1707
1708 As an example, to show any memory leaks produced during the
1709 standard Perl testset you would create and run the Purify'ed
1710 perl as:
1711
1712
1713 make pureperl
1714 cd t
1715 ../pureperl -I../lib harness
1716 which would run Perl on test.pl and report any memory problems.
1717
1718
1719 Purify outputs messages in ``Viewer'' windows by default. If
1720 you don't have a windowing environment or if you simply want
1721 the Purify output to unobtrusively go to a log file instead
1722 of to the interactive window, use these following options to
1723 output to the log file ``perl.log'':
1724
1725
1726 setenv PURIFYOPTIONS
1727 If you plan to use the ``Viewer'' windows, then you only need this option:
1728
1729
1730 setenv PURIFYOPTIONS
1731
1732
1733 __Purify on NT__
1734
1735
1736 Purify on Windows NT instruments the Perl
1737 binary 'perl.exe' on the fly. There are several options in
1738 the makefile you should change to get the most use out of
1739 Purify:
1740
1741
1742 DEFINES
1743
1744
1745 You should add -DPURIFY to the DEFINES line
1746 so the DEFINES line looks something
1747 like:
1748
1749
1750 DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1
1751 to disable Perl's arena memory allocation functions, as well as to force use of memory allocation functions derived from the system malloc.
1752
1753
1754 USE_MULTI = define
1755
1756
1757 Enabling the multiplicity option allows perl to clean up
1758 thoroughly when the interpreter shuts down, which reduces
1759 the number of bogus leak reports from Purify.
1760
1761
1762 #PERL_MALLOC = define
1763
1764
1765 Disable Perl's malloc so that Purify can more closely
1766 monitor allocations and leaks. Using Perl's malloc will make
1767 Purify report most leaks in the ``potential'' leaks
1768 category.
1769
1770
1771 CFG = Debug
1772
1773
1774 Adds debugging information so that you see the exact source
1775 statements where the problem occurs. Without this flag, all
1776 you will see is the source filename of where the error
1777 occurred.
1778
1779
1780 As an example, to show any memory leaks produced during the
1781 standard Perl testset you would create and run Purify
1782 as:
1783
1784
1785 cd win32
1786 make
1787 cd ../t
1788 purify ../perl -I../lib harness
1789 which would instrument Perl in memory, run Perl on test.pl, then finally report any memory problems.
1790
1791
1792 __CONCLUSION__
1793
1794
1795 We've had a brief look around the Perl source, an overview
1796 of the stages ''perl'' goes through when it's running
1797 your code, and how to use a debugger to poke at the Perl
1798 guts. We took a very simple problem and demonstrated how to
1799 solve it fully - with documentation, regression tests, and
1800 finally a patch for submission to p5p. Finally, we talked
1801 about how to use external tools to debug and test
1802 Perl.
1803
1804
1805 I'd now suggest you read over those references again, and
1806 then, as soon as possible, get your hands dirty. The best
1807 way to learn is by doing, so:
1808
1809
1810 Subscribe to perl5-porters, follow the patches and try and
1811 understand them; don't be afraid to ask if there's a portion
1812 you're not clear on - who knows, you may unearth a bug in
1813 the patch...
1814
1815
1816 Keep up to date with the bleeding edge Perl distributions
1817 and get familiar with the changes. Try and get an idea of
1818 what areas people are working on and the changes they're
1819 making.
1820
1821
1822 Do read the README associated with your
1823 operating system, e.g. README .aix on the
1824 IBM AIX OS . Don't hesitate to supply patches
1825 to that README if you find anything missing
1826 or changed over a new OS
1827 release.
1828
1829
1830 Find an area of Perl that seems interesting to you, and see
1831 if you can work out how it works. Scan through the source,
1832 and step over it in the debugger. Play, poke, investigate,
1833 fiddle! You'll probably get to understand not just your
1834 chosen area but a much wider range of ''perl'''s activity
1835 as well, and probably sooner than you'd think.
1836
1837
1838 ''The Road goes ever on and on, down from the door where it
1839 began.''
1840
1841
1842 If you can do these things, you've started on the long road
1843 to Perl porting. Thanks for wanting to help make Perl better
1844 - and happy hacking!
1845 !!AUTHOR
1846
1847
1848 This document was written by Nathan Torkington, and is
1849 maintained by the perl5-porters mailing list.
1850 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.