Penguin
Annotated edit history of perlguts(1) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 PERLGUTS
2 !!!PERLGUTS
3 NAME
4 DESCRIPTION
5 Variables
6 Subroutines
7 Compiled code
8 Examining internal data structures with the dump functions
9 How multiple interpreters and concurrency are supported
10 Internal Functions
11 Unicode Support
12 AUTHORS
13 SEE ALSO
14 ----
15 !!NAME
16
17
18 perlguts - Introduction to the Perl API
19 !!DESCRIPTION
20
21
22 This document attempts to describe how to use the Perl
23 API , as well as containing some info on the
24 basic workings of the Perl core. It is far from complete and
25 probably contains many errors. Please refer any questions or
26 comments to the author below.
27 !!Variables
28
29
30 __Datatypes__
31
32
33 Perl has three typedefs that handle Perl's three main data
34 types:
35
36
37 SV Scalar Value
38 AV Array Value
39 HV Hash Value
40 Each typedef has specific routines that manipulate the various data types.
41
42
43 __What is an `` IV ''?__
44
45
46 Perl uses a special typedef IV which is a
47 simple signed integer type that is guaranteed to be large
48 enough to hold a pointer (as well as an integer).
49 Additionally, there is the UV , which is
50 simply an unsigned IV .
51
52
53 Perl also uses two special typedefs, I32 and I16, which will
54 always be at least 32-bits and 16-bits long, respectively.
55 (Again, there are U32 and U16, as well.)
56
57
58 __Working with SVs__
59
60
61 An SV can be created and loaded with one
62 command. There are four types of values that can be loaded:
63 an integer value ( IV ), a double (
64 NV ), a string ( PV ), and
65 another scalar ( SV ).
66
67
68 The six routines are:
69
70
71 SV* newSViv(IV);
72 SV* newSVnv(double);
73 SV* newSVpv(const char*, int);
74 SV* newSVpvn(const char*, int);
75 SV* newSVpvf(const char*, ...);
76 SV* newSVsv(SV*);
77 To change the value of an *already-existing* SV , there are seven routines:
78
79
80 void sv_setiv(SV*, IV);
81 void sv_setuv(SV*, UV);
82 void sv_setnv(SV*, double);
83 void sv_setpv(SV*, const char*);
84 void sv_setpvn(SV*, const char*, int)
85 void sv_setpvf(SV*, const char*, ...);
86 void sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
87 void sv_setsv(SV*, SV*);
88 Notice that you can choose to specify the length of the string to be assigned by using sv_setpvn, newSVpvn, or newSVpv, or you may allow Perl to calculate the length by using sv_setpv or by specifying 0 as the second argument to newSVpv. Be warned, though, that Perl will determine the string's length by using strlen, which depends on the string terminating with a NUL character.
89
90
91 The arguments of sv_setpvf are processed like
92 sprintf, and the formatted output becomes the
93 value.
94
95
96 sv_setpvfn is an analogue of vsprintf, but
97 it allows you to specify either a pointer to a variable
98 argument list or the address and length of an array of SVs.
99 The last argument points to a boolean; on return, if that
100 boolean is true, then locale-specific information has been
101 used to format the string, and the string's contents are
102 therefore untrustworthy (see perlsec). This pointer may be
103 NULL if that information is not important.
104 Note that this function requires you to specify the length
105 of the format.
106
107
108 STRLEN is an integer type (Size_t, usually
109 defined as size_t in config.h) guaranteed to be large enough
110 to represent the size of any string that perl can
111 handle.
112
113
114 The sv_set*() functions are not generic enough to
115 operate on values that have ``magic''. See ``Magic Virtual
116 Tables'' later in this document.
117
118
119 All SVs that contain strings should be terminated with a
120 NUL character. If it is not NUL-terminated
121 there is a risk of core dumps and corruptions from code
122 which passes the string to C functions or system calls which
123 expect a NUL-terminated string. Perl's own functions
124 typically add a trailing NUL for this reason.
125 Nevertheless, you should be very careful when you pass a
126 string stored in an SV to a C function or
127 system call.
128
129
130 To access the actual value that an SV points
131 to, you can use the macros:
132
133
134 SvIV(SV*)
135 SvUV(SV*)
136 SvNV(SV*)
137 SvPV(SV*, STRLEN len)
138 SvPV_nolen(SV*)
139 which will automatically coerce the actual scalar type into an IV , UV , double, or string.
140
141
142 In the SvPV macro, the length of the string
143 returned is placed into the variable len (this is a
144 macro, so you do ''not'' use ). If you
145 do not care what the length of the data is, use the
146 SvPV_nolen macro. Historically the SvPV
147 macro with the global variable PL_na has been used
148 in this case. But that can be quite inefficient because
149 PL_na must be accessed in thread-local storage in
150 threaded Perl. In any case, remember that Perl allows
151 arbitrary strings of data that may both contain NULs and
152 might not be terminated by a NUL
153 .
154
155
156 Also remember that C doesn't allow you to safely say
157 foo(SvPV(s, len), len);. It might work with your
158 compiler, but it won't work for everyone. Break this sort of
159 statement up into separate assignments:
160
161
162 SV *s;
163 STRLEN len;
164 char * ptr;
165 ptr = SvPV(s, len);
166 foo(ptr, len);
167 If you want to know if the scalar value is TRUE , you can use:
168
169
170 SvTRUE(SV*)
171 Although Perl will automatically grow strings for you, if you need to force Perl to allocate more memory for your SV , you can use the macro
172
173
174 SvGROW(SV*, STRLEN newlen)
175 which will determine if more memory needs to be allocated. If so, it will call the function sv_grow. Note that SvGROW can only increase, not decrease, the allocated memory of an SV and that it does not automatically add a byte for the a trailing NUL (perl's own string functions typically do SvGROW(sv, len + 1)).
176
177
178 If you have an SV and want to know what kind
179 of data Perl thinks is stored in it, you can use the
180 following macros to check the type of SV you
181 have.
182
183
184 SvIOK(SV*)
185 SvNOK(SV*)
186 SvPOK(SV*)
187 You can get and set the current length of the string stored in an SV with the following macros:
188
189
190 SvCUR(SV*)
191 SvCUR_set(SV*, I32 val)
192 You can also get a pointer to the end of the string stored in the SV with the macro:
193
194
195 SvEND(SV*)
196 But note that these last three macros are valid only if SvPOK() is true.
197
198
199 If you want to append something to the end of string stored
200 in an SV*, you can use the following
201 functions:
202
203
204 void sv_catpv(SV*, const char*);
205 void sv_catpvn(SV*, const char*, STRLEN);
206 void sv_catpvf(SV*, const char*, ...);
207 void sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
208 void sv_catsv(SV*, SV*);
209 The first function calculates the length of the string to be appended by using strlen. In the second, you specify the length of the string yourself. The third function processes its arguments like sprintf and appends the formatted output. The fourth function works like vsprintf. You can specify the address and length of an array of SVs instead of the va_list argument. The fifth function extends the string stored in the first SV with the string stored in the second SV . It also forces the second SV to be interpreted as a string.
210
211
212 The sv_cat*() functions are not generic enough to
213 operate on values that have ``magic''. See ``Magic Virtual
214 Tables'' later in this document.
215
216
217 If you know the name of a scalar variable, you can get a
218 pointer to its SV by using the
219 following:
220
221
222 SV* get_sv(
223 This returns NULL if the variable does not exist.
224
225
226 If you want to know if this variable (or any other
227 SV ) is actually defined, you can
228 call:
229
230
231 SvOK(SV*)
232 The scalar undef value is stored in an SV instance called PL_sv_undef. Its address can be used whenever an SV* is needed.
233
234
235 There are also the two values PL_sv_yes and
236 PL_sv_no, which contain Boolean TRUE
237 and FALSE values, respectively. Like
238 PL_sv_undef, their addresses can be used whenever
239 an SV* is needed.
240
241
242 Do not be fooled into thinking that (SV *) 0 is the
243 same as . Take this
244 code:
245
246
247 SV* sv = (SV*) 0;
248 if (I-am-to-return-a-real-value) {
249 sv = sv_2mortal(newSViv(42));
250 }
251 sv_setsv(ST(0), sv);
252 This code tries to return a new SV (which contains the value 42) if it should return a real value, or undef otherwise. Instead it has returned a NULL pointer which, somewhere down the line, will cause a segmentation violation, bus error, or just weird results. Change the zero to in the first line and all will be well.
253
254
255 To free an SV that you've created, call
256 SvREFCNT_dec(SV*). Normally this call is not
257 necessary (see ``Reference Counts and
258 Mortality'').
259
260
261 __Offsets__
262
263
264 Perl provides the function sv_chop to efficiently
265 remove characters from the beginning of a string; you give
266 it an SV and a pointer to somewhere inside
267 the the PV , and it discards everything
268 before the pointer. The efficiency comes by means of a
269 little hack: instead of actually removing the characters,
270 sv_chop sets the flag OOK (offset
271 OK ) to signal to other functions that the
272 offset hack is in effect, and it puts the number of bytes
273 chopped off into the IV field of the
274 SV . It then moves the PV
275 pointer (called SvPVX) forward that many bytes, and
276 adjusts SvCUR and SvLEN.
277
278
279 Hence, at this point, the start of the buffer that we
280 allocated lives at SvPVX(sv) - SvIV(sv) in memory
281 and the PV pointer is pointing into the
282 middle of this allocated storage.
283
284
285 This is best demonstrated by example:
286
287
288 % ./perl -Ilib -MDevel::Peek -le '$a=
289 Here the number of bytes chopped off (1) is put into IV , and Devel::Peek::Dump helpfully reminds us that this is an offset. The portion of the string between the ``real'' and the ``fake'' beginnings is shown in parentheses, and the values of SvCUR and SvLEN reflect the fake beginning, not the real one.
290
291
292 Something similar to the offset hack is perfomed on AVs to
293 enable efficient shifting and splicing off the beginning of
294 the array; while AvARRAY points to the first
295 element in the array that is visible from Perl,
296 AvALLOC points to the real start of the C array.
297 These are usually the same, but a shift operation
298 can be carried out by increasing AvARRAY by one and
299 decreasing AvFILL and AvLEN. Again, the
300 location of the real start of the C array only comes into
301 play when freeing the array. See av_shift in
302 ''av.c''.
303
304
305 __What's Really Stored in an SV
306 ?__
307
308
309 Recall that the usual method of determining the type of
310 scalar you have is to use Sv*OK macros. Because a
311 scalar can be both a number and a string, usually these
312 macros will always return TRUE and calling
313 the Sv*V macros will do the appropriate conversion
314 of string to integer/double or integer/double to
315 string.
316
317
318 If you ''really'' need to know if you have an integer,
319 double, or string pointer in an SV , you can
320 use the following three macros instead:
321
322
323 SvIOKp(SV*)
324 SvNOKp(SV*)
325 SvPOKp(SV*)
326 These will tell you if you truly have an integer, double, or string pointer stored in your SV . The ``p'' stands for private.
327
328
329 In general, though, it's best to use the Sv*V
330 macros.
331
332
333 __Working with AVs__
334
335
336 There are two ways to create and load an AV .
337 The first method creates an empty
338 AV:
339
340
341 AV* newAV();
342 The second method both creates the AV and initially populates it with SVs:
343
344
345 AV* av_make(I32 num, SV **ptr);
346 The second argument points to an array containing num SV*'s. Once the AV has been created, the SVs can be destroyed, if so desired.
347
348
349 Once the AV has been created, the following
350 operations are possible on AVs:
351
352
353 void av_push(AV*, SV*);
354 SV* av_pop(AV*);
355 SV* av_shift(AV*);
356 void av_unshift(AV*, I32 num);
357 These should be familiar operations, with the exception of av_unshift. This routine adds num elements at the front of the array with the undef value. You must then use av_store (described below) to assign values to these new elements.
358
359
360 Here are some other functions:
361
362
363 I32 av_len(AV*);
364 SV** av_fetch(AV*, I32 key, I32 lval);
365 SV** av_store(AV*, I32 key, SV* val);
366 The av_len function returns the highest index value in array (just like $#array in Perl). If the array is empty, -1 is returned. The av_fetch function returns the value at index key, but if lval is non-zero, then av_fetch will store an undef value at that index. The av_store function stores the value val at index key, and does not increment the reference count of val. Thus the caller is responsible for taking care of that, and if av_store returns NULL , the caller will have to decrement the reference count to avoid a memory leak. Note that av_fetch and av_store both return SV**'s, not SV*'s as their return value.
367
368
369 void av_clear(AV*);
370 void av_undef(AV*);
371 void av_extend(AV*, I32 key);
372 The av_clear function deletes all the elements in the AV* array, but does not actually delete the array itself. The av_undef function will delete all the elements in the array plus the array itself. The av_extend function extends the array so that it contains at least key+1 elements. If key+1 is less than the currently allocated length of the array, then nothing is done.
373
374
375 If you know the name of an array variable, you can get a
376 pointer to its AV by using the
377 following:
378
379
380 AV* get_av(
381 This returns NULL if the variable does not exist.
382
383
384 See ``Understanding the Magic of Tied Hashes and Arrays''
385 for more information on how to use the array access
386 functions on tied arrays.
387
388
389 __Working with HVs__
390
391
392 To create an HV , you use the following
393 routine:
394
395
396 HV* newHV();
397 Once the HV has been created, the following operations are possible on HVs:
398
399
400 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
401 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
402 The klen parameter is the length of the key being passed in (Note that you cannot pass 0 in as a value of klen to tell Perl to measure the length of the key). The val argument contains the SV pointer to the scalar being stored, and hash is the precomputed hash value (zero if you want hv_store to calculate it for you). The lval parameter indicates whether this fetch is actually a part of a store operation, in which case a new undefined value will be added to the HV with the supplied key and hv_fetch will return as if the value had already existed.
403
404
405 Remember that hv_store and hv_fetch return
406 SV**'s and not just SV*. To access the
407 scalar value, you must first dereference the return value.
408 However, you should check to make sure that the return value
409 is not NULL before dereferencing
410 it.
411
412
413 These two functions check if a hash table entry exists, and
414 deletes it.
415
416
417 bool hv_exists(HV*, const char* key, U32 klen);
418 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
419 If flags does not include the G_DISCARD flag then hv_delete will create and return a mortal copy of the deleted value.
420
421
422 And more miscellaneous functions:
423
424
425 void hv_clear(HV*);
426 void hv_undef(HV*);
427 Like their AV counterparts, hv_clear deletes all the entries in the hash table but does not actually delete the hash table. The hv_undef deletes both the entries and the hash table itself.
428
429
430 Perl keeps the actual data in linked list of structures with
431 a typedef of HE . These contain the actual
432 key and value pointers (plus extra administrative overhead).
433 The key is a string pointer; the value is an SV*.
434 However, once you have an HE*, to get the actual
435 key and value, use the routines specified
436 below.
437
438
439 I32 hv_iterinit(HV*);
440 /* Prepares starting point to traverse hash table */
441 HE* hv_iternext(HV*);
442 /* Get the next entry, and return a pointer to a
443 structure that has both the key and value */
444 char* hv_iterkey(HE* entry, I32* retlen);
445 /* Get the key from an HE structure and also return
446 the length of the key string */
447 SV* hv_iterval(HV*, HE* entry);
448 /* Return a SV pointer to the value of the HE
449 structure */
450 SV* hv_iternextsv(HV*, char** key, I32* retlen);
451 /* This convenience routine combines hv_iternext,
452 hv_iterkey, and hv_iterval. The key and retlen
453 arguments are return values for the key and its
454 length. The value is returned in the SV* argument */
455 If you know the name of a hash variable, you can get a pointer to its HV by using the following:
456
457
458 HV* get_hv(
459 This returns NULL if the variable does not exist.
460
461
462 The hash algorithm is defined in the PERL_HASH(hash,
463 key, klen) macro:
464
465
466 hash = 0;
467 while (klen--)
468 hash = (hash * 33) + *key++;
469 hash = hash + (hash
470 The last step was added in version 5.6 to improve distribution of lower bits in the resulting hash value.
471
472
473 See ``Understanding the Magic of Tied Hashes and Arrays''
474 for more information on how to use the hash access functions
475 on tied hashes.
476
477
478 __Hash API Extensions__
479
480
481 Beginning with version 5.004, the following functions are
482 also supported:
483
484
485 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
486 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
487 bool hv_exists_ent (HV* tb, SV* key, U32 hash);
488 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
489 SV* hv_iterkeysv (HE* entry);
490 Note that these functions take SV* keys, which simplifies writing of extension code that deals with hash structures. These functions also allow passing of SV* keys to tie functions without forcing you to stringify the keys (unlike the previous set of functions).
491
492
493 They also return and accept whole hash entries
494 (HE*), making their use more efficient (since the
495 hash number for a particular string doesn't have to be
496 recomputed every time). See perlapi for detailed
497 descriptions.
498
499
500 The following macros must always be used to access the
501 contents of hash entries. Note that the arguments to these
502 macros must be simple variables, since they may get
503 evaluated more than once. See perlapi for detailed
504 descriptions of these macros.
505
506
507 HePV(HE* he, STRLEN len)
508 HeVAL(HE* he)
509 HeHASH(HE* he)
510 HeSVKEY(HE* he)
511 HeSVKEY_force(HE* he)
512 HeSVKEY_set(HE* he, SV* sv)
513 These two lower level macros are defined, but must only be used when dealing with keys that are not SV*s:
514
515
516 HeKEY(HE* he)
517 HeKLEN(HE* he)
518 Note that both hv_store and hv_store_ent do not increment the reference count of the stored val, which is the caller's responsibility. If these functions return a NULL value, the caller will usually have to decrement the reference count of val to avoid a memory leak.
519
520
521 __References__
522
523
524 References are a special type of scalar that point to other
525 data types (including references).
526
527
528 To create a reference, use either of the following
529 functions:
530
531
532 SV* newRV_inc((SV*) thing);
533 SV* newRV_noinc((SV*) thing);
534 The thing argument can be any of an SV*, AV*, or HV*. The functions are identical except that newRV_inc increments the reference count of the thing, while newRV_noinc does not. For historical reasons, newRV is a synonym for newRV_inc.
535
536
537 Once you have a reference, you can use the following macro
538 to dereference the reference:
539
540
541 SvRV(SV*)
542 then call the appropriate routines, casting the returned SV* to either an AV* or HV*, if required.
543
544
545 To determine if an SV is a reference, you can
546 use the following macro:
547
548
549 SvROK(SV*)
550 To discover what type of value the reference refers to, use the following macro and then check the return value.
551
552
553 SvTYPE(SvRV(SV*))
554 The most useful types that will be returned are:
555
556
557 SVt_IV Scalar
558 SVt_NV Scalar
559 SVt_PV Scalar
560 SVt_RV Scalar
561 SVt_PVAV Array
562 SVt_PVHV Hash
563 SVt_PVCV Code
564 SVt_PVGV Glob (possible a file handle)
565 SVt_PVMG Blessed or Magical Scalar
566 See the sv.h header file for more details.
567
568
569 __Blessed References and Class Objects__
570
571
572 References are also used to support object-oriented
573 programming. In the OO lexicon, an object is
574 simply a reference that has been blessed into a package (or
575 class). Once blessed, the programmer may now use the
576 reference to access the various methods in the
577 class.
578
579
580 A reference can be blessed into a package with the following
581 function:
582
583
584 SV* sv_bless(SV* sv, HV* stash);
585 The sv argument must be a reference. The stash argument specifies which class the reference will belong to. See ``Stashes and Globs'' for information on converting class names into stashes.
586
587
588 /* Still under construction */
589
590
591 Upgrades rv to reference if not already one. Creates new
592 SV for rv to point to. If classname
593 is non-null, the SV is blessed into the
594 specified class. SV is returned.
595
596
597 SV* newSVrv(SV* rv, const char* classname);
598 Copies integer or double into an SV whose reference is rv. SV is blessed if classname is non-null.
599
600
601 SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
602 SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
603 Copies the pointer value (''the address, not the string!'') into an SV whose reference is rv. SV is blessed if classname is non-null.
604
605
606 SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
607 Copies string into an SV whose reference is rv. Set length to 0 to let Perl calculate the string length. SV is blessed if classname is non-null.
608
609
610 SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
611 Tests whether the SV is blessed into the specified class. It does not check inheritance relationships.
612
613
614 int sv_isa(SV* sv, const char* name);
615 Tests whether the SV is a reference to a blessed object.
616
617
618 int sv_isobject(SV* sv);
619 Tests whether the SV is derived from the specified class. SV can be either a reference to a blessed object or a string containing a class name. This is the function implementing the UNIVERSAL::isa functionality.
620
621
622 bool sv_derived_from(SV* sv, const char* name);
623 To check if you've got an object derived from a specific class you have to write:
624
625
626 if (sv_isobject(sv)
627
628
629 __Creating New Variables__
630
631
632 To create a new Perl variable with an undef value which can
633 be accessed from your Perl script, use the following
634 routines, depending on the variable type.
635
636
637 SV* get_sv(
638 Notice the use of TRUE as the second parameter. The new variable can now be set, using the routines appropriate to the data type.
639
640
641 There are additional macros whose values may be bitwise
642 OR 'ed with the TRUE argument to
643 enable certain extra features. Those bits are:
644
645
646 GV_ADDMULTI Marks the variable as multiply defined, thus preventing the
647 If you do not specify a package name, the variable is created in the current package.
648
649
650 __Reference Counts and Mortality__
651
652
653 Perl uses an reference count-driven garbage collection
654 mechanism. SVs, AVs, or HVs (xV for short in the following)
655 start their life with a reference count of 1. If the
656 reference count of an xV ever drops to 0, then it will be
657 destroyed and its memory made available for
658 reuse.
659
660
661 This normally doesn't happen at the Perl level unless a
662 variable is undef'ed or the last variable holding a
663 reference to it is changed or overwritten. At the internal
664 level, however, reference counts can be manipulated with the
665 following macros:
666
667
668 int SvREFCNT(SV* sv);
669 SV* SvREFCNT_inc(SV* sv);
670 void SvREFCNT_dec(SV* sv);
671 However, there is one other function which manipulates the reference count of its argument. The newRV_inc function, you will recall, creates a reference to the specified argument. As a side effect, it increments the argument's reference count. If this is not what you want, use newRV_noinc instead.
672
673
674 For example, imagine you want to return a reference from an
675 XSUB function. Inside the XSUB
676 routine, you create an SV which initially has
677 a reference count of one. Then you call newRV_inc,
678 passing it the just-created SV . This returns
679 the reference as a new SV , but the reference
680 count of the SV you passed to
681 newRV_inc has been incremented to two. Now you
682 return the reference from the XSUB routine
683 and forget about the SV . But Perl hasn't!
684 Whenever the returned reference is destroyed, the reference
685 count of the original SV is decreased to one
686 and nothing happens. The SV will hang around
687 without any way to access it until Perl itself terminates.
688 This is a memory leak.
689
690
691 The correct procedure, then, is to use newRV_noinc
692 instead of newRV_inc. Then, if and when the last
693 reference is destroyed, the reference count of the
694 SV will go to zero and it will be destroyed,
695 stopping any memory leak.
696
697
698 There are some convenience functions available that can help
699 with the destruction of xVs. These functions introduce the
700 concept of ``mortality''. An xV that is mortal has had its
701 reference count marked to be decremented, but not actually
702 decremented, until ``a short time later''. Generally the
703 term ``short time later'' means a single Perl statement,
704 such as a call to an XSUB function. The
705 actual determinant for when mortal xVs have their reference
706 count decremented depends on two macros,
707 SAVETMPS and FREETMPS . See
708 perlcall and perlxs for more details on these
709 macros.
710
711
712 ``Mortalization'' then is at its simplest a deferred
713 SvREFCNT_dec. However, if you mortalize a variable
714 twice, the reference count will later be decremented
715 twice.
716
717
718 You should be careful about creating mortal variables.
719 Strange things can happen if you make the same value mortal
720 within multiple contexts, or if you make a variable mortal
721 multiple times.
722
723
724 To create a mortal variable, use the functions:
725
726
727 SV* sv_newmortal()
728 SV* sv_2mortal(SV*)
729 SV* sv_mortalcopy(SV*)
730 The first call creates a mortal SV , the second converts an existing SV to a mortal SV (and thus defers a call to SvREFCNT_dec), and the third creates a mortal copy of an existing SV .
731
732
733 The mortal routines are not just for SVs -- AVs and HVs can
734 be made mortal by passing their address (type-casted to
735 SV*) to the sv_2mortal or
736 sv_mortalcopy routines.
737
738
739 __Stashes and Globs__
740
741
742 A ``stash'' is a hash that contains all of the different
743 objects that are contained within a package. Each key of the
744 stash is a symbol name (shared by all the different types of
745 objects that have the same name), and each value in the hash
746 table is a GV (Glob Value). This
747 GV in turn contains references to the various
748 objects of that name, including (but not limited to) the
749 following:
750
751
752 Scalar Value
753 Array Value
754 Hash Value
755 I/O Handle
756 Format
757 Subroutine
758 There is a single stash called ``PL_defstash'' that holds the items that exist in the ``main'' package. To get at the items in other packages, append the string ``::'' to the package name. The items in the ``Foo'' package are in the stash ``Foo::'' in PL_defstash. The items in the ``Bar::Baz'' package are in the stash ``Baz::'' in ``Bar::'''s stash.
759
760
761 To get the stash pointer for a particular package, use the
762 function:
763
764
765 HV* gv_stashpv(const char* name, I32 create)
766 HV* gv_stashsv(SV*, I32 create)
767 The first function takes a literal string, the second uses the string stored in the SV . Remember that a stash is just a hash table, so you get back an HV*. The create flag will create a new package if it is set.
768
769
770 The name that gv_stash*v wants is the name of the
771 package whose symbol table you want. The default package is
772 called main. If you have multiply nested packages,
773 pass their names to gv_stash*v, separated by
774 :: as in the Perl language itself.
775
776
777 Alternately, if you have an SV that is a
778 blessed reference, you can find out the stash pointer by
779 using:
780
781
782 HV* SvSTASH(SvRV(SV*));
783 then use the following to get the package name itself:
784
785
786 char* HvNAME(HV* stash);
787 If you need to bless or re-bless an object you can use the following function:
788
789
790 SV* sv_bless(SV*, HV* stash)
791 where the first argument, an SV*, must be a reference, and the second argument is a stash. The returned SV* can now be used in the same way as any other SV .
792
793
794 For more information on references and blessings, consult
795 perlref.
796
797
798 __Double-Typed SVs__
799
800
801 Scalar variables normally contain only one type of value, an
802 integer, double, pointer, or reference. Perl will
803 automatically convert the actual scalar data from the stored
804 type into the requested type.
805
806
807 Some scalar variables contain more than one type of scalar
808 data. For example, the variable $! contains either
809 the numeric value of errno or its string equivalent
810 from either strerror or
811 sys_errlist[[].
812
813
814 To force multiple data values into an SV ,
815 you must do two things: use the sv_set*v routines
816 to add the additional scalar type, then set a flag so that
817 Perl will believe it contains more than one type of data.
818 The four macros to set the flags are:
819
820
821 SvIOK_on
822 SvNOK_on
823 SvPOK_on
824 SvROK_on
825 The particular macro you must use depends on which sv_set*v routine you called first. This is because every sv_set*v routine turns on only the bit for the particular type of data being set, and turns off all the rest.
826
827
828 For example, to create a new Perl variable called
829 ``dberror'' that contains both the numeric and descriptive
830 string error values, you could use the following
831 code:
832
833
834 extern int dberror;
835 extern char *dberror_list;
836 SV* sv = get_sv(
837 If the order of sv_setiv and sv_setpv had been reversed, then the macro SvPOK_on would need to be called instead of SvIOK_on.
838
839
840 __Magic Variables__
841
842
843 [[This section still under construction. Ignore everything
844 here. Post no bills. Everything not permitted is
845 forbidden.]
846
847
848 Any SV may be magical, that is, it has
849 special features that a normal SV does not
850 have. These features are stored in the SV
851 structure in a linked list of struct magic's,
852 typedef'ed to MAGIC.
853
854
855 struct magic {
856 MAGIC* mg_moremagic;
857 MGVTBL* mg_virtual;
858 U16 mg_private;
859 char mg_type;
860 U8 mg_flags;
861 SV* mg_obj;
862 char* mg_ptr;
863 I32 mg_len;
864 };
865 Note this is current as of patchlevel 0, and could change at any time.
866
867
868 __Assigning Magic__
869
870
871 Perl adds magic to an SV using the sv_magic
872 function:
873
874
875 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
876 The sv argument is a pointer to the SV that is to acquire a new magical feature.
877
878
879 If sv is not already magical, Perl uses the
880 SvUPGRADE macro to set the SVt_PVMG flag
881 for the sv. Perl then continues by adding it to the
882 beginning of the linked list of magical features. Any prior
883 entry of the same type of magic is deleted. Note that this
884 can be overridden, and multiple instances of the same type
885 of magic can be associated with an SV
886 .
887
888
889 The name and namlen arguments are used to
890 associate a string with the magic, typically the name of a
891 variable. namlen is stored in the mg_len
892 field and if name is non-null and namlen
893 mg_ptr field.
894
895
896 The sv_magic function uses how to determine which,
897 if any, predefined ``Magic Virtual Table'' should be
898 assigned to the mg_virtual field. See the ``Magic
899 Virtual Table'' section below. The how argument is
900 also stored in the mg_type field.
901
902
903 The obj argument is stored in the mg_obj
904 field of the MAGIC structure. If it is not the same
905 as the sv argument, the reference count of the
906 obj object is incremented. If it is the same, or if
907 the how argument is ``#'', or if it is a
908 NULL pointer, then obj is merely
909 stored, without the reference count being
910 incremented.
911
912
913 There is also a function to add magic to an
914 HV:
915
916
917 void hv_magic(HV *hv, GV *gv, int how);
918 This simply calls sv_magic and coerces the gv argument into an SV.
919
920
921 To remove the magic from an SV , call the
922 function sv_unmagic:
923
924
925 void sv_unmagic(SV *sv, int type);
926 The type argument should be equal to the how value when the SV was initially made magical.
927
928
929 __Magic Virtual Tables__
930
931
932 The mg_virtual field in the MAGIC
933 structure is a pointer to a MGVTBL, which is a
934 structure of function pointers and stands for ``Magic
935 Virtual Table'' to handle the various operations that might
936 be applied to that variable.
937
938
939 The MGVTBL has five pointers to the following
940 routine types:
941
942
943 int (*svt_get)(SV* sv, MAGIC* mg);
944 int (*svt_set)(SV* sv, MAGIC* mg);
945 U32 (*svt_len)(SV* sv, MAGIC* mg);
946 int (*svt_clear)(SV* sv, MAGIC* mg);
947 int (*svt_free)(SV* sv, MAGIC* mg);
948 This MGVTBL structure is set at compile-time in perl.h and there are currently 19 types (or 21 with overloading turned on). These different structures contain pointers to various routines that perform additional actions depending on which function is being called.
949
950
951 Function pointer Action taken
952 ---------------- ------------
953 svt_get Do something after the value of the SV is retrieved.
954 svt_set Do something after the SV is assigned a value.
955 svt_len Report on the SV's length.
956 svt_clear Clear something the SV represents.
957 svt_free Free any extra storage associated with the SV.
958 For instance, the MGVTBL structure called vtbl_sv (which corresponds to an mg_type of '0') contains:
959
960
961 { magic_get, magic_set, magic_len, 0, 0 }
962 Thus, when an SV is determined to be magical and of type '0', if a get operation is being performed, the routine magic_get is called. All the various routines for the various magical types begin with magic_. NOTE: the magic routines are not considered part of the Perl API , and may not be exported by the Perl library.
963
964
965 The current kinds of Magic Virtual Tables are:
966
967
968 mg_type MGVTBL Type of magic
969 ------- ------ ----------------------------
970 0 vtbl_sv Special scalar variable
971 A vtbl_amagic %OVERLOAD hash
972 a vtbl_amagicelem %OVERLOAD hash element
973 c (none) Holds overload table (AMT) on stash
974 B vtbl_bm Boyer-Moore (fast string search)
975 D vtbl_regdata Regex match position data (@+ and @- vars)
976 d vtbl_regdatum Regex match position data element
977 E vtbl_env %ENV hash
978 e vtbl_envelem %ENV hash element
979 f vtbl_fm Formline ('compiled' format)
980 g vtbl_mglob m//g target / study()ed string
981 I vtbl_isa @ISA array
982 i vtbl_isaelem @ISA array element
983 k vtbl_nkeys scalar(keys()) lvalue
984 L (none) Debugger %_
985 When an uppercase and lowercase letter both exist in the table, then the uppercase letter is used to represent some kind of composite type (a list or a hash), and the lowercase letter is used to represent an element of that composite type.
986
987
988 The '~' and 'U' magic types are defined specifically for use
989 by extensions and will not be used by perl itself.
990 Extensions can use '~' magic to 'attach' private information
991 to variables (typically objects). This is especially useful
992 because there is no way for normal perl code to corrupt this
993 private information (unlike using extra elements of a hash
994 object).
995
996
997 Similarly, 'U' magic can be used much like ''tie()'' to
998 call a C function any time a scalar's value is used or
999 changed. The MAGIC's mg_ptr field points
1000 to a ufuncs structure:
1001
1002
1003 struct ufuncs {
1004 I32 (*uf_val)(IV, SV*);
1005 I32 (*uf_set)(IV, SV*);
1006 IV uf_index;
1007 };
1008 When the SV is read from or written to, the uf_val or uf_set function will be called with uf_index as the first arg and a pointer to the SV as the second. A simple example of how to add 'U' magic is shown below. Note that the ufuncs structure is copied by sv_magic, so you can safely allocate it on the stack.
1009
1010
1011 void
1012 Umagic(sv)
1013 SV *sv;
1014 PREINIT:
1015 struct ufuncs uf;
1016 CODE:
1017 uf.uf_val =
1018 Note that because multiple extensions may be using '~' or 'U' magic, it is important for extensions to take extra care to avoid conflict. Typically only using the magic on objects blessed into the same class as the extension is sufficient. For '~' magic, it may also be appropriate to add an I32 'signature' at the top of the private data area and check that.
1019
1020
1021 Also note that the sv_set*() and sv_cat*()
1022 functions described earlier do __not__ invoke 'set' magic
1023 on their targets. This must be done by the user either by
1024 calling the SvSETMAGIC() macro after calling these
1025 functions, or by using one of the sv_set*_mg() or
1026 sv_cat*_mg() functions. Similarly, generic C code
1027 must call the SvGETMAGIC() macro to invoke any
1028 'get' magic if they use an SV obtained from
1029 external sources in functions that don't handle magic. See
1030 perlapi for a description of these functions. For example,
1031 calls to the sv_cat*() functions typically need to
1032 be followed by SvSETMAGIC(), but they don't need a
1033 prior SvGETMAGIC() since their implementation
1034 handles 'get' magic.
1035
1036
1037 __Finding Magic__
1038
1039
1040 MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
1041 This routine returns a pointer to the MAGIC structure stored in the SV . If the SV does not have that magical feature, NULL is returned. Also, if the SV is not of type SVt_PVMG, Perl may core dump.
1042
1043
1044 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1045 This routine checks to see what types of magic sv has. If the mg_type field is an uppercase letter, then the mg_obj is copied to nsv, but the mg_type field is changed to be the lowercase letter.
1046
1047
1048 __Understanding the Magic of Tied Hashes and
1049 Arrays__
1050
1051
1052 Tied hashes and arrays are magical beasts of the 'P' magic
1053 type.
1054
1055
1056 WARNING: As of the 5.004 release, proper
1057 usage of the array and hash access functions requires
1058 understanding a few caveats. Some of these caveats are
1059 actually considered bugs in the API , to be
1060 fixed in later releases, and are bracketed with [[
1061 MAYCHANGE ] below. If you find yourself
1062 actually applying such information in this section, be aware
1063 that the behavior may change in the future, umm, without
1064 warning.
1065
1066
1067 The perl tie function associates a variable with an object
1068 that implements the various GET ,
1069 SET etc methods. To perform the equivalent of
1070 the perl tie function from an XSUB , you must
1071 mimic this behaviour. The code below carries out the
1072 necessary steps - firstly it creates a new hash, and then
1073 creates a second hash which it blesses into the class which
1074 will implement the tie methods. Lastly it ties the two
1075 hashes together, and returns a reference to the new tied
1076 hash. Note that the code below does NOT call
2 perry 1077 the TIEHASH method in the !MyTie class - see
1 perry 1078 ``Calling Perl Routines from within C Programs'' for details
1079 on how to do this.
1080
1081
1082 SV*
1083 mytie()
1084 PREINIT:
1085 HV *hash;
1086 HV *stash;
1087 SV *tie;
1088 CODE:
1089 hash = newHV();
1090 tie = newRV_noinc((SV*)newHV());
1091 stash = gv_stashpv(
1092 The av_store function, when given a tied array argument, merely copies the magic of the array onto the value to be ``stored'', using mg_copy. It may also return NULL , indicating that the value did not actually need to be stored in the array. [[ MAYCHANGE ] After a call to av_store on a tied array, the caller will usually need to call mg_set(val) to actually invoke the perl level `` STORE '' method on the TIEARRAY object. If av_store did return NULL , a call to SvREFCNT_dec(val) will also be usually necessary to avoid a memory leak. [[/MAYCHANGE]
1093
1094
1095 The previous paragraph is applicable verbatim to tied hash
1096 access using the hv_store and hv_store_ent
1097 functions as well.
1098
1099
1100 av_fetch and the corresponding hash functions
1101 hv_fetch and hv_fetch_ent actually return
1102 an undefined mortal value whose magic has been initialized
1103 using mg_copy. Note the value so returned does not
1104 need to be deallocated, as it is already mortal. [[
1105 MAYCHANGE ] But you will need to call
1106 mg_get() on the returned value in order to actually
1107 invoke the perl level `` FETCH '' method on
1108 the underlying TIE object. Similarly, you may
1109 also call mg_set() on the return value after
1110 possibly assigning a suitable value to it using
1111 sv_setsv, which will invoke the ``
1112 STORE '' method on the TIE
1113 object. [[/MAYCHANGE]
1114
1115
1116 [[ MAYCHANGE ] In other words, the array or
1117 hash fetch/store functions don't really fetch and store
1118 actual values in the case of tied arrays and hashes. They
1119 merely call mg_copy to attach magic to the values
1120 that were meant to be ``stored'' or ``fetched''. Later calls
1121 to mg_get and mg_set actually do the job
1122 of invoking the TIE methods on the underlying
1123 objects. Thus the magic mechanism currently implements a
1124 kind of lazy access to arrays and hashes.
1125
1126
1127 Currently (as of perl version 5.004), use of the hash and
1128 array access functions requires the user to be aware of
1129 whether they are operating on ``normal'' hashes and arrays,
1130 or on their tied variants. The API may be
1131 changed to provide more transparent access to both tied and
1132 normal data types in future versions.
1133 [[/MAYCHANGE]
1134
1135
1136 You would do well to understand that the
1137 TIEARRAY and TIEHASH
1138 interfaces are mere sugar to invoke some perl method calls
1139 while using the uniform hash and array syntax. The use of
1140 this sugar imposes some overhead (typically about two to
1141 four extra opcodes per FETCH/STORE operation,
1142 in addition to the creation of all the mortal variables
1143 required to invoke the methods). This overhead will be
1144 comparatively small if the TIE methods are
1145 themselves substantial, but if they are only a few
1146 statements long, the overhead will not be
1147 insignificant.
1148
1149
1150 __Localizing changes__
1151
1152
1153 Perl has a very handy construction
1154
1155
1156 {
1157 local $var = 2;
1158 ...
1159 }
1160 This construction is ''approximately'' equivalent to
1161
1162
1163 {
1164 my $oldvar = $var;
1165 $var = 2;
1166 ...
1167 $var = $oldvar;
1168 }
1169 The biggest difference is that the first construction would reinstate the initial value of $var, irrespective of how control exits the block: goto, return, die/eval etc. It is a little bit more efficient as well.
1170
1171
1172 There is a way to achieve a similar task from C via Perl
1173 API: create a ''pseudo-block'', and
1174 arrange for some changes to be automatically undone at the
1175 end of it, either explicit, or via a non-local exit (via
1176 ''die()''). A ''block''-like construct is created by a
1177 pair of ENTER/LEAVE macros (see
1178 ``Returning a Scalar'' in perlcall). Such a construct may be
1179 created specially for some important localized task, or an
1180 existing one (like boundaries of enclosing Perl
1181 subroutine/block, or an existing pair for freeing TMPs) may
1182 be used. (In the second case the overhead of additional
1183 localization must be almost negligible.) Note that any
1184 XSUB is automatically enclosed in an
1185 ENTER/LEAVE pair.
1186
1187
1188 Inside such a ''pseudo-block'' the following service is
1189 available:
1190
1191
1192 SAVEINT(int i)
1193
1194
1195 SAVEIV(IV i)
1196
1197
1198 SAVEI32(I32 i)
1199
1200
1201 SAVELONG(long i)
1202
1203
1204 These macros arrange things to restore the value of integer
1205 variable i at the end of enclosing
1206 ''pseudo-block''.
1207
1208
1209 SAVESPTR(s)
1210
1211
1212 SAVEPPTR(p)
1213
1214
1215 These macros arrange things to restore the value of pointers
1216 s and p. s must be a pointer of a
1217 type which survives conversion to SV* and back,
1218 p should be able to survive conversion to
1219 char* and back.
1220
1221
1222 SAVEFREESV(SV *sv)
1223
1224
1225 The refcount of sv would be decremented at the end
1226 of ''pseudo-block''. This is similar to
1227 sv_2mortal in that it is also a mechanism for doing
1228 a delayed SvREFCNT_dec. However, while
1229 sv_2mortal extends the lifetime of sv
1230 until the beginning of the next statement,
1231 SAVEFREESV extends it until the end of the
1232 enclosing scope. These lifetimes can be wildly
1233 different.
1234
1235
1236 Also compare SAVEMORTALIZESV.
1237
1238
1239 SAVEMORTALIZESV(SV *sv)
1240
1241
1242 Just like SAVEFREESV, but mortalizes sv at
1243 the end of the current scope instead of decrementing its
1244 reference count. This usually has the effect of keeping
1245 sv alive until the statement that called the
1246 currently live scope has finished executing.
1247
1248
1249 SAVEFREEOP(OP *op)
1250
1251
1252 The OP * is ''op_free()''ed at the end of
1253 ''pseudo-block''.
1254
1255
1256 SAVEFREEPV(p)
1257
1258
1259 The chunk of memory which is pointed to by p is
1260 ''Safefree()''ed at the end of
1261 ''pseudo-block''.
1262
1263
1264 SAVECLEARSV(SV *sv)
1265
1266
1267 Clears a slot in the current scratchpad which corresponds to
1268 sv at the end of ''pseudo-block''.
1269
1270
1271 SAVEDELETE(HV *hv, char *key, I32
1272 length)
1273
1274
1275 The key key of hv is deleted at the end of
1276 ''pseudo-block''. The string pointed to by key
1277 is ''Safefree()''ed. If one has a ''key'' in
1278 short-lived storage, the corresponding string may be
1279 reallocated like this:
1280
1281
1282 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1283
1284
1285 SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void
1286 *p)
1287
1288
1289 At the end of ''pseudo-block'' the function f is
1290 called with the only argument p.
1291
1292
1293 SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void
1294 *p)
1295
1296
1297 At the end of ''pseudo-block'' the function f is
1298 called with the implicit context argument (if any), and
1299 p.
1300
1301
1302 SAVESTACK_POS()
1303
1304
1305 The current offset on the Perl internal stack (cf.
1306 SP) is restored at the end of
1307 ''pseudo-block''.
1308
1309
1310 The following API list contains functions,
1311 thus one needs to provide pointers to the modifiable data
1312 explicitly (either C pointers, or Perlish GV *s).
1313 Where the above macros take int, a similar function
1314 takes int *.
1315
1316
1317 SV* save_scalar(GV *gv)
1318
1319
1320 Equivalent to Perl code local $gv.
1321
1322
1323 AV* save_ary(GV *gv)
1324
1325
1326 HV* save_hash(GV *gv)
1327
1328
1329 Similar to save_scalar, but localize @gv
1330 and %gv.
1331
1332
1333 void save_item(SV *item)
1334
1335
1336 Duplicates the current value of SV, on the exit
1337 from the current ENTER/LEAVE
1338 ''pseudo-block'' will restore the value of SV
1339 using the stored value.
1340
1341
1342 void save_list(SV **sarg, I32 maxsarg)
1343
1344
1345 A variant of save_item which takes multiple
1346 arguments via an array sarg of SV* of
1347 length maxsarg.
1348
1349
1350 SV* save_svref(SV **sptr)
1351
1352
1353 Similar to save_scalar, but will reinstate a SV
1354 *.
1355
1356
1357 void save_aptr(AV **aptr)
1358
1359
1360 void save_hptr(HV **hptr)
1361
1362
1363 Similar to save_svref, but localize AV *
1364 and HV *.
1365
1366
1367 The Alias module implements localization of the
1368 basic types within the ''caller's scope''. People who are
1369 interested in how to localize things in the containing scope
1370 should take a look there too.
1371 !!Subroutines
1372
1373
1374 __XSUBs and the Argument Stack__
1375
1376
1377 The XSUB mechanism is a simple way for Perl
1378 programs to access C subroutines. An XSUB
1379 routine will have a stack that contains the arguments from
1380 the Perl program, and a way to map from the Perl data
1381 structures to a C equivalent.
1382
1383
1384 The stack arguments are accessible through the
1385 ST(n) macro, which returns the n'th stack
1386 argument. Argument 0 is the first argument passed in the
1387 Perl subroutine call. These arguments are SV*, and
1388 can be used anywhere an SV* is used.
1389
1390
1391 Most of the time, output from the C routine can be handled
1392 through use of the RETVAL and
1393 OUTPUT directives. However, there are some
1394 cases where the argument stack is not already long enough to
1395 handle all the return values. An example is the
1396 POSIX ''tzname()'' call, which takes no
1397 arguments, but returns two, the local time zone's standard
1398 and summer time abbreviations.
1399
1400
1401 To handle this situation, the PPCODE
1402 directive is used and the stack is extended using the
1403 macro:
1404
1405
1406 EXTEND(SP, num);
1407 where SP is the macro that represents the local copy of the stack pointer, and num is the number of elements the stack should be extended by.
1408
1409
1410 Now that there is room on the stack, values can be pushed on
1411 it using the macros to push IVs, doubles, strings, and
1412 SV pointers respectively:
1413
1414
1415 PUSHi(IV)
1416 PUSHn(double)
1417 PUSHp(char*, I32)
1418 PUSHs(SV*)
1419 And now the Perl program calling tzname, the two values will be assigned as in:
1420
1421
1422 ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1423 An alternate (and possibly simpler) method to pushing values on the stack is to use the macros:
1424
1425
1426 XPUSHi(IV)
1427 XPUSHn(double)
1428 XPUSHp(char*, I32)
1429 XPUSHs(SV*)
1430 These macros automatically adjust the stack for you, if needed. Thus, you do not need to call EXTEND to extend the stack. However, see ``Putting a C value on Perl stack''
1431
1432
1433 For more information, consult perlxs and
1434 perlxstut.
1435
1436
1437 __Calling Perl Routines from within C
1438 Programs__
1439
1440
1441 There are four routines that can be used to call a Perl
1442 subroutine from within a C program. These four
1443 are:
1444
1445
1446 I32 call_sv(SV*, I32);
1447 I32 call_pv(const char*, I32);
1448 I32 call_method(const char*, I32);
1449 I32 call_argv(const char*, I32, register char**);
1450 The routine most often used is call_sv. The SV* argument contains either the name of the Perl subroutine to be called, or a reference to the subroutine. The second argument consists of flags that control the context in which the subroutine is called, whether or not the subroutine is being passed arguments, how errors should be trapped, and how to treat return values.
1451
1452
1453 All four routines return the number of arguments that the
1454 subroutine returned on the Perl stack.
1455
1456
1457 These routines used to be called perl_call_sv etc.,
1458 before Perl v5.6.0, but those names are now deprecated;
1459 macros of the same name are provided for
1460 compatibility.
1461
1462
1463 When using any of these routines (except
1464 call_argv), the programmer must manipulate the Perl
1465 stack. These include the following macros and
1466 functions:
1467
1468
1469 dSP
1470 SP
1471 PUSHMARK()
1472 PUTBACK
1473 SPAGAIN
1474 ENTER
1475 SAVETMPS
1476 FREETMPS
1477 LEAVE
1478 XPUSH*()
1479 POP*()
1480 For a detailed description of calling conventions from C to Perl, consult perlcall.
1481
1482
1483 __Memory Allocation__
1484
1485
1486 All memory meant to be used with the Perl API
1487 functions should be manipulated using the macros described
1488 in this section. The macros provide the necessary
1489 transparency between differences in the actual malloc
1490 implementation that is used within perl.
1491
1492
1493 It is suggested that you enable the version of malloc that
1494 is distributed with Perl. It keeps pools of various sizes of
1495 unallocated memory in order to satisfy allocation requests
1496 more quickly. However, on some platforms, it may cause
1497 spurious malloc or free errors.
1498
1499
1500 New(x, pointer, number, type);
1501 Newc(x, pointer, number, type, cast);
1502 Newz(x, pointer, number, type);
1503 These three macros are used to initially allocate memory.
1504
1505
1506 The first argument x was a ``magic cookie'' that
1507 was used to keep track of who called the macro, to help when
1508 debugging memory problems. However, the current code makes
1509 no use of this feature (most Perl developers now use
1510 run-time memory checkers), so this argument can be any
1511 number.
1512
1513
1514 The second argument pointer should be the name of a
1515 variable that will point to the newly allocated
1516 memory.
1517
1518
1519 The third and fourth arguments number and
1520 type specify how many of the specified type of data
1521 structure should be allocated. The argument type is
1522 passed to sizeof. The final argument to
1523 Newc, cast, should be used if the
1524 pointer argument is different from the
1525 type argument.
1526
1527
1528 Unlike the New and Newc macros, the
1529 Newz macro calls memzero to zero out all
1530 the newly allocated memory.
1531
1532
1533 Renew(pointer, number, type);
1534 Renewc(pointer, number, type, cast);
1535 Safefree(pointer)
1536 These three macros are used to change a memory buffer size or to free a piece of memory no longer needed. The arguments to Renew and Renewc match those of New and Newc with the exception of not needing the ``magic cookie'' argument.
1537
1538
1539 Move(source, dest, number, type);
1540 Copy(source, dest, number, type);
1541 Zero(dest, number, type);
1542 These three macros are used to move, copy, or zero out previously allocated memory. The source and dest arguments point to the source and destination starting points. Perl will move, copy, or zero out number instances of the size of the type data structure (using the sizeof function).
1543
1544
1545 __PerlIO__
1546
1547
1548 The most recent development releases of Perl has been
1549 experimenting with removing Perl's dependency on the
1550 ``normal'' standard I/O suite and allowing other stdio
1551 implementations to be used. This involves creating a new
1552 abstraction layer that then calls whichever implementation
1553 of stdio Perl was compiled with. All XSUBs should now use
1554 the functions in the PerlIO abstraction layer and not make
1555 any assumptions about what kind of stdio is being
1556 used.
1557
1558
1559 For a complete description of the PerlIO abstraction,
1560 consult perlapio.
1561
1562
1563 __Putting a C value on Perl stack__
1564
1565
1566 A lot of opcodes (this is an elementary operation in the
1567 internal perl stack machine) put an SV* on the stack.
1568 However, as an optimization the corresponding
1569 SV is (usually) not recreated each time. The
1570 opcodes reuse specially assigned SVs (''target''s) which
1571 are (as a corollary) not constantly
1572 freed/created.
1573
1574
1575 Each of the targets is created only once (but see
1576 ``Scratchpads and recursion'' below), and when an opcode
1577 needs to put an integer, a double, or a string on stack, it
1578 just sets the corresponding parts of its ''target'' and
1579 puts the ''target'' on stack.
1580
1581
1582 The macro to put this target on stack is PUSHTARG,
1583 and it is directly used in some opcodes, as well as
1584 indirectly in zillions of others, which use it via
1585 (X)PUSH[[pni].
1586
1587
1588 Because the target is reused, you must be careful when
1589 pushing multiple values on the stack. The following code
1590 will not do what you think:
1591
1592
1593 XPUSHi(10);
1594 XPUSHi(20);
1595 This translates as TARG to 10, push a pointer to TARG onto the stack; set TARG to 20, push a pointer to TARG onto the stackTARG, which we have set to 20. If you need to push multiple different values, use XPUSHs, which bypasses TARG.
1596
1597
1598 On a related note, if you do use (X)PUSH[[npi], then
1599 you're going to need a dTARG in your variable
1600 declarations so that the *PUSH* macros can make use
1601 of the local variable TARG.
1602
1603
1604 __Scratchpads__
1605
1606
1607 The question remains on when the SVs which are
1608 ''target''s for opcodes are created. The answer is that
1609 they are created when the current unit -- a subroutine or a
1610 file (for opcodes for statements outside of subroutines) --
1611 is compiled. During this time a special anonymous Perl array
1612 is created, which is called a scratchpad for the current
1613 unit.
1614
1615
1616 A scratchpad keeps SVs which are lexicals for the current
1617 unit and are targets for opcodes. One can deduce that an
1618 SV lives on a scratchpad by looking on its
1619 flags: lexicals have SVs_PADMY set, and
1620 ''target''s have SVs_PADTMP set.
1621
1622
1623 The correspondence between OPs and ''target''s is not
1624 1-to-1. Different OPs in the compile tree of the unit can
1625 use the same target, if this would not conflict with the
1626 expected life of the temporary.
1627
1628
1629 __Scratchpads and recursion__
1630
1631
1632 In fact it is not 100% true that a compiled unit contains a
1633 pointer to the scratchpad AV . In fact it
1634 contains a pointer to an AV of (initially)
1635 one element, and this element is the scratchpad
1636 AV . Why do we need an extra level of
1637 indirection?
1638
1639
1640 The answer is __recursion__, and maybe (sometime soon)
1641 __threads__. Both these can create several execution
1642 pointers going into the same subroutine. For the
1643 subroutine-child not write over the temporaries for the
1644 subroutine-parent (lifespan of which covers the call to the
1645 child), the parent and the child should have different
1646 scratchpads. (''And'' the lexicals should be separate
1647 anyway!)
1648
1649
1650 So each subroutine is born with an array of scratchpads (of
1651 length 1). On each entry to the subroutine it is checked
1652 that the current depth of the recursion is not more than the
1653 length of this array, and if it is, new scratchpad is
1654 created and pushed into the array.
1655
1656
1657 The ''target''s on this scratchpad are undefs,
1658 but they are already marked with correct flags.
1659 !!Compiled code
1660
1661
1662 __Code tree__
1663
1664
1665 Here we describe the internal form your code is converted to
1666 by Perl. Start with a simple example:
1667
1668
1669 $a = $b + $c;
1670 This is converted to a tree similar to this one:
1671
1672
1673 assign-to
1674 / \
1675 + $a
1676 / \
1677 $b $c
1678 (but slightly more complicated). This tree reflects the way Perl parsed your code, but has nothing to do with the execution order. There is an additional ``thread'' going through the nodes of the tree which shows the order of execution of the nodes. In our simplified example above it looks like:
1679
1680
1681 $b ---
1682 But with the actual compile tree for $a = $b + $c it is different: some nodes ''optimized away''. As a corollary, though the actual tree contains more nodes than our simplified example, the execution order is the same as in our example.
1683
1684
1685 __Examining the tree__
1686
1687
1688 If you have your perl compiled for debugging (usually done
1689 with -D optimize=-g on Configure command
1690 line), you may examine the compiled tree by specifying
1691 -Dx on the Perl command line. The output takes
1692 several lines per node, and for $b+$c it looks like
1693 this:
1694
1695
1696 5 TYPE = add ===
1697 This tree has 5 nodes (one per TYPE specifier), only 3 of them are not optimized away (one per number in the left column). The immediate children of the given node correspond to {} pairs on the same level of indentation, thus this listing corresponds to the tree:
1698
1699
1700 add
1701 / \
1702 null null
1703 gvsv gvsv
1704 The execution order is indicated by === marks, thus it is 3 4 5 6 (node 6 is not included into above listing), i.e., gvsv gvsv add whatever.
1705
1706
1707 Each of these nodes represents an op, a fundamental
1708 operation inside the Perl core. The code which implements
1709 each operation can be found in the ''pp*.c'' files; the
1710 function which implements the op with type gvsv is
1711 pp_gvsv, and so on. As the tree above shows,
1712 different ops have different numbers of children:
1713 add is a binary operator, as one would expect, and
1714 so has two children. To accommodate the various different
1715 numbers of children, there are various types of op data
1716 structure, and they link together in different
1717 ways.
1718
1719
1720 The simplest type of op structure is OP: this has
1721 no children. Unary operators, UNOPs, have one
1722 child, and this is pointed to by the op_first
1723 field. Binary operators (BINOPs) have not only an
1724 op_first field but also an op_last field.
1725 The most complex type of op is a LISTOP, which has
1726 any number of children. In this case, the first child is
1727 pointed to by op_first and the last child by
1728 op_last. The children in between can be found by
1729 iteratively following the op_sibling pointer from
1730 the first child to the last.
1731
1732
1733 There are also two other op types: a PMOP holds a
1734 regular expression, and has no children, and a LOOP
1735 may or may not have children. If the op_children
1736 field is non-zero, it behaves like a LISTOP. To
1737 complicate matters, if a UNOP is actually a
1738 null op after optimization (see ``Compile pass 2:
1739 context propagation'') it will still have children in
1740 accordance with its former type.
1741
1742
1743 __Compile pass 1: check routines__
1744
1745
1746 The tree is created by the compiler while ''yacc'' code
1747 feeds it the constructions it recognizes. Since ''yacc''
1748 works bottom-up, so does the first pass of perl
1749 compilation.
1750
1751
1752 What makes this pass interesting for perl developers is that
1753 some optimization may be performed on this pass. This is
1754 optimization by so-called ``check routines''. The
1755 correspondence between node names and corresponding check
1756 routines is described in ''opcode.pl'' (do not forget to
1757 run make regen_headers if you modify this
1758 file).
1759
1760
1761 A check routine is called when the node is fully constructed
1762 except for the execution-order thread. Since at this time
1763 there are no back-links to the currently constructed node,
1764 one can do most any operation to the top-level node,
1765 including freeing it and/or creating new nodes above/below
1766 it.
1767
1768
1769 The check routine returns the node which should be inserted
1770 into the tree (if the top-level node was not modified, check
1771 routine returns its argument).
1772
1773
1774 By convention, check routines have names ck_*. They
1775 are usually called from new*OP subroutines (or
1776 convert) (which in turn are called from
1777 ''perly.y'').
1778
1779
1780 __Compile pass 1a: constant folding__
1781
1782
1783 Immediately after the check routine is called the returned
1784 node is checked for being compile-time executable. If it is
1785 (the value is judged to be constant) it is immediately
1786 executed, and a ''constant'' node with the ``return
1787 value'' of the corresponding subtree is substituted instead.
1788 The subtree is deleted.
1789
1790
1791 If constant folding was not performed, the execution-order
1792 thread is created.
1793
1794
1795 __Compile pass 2: context propagation__
1796
1797
1798 When a context for a part of compile tree is known, it is
1799 propagated down through the tree. At this time the context
1800 can have 5 values (instead of 2 for runtime context): void,
1801 boolean, scalar, list, and lvalue. In contrast with the pass
1802 1 this pass is processed from top to bottom: a node's
1803 context determines the context for its
1804 children.
1805
1806
1807 Additional context-dependent optimizations are performed at
1808 this time. Since at this moment the compile tree contains
1809 back-references (via ``thread'' pointers), nodes cannot be
1810 ''free()''d now. To allow optimized-away nodes at this
1811 stage, such nodes are ''null()''ified instead of
1812 ''free()''ing (i.e. their type is changed to
1813 OP_NULL ).
1814
1815
1816 __Compile pass 3: peephole optimization__
1817
1818
1819 After the compile tree for a subroutine (or for an
1820 eval or a file) is created, an additional pass over
1821 the code is performed. This pass is neither top-down or
1822 bottom-up, but in the execution order (with additional
1823 complications for conditionals). These optimizations are
1824 done in the subroutine ''peep()''. Optimizations
1825 performed at this stage are subject to the same restrictions
1826 as in the pass 2.
1827 !!Examining internal data structures with the dump functions
1828
1829
1830 To aid debugging, the source file ''dump.c'' contains a
1831 number of functions which produce formatted output of
1832 internal data structures.
1833
1834
1835 The most commonly used of these functions is
1836 Perl_sv_dump; it's used for dumping SVs, AVs, HVs,
1837 and CVs. The Devel::Peek module calls
1838 sv_dump to produce debugging output from
1839 Perl-space, so users of that module should already be
1840 familiar with its format.
1841
1842
1843 Perl_op_dump can be used to dump an OP
1844 structure or any of its derivatives, and produces output
1845 similiar to perl -Dx; in fact,
1846 Perl_dump_eval will dump the main root of the code
1847 being evaluated, exactly like -Dx.
1848
1849
1850 Other useful functions are Perl_dump_sub, which
1851 turns a GV into an op tree,
1852 Perl_dump_packsubs which calls
1853 Perl_dump_sub on all the subroutines in a package
1854 like so: (Thankfully, these are all xsubs, so there is no op
1855 tree)
1856
1857
1858 (gdb) print Perl_dump_packsubs(PL_defstash)
1859 SUB attributes::bootstrap = (xsub 0x811fedc 0)
1860 SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1861 SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1862 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
2 perry 1863 SUB !DynaLoader::boot_!DynaLoader = (xsub 0x805b188 0)
1 perry 1864 and Perl_dump_all, which dumps all the subroutines in the stash and the op tree of the main root.
1865 !!How multiple interpreters and concurrency are supported
1866
1867
1868 __Background and
1869 PERL_IMPLICIT_CONTEXT__
1870
1871
1872 The Perl interpreter can be regarded as a closed box: it has
1873 an API for feeding it code or otherwise
1874 making it do things, but it also has functions for its own
1875 use. This smells a lot like an object, and there are ways
1876 for you to build Perl so that you can have multiple
1877 interpreters, with one interpreter represented either as a C
1878 ++ object, a C structure, or inside a thread.
1879 The thread, the C structure, or the C ++
1880 object will contain all the context, the state of that
1881 interpreter.
1882
1883
1884 Three macros control the major Perl build flavors:
1885 MULTIPLICITY , USE_THREADS and
1886 PERL_OBJECT . The MULTIPLICITY
1887 build has a C structure that packages all the interpreter
1888 state, there is a similar thread-specific data structure
1889 under USE_THREADS , and the (now deprecated)
1890 PERL_OBJECT build has a C ++
1891 class to maintain interpreter state. In all three cases,
1892 PERL_IMPLICIT_CONTEXT is also normally
1893 defined, and enables the support for passing in a ``hidden''
1894 first argument that represents all three data
1895 structures.
1896
1897
1898 All this obviously requires a way for the Perl internal
1899 functions to be C ++ methods, subroutines
1900 taking some kind of structure as the first argument, or
1901 subroutines taking nothing as the first argument. To enable
1902 these three very different ways of building the interpreter,
1903 the Perl source (as it does in so many other situations)
1904 makes heavy use of macros and subroutine naming
1905 conventions.
1906
1907
1908 First problem: deciding which functions will be public
1909 API functions and which will be private. All
1910 functions whose names begin S_ are private (think
1911 ``S'' for ``secret'' or ``static''). All other functions
1912 begin with ``Perl_'', but just because a function begins
1913 with ``Perl_'' does not mean it is part of the
1914 API . (See ``Internal Functions''.) The
1915 easiest way to be __sure__ a function is part of the
1916 API is to find its entry in perlapi. If it
1917 exists in perlapi, it's part of the API . If
1918 it doesn't, and you think it should be (i.e., you need it
1919 for your extension), send mail via perlbug explaining why
1920 you think it should be.
1921
1922
1923 Second problem: there must be a syntax so that the same
1924 subroutine declarations and calls can pass a structure as
1925 their first argument, or pass nothing. To solve this, the
1926 subroutines are named and declared in a particular way.
1927 Here's a typical start of a static function used within the
1928 Perl guts:
1929
1930
1931 STATIC void
1932 S_incline(pTHX_ char *s)
1933 STATIC becomes ``static'' in C, and is #define'd to nothing in C ++ .
1934
1935
1936 A public function (i.e. part of the internal
1937 API , but not necessarily sanctioned for use
1938 in extensions) begins like this:
1939
1940
1941 void
1942 Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv)
1943 pTHX_ is one of a number of macros (in perl.h) that hide the details of the interpreter's context. THX stands for ``thread'', ``this'', or ``thingy'', as the case may be. (And no, George Lucas is not involved. :-) The first character could be 'p' for a __p__rototype, 'a' for __a__rgument, or 'd' for __d__eclaration, so we have pTHX, aTHX and dTHX, and their variants.
1944
1945
1946 When Perl is built without options that set
1947 PERL_IMPLICIT_CONTEXT , there is no first
1948 argument containing the interpreter's context. The trailing
1949 underscore in the pTHX_ macro indicates that the macro
1950 expansion needs a comma after the context argument because
1951 other arguments follow it. If
1952 PERL_IMPLICIT_CONTEXT is not defined, pTHX_
1953 will be ignored, and the subroutine is not prototyped to
1954 take the extra argument. The form of the macro without the
1955 trailing underscore is used when there are no additional
1956 explicit arguments.
1957
1958
1959 When a core function calls another, it must pass the
1960 context. This is normally hidden via macros. Consider
1961 sv_setsv. It expands into something like
1962 this:
1963
1964
1965 ifdef PERL_IMPLICIT_CONTEXT
1966 define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b)
1967 /* can't do this for vararg functions, see below */
1968 else
1969 define sv_setsv Perl_sv_setsv
1970 endif
1971 This works well, and means that XS authors can gleefully write:
1972
1973
1974 sv_setsv(foo, bar);
1975 and still have it work under all the modes Perl could have been compiled with.
1976
1977
1978 Under PERL_OBJECT in the core, that will
1979 translate to either:
1980
1981
1982 CPerlObj::Perl_sv_setsv(foo,bar); # in CPerlObj functions,
1983 # C++ takes care of 'this'
1984 or
1985 pPerl-
1986 Under PERL_OBJECT in extensions (aka PERL_CAPI ), or under MULTIPLICITY/USE_THREADS with PERL_IMPLICIT_CONTEXT in both core and extensions, it will become:
1987
1988
1989 Perl_sv_setsv(aTHX_ foo, bar); # the canonical Perl
1990 This doesn't work so cleanly for varargs functions, though, as macros imply that the number of arguments is known in advance. Instead we either need to spell them out fully, passing aTHX_ as the first argument (the Perl core tends to do this with functions like Perl_warner), or use a context-free version.
1991
1992
1993 The context-free version of Perl_warner is called
1994 Perl_warner_nocontext, and does not take the extra argument.
1995 Instead it does dTHX; to get the context from thread-local
1996 storage. We #define warner Perl_warner_nocontext so
1997 that extensions get source compatibility at the expense of
1998 performance. (Passing an arg is cheaper than grabbing it
1999 from thread-local storage.)
2000
2001
2002 You can ignore [[pad]THX[[xo] when browsing the Perl
2003 headers/sources. Those are strictly for use within the core.
2004 Extensions and embedders need only be aware of
2005 [[pad]THX.
2006
2007
2008 __So what happened to dTHR?__
2009
2010
2011 dTHR was introduced in perl 5.005 to support the
2012 older thread model. The older thread model now uses the
2013 THX mechanism to pass context pointers around, so
2014 dTHR is not useful any more. Perl 5.6.0 and later
2015 still have it for backward source compatibility, but it is
2016 defined to be a no-op.
2017
2018
2019 __How do I use all this in extensions?__
2020
2021
2022 When Perl is built with PERL_IMPLICIT_CONTEXT
2023 , extensions that call any functions in the Perl
2024 API will need to pass the initial context
2025 argument somehow. The kicker is that you will need to write
2026 it in such a way that the extension still compiles when Perl
2027 hasn't been built with PERL_IMPLICIT_CONTEXT
2028 enabled.
2029
2030
2031 There are three ways to do this. First, the easy but
2032 inefficient way, which is also the default, in order to
2033 maintain source compatibility with extensions: whenever
2034 XSUB .h is #included, it redefines the aTHX
2035 and aTHX_ macros to call a function that will return the
2036 context. Thus, something like:
2037
2038
2039 sv_setsv(asv, bsv);
2040 in your extension will translate to this when PERL_IMPLICIT_CONTEXT is in effect:
2041
2042
2043 Perl_sv_setsv(Perl_get_context(), asv, bsv);
2044 or to this otherwise:
2045
2046
2047 Perl_sv_setsv(asv, bsv);
2048 You have to do nothing new in your extension to get this; since the Perl library provides ''Perl_get_context()'', it will all just work.
2049
2050
2051 The second, more efficient way is to use the following
2052 template for your Foo.xs:
2053
2054
2055 #define PERL_NO_GET_CONTEXT /* we want efficiency */
2056 #include
2057 static my_private_function(int arg1, int arg2);
2058 static SV *
2059 my_private_function(int arg1, int arg2)
2060 {
2061 dTHX; /* fetch context */
2062 ... call many Perl API functions ...
2063 }
2064 [[... etc ...]
2065 MODULE = Foo PACKAGE = Foo
2066 /* typical XSUB */
2067 void
2068 my_xsub(arg)
2069 int arg
2070 CODE:
2071 my_private_function(arg, 10);
2072 Note that the only two changes from the normal way of writing an extension is the addition of a #define PERL_NO_GET_CONTEXT before including the Perl headers, followed by a dTHX; declaration at the start of every function that will call the Perl API . (You'll know which functions need this, because the C compiler will complain that there's an undeclared identifier in those functions.) No changes are needed for the XSUBs themselves, because the ''XS ()'' macro is correctly defined to pass in the implicit context if needed.
2073
2074
2075 The third, even more efficient way is to ape how it is done
2076 within the Perl guts:
2077
2078
2079 #define PERL_NO_GET_CONTEXT /* we want efficiency */
2080 #include
2081 /* pTHX_ only needed for functions that call Perl API */
2082 static my_private_function(pTHX_ int arg1, int arg2);
2083 static SV *
2084 my_private_function(pTHX_ int arg1, int arg2)
2085 {
2086 /* dTHX; not needed here, because THX is an argument */
2087 ... call Perl API functions ...
2088 }
2089 [[... etc ...]
2090 MODULE = Foo PACKAGE = Foo
2091 /* typical XSUB */
2092 void
2093 my_xsub(arg)
2094 int arg
2095 CODE:
2096 my_private_function(aTHX_ arg, 10);
2097 This implementation never has to fetch the context using a function call, since it is always passed as an extra argument. Depending on your needs for simplicity or efficiency, you may mix the previous two approaches freely.
2098
2099
2100 Never add a comma after pTHX yourself--always use
2101 the form of the macro with the underscore for functions that
2102 take explicit arguments, or the form without the argument
2103 for functions with no explicit arguments.
2104
2105
2106 __Should I do anything special if I call perl from multiple
2107 threads?__
2108
2109
2110 If you create interpreters in one thread and then proceed to
2111 call them in another, you need to make sure perl's own
2112 Thread Local Storage ( TLS ) slot is
2113 initialized correctly in each of those threads.
2114
2115
2116 The perl_alloc and perl_clone
2117 API functions will automatically set the
2118 TLS slot to the interpreter they created, so
2119 that there is no need to do anything special if the
2120 interpreter is always accessed in the same thread that
2121 created it, and that thread did not create or call any other
2122 interpreters afterwards. If that is not the case, you have
2123 to set the TLS slot of the thread before
2124 calling any functions in the Perl API on that
2125 particular interpreter. This is done by calling the
2126 PERL_SET_CONTEXT macro in that thread as the first
2127 thing you do:
2128
2129
2130 /* do this before doing anything else with some_perl */
2131 PERL_SET_CONTEXT(some_perl);
2132 ... other Perl API calls on some_perl go here ...
2133
2134
2135 __Future Plans and
2136 PERL_IMPLICIT_SYS__
2137
2138
2139 Just as PERL_IMPLICIT_CONTEXT provides a way
2140 to bundle up everything that the interpreter knows about
2141 itself and pass it around, so too are there plans to allow
2142 the interpreter to bundle up everything it knows about the
2143 environment it's running on. This is enabled with the
2144 PERL_IMPLICIT_SYS macro. Currently it only
2145 works with PERL_OBJECT and
2146 USE_THREADS on Windows (see inside
2147 iperlsys.h).
2148
2149
2150 This allows the ability to provide an extra pointer (called
2151 the ``host'' environment) for all the system calls. This
2152 makes it possible for all the system stuff to maintain their
2153 own state, broken down into seven C structures. These are
2154 thin wrappers around the usual system calls (see
2155 win32/perllib.c) for the default perl executable, but for a
2156 more ambitious host (like the one that would do
2157 ''fork()'' emulation) all the extra work needed to
2158 pretend that different interpreters are actually different
2159 ``processes'', would be done here.
2160
2161
2162 The Perl engine/interpreter and the host are orthogonal
2163 entities. There could be one or more interpreters in a
2164 process, and one or more ``hosts'', with free association
2165 between them.
2166 !!Internal Functions
2167
2168
2169 All of Perl's internal functions which will be exposed to
2170 the outside world are be prefixed by Perl_ so that
2171 they will not conflict with XS functions or
2172 functions used in a program in which Perl is embedded.
2173 Similarly, all global variables begin with PL_. (By
2174 convention, static functions start with
2175 S_)
2176
2177
2178 Inside the Perl core, you can get at the functions either
2179 with or without the Perl_ prefix, thanks to a bunch
2180 of defines that live in ''embed.h''. This header file is
2181 generated automatically from ''embed.pl''.
2182 ''embed.pl'' also creates the prototyping header files
2183 for the internal functions, generates the documentation and
2184 a lot of other bits and pieces. It's important that when you
2185 add a new function to the core or change an existing one,
2186 you change the data in the table at the end of
2187 ''embed.pl'' as well. Here's a sample entry from that
2188 table:
2189
2190
2191 Apd SV** av_fetch AV* arI32 keyI32 lval
2192 The second column is the return type, the third column the name. Columns after that are the arguments. The first column is a set of flags:
2193
2194
2195 A
2196
2197
2198 This function is a part of the public API
2199 .
2200
2201
2202 p
2203
2204
2205 This function has a Perl_ prefix; ie, it is defined
2206 as Perl_av_fetch
2207
2208
2209 d
2210
2211
2212 This function has documentation using the apidoc
2213 feature which we'll look at in a second.
2214
2215
2216 Other available flags are:
2217
2218
2219 s
2220
2221
2222 This is a static function and is defined as
2223 S_whatever, and usually called within the sources
2224 as whatever(...).
2225
2226
2227 n
2228
2229
2230 This does not use aTHX_ and pTHX to pass
2231 interpreter context. (See ``Background and
2232 PERL_IMPLICIT_CONTEXT '' in
2233 perlguts.)
2234
2235
2236 r
2237
2238
2239 This function never returns; croak, exit
2240 and friends.
2241
2242
2243 f
2244
2245
2246 This function takes a variable number of arguments,
2247 printf style. The argument list should end with
2248 ..., like this:
2249
2250
2251 Afprd void croak const char* pat...
2252
2253
2254 M
2255
2256
2257 This function is part of the experimental development
2258 API , and may change or disappear without
2259 notice.
2260
2261
2262 o
2263
2264
2265 This function should not have a compatibility macro to
2266 define, say, Perl_parse to parse. It must
2267 be called as Perl_parse.
2268
2269
2270 j
2271
2272
2273 This function is not a member of CPerlObj. If you
2274 don't know what this means, don't use it.
2275
2276
2277 x
2278
2279
2280 This function isn't exported out of the Perl
2281 core.
2282
2283
2284 If you edit ''embed.pl'', you will need to run make
2285 regen_headers to force a rebuild of ''embed.h'' and
2286 other auto-generated files.
2287
2288
2289 __Formatted Printing of IVs, UVs, and NVs__
2290
2291
2292 If you are printing IVs, UVs, or NVS instead
2293 of the stdio(3) style formatting codes like
2294 %d, %ld, %f, you should use the
2295 following macros for portability
2296
2297
2298 IVdf IV in decimal
2299 UVuf UV in decimal
2300 UVof UV in octal
2301 UVxf UV in hexadecimal
2302 NVef NV %e-like
2303 NVff NV %f-like
2304 NVgf NV %g-like
2305 These will take care of 64-bit integers and long doubles. For example:
2306
2307
2308 printf(
2309 The IVdf will expand to whatever is the correct format for the IVs.
2310
2311
2312 If you are printing addresses of pointers, use UVxf combined
2313 with ''PTR2UV ()'', do not use
2314 %lx or %p.
2315
2316
2317 __Pointer-To-Integer and
2318 Integer-To-Pointer__
2319
2320
2321 Because pointer size does not necessarily equal integer
2322 size, use the follow macros to do it right.
2323
2324
2325 PTR2UV(pointer)
2326 PTR2IV(pointer)
2327 PTR2NV(pointer)
2328 INT2PTR(pointertotype, integer)
2329 For example:
2330
2331
2332 IV iv = ...;
2333 SV *sv = INT2PTR(SV*, iv);
2334 and
2335
2336
2337 AV *av = ...;
2338 UV uv = PTR2UV(av);
2339
2340
2341 __Source Documentation__
2342
2343
2344 There's an effort going on to document the internal
2345 functions and automatically produce reference manuals from
2346 them - perlapi is one such manual which details all the
2347 functions which are available to XS writers.
2348 perlintern is the autogenerated manual for the functions
2349 which are not part of the API and are
2350 supposedly for internal use only.
2351
2352
2353 Source documentation is created by putting
2354 POD comments into the C source, like
2355 this:
2356
2357
2358 /*
2359 =for apidoc sv_setiv
2360 Copies an integer into the given SV. Does not handle 'set' magic. See
2361 C
2362 =cut
2363 */
2364 Please try and supply some documentation if you add functions to the Perl core.
2365 !!Unicode Support
2366
2367
2368 Perl 5.6.0 introduced Unicode support. It's important for
2369 porters and XS writers to understand this
2370 support and make sure that the code they write does not
2371 corrupt Unicode data.
2372
2373
2374 __What is Unicode, anyway?__
2375
2376
2377 In the olden, less enlightened times, we all used to use
2378 ASCII . Most of us did, anyway. The big
2379 problem with ASCII is that it's American.
2380 Well, no, that's not actually the problem; the problem is
2381 that it's not particularly useful for people who don't use
2382 the Roman alphabet. What used to happen was that particular
2383 languages would stick their own alphabet in the upper range
2384 of the sequence, between 128 and 255. Of course, we then
2385 ended up with plenty of variants that weren't quite
2386 ASCII , and the whole point of it being a
2387 standard was lost.
2388
2389
2390 Worse still, if you've got a language like Chinese or
2391 Japanese that has hundreds or thousands of characters, then
2392 you really can't fit them into a mere 256, so they had to
2393 forget about ASCII altogether, and build
2394 their own systems using pairs of numbers to refer to one
2395 character.
2396
2397
2398 To fix this, some people formed Unicode, Inc. and produced a
2399 new character set containing all the characters you can
2400 possibly think of and more. There are several ways of
2401 representing these characters, and the one Perl uses is
2402 called UTF8 . UTF8 uses a
2403 variable number of bytes to represent a character, instead
2404 of just one. You can learn more about Unicode at
2405 http://www.unicode.org/
2406
2407
2408 __How can I recognise a UTF8
2409 string?__
2410
2411
2412 You can't. This is because UTF8 data is
2413 stored in bytes just like non-UTF8 data. The Unicode
2414 character 200, (0xC8 for you hex types) capital E
2415 with a grave accent, is represented by the two bytes
2416 v196.172. Unfortunately, the non-Unicode string
2417 chr(196).chr(172) has that byte sequence as well.
2418 So you can't tell just by looking - this is what makes
2419 Unicode input an interesting problem.
2420
2421
2422 The API function is_utf8_string can
2423 help; it'll tell you if a string contains only valid
2424 UTF8 characters. However, it can't do the
2425 work for you. On a character-by-character basis,
2426 is_utf8_char will tell you whether the current
2427 character in a string is valid UTF8
2428 .
2429
2430
2431 __How does UTF8 represent Unicode
2432 characters?__
2433
2434
2435 As mentioned above, UTF8 uses a variable
2436 number of bytes to store a character. Characters with values
2437 1...128 are stored in one byte, just like good ol'
2438 ASCII . Character 129 is stored as
2439 v194.129; this continues up to character 191, which
2440 is v194.191. Now we've run out of bits (191 is
2441 binary 10111111) so we move on; 192 is
2442 v195.128. And so it goes on, moving to three bytes
2443 at character 2048.
2444
2445
2446 Assuming you know you're dealing with a UTF8
2447 string, you can find out how long the first character in it
2448 is with the UTF8SKIP macro:
2449
2450
2451 char *utf =
2452 len = UTF8SKIP(utf); /* len is 2 here */
2453 utf += len;
2454 len = UTF8SKIP(utf); /* len is 3 here */
2455 Another way to skip over characters in a UTF8 string is to use utf8_hop, which takes a string and a number of characters to skip over. You're on your own about bounds checking, though, so don't use it lightly.
2456
2457
2458 All bytes in a multi-byte UTF8 character will
2459 have the high bit set, so you can test if you need to do
2460 something special with this character like
2461 this:
2462
2463
2464 UV uv;
2465 if (utf
2466 You can also see in that example that we use utf8_to_uv to get the value of the character; the inverse function uv_to_utf8 is available for putting a UV into UTF8:
2467
2468
2469 if (uv
2470 You __must__ convert characters to UVs using the above functions if you're ever in a situation where you have to match UTF8 and non-UTF8 characters. You may not skip over UTF8 characters in this case. If you do this, you'll lose the ability to match hi-bit non-UTF8 characters; for instance, if your UTF8 string contains v196.172, and you skip that character, you can never match a chr(200) in a non-UTF8 string. So don't do that!
2471
2472
2473 __How does Perl store UTF8
2474 strings?__
2475
2476
2477 Currently, Perl deals with Unicode strings and non-Unicode
2478 strings slightly differently. If a string has been
2479 identified as being UTF-8 encoded, Perl will
2480 set a flag in the SV , SVf_UTF8. You
2481 can check and manipulate this flag with the following
2482 macros:
2483
2484
2485 SvUTF8(sv)
2486 SvUTF8_on(sv)
2487 SvUTF8_off(sv)
2488 This flag has an important effect on Perl's treatment of the string: if Unicode data is not properly distinguished, regular expressions, length, substr and other string handling operations will have undesirable results.
2489
2490
2491 The problem comes when you have, for instance, a string that
2492 isn't flagged is UTF8 , and contains a byte
2493 sequence that could be UTF8 - especially when
2494 combining non-UTF8 and UTF8
2495 strings.
2496
2497
2498 Never forget that the SVf_UTF8 flag is separate to
2499 the PV value; you need be sure you don't
2500 accidentally knock it off while you're manipulating SVs.
2501 More specifically, you cannot expect to do
2502 this:
2503
2504
2505 SV *sv;
2506 SV *nsv;
2507 STRLEN len;
2508 char *p;
2509 p = SvPV(sv, len);
2510 frobnicate(p);
2511 nsv = newSVpvn(p, len);
2512 The char* string does not tell you the whole story, and you can't copy or reconstruct an SV just by copying the string value. Check if the old SV has the UTF8 flag set, and act accordingly:
2513
2514
2515 p = SvPV(sv, len);
2516 frobnicate(p);
2517 nsv = newSVpvn(p, len);
2518 if (SvUTF8(sv))
2519 SvUTF8_on(nsv);
2520 In fact, your frobnicate function should be made aware of whether or not it's dealing with UTF8 data, so that it can handle the string appropriately.
2521
2522
2523 __How do I convert a string to UTF8
2524 ?__
2525
2526
2527 If you're mixing UTF8 and non-UTF8 strings,
2528 you might find it necessary to upgrade one of the strings to
2529 UTF8 . If you've got an SV ,
2530 the easiest way to do this is:
2531
2532
2533 sv_utf8_upgrade(sv);
2534 However, you must not do this, for example:
2535
2536
2537 if (!SvUTF8(left))
2538 sv_utf8_upgrade(left);
2539 If you do this in a binary operator, you will actually change one of the strings that came into the operator, and, while it shouldn't be noticeable by the end user, it can cause problems.
2540
2541
2542 Instead, bytes_to_utf8 will give you a UTF8-encoded
2543 __copy__ of its string argument. This is useful for
2544 having the data available for comparisons and so on, without
2545 harming the original SV . There's also
2546 utf8_to_bytes to go the other way, but naturally,
2547 this will fail if the string contains any characters above
2548 255 that can't be represented in a single byte.
2549
2550
2551 __Is there anything else I need to know?__
2552
2553
2554 Not really. Just remember these things:
2555
2556
2557 There's no way to tell if a string is UTF8 or
2558 not. You can tell if an SV is
2559 UTF8 by looking at is SvUTF8 flag.
2560 Don't forget to set the flag if something should be
2561 UTF8 . Treat the flag as part of the
2562 PV , even though it's not - if you pass on
2563 the PV to somewhere, pass on the flag
2564 too.
2565
2566
2567 If a string is UTF8 , __always__ use
2568 utf8_to_uv to get at the value, unless !(*s
2569 in which case you can use
2570 *s.
2571
2572
2573 When writing to a UTF8 string, __always__
2574 use uv_to_utf8, unless uv in
2575 which case you can use *s = uv.
2576
2577
2578 Mixing UTF8 and non-UTF8 strings is tricky.
2579 Use bytes_to_utf8 to get a new string which is
2580 UTF8 encoded. There are tricks you can use to
2581 delay deciding whether you need to use a UTF8
2582 string until you get to a high character -
2583 HALF_UPGRADE is one of those.
2584 !!AUTHORS
2585
2586
2587 Until May 1997, this document was maintained by Jeff Okamoto
2588
2589
2590 With lots of help and suggestions from Dean Roehrich,
2591 Malcolm Beattie, Andreas Koenig, Paul Hudson, Ilya
2592 Zakharevich, Paul Marquess, Neil Bowers, Matthew Green, Tim
2 perry 2593 Bunce, Spider Boardman, Ulrich Pfeifer, Stephen !McCamant,
1 perry 2594 and Gurusamy Sarathy.
2595
2596
2597 API Listing originally by Dean Roehrich
2598
2599
2600 Modifications to autogenerate the API listing
2601 (perlapi) by Benjamin Stuhl.
2602 !!SEE ALSO
2603
2604
2605 perlapi(1), perlintern(1), perlxs(1),
2606 perlembed(1)
2607 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.