Penguin
Annotated edit history of perldsc(1) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 PERLDSC
2 !!!PERLDSC
3 NAME
4 DESCRIPTION
5 REFERENCES
6 COMMON MISTAKES
7 CAVEAT ON PRECEDENCE
8 WHY YOU SHOULD ALWAYS use strict
9 DEBUGGING
10 CODE EXAMPLES
11 ARRAYS OF ARRAYS
12 HASHES OF ARRAYS
13 ARRAYS OF HASHES
14 HASHES OF HASHES
15 MORE ELABORATE RECORDS
16 Database Ties
17 SEE ALSO
18 AUTHOR
19 ----
20 !!NAME
21
22
23 perldsc - Perl Data Structures Cookbook
24 !!DESCRIPTION
25
26
27 The single feature most sorely lacking in the Perl
28 programming language prior to its 5.0 release was complex
29 data structures. Even without direct language support, some
30 valiant programmers did manage to emulate them, but it was
31 hard work and not for the faint of heart. You could
32 occasionally get away with the $m{$AoA,$b} notation
33 borrowed from __awk__ in which the keys are actually more
34 like a single concatenated string
35 , but traversal and sorting were
36 difficult. More desperate programmers even hacked Perl's
37 internal symbol table directly, a strategy that proved hard
38 to develop and maintain--to put it mildly.
39
40
41 The 5.0 release of Perl let us have complex data structures.
42 You may now write something like this and all of a sudden,
43 you'd have a array with three dimensions!
44
45
46 for $x (1 .. 10) {
47 for $y (1 .. 10) {
48 for $z (1 .. 10) {
49 $AoA[[$x][[$y][[$z] =
50 $x ** $y + $z;
51 }
52 }
53 }
54 Alas, however simple this may appear, underneath it's a much more elaborate construct than meets the eye!
55
56
57 How do you print it out? Why can't you say just print
58 @AoA? How do you sort it? How can you pass it to a
59 function or get one of these back from a function? Is is an
60 object? Can you save it to disk to read back later? How do
61 you access whole rows or columns of that matrix? Do all the
62 values have to be numeric?
63
64
65 As you see, it's quite easy to become confused. While some
66 small portion of the blame for this can be attributed to the
67 reference-based implementation, it's really more due to a
68 lack of existing documentation with examples designed for
69 the beginner.
70
71
72 This document is meant to be a detailed but understandable
73 treatment of the many different sorts of data structures you
74 might want to develop. It should also serve as a cookbook of
75 examples. That way, when you need to create one of these
76 complex data structures, you can just pinch, pilfer, or
77 purloin a drop-in example from here.
78
79
80 Let's look at each of these possible constructs in detail.
81 There are separate sections on each of the
82 following:
83
84
85 arrays of arrays
86
87
88 hashes of arrays
89
90
91 arrays of hashes
92
93
94 hashes of hashes
95
96
97 more elaborate constructs
98
99
100 But for now, let's look at general issues common to all
101 these types of data structures.
102 !!REFERENCES
103
104
105 The most important thing to understand about all data
106 structures in Perl -- including multidimensional arrays--is
107 that even though they might appear otherwise, Perl
108 @ARRAYs and %HASHes are all internally
109 one-dimensional. They can hold only scalar values (meaning a
110 string, number, or a reference). They cannot directly
111 contain other arrays or hashes, but instead contain
112 ''references'' to other arrays or hashes.
113
114
115 You can't use a reference to a array or hash in quite the
116 same way that you would a real array or hash. For C or C
117 ++ programmers unused to distinguishing
118 between arrays and pointers to the same, this can be
119 confusing. If so, just think of it as the difference between
120 a structure and a pointer to a structure.
121
122
123 You can (and should) read more about references in the
124 perlref(1) man page. Briefly, references are rather
125 like pointers that know what they point to. (Objects are
126 also a kind of reference, but we won't be needing them right
127 away--if ever.) This means that when you have something
128 which looks to you like an access to a
129 two-or-more-dimensional array and/or hash, what's really
130 going on is that the base type is merely a one-dimensional
131 entity that contains references to the next level. It's just
132 that you can ''use'' it as though it were a
133 two-dimensional one. This is actually the way almost all C
134 multidimensional arrays work as well.
135
136
137 $array[[7][[12] # array of arrays
138 $array[[7]{string} # array of hashes
139 $hash{string}[[7] # hash of arrays
140 $hash{string}{'another string'} # hash of hashes
141 Now, because the top level contains only references, if you try to print out your array in with a simple ''print()'' function, you'll get something that doesn't look very nice, like this:
142
143
144 @AoA = ( [[2, 3], [[4, 5, 7], [[0] );
145 print $AoA[[1][[2];
146 7
147 print @AoA;
148 ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
149 That's because Perl doesn't (ever) implicitly dereference your variables. If you want to get at the thing a reference is referring to, then you have to do this yourself using either prefix typing indicators, like ${$blah}, @{$blah}, @{$blah[[$i]}, or else postfix pointer arrows, like $a-, $h-, or even $ob-.
150 !!COMMON MISTAKES
151
152
153 The two most common mistakes made in constructing something
154 like an array of arrays is either accidentally counting the
155 number of elements or else taking a reference to the same
156 memory location repeatedly. Here's the case where you just
157 get the count instead of a nested array:
158
159
160 for $i (1..10) {
161 @array = somefunc($i);
162 $AoA[[$i] = @array; # WRONG!
163 }
164 That's just the simple case of assigning an array to a scalar and getting its element count. If that's what you really and truly want, then you might do well to consider being a tad more explicit about it, like this:
165
166
167 for $i (1..10) {
168 @array = somefunc($i);
169 $counts[[$i] = scalar @array;
170 }
171 Here's the case of taking a reference to the same memory location again and again:
172
173
174 for $i (1..10) {
175 @array = somefunc($i);
176 $AoA[[$i] = @array; # WRONG!
177 }
178 So, what's the big problem with that? It looks right, doesn't it? After all, I just told you that you need an array of references, so by golly, you've made me one!
179
180
181 Unfortunately, while this is true, it's still broken. All
182 the references in @AoA refer to the ''very same
183 place'', and they will therefore all hold whatever was
184 last in @array! It's similar to the problem
185 demonstrated in the following C program:
186
187
188 #include
189 printf(
190 Which will print
191
192
193 daemon name is daemon
194 root name is daemon
195 The problem is that both rp and dp are pointers to the same location in memory! In C, you'd have to remember to ''malloc()'' yourself some new memory. In Perl, you'll want to use the array constructor [[] or the hash constructor {} instead. Here's the right way to do the preceding broken code fragments:
196
197
198 for $i (1..10) {
199 @array = somefunc($i);
200 $AoA[[$i] = [[ @array ];
201 }
202 The square brackets make a reference to a new array with a ''copy'' of what's in @array at the time of the assignment. This is what you want.
203
204
205 Note that this will produce something similar, but it's much
206 harder to read:
207
208
209 for $i (1..10) {
210 @array = 0 .. $i;
211 @{$AoA[[$i]} = @array;
212 }
213 Is it the same? Well, maybe so--and maybe not. The subtle difference is that when you assign something in square brackets, you know for sure it's always a brand new reference with a new ''copy'' of the data. Something else could be going on in this new case with the @{$AoA[[$i]}} dereference on the left-hand-side of the assignment. It all depends on whether $AoA[[$i] had been undefined to start with, or whether it already contained a reference. If you had already populated @AoA with references, as in
214
215
216 $AoA[[3] = @another_array;
217 Then the assignment with the indirection on the left-hand-side would use the existing reference that was already there:
218
219
220 @{$AoA[[3]} = @array;
221 Of course, this ''would'' have the ``interesting'' effect of clobbering @another_array. (Have you ever noticed how when a programmer says something is ``interesting'', that rather than meaning ``intriguing'', they're disturbingly more apt to mean that it's ``annoying'', ``difficult'', or both? :-)
222
223
224 So just remember always to use the array or hash
225 constructors with [[] or {}, and you'll be
226 fine, although it's not always optimally
227 efficient.
228
229
230 Surprisingly, the following dangerous-looking construct will
231 actually work out fine:
232
233
234 for $i (1..10) {
235 my @array = somefunc($i);
236 $AoA[[$i] = @array;
237 }
238 That's because ''my()'' is more of a run-time statement than it is a compile-time declaration ''per se''. This means that the ''my()'' variable is remade afresh each time through the loop. So even though it ''looks'' as though you stored the same variable reference each time, you actually did not! This is a subtle distinction that can produce more efficient code at the risk of misleading all but the most experienced of programmers. So I usually advise against teaching it to beginners. In fact, except for passing arguments to functions, I seldom like to see the gimme-a-reference operator (backslash) used much at all in code. Instead, I advise beginners that they (and most of the rest of us) should try to use the much more easily understood constructors [[] and {} instead of relying upon lexical (or dynamic) scoping and hidden reference-counting to do the right thing behind the scenes.
239
240
241 In summary:
242
243
244 $AoA[[$i] = [[ @array ]; # usually best
245 $AoA[[$i] = @array; # perilous; just how my() was that array?
246 @{ $AoA[[$i] } = @array; # way too tricky for most programmers
247 !!CAVEAT ON PRECEDENCE
248
249
250 Speaking of things like @{$AoA[[$i]}, the following
251 are actually the same thing:
252
253
254 $aref-
255 That's because Perl's precedence rules on its five prefix dereferencers (which look like someone swearing: $ @ * % ) make them bind more tightly than the postfix subscripting brackets or braces! This will no doubt come as a great shock to the C or C ++ programmer, who is quite accustomed to using *a[[i] to mean what's pointed to by the ''i'th'' element of a. That is, they first take the subscript, and only then dereference the thing at that subscript. That's fine in C, but this isn't C.
256
257
258 The seemingly equivalent construct in Perl,
259 $$aref[[$i] first does the deref of $aref,
260 making it take $aref as a reference to an array,
261 and then dereference that, and finally tell you the
262 ''i'th'' value of the array pointed to by $AoA.
263 If you wanted the C notion, you'd have to write
264 ${$AoA[[$i]} to force the $AoA[[$i] to get
265 evaluated first before the leading $
266 dereferencer.
267 !!WHY YOU SHOULD ALWAYS use strict
268
269
270 If this is starting to sound scarier than it's worth, relax.
271 Perl has some features to help you avoid its most common
272 pitfalls. The best way to avoid getting confused is to start
273 every program like this:
274
275
276 #!/usr/bin/perl -w
277 use strict;
278 This way, you'll be forced to declare all your variables with ''my()'' and also disallow accidental ``symbolic dereferencing''. Therefore if you'd done this:
279
280
281 my $aref = [[
282 [[
283 print $aref[[2][[2];
284 The compiler would immediately flag that as an error ''at compile time'', because you were accidentally accessing @aref, an undeclared variable, and it would thereby remind you to write instead:
285
286
287 print $aref-
288 !!DEBUGGING
289
290
291 Before version 5.002, the standard Perl debugger didn't do a
292 very nice job of printing out complex data structures. With
293 5.002 or above, the debugger includes several new features,
294 including command line editing as well as the x
295 command to dump out complex data structures. For example,
296 given the assignment to $AoA above, here's the
297 debugger output:
298
299
300 DB
301 !!CODE EXAMPLES
302
303
304 Presented with little comment (these will get their own
305 manpages someday) here are short code examples illustrating
306 access of various types of data structures.
307 !!ARRAYS OF ARRAYS
308
309
310 __Declaration of a ARRAY OF
311 ARRAYS__
312
313
314 @AoA = (
315 [[
316
317
318 __Generation of a ARRAY OF
319 ARRAYS__
320
321
322 # reading from file
323 while (
324 # calling a function
325 for $i ( 1 .. 10 ) {
326 $AoA[[$i] = [[ somefunc($i) ];
327 }
328 # using temp vars
329 for $i ( 1 .. 10 ) {
330 @tmp = somefunc($i);
331 $AoA[[$i] = [[ @tmp ];
332 }
333 # add to an existing row
334 push @{ $AoA[[0] },
335
336
337 __Access and Printing of a ARRAY OF
338 ARRAYS__
339
340
341 # one element
342 $AoA[[0][[0] =
343 # another element
344 $AoA[[1][[1] =~ s/(w)/u$1/;
345 # print the whole thing with refs
346 for $aref ( @AoA ) {
347 print
348 # print the whole thing with indices
349 for $i ( 0 .. $#AoA ) {
350 print
351 # print the whole thing one at a time
352 for $i ( 0 .. $#AoA ) {
353 for $j ( 0 .. $#{ $AoA[[$i] } ) {
354 print
355 !!HASHES OF ARRAYS
356
357
358 __Declaration of a HASH OF
359 ARRAYS__
360
361
362 %HoA = (
363 flintstones =
364
365
366 __Generation of a HASH OF
367 ARRAYS__
368
369
370 # reading from file
371 # flintstones: fred barney wilma dino
372 while (
373 # reading from file; more temps
374 # flintstones: fred barney wilma dino
375 while ( $line =
376 # calling a function that returns a list
377 for $group (
378 # likewise, but using temps
379 for $group (
380 # append new members to an existing family
381 push @{ $HoA{
382
383
384 __Access and Printing of a HASH OF
385 ARRAYS__
386
387
388 # one element
389 $HoA{flintstones}[[0] =
390 # another element
391 $HoA{simpsons}[[1] =~ s/(w)/u$1/;
392 # print the whole thing
393 foreach $family ( keys %HoA ) {
394 print
395 # print the whole thing with indices
396 foreach $family ( keys %HoA ) {
397 print
398 # print the whole thing sorted by number of members
399 foreach $family ( sort { @{$HoA{$b}}
400 # print the whole thing sorted by number of members and name
401 foreach $family ( sort {
402 @{$HoA{$b}}
403 !!ARRAYS OF HASHES
404
405
406 __Declaration of a ARRAY OF
407 HASHES__
408
409
410 @AoH = (
411 {
412 Lead =
413
414
415 __Generation of a ARRAY OF
416 HASHES__
417
418
419 # reading from file
420 # format: LEAD=fred FRIEND=barney
421 while (
422 # reading from file
423 # format: LEAD=fred FRIEND=barney
424 # no temp
425 while (
426 # calling a function that returns a key/value pair list, like
427 #
428 # likewise, but using no temp vars
429 while (
430 # add key/value to an element
431 $AoH[[0]{pet} =
432
433
434 __Access and Printing of a ARRAY OF
435 HASHES__
436
437
438 # one element
439 $AoH[[0]{lead} =
440 # another element
441 $AoH[[1]{lead} =~ s/(w)/u$1/;
442 # print the whole thing with refs
443 for $href ( @AoH ) {
444 print
445 # print the whole thing with indices
446 for $i ( 0 .. $#AoH ) {
447 print
448 # print the whole thing one at a time
449 for $i ( 0 .. $#AoH ) {
450 for $role ( keys %{ $AoH[[$i] } ) {
451 print
452 !!HASHES OF HASHES
453
454
455 __Declaration of a HASH OF
456 HASHES__
457
458
459 %HoH = (
460 flintstones =
461
462
463 __Generation of a HASH OF
464 HASHES__
465
466
467 # reading from file
468 # flintstones: lead=fred pal=barney wife=wilma pet=dino
469 while (
470 # reading from file; more temps
471 while (
472 # calling a function that returns a key,value hash
473 for $group (
474 # likewise, but using temps
475 for $group (
476 # append new members to an existing family
477 %new_folks = (
478 wife =
479 for $what (keys %new_folks) {
480 $HoH{flintstones}{$what} = $new_folks{$what};
481 }
482
483
484 __Access and Printing of a HASH OF
485 HASHES__
486
487
488 # one element
489 $HoH{flintstones}{wife} =
490 # another element
491 $HoH{simpsons}{lead} =~ s/(w)/u$1/;
492 # print the whole thing
493 foreach $family ( keys %HoH ) {
494 print
495 # print the whole thing somewhat sorted
496 foreach $family ( sort keys %HoH ) {
497 print
498 # print the whole thing sorted by number of members
499 foreach $family ( sort { keys %{$HoH{$b}}
500 # establish a sort order (rank) for each role
501 $i = 0;
502 for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
503 # now print the whole thing sorted by number of members
504 foreach $family ( sort { keys %{ $HoH{$b} }
505 !!MORE ELABORATE RECORDS
506
507
508 __Declaration of MORE ELABORATE
509 RECORDS__
510
511
512 Here's a sample showing how to create and use a record whose
513 fields are of many different sorts:
514
515
516 $rec = {
517 TEXT =
518 print $rec-
519 print $rec-
520 print $rec-
521 $answer = $rec-
522 # careful of extra block braces on fh ref
523 print { $rec-
2 perry 524 use !FileHandle;
1 perry 525 $rec-
526
527
528 __Declaration of a HASH OF COMPLEX
529 RECORDS__
530
531
532 %TV = (
533 flintstones =
534 jetsons =
535 simpsons =
536
537
538 __Generation of a HASH OF COMPLEX
539 RECORDS__
540
541
542 # reading from file
543 # this is most easily done by having the file itself be
544 # in the raw data format as shown above. perl is happy
545 # to parse complex data structures if declared as data, so
546 # sometimes it's easiest to do that
547 # here's a piece by piece build up
548 $rec = {};
549 $rec-
550 @members = ();
551 # assume this file in field=value syntax
552 while (
553 # now remember the whole thing
554 $TV{ $rec-
555 ###########################################################
556 # now, you might want to make interesting extra fields that
557 # include pointers back into the same data structure so if
558 # change one piece, it changes everywhere, like for example
559 # if you wanted a {kids} field that was a reference
560 # to an array of the kids' records without having duplicate
561 # records and thus update problems.
562 ###########################################################
563 foreach $family (keys %TV) {
564 $rec = $TV{$family}; # temp pointer
565 @kids = ();
566 for $person ( @{ $rec-
567 # you copied the array, but the array itself contains pointers
568 # to uncopied objects. this means that if you make bart get
569 # older via
570 $TV{simpsons}{kids}[[0]{age}++;
571 # then this would also change in
572 print $TV{simpsons}{members}[[2]{age};
573 # because $TV{simpsons}{kids}[[0] and $TV{simpsons}{members}[[2]
574 # both point to the same underlying anonymous hash table
575 # print the whole thing
576 foreach $family ( keys %TV ) {
577 print
578 !!Database Ties
579
580
581 You cannot easily tie a multilevel data structure (such as a
582 hash of hashes) to a dbm file. The first problem is that all
583 but GDBM and Berkeley DB have
584 size limitations, but beyond that, you also have problems
585 with how references are to be represented on disk. One
586 experimental module that does partially attempt to address
587 this need is the MLDBM module. Check your
588 nearest CPAN site as described in perlmodlib
589 for source code to MLDBM .
590 !!SEE ALSO
591
592
593 perlref(1), perllol(1), perldata(1),
594 perlobj(1)
595 !!AUTHOR
596
597
598 Tom Christiansen
599 tchrist@perl.com''''
600
601
602 Last update: Wed Oct 23 04:57:50 MET DST
603 1996
604 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.