PDL::Indexing
INDEXING(G)    User Contributed Perl Documentation    INDEXING(G)



NAME
       PDL::Indexing - how to index piddles.

DESCRIPTION
       This manpage should serve as a first tutorial on the
       indexing and threading features of PDL.

       This manpage is still in alpha development and not yet
       complete. "Meta" comments that point out deficien-
       cies/omissions of this document will be surrounded by
       square brackets ([]), e.g. [ Hopefully I will be able to
       remove this paragraph at some time in the future ]. Fur-
       thermore, it is possible that there are errors in the code
       examples. Please report any errors to Christian Soeller
       (c.soeller@auckland.ac.nz).

       Still to be done are (please bear with us and/or ask on
       the mailing list, see PDL::FAQ):

       o    document perl level threading

       o    threadids

       o    update and correct description of slice

       o    new functions in slice.pd (affine, lag, splitdim)

       o    reworking of paragraph on explicit threading

Indexing and threading with PDL
       A lot of the flexibility and power of PDL relies on the
       indexing and looping features of the perl extension.
       Indexing allows access to the data of a pdl object in a
       very flexible way. Threading provides efficient implicit
       looping functionality (since the loops are implemented as
       optimized C code).

       Pdl objects (later often called "pdls") are perl objects
       that represent multidimensional arrays and operations on
       those. In contrast to simple perl @x style lists the array
       data is compactly stored in a single block of memory thus
       taking up a lot less memory and enabling use of fast C
       code to implement operations (e.g. addition, etc) on pdls.

       pdls can have children

       Central to many of the indexing capabilities of PDL are
       the relation of "parent" and "child" between pdls. Many of
       the indexing commands create a new pdl from an existing
       pdl. The new pdl is the "child" and the old one is the
       "parent". The data of the new pdl is defined by a trans-
       formation that specifies how to generate (compute) its
       data from the parent's data. The relation between the
       child pdl and its parent are often bidirectional, meaning
       that changes in the child's data are propagated back to
       the parent. (Note: You see, we are aiming in our terminol-
       ogy already towards the new dataflow features. The kind of
       dataflow that is used by the indexing commands (about
       which you will learn in a minute) is always in operation,
       not only when you have explicitly switched on dataflow in
       your pdl by saying "$a->doflow". For further information
       about data flow check the dataflow manpage.)

       Another way to interpret the pdls created by our indexing
       commands is to view them as a kind of intelligent pointer
       that points back to some portion or all of its parent's
       data. Therefore, it is not surprising that the parent's
       data (or a portion of it) changes when manipulated through
       this "pointer". After these introductory remarks that
       hopefully prepared you for what is coming (rather than
       confuse you too much) we are going to dive right in and
       start with a description of the indexing commands and some
       typical examples how they might be used in PDL programs.
       We will further illustrate the pointer/dataflow analogies
       in the context of some of the examples later on.

       There are two different implementations of this ``smart
       pointer'' relationship: the first one, which is a little
       slower but works for any transformation is simply to do
       the transformation forwards and backwards as necessary.
       The other is to consider the child piddle a ``virtual''
       piddle, which only stores a pointer to the parent and
       access information so that routines which use the child
       piddle actually directly access the data in the parent.
       If the virtual piddle is given to a routine which cannot
       use it, PDL transparently physicalizes the virtual piddle
       before letting the routine use it.

       Currently (1.94_01) all transformations which are
       ``affine'', i.e. the indices of the data item in the par-
       ent piddle are determined by a linear transformation (+
       constant) from the indices of the child piddle result in
       virtual piddles. All other indexing routines (e.g.
       "->index(...)") result in physical piddles.  All routines
       compiled by PP can accept affine piddles (except those
       routines that pass pointers to external library func-
       tions).

       Note that whether something is affine or not does not
       affect the semantics of what you do in any way: both

        $a->index(...) .= 5;
        $a->slice(...) .= 5;

       change the data in $a. The affinity does, however, have a
       significant impact on memory usage and performance.

       Slicing pdls

       Probably the most important application of the concept of
       parent/child pdls is the representation of rectangular
       slices of a physical pdl by a virtual pdl. Having talked
       long enough about concepts let's get more specific. Sup-
       pose we are working with a 2D pdl representing a 5x5 image
       (its unusually small so that we can print it without fill-
       ing several screens full of digits ;).

        perldl> $im = sequence(5,5)
        perldl> p $im

        [
         [ 0  1  2  3  4]
         [ 5  6  7  8  9]
         [10 11 12 13 14]
         [15 16 17 18 19]
         [20 21 22 23 24]
        ]

        perldl> help vars
        PDL variables in package main::

        Name         Type   Dimension       Flow  State          Mem
        ----------------------------------------------------------------
        $im          Double D [5,5]                P            0.20Kb

       [ here it might be appropriate to quickly talk about the
       "help vars" command that provides information about pdls
       in the interactive "perldl" shell that comes with pdl.  ]

       Now suppose we want to create a 1-D pdl that just refer-
       ences one line of the image, say line 2; or a pdl that
       represents all even lines of the image (imagine we have to
       deal with even and odd frames of an interlaced image due
       to some peculiar behaviour of our frame grabber). As
       another frequent application of slices we might want to
       create a pdl that represents a rectangular region of the
       image with top and bottom reversed. All these effects (and
       many more) can be easily achieved with the powerful slice
       function:

        perldl> $line = $im->slice(':,(2)')
        perldl> $even = $im->slice(':,1:-1:2')
        perldl> $area = $im->slice('3:4,3:1')
        perldl> help vars  # or just PDL->vars
        PDL variables in package main::

        Name         Type   Dimension       Flow  State          Mem
        ----------------------------------------------------------------
        $even        Double D [5,2]                -C           0.00Kb
        $im          Double D [5,5]                P            0.20Kb
        $line        Double D [5]                  -C           0.00Kb
        $area        Double D [2,3]                -C           0.00Kb

       All three "child" pdls are children of $im or in the other
       (largely equivalent) interpretation pointers to data of
       $im.  Operations on those virtual pdls access only those
       portions of the data as specified by the argument to
       slice. So we can just print line 2:

        perldl> p $line
        [10 11 12 13 14]

       Also note the difference in the "Flow State" of $area
       above and below:

        perldl> p $area
        perldl> help $area
        This variable is Double D [2,3]                VC           0.00Kb

       The following demonstrates that $im and $line really
       behave as you would exspect from a pointer-like object (or
       in the dataflow picture: the changes in $line's data are
       propagated back to $im):

        perldl> $im++
        perldl> p $line
        [11 12 13 14 15]
        perldl> $line += 2
        perldl> p $im





        [
         [ 1  2  3  4  5]
         [ 6  7  8  9 10]
         [13 14 15 16 17]
         [16 17 18 19 20]
         [21 22 23 24 25]
        ]

       Note how assignment operations on the child virtual pdls
       change the parent physical pdl and vice versa (however,
       the basic "=" assignment doesn't, use ".=" to obtain that
       effect. See below for the reasons).  The virtual child
       pdls are something like "live links" to the "original"
       parent pdl. As previously said, they can be thought of to
       work similiar to a C-pointer. But in contrast to a
       C-pointer they carry a lot more information. Firstly, they
       specify the structure of the data they represent (the
       dimensionality of the new pdl) and secondly, specify how
       to create this structure from its parents data (the way
       this works is buried in the internals of PDL and not
       important for you to know anyway (unless you want to hack
       the core in the future or would like to become a PDL guru
       in general (for a definition of this strange creature see
       PDL::Internals)).

       The previous examples have demonstrated typical usage of
       the slice function. Since the slicing functionality is so
       important here is an explanation of the syntax for the
       string argument to slice:

        $vpdl = $a->slice('ind0,ind1...')

       where "ind0" specifies what to do with index No 0 of the
       pdl $a, etc. Each element of the comma separated list can
       have one of the following forms:

       ':'   Use the whole dimension

       'n'   Use only index "n". The dimension of this index in
             the resulting virtual pdl is 1. An example involving
             those first two index formats:

              perldl> $column = $im->slice('2,:')
              perldl> $row = $im->slice(':,0')
              perldl> p $column

              [
               [ 3]
               [ 8]
               [15]
               [18]
               [23]
              ]

              perldl> p $row

              [
               [1 2 3 4 5]
              ]

              perldl> help $column
              This variable is Double D [1,5]                VC           0.00Kb

              perldl> help $row
              This variable is Double D [5,1]                VC           0.00Kb


       '(n)' Use only index "n". This dimension is removed from
             the resulting pdl (relying on the fact that a dimen-
             sion of size 1 can always be removed). The distinc-
             tion between this case and the previous one becomes
             important in assignments where left and right hand
             side have to have appropriate dimensions.

              perldl> $line = $im->slice(':,(0)')
              perldl> help $line
              This variable is Double D [5]                  -C           0.00Kb

              perldl> p $line
              [1 2 3 4 5]

             Spot the difference to the previous example?

       'n1:n2' or 'n1:n2:n3'
             Take the range of indices from "n1" to "n2" or (sec-
             ond form) take the range of indices from "n1" to
             "n2" with step "n3". An example for the use of this
             format is the previous definition of the subimage
             composed of even lines.

              perldl> $even = $im->slice(':,1:-1:2')

             This example also demonstrates that negative indices
             work like they do for normal perl style arrays by
             counting backwards from the end of the dimension. If
             "n2" is smaller than "n1" (in the example -1 is
             equivalent to index 4) the elements in the virtual
             pdl are effectively reverted with respect to its
             parent.

       '*[n]'
             Add a dummy dimension. The size of this dimension
             will be 1 by default or equal to "n" if the optional
             numerical argument is given.

             Now, this is really something a bit strange on first
             sight. What is a dummy dimension? A dummy dimension
             inserts a dimension where there wasn't one before.
             How is that done ? Well, in the case of the new
             dimension having size 1 it can be easily explained
             by the way in which you can identify a vector (with
             "m" elements) with an "(1,m)" or "(m,1)" matrix. The
             same holds obviously for higher dimensional objects.
             More interesting is the case of a dummy dimensions
             of size greater than one (e.g. "slice('*5,:')").
             This works in the same way as a call to the dummy
             function creates a new dummy dimension.  So read on
             and check its explanation below.

       '([n1:n2[:n3]]=i)'
             [Not yet implemented ??????]  With an argument like
             this you make generalised diagonals. The diagonal
             will be dimension no. "i" of the new output pdl and
             (if optional part in brackets specified) will extend
             along the range of indices specified of the respec-
             tive parent pdl's dimension. In general an argument
             like this only makes sense if there are other argu-
             ments like this in the same call to slice. The part
             in brackets is optional for this type of argument.
             All arguments of this type that specify the same
             target dimension "i" have to relate to the same num-
             ber of indices in their parent dimension. The best
             way to explain it is probably to give an example,
             here we make a pdl that refers to the elements along
             the space diagonal of its parent pdl (a cube):

              $cube = zeroes(5,5,5);
              $sdiag = $cube->slice('(=0),(=0),(=0)');

             The above command creates a virtual pdl that repre-
             sents the diagonal along the parents' dimension no.
             0, 1 and 2 and makes its dimension 0 (the only
             dimension) of it. You use the extended syntax if the
             dimension sizes of the parent dimensions you want to
             build the diagonal from have different sizes or you
             want to reverse the sequence of elements in the
             diagonal, e.g.

              $rect = zeroes(12,3,5,6,2);
              $vpdl = $rect->slice('2:7,(0:1=1),(4),(5:4=1),(=1)');

             So the elements of $vpdl will then be related to
             those of its parent in way we can express as:

               vpdl(i,j) = rect(i+2,j,4,5-j,j)       0<=i<5, 0<=j<2


       [ work in the new index function: "$b = $a->index($c);"
       ???? ]

       There are different kinds of assignments in PDL

       The previous examples have already shown that virtual pdls
       can be used to operate on or access portions of data of a
       parent pdl. They can also be used as lvalues in assign-
       ments (as the use of "++" in some of the examples above
       has already demonstrated). For explicit assignments to the
       data represented by a virtual pdl you have to use the
       overloaded ".=" operator (which in this context we call
       propagated assignment). Why can't you use the normal
       assignment operator "="?

       Well, you definitely still can use the '=' operator but it
       wouldn't do what you want. This is due to the fact that
       the '=' operator cannot be overloaded in the same way as
       other assignment operators. If we tried to use '=' to try
       to assign data to a portion of a physical pdl through a
       virtual pdl we wouldn't achieve the desired effect
       (instead the variable representing the virtual pdl (a ref-
       erence to a blessed thingy) would after the assignment
       just contain the reference to another blessed thingy which
       would behave to future assignments as a "physical" copy of
       the original rvalue [this is actually not yet clear and
       subject of discussions in the PDL developers mailing
       list]. In that sense it would break the connection of the
       pdl to the parent [ isn't this behaviour in a sense the
       opposite of what happens in dataflow, where ".=" breaks
       the connection to the parent? ].

       E.g.

        perldl> $line = $im->slice(':,(2)')
        perldl> $line = zeroes(s);
        perldl> $line++;
        perldl> p $im



        [
         [ 1  2  3  4  5]
         [ 6  7  8  9 10]
         [13 14 15 16 17]
         [16 17 18 19 20]
         [21 22 23 24 25]
        ]

        perldl> p $line
        [1 1 1 1 1]

       But using ".="

        perldl> $line = $im->slice(':,(2)')
        perldl> $line .= zeroes(s)
        perldl> $line++
        perldl> p $im

        [
         [ 1  2  3  4  5]
         [ 6  7  8  9 10]
         [ 1  1  1  1  1]
         [16 17 18 19 20]
         [21 22 23 24 25]
        ]

        perldl> print $line
        [1 1 1 1 1]

       Also, you can substitute

        perldl> $line .= 0;

       for the assignment above (the zero is converted to a
       scalar piddle, with no dimensions so it can be assigned to
       any piddle).

       Related to the assignment feature is a little trap for the
       unwary: since perl currently does not allow subroutines to
       return lvalues the following shortcut of the above is
       flagged as a compile time error:

        perldl> $im->slice(':,(2)') .= zeroes(s)->xvals->float

       instead you have to say something like

        perldl> ($pdl = $im->slice(':,(2)')) .= zeroes(s)->xvals->float

       We hope that future versions of perl will allow the sim-
       pler syntax (i.e. allow subroutines to return lvalues).
       [Note: perl v5.6.0 does allow this, but it is an experi-
       mental feature. However, early reports suggest it works in
       simple situations]

       Note that there can be a problem with assignments like
       this when lvalue and rvalue pdls refer to overlapping por-
       tions of data in the parent pdl:

        # revert the elements of the first line of $a
        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)');

       Currently, the parent data on the right side of the
       assignments is not copied before the (internal) assignment
       loop proceeds. Therefore, the outcome of this assignment
       will depend on the sequence in which elements are assigned
       and almost certainly not do what you wanted.  So the
       semantics are currently undefined for now and liable to
       change anytime. To obtain the desired behaviour, use

        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->copy;

       which makes a physical copy of the slice or

        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->sever;

       which returns the same slice but severs the connection of
       the slice to its parent.

       Other functions that manipulate dimensions

       Having talked extensively about the slice function it
       should be noted that this is not the only PDL indexing
       function. There are additional indexing functions which
       are also useful (especially in the context of threading
       which we will talk about later). Here are a list and some
       examples how to use them.

       "dummy"
           inserts a dummy dimension of the size you specify
           (default 1) at the chosen location. You can't wait to
           hear how that is achieved?  Well, all elements with
           index "(X,x,Y)" ("0<=x<size_of_dummy_dim") just map to
           the element with index "(X,Y)" of the parent pdl
           (where "X" and "Y" refer to the group of indices
           before and after the location where the dummy dimen-
           sion was inserted.)

           This example calculates the x coordinate of the cen-
           troid of an image (later we will learn that we didn't
           actually need the dummy dimension thanks to the magic
           of implicit threading; but using dummy dimensions the
           code would also work in a threadless world; though
           once you have worked with PDL threads you wouldn't
           want to live without them again).

            # centroid
            ($xd,$yd) = $im->dims;
            $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);

           Let's explain how that works in a little more detail.
           First, the product:

            $xvs = xvals(zeroes($xd));
            print $xvs->dummy(1,$yd);      # repeat the line $yd times
            $prod = $im*xvs->dummy(1,$yd); # form the pixelwise product with
                                           # the repeated line of x-values

           The rest is then summing the results of the pixelwise
           product together and normalising with the sum of all
           pixel values in the original image thereby calculating
           the x-coordinate of the "center of mass" of the image
           (interpreting pixel values as local mass) which is
           known as the centroid of an image.

           Next is a (from the point of view of memory consump-
           tion) very cheap conversion from greyscale to RGB,
           i.e. every pixel holds now a triple of values instead
           of a scalar. The three values in the triple are, for-
           tunately, all the same for a grey image, so that our
           trick works well in that it maps all the three members
           of the triple to the same source element:

            # a cheap greyscale to RGB conversion
            $rgb = $grey->dummy(0,3)

           Unfortunately this trick cannot be used to convert
           your old B/W photos to color ones in the way you'd
           like. :(

           Note that the memory usage of piddles with dummy
           dimensions is especially sensitive to the internal
           representation. If the piddle can be represented as a
           virtual affine (``vaffine'') piddle, only the control
           structures are stored. But if $b in

            $a = zeroes(s);
            $b = $a->dummy(1,10000);

           is made physical by some routine, you will find that
           the memory usage of your program has suddenly grown by
           100Mb.

       "diagonal"
           replaces two dimensions (which have to be of equal
           size) by one dimension that references all the ele-
           ments along the "diagonal" along those two dimensions.
           Here, we have two examples which should appear famil-
           iar to anyone who has ever done some linear algebra.
           Firstly, make a unity matrix:

            # unity matrix
            $e = zeroes(float, 3, 3); # make everything zero
            ($tmp = $e->diagonal(0,1)) .= 1; # set the elements along the diagonal to 1
            print $e;

           Or the other diagonal:

            ($tmp = $e->slice(':-1:0')->diagonal(0,1)) .= 2;
            print $e;

           (Did you notice how we used the slice function to
           revert the sequence of lines before setting the diago-
           nal of the new child, thereby setting the cross diago-
           nal of the parent ?)  Or a mapping from the space of
           diagonal matrices to the field over which the matrices
           are defined, the trace of a matrix:

            # trace of a matrix
            $trace = sum($mat->diagonal(0,1));  # sum all the diagonal elements


       "xchg" and "mv"
           xchg exchanges or "transposes" the two  specified
           dimensions.  A straightforward example:

            # transpose a matrix (without explicitly reshuffling data and
            # making a copy)
            $prod = $a x $a->xchg(0,1);

           $prod should now be pretty close to the unity matrix
           if $a is an orthogonal matrix. Often "xchg" will be
           used in the context of threading but more about that
           later.

           mv works in a similar fashion. It moves a dimension
           (specified by its number in the parent) to a new posi-
           tion in the new child pdl:

            $b = $a->mv(4,0);  # make the 5th dimension of $a the first in the
                               # new child $b

           The difference between "xchg" and "mv" is that "xchg"
           only changes the position of two dimensions with each
           other, whereas "mv" inserts the first dimension to the
           place of second, moving the other dimensions around
           accordingly.

       "clump"
           collapses several dimensions into one. Its only argu-
           ment specifies how many dimensions of the source pdl
           should be collapsed (starting from the first). An
           (admittedly unrealistic) example is a 3D pdl which
           holds data from a stack of image files that you have
           just read in. However, the data from each image really
           represents a 1D time series and has only been arranged
           that way because it was digitized with a frame grab-
           ber. So to have it again as an array of time sequences
           you say

            perldl> $seqs = $stack->clump(p)
            perldl> help vars
            PDL variables in package main::

            Name         Type   Dimension       Flow  State          Mem
            ----------------------------------------------------------------
            $seqs        Double D [8000,50]            -C           0.00Kb
            $stack       Double D [100,80,50]          P            3.05Mb

           Unrealistic as it may seem, our confocal microscope
           software writes data (sometimes) this way. But more
           often you use clump to achieve a certain effect when
           using implicit or explicit threading.

       Calls to indexing functions can be chained

       As you might have noticed in some of the examples above
       calls to the indexing functions can be nicely chained
       since all of these functions return a newly created child
       object. However, when doing extensive index manipulations
       in a chain be sure to keep track of what you are doing,
       e.g.

        $a->xchg(0,1)->mv(0,4)

       moves the dimension 1 of $a to position 4 since when the
       second command is executed the original dimension 1 has
       been moved to position 0 of the new child that calls the
       "mv" function. I think you get the idea (in spite of my
       convoluted explanations).

       Propagated assignments ('.=') and dummy dimensions

       A sublety related to indexing is the assignment to pdls
       containing dummy dimensions of size greater than 1. These
       assignments (using ".=") are forbidden since several ele-
       ments of the lvalue pdl point to the same element of the
       parent. As a consequence the value of those parent ele-
       ments are potentially ambiguous and would depend on the
       sequence in which the implementation makes the assignments
       to elements. Therefore, an assignment like this:

        $a = pdl [1,2,3];
        $b = $a->dummy(1,4);
        $b .= yvals(zeroes(3,4));

       can produce unexpected results and the results are explic-
       itly undefined by PDL because when PDL gets parallel com-
       puting features, the current result may well change.

       From the point of view of dataflow the introduction of
       greater-size-than-one dummy dimensions is regarded as an
       irreversible transformation (similar to the terminology in
       thermodynamics) which precludes backward propagation of
       assignment to a parent (which you had explicitly requested
       using the ".=" assignment). A similar problem to watch out
       for occurs in the context of threading where sometimes
       dummy dimensions are created implicitly during the thread
       loop (see below).

       Reasons for the parent/child (or "pointer") concept

       [ this will have to wait a bit ]

        XXXXX being memory efficient
        XXXXX in the context of threading
        XXXXX very flexible and powerful way of accessing portions of pdl data
              (in much more general way than sec, etc allow)
        XXXXX efficient implementation
        XXXXX difference to section/at, etc.


       How to make things physical again

       [ XXXXX fill in later when everything has settled a bit
       more ]

        ** When needed (xsub routine interfacing C lib function)
        ** How achieved (->physical)
        ** How to test (isphysical (explain how it works currently))
        ** ->copy and ->sever


Threading
       In the previous paragraph on indexing we have already men-
       tioned the term occasionally but now its really time to
       talk explicitly about "threading" with pdls. The term
       threading has many different meanings in different fields
       of computing. Within the framework of PDL it could proba-
       bly be loosely defined as an implicit looping facility. It
       is implicit because you don't specify anything like
       enclosing for-loops but rather the loops are automatically
       (or 'magically') generated by PDL based on the dimensions
       of the pdls involved. This should give you a first idea
       why the index/dimension manipulating functions you have
       met in the previous paragraphs are especially important
       and useful in the context of threading.  The other ingre-
       dient for threading (apart from the pdls involved) is a
       function that is threading aware (generally, these are
       PDL::PP compiled functions) and that the pdls are
       "threaded" over.  So much about the terminology and now
       let's try to shed some light on what it all means.

       Implicit threading - a first example

       There are two slightly different variants of threading. We
       start with what we call "implicit threading". Let's pick a
       practical example that involves looping of a function over
       many elements of a pdl. Suppose we have an RGB image that
       we want to convert to greyscale. The RGB image is repre-
       sented by a 3-dim pdl "im(3,x,y)" where the first
       dimension contains the three color components of each
       pixel and "x" and "y" are width and height of the image,
       respectively. Next we need to specify how to convert a
       color-triple at a given pixel into a greyvalue (to be a
       realistic example it should represent the relative inten-
       sity with which our color insensitive eye cells would
       detect that color to achieve what we would call a natural
       conversion from color to greyscale). An approximation that
       works quite well is to compute the grey intensity from
       each RGB triplet (r,g,b) as a weighted sum

        greyvalue = 77/256*r + 150/256*g + 29/256*b =
            inner([77,150,29]/256, [r,g,b])

       where the last form indicates that we can write this as an
       inner product of the 3-vector comprising the weights for
       red, green and blue components with the 3-vector contain-
       ing the color components. Traditionally, we might have
       written a function like the following to process the whole
       image:

        my @dims=$im->dims;
        # here normally check that first dim has correct size (3), etc
        $grey=zeroes(@dims[1,2]);   # make the pdl for the resulting grey image
        $w = pdl [77,150,29] / 256; # the vector of weights
        for ($j=0;$j<dims[2];$j++) {
           for ($i=0;$i<dims[1];$i++) {
               # compute the pixel value
               $tmp = inner($w,$im->slice(':,(i),(j)'));
               set($grey,$i,$j,$tmp); # and set it in the greyscale image
           }
        }

       Now we write the same using threading (noting that "inner"
       is a threading aware function defined in the PDL::Primi-
       tive package)

        $grey = inner($im,pdl([77,150,29]/256));

       We have ended up with a one-liner that automatically cre-
       ates the pdl $grey with the right number and size of
       dimensions and performs the loops automatically (these
       loops are implemented as fast C code in the internals of
       PDL).  Well, we still owe you an explanation how this
       'magic' is achieved.

       How does the example work ?

       The first thing to note is that every function that is
       threading aware (these are without exception functions
       compiled from concise descriptions by PDL::PP, later just
       called PP-functions) expects a defined (minimum) number of
       dimensions (we call them core dimensions) from each of its
       pdl arguments. The inner function expects two one-dimen-
       sional (input) parameters from which it calculates a zero-
       dimensional (output) parameter. We write that symbolically
       as "inner((n),(n),[o]())" and call it "inner"'s signature,
       where n represents the size of that dimension. n being
       equal in the first and second parameter means that those
       dimensions have to be of equal size in any call. As a dif-
       ferent example take the outer product which takes two 1D
       vectors to generate a 2D matrix, symbolically written as
       "outer((n),(m),[o](n,m))". The "[o]" in both examples
       indicates that this (here third) argument is an output
       argument. In the latter example the dimensions of first
       and second argument don't have to agree but you see how
       they determine the size of the two dimensions of the out-
       put pdl.

       Here is the point when threading finally enters the game.
       If you call PP-functions with pdls that have more than the
       required core dimensions the first dimensions of the pdl
       arguments are used as the core dimensions and the addi-
       tional extra dimensions are threaded over. Let us demon-
       strate this first with our example above

        $grey = inner($im,$w); # w is the weight vector from above

       In this case $w is 1D and so supplied just the core dimen-
       sion, $im is 3D, more specifically "(3,x,y)". The first
       dimension (of size 3) is the required core dimension that
       matches (as required by inner) the first (and only) dimen-
       sion of $w. The second dimension is the first thread
       dimension (of size "x") and the third is here the second
       thread dimension (of size "y"). The output pdl is automat-
       ically created (as requested by setting $grey to "null"
       prior to invocation). The output dimensions are obtained
       by appending the loop dimensions (here "(x,y)") to the
       core output dimensions (here 0D) to yield the final dimen-
       sions of the autocreated pdl (here "0D+2D=2D" to yield a
       2D output of size "(x,y)").

       So the above command calls the core functioniality that
       computes the inner product of two 1D vectors "x*y" times
       with $w and all 1D slices of the form "(':,(i),(j)')" of
       $im and sets the respective elements of the output pdl
       "$grey(i,j)" to the result of each computation. We could
       write that symbolically as

        $grey(0,0) = f($w,$im(:,(0),(0)))
        $grey(1,0) = f($w,$im(:,(1),(0)))
            .
            .
            .
        $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
        $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))

       But this is done automatically by PDL without writing any
       explicit perl loops.  We see that the command really cre-
       ates an output pdl with the right dimensions and sets the
       elements indeed to the result of the computation for each
       pixel of the input image.

       When even more pdls and extra dimensions are involved
       things get a bit more complicated. We will first give the
       general rules how the thread dimensions depend on the
       dimensions of input pdls enabling you to figure out the
       dimensionality of an autocreated output pdl (for any given
       set of input pdls and core dimensions of the PP-function
       in question). The general rules will most likely appear a
       bit confusing on first sight so that we'll set out to
       illustrate the usage with a set of further examples (which
       will hopefully also demonstrate that there are indeed many
       practical situations where threading comes in extremly
       handy).

       A call for coding discipline

       Before we point out the other technical details of thread-
       ing, please note this call for programming discipline when
       using threading:

       In order to preserve human readability, PLEASE comment any
       nontrivial expression in your code involving threading.
       Most importantly, for any subroutine, include information
       at the beginning about what you expect the dimensions to
       represent (or ranges of dimensions).

       As a warning, look at this undocumented function and try
       to guess what might be going on:

        sub lookup {
          my ($im,$palette) = @_;
          my $res;
          index($palette->xchg(0,1),
                     $im->long->dummy(0,($palette->dim)[0]),
                     ($res=null));
          return $res;
        }

       Would you agree that it might be difficult to figure out
       expected dimensions, purpose of the routine, etc ?  (If
       you want to find out what this piece of code does, see
       below)

       How to figure out the loop dimensions

       There are a couple of rules that allow you to figure out
       number and size of loop dimensions (and if the size of
       your input pdls comply with the threading rules). Dimen-
       sions of any pdl argument are broken down into two groups
       in the following: Core dimensions (as defined by the
       PP-function, see Appendix B for a list of PDL primitives)
       and extra dimensions which comprises all remaining dimen-
       sions of that pdl. For example calling a function "func"
       with the signature "func((n,m),[o](n))" with a pdl
       "a(2,4,7,1,3)" as "f($a,($o = null))" results in the
       semantic splitting of a's dimensions into: core dimensions
       "(2,4)" and extra dimensions "(7,1,3)".

       R0    Core dimensions are identified with the first N
             dimensions of the respective pdl argument (and are
             required). Any further dimensions are extra dimen-
             sions and used to determine the loop dimensions.

       R1    The number of (implicit) loop dimensions is equal to
             the maximal number of extra dimensions taken over
             the set of pdl arguments.

       R2    The size of each of the loop dimensions is derived
             from the size of the respective dimensions of the
             pdl arguments. The size of a loop dimension is given
             by the maximal size found in any of the pdls having
             this extra dimension.

       R3    For all pdls that have a given extra dimension the
             size must be equal to the size of the loop dimension
             (as determined by the previous rule) or 1; otherwise
             you raise a runtime exception. If the size of the
             extra dimension in a pdl is one it is implicitly
             treated as a dummy dimension of size equal to that
             loop dim size when performing the thread loop.

       R4    If a pdl doesn't have a loop dimension, in the
             thread loop this pdl is treated as if having a dummy
             dimension of size equal to the size of that loop
             dimension.

       R5    If output autocreation is used (by setting the rele-
             vant pdl to "PDL->null" before invocation) the num-
             ber of dimensions of the created pdl is equal to the
             sum of the number of core output dimensions + number
             of loop dimensions. The size of the core output
             dimensions is derived from the relevant dimension of
             input pdls (as specified in the function definition)
             and the sizes of the other dimensions are equal to
             the size of the loop dimension it is derived from.
             The automatically created pdl will be physical
             (unless dataflow is in operation).

       In this context, note that you can run into the problem
       with assignment to pdls containing greater-than-one dummy
       dimensions (see above).  Although your output pdl(l)
       didn't contain any dummy dimensions in the first place
       they may end up with implicitly created dummy dimensions
       according to R4.

       As an example, suppose we have a (here unspecified) PP-
       function with the signature:

        func((m,n),(m,n,o),(m),[o](m,o))

       and you call it with 3 pdls "a(5,3,10,11)",
       "b(5,3,2,10,1,12)", and "c(5,1,11,12)" as

        func($a,$b,$c,($d=null))

       then the number of loop dimensions is 3 (by "R0+R1" from
       $b and $c) with sizes "(10,11,12)" (by R2); the two output
       core dimensions are "(5,2)" (from the signature of func)
       resulting in a 5-dimensional output pdl $c of size
       "(5,2,10,11,12)" (see R5) and (the automatically created)
       $d is derived from "($a,$b,$c)" in a way that can be
       expressed in pdl pseudo-code as

        $d(:,:,i,j,k) .= func($a(:,:,i,j),$b(:,:,:,i,0,k),$c(:,0,j,k))
           with 0<=i<10, 0<=j<=11, 0<=k<12

       If we analyze the color to greyscale conversion again with
       these rules in mind we note another great advantage of
       implicit threading.  We can call the conversion with a pdl
       representing a pixel (im(m)), a line of rgb pixels
       ("im(3,x)"), a proper color image ("im(3,x,y)") or a whole
       stack of RGB images ("im(3,x,y,z)"). As long as $im is of
       the form "(3,...)" the automatically created output pdl
       will contain the right number of dimensions and contain
       the intensity data as we exspect it since the loops have
       been implicitly performed thanks to implicit threading.
       You can easily convince yourself that calling with a color
       pixel $grey is 0D, with a line it turns out 1D grey(y),
       with an image we get "grey(x,y)" and finally we get a con-
       verted image stack "grey(x,y,z)".

       Let's fill these general rules with some more life by
       going through a couple of further examples. The reader may
       try to figure out equivalent formulations with explicit
       for-looping and compare the flexibility of those routines
       using implicit threading to the explicit formulation. Fur-
       thermore, especially when using several thread dimensions
       it is a useful exercise to check the relative speed by
       doing some benchmark tests (which we still have to do).

       First in the row is a slightly reworked centroid example,
       now coded with threading in mind.

        # threaded mult to calculate centroid coords, works for stacks as well
        $xc = sumover(($im*xvals(($im->dims)[0]))->clump(p)) /
              sumover($im->clump(p));

       Let's analyse what's going on step by step. First the
       product:

        $prod = $im*xvals(zeroes(($im->dims)[0]))

       This will actually work for $im being one, two, three, and
       higher dimensional. If $im is one-dimensional it's just an
       ordinary product (in the sense that every element of $im
       is multiplied with the respective element of
       "xvals(...)"), if $im has more dimensions further thread-
       ing is done by adding appropriate dummy dimensions to
       "xvals(...)"  according to R4.  More importantly, the two
       sumover operations show a first example of how to make use
       of the dimension manipulating commands. A quick look at
       sumover's signature will remind you that it will only
       "gobble up" the first dimension of a given input pdl. But
       what if we want to really compute the sum over all ele-
       ments of the first two dimensions? Well, nothing keeps us
       from passing a virtual pdl into sumover which in this case
       is formed by clumping the first two dimensions of the
       "parent pdl" into one. From the point of view of the par-
       ent pdl the sum is now computed over the first two dimen-
       sions, just as we wanted, though sumover has just done the
       job as specified by its signature. Got it ?

       Another little finesse of writing the code like that: we
       intentionally used "sumover($pdl->clump(p))" instead of
       "sum($pdl)" so that we can either pass just an image
       "(x,y)" or a stack of images "(x,y,t)" into this routine
       and get either just one x-coordiante or a vector of
       x-coordinates (of size t) in return.

       Another set of common operations are what one could call
       "projection operations". These operations take a N-D pdl
       as input and return a (N-1)-D "projected" pdl. These oper-
       ations are often performed with functions like sumover,
       prodover, minimum and maximum.  Using again images as
       examples we might want to calculate the maximum pixel
       value for each line of an image or image stack. We know
       how to do that

        # maxima of lines (as function of line number and time)
        maximum($stack,($ret=null));

       But what if you want to calculate maxima per column when
       implicit threading always applies the core functionality
       to the first dimension and threads over all others? How
       can we achieve that instead the core functionality is
       applied to the second dimension and threading is done over
       the others. Can you guess it? Yes, we make a virtual pdl
       that has the second dimension of the "parent pdl" as its
       first dimension using the "mv" command.

        # maxima of columns (as function of column number and time)
        maximum($stack->mv(0,1),($ret=null));

       and calculating all the sums of sub-slices over the third
       dimension is now almost too easy

        # sums of pixles in time (assuming time is the third dim)
        sumover($stack->mv(0,2),($ret=null));

       Finally, if you want to apply the operation to all ele-
       ments (like max over all elements or sum over all ele-
       ments) regardless of the dimensions of the pdl in question
       "clump" comes in handy. As an example look at the defini-
       tion of "sum" (as defined in "Basic.pm"):

        sub sum {
          PDL::Primitive::sumover($name->clump(-1),($tmp=null));
          return $tmp->at(); # return a perl number, not a 0D pdl
        }

       We have already mentioned that all basic operations sup-
       port threading and assignment is no exception. So here are
       a couple of threaded assignments

        perldl> $im = zeroes(byte, 10,20)
        perldl> $line = exp(-rvals(s)**2/9)
        # threaded assignment
        perldl> $im .= $line      # set every line of $im to $line
        perldl> $im2 .= 5         # set every element of $im2 to 5

       By now you probably see how it works and what it does,
       don't you?

       To finish the examples in this paragraph here is a func-
       tion to create an RGB image from what is called a palette
       image. The palette image consists of two parts: an image
       of indices into a color lookup table and the color lookup
       table itself. [ describe how it works ] We are going to
       use a PP-function we haven't encoutered yet in the previ-
       ous examples. It is the aptly named index function, signa-
       ture "((n),(),[o]())" (see Appendix B) with the core func-
       tionality that "index(pdl (0,2,4,5),2,($ret=null))" will
       return the element with index 2 of the first input pdl. In
       this case, $ret will contain the value 4.  So here is the
       example:

        # a threaded index lookup to generate an RGB, or RGBA or YMCK image
        # from a palette image (represented by a lookup table $palette and
        # an color-index image $im)
        # you can say just dummy(y) since the rules of threading make it fit
        perldl> index($palette->xchg(0,1),
                      $im->long->dummy(0,($palette->dim)[0]),
                      ($res=null));

       Let's go through it and explain the steps involved. Assum-
       ing we are dealing with an RGB lookup-table $palette is of
       size "(3,x)". First we exchange the dimensions of the
       palette so that looping is done over the first dimension
       of $palette (of size 3 that represent r, g, and b compo-
       nents). Now looking at $im, we add a dummy dimension of
       size equal to the length of the number of components (in
       the case we are discussing here we could have just used
       the number 3 since we have 3 color components). We can use
       a dummy dimension since for red, green and blue color com-
       ponents we use the same index from the original image,
       e.g.  assuming a certain pixel of $im had the value 4 then
       the lookup should produce the triple

        [palette(0,4),palette(1,4),palette(2,4)]

       for the new red, green and blue components of the output
       image. Hopefully by now you have some sort of idea what
       the above piece of code is supposed to do (it is often
       actually quite complicated to describe in detail how a
       piece of threading code works; just go ahead and experi-
       ment a bit to get a better feeling for it).

       If you have read the threading rules carefully, then you
       might have noticed that we didn't have to explicitely
       state the size of the dummy dimension that we created for
       $im; when we create it with size 1 (the default) the rules
       of threading make it automatically fit to the desired size
       (by rule R3, in our example the size would be 3 assuming a
       palette of size "(3,x)"). Since situations like this do
       occur often in practice this is actually why rule R3 has
       been introduced (the part that makes dimensions of size 1
       fit to the thread loop dim size). So we can just say

        perldl> index($palette->xchg(0,1),$im->long->dummy(y),($res=null));

       Again, you can convince yourself that this routine will
       create the right output if called with a pixel ($im is
       0D), a line ($im is 1D), an image ($im is 2D), ..., an RGB
       lookup table (palette is "(3,x)") and RGBA lookup table
       (palette is "(4,x)", see e.g. OpenGL). This flexibility is
       achieved by the rules of threading which are made to do
       the right thing in most situations.

       To wrap it all up once again, the general idea is as fol-
       lows. If you want to achieve looping over certain dimen-
       sions and have the core functionality applied to another
       specified set of dimensions you use the dimension manipu-
       lating commands to create a (or several) virtual pdl(l) so
       that from the point of view of the parent pdl(l) you get
       what you want (always having the signature of the function
       in question and R1-R5 in mind!). Easy, isn't it ?

       Output autocreation and PP-function calling conventions

       At this point we have to divert to some technical detail
       that has to do with the general calling conventions of PP-
       functions and the automatic creation of output arguments.
       Basically, there are two ways of invoking pdl routines,
       namely

        $result = func($a,$b);

       and

        func($a,$b,$result);

       If you are only using implicit threading then the output
       variable can be automatically created by PDL. You flag
       that to the PP-function by setting the output argument to
       a special kind of pdl that is returned from a call to the
       function "PDL->null" that returns an essentially "empty"
       pdl (for those interested in details there is a flag in
       the C pdl structure for this). The dimensions of the cre-
       ated pdl are determined by the rules of implicit thread-
       ing: the first dimensions are the core output dimensions
       to which the threading dimensions are appended (which are
       in turn determined by the dimensions of the input pdls as
       described above).  So you can say

        func($a,$b,($result=PDL->null));

       or

        $result = func($a,$b)

       which are exactly equivalent.

       Be warned that you can not use output autocreation when
       using explicit threading (for reasons explained in the
       following section on explicit threading, the second vari-
       ant of threading).

       In "tight" loops you probably want to avoid the implicit
       creation of a temporary pdl in each step of the loop that
       comes along with the "functional" style but rather say

        # create output pdl of appropriate size only at first invocation
        $result = null;
        for (0...$n) {
             func($a,$b,$result); # in all but the first invocation $result
             func2($b);           # is defined and has the right size to
                                  # take the output provided $b's dims don't change
             twiddle($result,$a); # do something from $result to $a for iteration
        }

       The take-home message of this section once more: be aware
       of the limitation on output creation when using explicit
       threading.

       Explicit threading

       Having so far only talked about the first flavour of
       threading it is now about time to introduce the second
       variant. Instead of shuffling around dimensions all the
       time and relying on the rules of implicit threading to get
       it all right you sometimes might want to specify in a more
       explicit way how to perform the thread loop. It is proba-
       bly not too surprising that this variant of the game is
       called explicit threading.  Now, before we create the
       wrong impression: it is not either implicit or explicit;
       the two flavours do mix. But more about that later.

       The two most used functions with explicit threading are
       thread and unthread.  We start with an example that illus-
       trates typical usage of the former:

        [ # ** this is the worst possible example to start with ]
        #  but can be used to show that $mat += $line is different from
        #                               $mat->thread(d) += $line
        # explicit threading to add a vector to each column of a matrix
        perldl> $mat  = zeroes(4,3)
        perldl> $line = pdl (3.1416,2,-2)
        perldl> ($tmp = $mat->thread(d)) += $line

       In this example, "$mat->thread(d)" tells PDL that you want
       the second dimension of this pdl to be threaded over first
       leading to a thread loop that can be expressed as

        for (j=0; j<3; j++) {
           for (i=0; i<4; i++) {
               mat(i,j) += src(c);
           }
        }

       "thread" takes a list of numbers as arguments which
       explicitly specify which dimensions to thread over first.
       With the introduction of explicit threading the dimensions
       of a pdl are conceptually split into three different
       groups the latter two of which we have already encoun-
       tered: thread dimensions, core dimensions and extra dimen-
       sions.

       Conceptually, it is best to think of those dimensions of a
       pdl that have been specified in a call to "thread" as
       being taken away from the set of normal dimensions and put
       on a separate stack. So assuming we have a pdl
       "a(4,7,2,8)" saying

        $b = $a->thread(2,1)

       creates a new virtual pdl of dimension "b(4,8)" (which we
       call the remaining dims) that also has 2 thread dimensions
       of size "(2,7)". For the purposes of this document we
       write that symbolically as "b(4,8){2,7}". An important
       difference to the previous examples where only implicit
       threading was used is the fact that the core dimensions
       are matched against the remaining dimensions which are not
       necessarily the first dimensions of the pdl. We will now
       specify how the presence of thread dimensions changes the
       rules R1-R5 for threadloops (which apply to the special
       case where none of the pdl arguments has any thread dimen-
       sions).

       T0  Core dimensions are matched against the first n
           remaining dimensions of the pdl argument (note the
           difference to R1). Any further remaining dimensions
           are extra dimensions and are used to determine the
           implicit loop dimensions.

       T1a The number of implicit loop dimensions is equal to the
           maximal number of extra dimensions taken over the set
           of pdl arguments.

       T1b The number of explicit loop dimensions is equal to the
           maximal number of thread dimensions taken over the set
           of pdl arguments.

       T1c The total number of loop dimensions is equal to the
           sum of explicit loop dimensions and implicit loop
           dimensions. In the thread loop, explicit loop dimen-
           sions are threaded over first followed by implicit
           loop dimensions.

       T2  The size of each of the loop dimensions is derived
           from the size of the respective dimensions of the pdl
           arguments. It is given by the maximal size found in
           any pdls having this thread dimension (for explicit
           loop dimensions) or extra dimension (for implicit loop
           dimensions).

       T3  This rule applies to any explicit loop dimension as
           well as any implicit loop dimension. For all pdls that
           have a given thread/extra dimension the size must be
           equal to the size of the respective explicit/implicit
           loop dimension or 1; otherwise you raise a runtime
           exception. If the size of a thread/extra dimension of
           a pdl is one it is implicitly treated as a dummy
           dimension of size equal to the explicit/implicit loop
           dimension.

       T4  If a pdl doesn't have a thread/extra dimension that
           corresponds to an explicit/implicit loop dimension, in
           the thread loop this pdl is treated as if having a
           dummy dimension of size equal to the size of that loop
           dimension.

       T4a All pdls that do have thread dimensions must have the
           same number of thread dimensions.

       T5  Output autocreation cannot be used if any of the pdl
           arguments has any thread dimensions. Otherwise R5
           applies.

       The same restrictions apply with regard to implicit dummy
       dimensions (created by application of T4) as already men-
       tioned in the section on implicit threading: if any of the
       output pdls has an (explicit or implicitly created)
       greater-than-one dummy dimension a runtime exception will
       be raised.

       Let us demonstrate these rules at work in a generic case.
       Suppose we have a (here unspecified) PP-function with the
       signature:

        func((m,n),(m),(),[o](m))

       and you call it with 3 pdls "a(5,3,10,11)",
       "b(3,5,10,1,12)", "c(c)" and an output pdl
       "d(3,11,5,10,12)" (which can here not be automatically
       created) as

        func($a->thread(1,3),$b->thread(0,3),$c,$d->thread(0,1))

       From the signature of func and the above call the pdls
       split into the following groups of core, extra and thread
       dimensions (written in the form "pdl(core dims){thread
       dims}[extra dims]"):

        a(5,10){3,11}[] b(b){3,1}[10,12] c(){}[10] d(d){3,11}[10,12]

       With this to help us along (it is in general helpful to
       write the arguments down like this when you start playing
       with threading and want to keep track of what is going on)
       we further deduce that the number of explicit loop dimen-
       sions is 2 (by T1b from $a and $b) with sizes "(3,11)" (by
       T2); 2 implicit loop dimensions (by T1a from $b and $d) of
       size "(10,12)" (by T2) and the elements of are computed
       from the input pdls in a way that can be expressed in pdl
       pseudo-code as

        for (l=0;l<12;l++)
         for (k=0;k<10;k++)
          for (j=0;j<11;j++)         effect of treating it as dummy dim (index j)
           for (i=0;i<3;i++)                         |
              d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(c))

       Uhhmpf, this example was really not easy in terms of
       bookeeping. It serves mostly as an example how to figure
       out what's going on when you encounter a complicated look-
       ing expression. But now it is really time to show that
       threading is useful by giving some more of our so called
       "practical" examples.

       [ The following examples will need some additional expla-
       nations in the future. For the moment please try to live
       with the comments in the code fragments. ]

       Example 1:





        *** inverse of matrix represented by eigvecs and eigvals
        ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
        **    =>  inverse M^-1 = A^T x diag(1/lambda_i) x A
        ** first $tmp = diag(1/lambda_i)*A
        ** then  A^T * $tmp by threaded inner product
        # index handling so that matrices print correct under pdl
        $inv .= $evecs*0;  # just copy to get appropriately sized output
        $tmp .= $evecs;    # initialise, no backpropagation
        ($tmp2 = $tmp->thread(d)) /= $evals;    #  threaded division
        # and now a matrix multiplication in disguise
        PDL::Primitive::inner($evecs->xchg(0,1)->thread(-1,1),
                              $tmp->thread(0,-1),
                              $inv->thread(0,1));
        # alternative for matrix mult using implicit threading,
        # first xchg only for transpose
        PDL::Primitive::inner($evecs->xchg(0,1)->dummy(y),
                              $tmp->xchg(0,1)->dummy(y),
                              ($inv=null));

       Example 2:

        # outer product by threaded multiplication
        # stress that we need to do it with explicit call to my_biop1
        # when using explicit threading
        $res=zeroes(($a->dims)[0],($b->dims)[0]);
        my_biop1($a->thread(0,-1),$b->thread(-1,0),$res->(0,1),"*");
        # similiar thing by implicit threading with autocreated pdl
        $res = $a->dummy(y) * $b->dummy(y);

       Example 3:

        # different use of thread and unthread to shuffle a number of
        # dimensions in one go without lots of calls to ->xchg and ->mv

        # use thread/unthread to shuffle dimensions around
        # just try it out and compare the child pdl with its parent
        $trans = $a->thread(4,1,0,3,2)->unthread;

       Example 4:

        # calculate a couple of bounding boxes
        # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
        # we use again thread and unthread to shuffle dimensions around
        perldl> $bb = zeroes(double, 2,3 );
        perldl> minimum($vertices->thread(d)->clump->unthread(d),
                        $bb->slice('(0),:'));
        perldl> maximum($vertices->thread(d)->clump->unthread(d),
                        $bb->slice('(1),:'));

       Example 5:

        # calculate a self-ratioed (i.e. self normalized) sequence of images
        # uses explicit threading and an implicitly threaded division
        $stack = read_image_stack();
        # calculate the average (per pixel average) of the first $n+1 images
        $aver = zeroes([stack->dims]->[0,1]);  # make the output pdl
        sumover($stack->slice(":,:,0:$n")->thread(0,1),$aver);
        $aver /= ($n+1);
        $stack /= $aver;  # normalize the stack by doing a threaded divison
        # implicit versus explicit
        # alternatively calculate $aver with implicit threading and autocreation
        sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
        $aver /= ($n+1);
        #



       Implicit versus explicit threading

       In this paragraph we are going to illustrate when explicit
       threading is preferrable over implicit threading and vice
       versa. But then again, this is probably not the best way
       of putting the case since you already know: the two
       flavours do mix. So, it's more about how to get the best
       of both worlds and, anyway, in the best of perl tradi-
       tions: TIMTOWTDI !

       [ Sorry, this still has to be filled in in a later
       release; either refer to above examples or choose some new
       ones ]

       Finally, this may be a good place to justify all the tech-
       nical detail we have been going on about for a couple of
       pages: why threading ?

       Well, code that uses threading should be (considerably)
       faster than code that uses explicit for-loops (or similar
       perl constructs) to achieve the same functionality. Espe-
       cially on supercomputers (with vector computing facili-
       ties/parallel processing) PDL threading will be imple-
       mented in a way that takes advantage of the additional
       facilities of these machines. Furthermore, it is a concep-
       tually simply construct (though technical details might
       get involved at times) and can greatly reduce the syntac-
       tical complexity of PDL code (but keep the admonition for
       documentation in mind). Once you are comfortable with the
       threading way of thinking (and coding) it shouldn't be too
       difficult to understand code that somebody else has writ-
       ten than (provided he gave you an idea what exspected
       input dimensions are, etc.). As a general tip to increase
       the performance of your code: if you have to introduce a
       loop into your code try to reformulate the problem so that
       you can use threading to perform the loop (as with any-
       thing there are exceptions to this rule of thumb; but the
       authors of this document tend to think that these are rare
       cases ;).

PDL::PP
       An easy way to define functions that are aware of indexing
       and threading (and the universe and everything)

       PDL:PP is part of the PDL distribution. It is used to gen-
       erate functions that are aware of indexing and threading
       rules from very concise descriptions. It can be useful for
       you if you want to write your own functions or if you want
       to interface functions from an external library so  that
       they support indexing and threading (and mabe dataflow as
       well, see PDL::Dataflow). For further details check
       PDL::PP.

Appendix A
       Affine transformations - a special class of simple and
       powerful transformations

       [ This is also something to be added in future releases.
       Do we already have the general make_affine routine in PDL
       ? It is possible that we will reference another appropri-
       ate manpage from here ]

Appendix B



       signatures of standard PDL::PP compiled functions

       A selection of signatures of PDL primitives to show how
       many dimensions PP compiled functions gobble up (and
       therefore you can figure out what will be threaded over).
       Most of those functions are the basic ones defined in
       "primitive.pd"

        # functions in primitive.pd
        #
        sumover        ((n),[o]())
        prodover       ((n),[o]())
        axisvalues     ((n))                                   inplace
        inner          ((n),(n),[o]())
        outer          ((n),(m),[o](n,m))
        innerwt        ((n),(n),(n),[o]())
        inner2         ((m),(m,n),(n),[o]())
        inner2t        ((j,n),(n,m),(m,k),[o]())
        index          (1D,0D,[o])
        minimum        (1D,[o])
        maximum        (1D,[o])
        wstat          ((n),(n),(),[o],())
        assgn          ((),())

        # basic operations
        binary operations ((),(),[o]())
        unary operations  ((),[o]())


AUTHOR & COPYRIGHT
       Copyright (C) 1997 Christian Soeller (c.soeller@auck-
       land.ac.nz) & Tuomas J. Lukka (lukka@fas.harvard.edu). All
       rights reserved. Although destined for release as a man
       page with the standard PDL distribution, it is not public
       domain. Permission is granted to freely distribute verba-
       tim copies of this document provided that no modifications
       outside of formatting be made, and that this notice remain
       intact.  You are permitted and encouraged to use its code
       and derivatives thereof in your own source code for fun or
       for profit as you see fit.



perl v5.6.1                 2000-05-24                INDEXING(G)