utf8
utf8(8)      Perl Programmers Reference Guide     utf8(8)



NAME
       utf8 - Perl pragma to enable/disable UTF-8 in source code

SYNOPSIS
           use utf8;
           no utf8;


DESCRIPTION
       WARNING: The implementation of Unicode support in Perl is
       incomplete.  See perlunicode for the exact details.

       The "use utf8" pragma tells the Perl parser to allow UTF-8
       in the program text in the current lexical scope.  The "no
       utf8" pragma tells Perl to switch back to treating the
       source text as literal bytes in the current lexical scope.

       This pragma is primarily a compatibility device.  Perl
       versions earlier than 5.6 allowed arbitrary bytes in
       source code, whereas in future we would like to standard-
       ize on the UTF-8 encoding for source text.  Until UTF-8
       becomes the default format for source text, this pragma
       should be used to recognize UTF-8 in the source.  When
       UTF-8 becomes the standard source format, this pragma will
       effectively become a no-op.  This pragma already is a no-
       op on EBCDIC platforms (where it is alright to code perl
       in EBCDIC rather than UTF-8).

       Enabling the "utf8" pragma has the following effects:

       o   Bytes in the source text that have their high-bit set
           will be treated as being part of a literal UTF-8 char-
           acter.  This includes most literals such as identi-
           fiers, string constants, constant regular expression
           patterns and package names.

       o   In the absence of inputs marked as UTF-8, regular
           expressions within the scope of this pragma will
           default to using character semantics instead of byte
           semantics.

               @bytes_or_chars = split //, $data;  # may split to bytes if data
                                                   # $data isn't UTF-8
               {
                   use utf8;                       # force char semantics
                   @chars = split //, $data;       # splits characters
               }


SEE ALSO
       perlunicode, bytes



perl v5.6.1                 2001-03-03                utf8(8)