PEP(1)                                                                  PEP(1)



NAME
       pep - a file detergent

SYNOPSIS
       pep [ -a ] [ -b ] [ -c [ size ]] [ -d + | - ]
            [ -e [ 0 | 1 | 2 ]] [ -g [ file ]] [ -h ] [ -i + | - ]
            [ -k + | - ] [ -l [n][ size ]] [ -m + | - ] [ -o [ b ]]
            [ -p ] [ -s [ size ]] [ -t [ size ]] [ -u terminator ]
            [ -v ] [ -w + | - ] [ -x ] [ -z ] [ filename ...  ]

DESCRIPTION
       Pep  is a filter program to "clean" files.  It is named after a popular
       Norwegian detergent.

       Pep may be used to remove control characters, strip parity bits, inter-
       pret  ANSI  escape  sequences, compress tabulation, extract strings and
       convert character sets.

       Pep is a filter.  Its default operation is to read from standard  input
       (the keyboard) and write on standard output (the terminal).

       You may also specify the name of one or more files as the last argument
       on the command line.  Most versions of pep (not  the  version  compiled
       for the DEC VMS operating system) allow redirection and ambiguous file-
       name arguments.

       Instead of using pep as a filter; you may instruct  pep  to  write  the
       result  back  onto  the original input file with the -o option.  If you
       use this option, the original file will be lost.  This is  the  default
       behaviour  on  operating  systems that do not support redirection (e.g.
       DEC VMS).

       To get a brief summary of the command line syntax and all the  options,
       you need to specify the -h option.  Just type the command:
              pep -h

       followed  by the RETURN key.  Note that just pep will not give you this
       summary.  The command:
              pep

       will start pep as a filter, and it will just  echo  back  whatever  you
       type, until you type the end of file character (usually CTRL-D or CTRL-
       Z).

       When pep is running as filter, it is reading from  the  standard  input
       and  writing  to  the standard output.  In this state, pep will be very
       much less verbose than it usually is.  It will still print  error  mes-
       sages, but very little else.  Note that while:
              pep < foobar.in > foobar.out
              pep -ob foobar.txt

       will do more or less the same job, the first will do it quietly, in the
       tradition of Unix filters; the latter will print the copyright  notice,
       a  detailed  list of the things it will do, and finally a list and line
       count of all the files it processes as it plods along.

       Pep will remove some "noise" from files, even if no options are  speci-
       fied.  The following is the default behavior:

                remove trailing spaces;

                terminate  each line with the canonical line terminator (usu-
                 ally LF, CR or both);

                remove underlining intended for backspacing printers;

                remove control  characters  (character  codes  <  32)  except
                 canonical line terminator, FF and TAB;

                break  the  line  before the FF if a line contains an FF any-
                 where except in the first column.

       If you want to check what pep actually intend to do to your file before
       it does it, you may make it pause with the -p option.  For example:
              pep -p foobar.txt

       will  make  pep stop after displaying a list of the conversions it will
       apply to the file.  The user is prompted  and  may  choose  to  proceed
       (hitting  the  RETURN key), or abort the program without doing anything
       (hitting CTRL-C).

       The user may want other conversions than the default  action  described
       above.   A number of conversion functions may be selected by specifying
       one or more options on the command line.

       Some of the options require an additional argument switch, and must  be
       followed  by  a "+" or a "-", other options require a number or a file-
       name argument.  Most of the options may be combined with other options,
       but  a  few  are  mutually  exclusive.   If  the user specifies invalid
       options or option arguments, then pep will abort with an error  message
       and  return  an  error exit code on operating systems that support exit
       codes.

OPTIONS
       -a     Write out information about pep.

       -b     Remove all characters not in the original  7-bit  character  set
              (ISO  646).   I.e.  remove the characters which are encoded from
              128 to 255.  (If this option is combined with the -x option,  it
              will print the codes for these characters in hexadecimal instead
              of removing them.)  The -b option is powerful, and may remove  a
              lot  of  bytes  if you use it on the wrong file.  Only use it if
              you know exactly how the eight bit  is  used  in  the  file  you
              intend  to  filter.  Also note that the options i, d, k, g, m, w
              or z in most cases are better suited to process files where  the
              eight bit is set.

       -c [ size ]
              Compress space into tabulation.  I.e. insert TAB characters when
              replacing a run of two or more SPACE characters would produce  a
              smaller output file.  This function is the opposite of the func-
              tion invoked with the -t option.

              The default tabulation size is 8, but you may specify any  other
              tabulation with the optional numeric argument.

       -d + | -
              Convert  to  or  from the ISO 8859/1 8 bit character set and the
              Norwegian version of the ISO 646 7 bit character  set.   If  the
              argument  is  "+",  the file is converted to ISO 8859/1.  If the
              argument is "-", the file is converted from ISO 8859/1.  The ISO
              8859/1  character  set  is also known as the  "DEC Multinational
              Character Set".

       -e [ 0 | 1 | 2 ]
              Interpret ANSI screen control  sequences  (also  known  as  ANSI
              ESCAPE sequences).  This function makes pep emulate cursor posi-
              tioning and other functions on an ANSI-terminal.

              Pep will complain about "strange"  (i.e.  implementation  depen-
              dent) use of ANSI escape sequences.

              Pep  will  normally  save a screen image on the output file when
              one of two events occur:  1) When the screen is full and scrolls
              up;  or  2) just before a screen image is erased with the "erase
              screen" ANSI screen control sequence.  In some  cases  important
              fields on the screen will be overwritten or erased.  There is no
              good solution to this problem, but pep provides  the  user  with
              some opportunity to guard against overwriting and erasure.  This
              is done by specifying an additional numeric argument to  the  -e
              option.   This  numeric  indicate the level of protection and is
              interpreted as follows:

                     0: no protection -- fields may be erased  and  overwritten
                        (this is the default);

                     1: sequences that erase fields are ignored;

                     2: sequences  that erase or overwrite fields are ignored.

       -g [ file ]
              Read the conversion table from a file.  The  name  of  the  file
              must be appended as the argument to this option.

              The  file  itself  is a standard ASCII text file where each line
              should contain two decimal numbers.  The  first  number  is  the
              character  code  to  convert  from, and the second number is the
              character code to convert to.  A "#" character and all the  fol-
              lowing  characters  up to a NEWLINE is considered a comment, and
              is ignored.  Comments are however echoed  on  the  screen  along
              with  the  other  comments  pep  makes,  unless the comment line
              starts with a "##".

              Below is an example of how such a conversion file may look:

                     # Convert from Macintosh to IBM-PC
                     ##This line is not echoed on the screen.
                     # MAC IBM
                     174 146
                     175 157
                     129 143
                     190 145
                     191 155
                     140 134
                     # EOF

              If the name of the file is omitted, pep will write  out  a  list
              the directories it searches for these files.

       -h     Write a brief summary of pep options, and exit.

       -i + | -
              Convert  to  or  from the IBM 8 bit character set (Code Page 850
              Multilingual) and the Norwegian version of the  ISO  646  7  bit
              character set.  If the argument is "+", the file is converted to
              CP 850.  If the argument is "-", the file is converted  from  CP
              850.   The  CP  850 character set (or a subset of it) is what is
              used in the IBM PC, AT, and PS/2 series of computers  and  their
              clones.  Note that some machines with American PROMs have a yen-
              and cent character in the position rightfully belonging to upper
              and lower case versions of the Norwegian character written as an
              "o" with a slash across it (often referred to as oslash).

       -k + | -
              Convert to or from a 8 bit character set and the ISO 646  7  bit
              character  set.   This is a modified version of the -i function,
              hacked to preserve both the backslash character  and  the  upper
              case  oslash character as required by, among others, the "Knowl-
              edgeMan" package.  These characters  share  the  same  code  (92
              decimal) in 7 bit ISO 646, but uses different codes (92 is back-
              slash, 157 is oslash) in 8 bit CP 850.  To get around this,  two
              backslashes  in  ISO  646  will  be  converted to the upper case
              oslash character in CP 850, while a  single  backslash  will  be
              preserved -- and vice versa.

              If this option is combined with the -d or -m option, the DEC/ISO
              or the Macintosh character sets is used as base  instead  of  CP
              850.

       -l [ [ n ] size ]
              Split  long lines into lines of maximum length given by the size
              argument.

              This option will also make sure that there will be at least  one
              blank  line between each paragraph, unless the optional argument
              n is specified.

              If size is not specified, a default value of 72  characters  are
              used.

       -m + | -
              Convert  to  or from the Apple Macintosh 8 bit character set and
              the Norwegian version of the ISO 646 7 bit  character  set.   If
              the  argument  is  "+",  the  file is converted to the Macintosh
              character set; if the argument is "-",  the  file  is  converted
              from  the Macintosh character set.  See description of -v option
              below and note in "bugs" section below about treatment of  "end-
              of-line" and "end-of-paragraph".

       -o [ b ]
              Pep will usually write the result of conversions on the standard
              output (stdout).  This option instead instructs pep  to  replace
              each  named input file with a file containing the result of fil-
              tering the file through pep.  If the option  is  augmented  with
              the  argument  b (i.e.  -ob), then pep will create a backup copy
              of the original input file on a file with  extension  .BAK.   If
              you just specify -o the original file is deleted.

              The  VMS  version  of  pep will always run as if this option was
              specified.  This is because VMS does not support useful redirec-
              tion  or pipes.  Therefore, it is never necessary to specify the
              -o option under VMS, but users should still specify -ob if  they
              want a backup copy of the original input file.

       -p     Write out a brief description the conversion functions that will
              be activated by the current set of options, and pause.  The user
              may  review  the list of conversion functions and abort (by hit-
              ting CTRL-C) if they do not have the intended effect.

       -s [ size ]
              Find strings in extremely "noisy" files.

              Pep's concept of a string is that it is a  sequence  of  "print-
              able"  characters  of  a  certain  length.   The default minimum
              length of this sequence is 4, but this may  be  changed  by  the
              user  by supplying an optional numeric argument that becomes the
              minimum length of the sequence.

              The default definition of a "printable" character  is  a  symbol
              with  encoding  above  31  decimal (i.e. 32 to 255) plus certain
              common control characters (TAB, CR and LF).  This definition  is
              almost  always too liberal, and will include a lot of "noise" in
              the output.  One or more of the options -b, -d,  -i,  -m  or  -z
              should  be  specified  in  addition to -s in order to narrow the
              definition and the search  space.   In  my  experience,  the  -b
              option is a particularly useful additional filter when searching
              for strings.

       -t [ size ]
              Expand tabulation, replacing the TAB character with  a  suitable
              number  of  spaces.   The  default tabulation size is 8, but the
              optional numeric argument size may be used to set tabulation  to
              any desired size.

       -u r | n | s | - | # | number
              Pep's  default  behaviour is to terminate lines with whatever is
              the canonical line terminator (the standard way to  terminate  a
              text  line)  on  the  assumed target system for the output file.
              This means CR/LF on a microcomputer system, LF on a UNIX system,
              and CR if the target is a Macintosh).  The assumed target system
              is usually the system pep is  running  on,  unless  you  request
              folding  to the character set of another computer system.  Then,
              that computer system becomes the assumed target.

              The -u option allows you to override this  assumption.   You  do
              this by specifying explicit (in decimal) the numeric ASCII value
              of the end of line character you want in your output file.   For
              example,  to  make sure lines are terminated by LF (the standard
              for UNIX text files), you may use -u10, because 10 is the  ASCII
              value  of  the  newline  (LF)  control  character.  Instead of a
              numeric argument, you may specify r, for carrige return (CR), n,
              for  newline  (LF),  s, for record separator (RS), the symbol -,
              for no line terminator, or the symbol # to  get  carrige  return
              followed by a newline (CR/LF).

       -v     Normally,  pep  will terminate each line with the canonical line
              terminator.  Some typesetting programs and word processors, how-
              ever,  require  that no hard line terminator is present within a
              paragraph, and that only paragraphs are hard terminated.  If you
              want to export a file to such a typesetting program or word pro-
              cessor, you may instruct pep to terminate paragraphs  only  with
              this option.

              See  note  in  "bugs"  section below about treatment of "end-of-
              line" and "end-of-paragraph".

       -w + | -
              This slightly obsolete option converts files  to  and  from  the
              WordStar  version  3.2 "document" mode.  If the argument is "+",
              the file is converted to WordStar document mode; if the argument
              is  "-",  the file is converted from WordStar document mode into
              plain ASCII text.

       -x     Expand unprintable characters.  This option will make pep expand
              the characters it would otherwise remove from the file by print-
              ing the character encoding of these  characters  in  hexadecimal
              between angle brackets.

       -z     Zero  the eight bit (a.k.a. the parity bit) on all characters in
              the file.

ENVIRONMENT
       Pep knows a single environment variable: PEP,  which  may  be  used  to
       indicate  the  lookup  path for files with conversion tables.  Below is
       some examples on how to set this in some operating systems:
              set PEP=C:\MISC\LIB       (MS-DOS)
              setenv PEP /home/george/lib      (UNIX)
              define PEP "DISK_USR:<GEORGE.LIB>"    (VMS)

       The command to set this environment variable should usually be part  of
       the  command  file  that  is  read  during  login  (this  may  be named
       AUTOEXEC.BAT, LOGIN.COM, .profile or .login depending upon your  choice
       of operating system.

DIAGNOSTICS
       If  you  specify  an  option that pep does not recognize, then pep will
       write a summary of usage and abort.  Other errors on the  command  line
       will result in pep writing an error message before aborting.

       On  operating  systems that support exit codes, pep will return an exit
       code upon termination.

       If pep is interpreting ANSI escape sequences and notices syntactical or
       semantical errors in the way they are used, a warning is printed on the
       screen, prefixed with the string "ansi:".  This means that it  is  also
       possible  to  use  pep  to  check  if  programs use ANSI sequences in a
       portable way.

FILES
       The directory /ifi/bifrost/a03/gisle/lib/pep should contain  a  set  of
       standard filters for use with the -g option.

AUTHOR
       Copyright (c) 1987-1995 Gisle Hannemyr.

       This  program  is free software;  you can redistribute it and/or modify
       it under the terms of the GNU General Public License, as  published  by
       the Free Software Foundation. See the file "copying.txt" for details.

       Bug reports, comments and suggestions to:
         gisle@hannemyr.no

ACKNOWLEDGMENTS
       Thanks  to Robert Andersson, for the SYS-V rename function; and to Knut
       Borge, Bjorn Larsen, Knut Omang and Geir-Harald Strand, for elucidation
       of the unspeakeable horrors of VMS.

       Several  people  have  contributed  character  tables, ideas and/or bug
       reports.  In addition to those mentioned  above:  Inge  Arnesen,  Nils-
       Eivind  Naas,  Ola  Garstad,  Ottar  Grimstad, Tor Sjowall, Jens-Henrik
       Sorensen and Bjorn Asle Valde, should be mentioned.   My  apologies  if
       anyone is forgotten.

SEE ALSO
       dd(1),  convert(VMS),  expand(1),  od(1V),  sed(1),  strings(1), tr(1),
       unexpand(1).

       Those marked VMS are standard VMS utilities.  The others  are  standard
       UNIX utilities.

BUGS
       There  is  a  very  strong Norwegian bias in pep.  In particular, there
       exists several national versions of the ISO 646  7-bit  character  set;
       but  all  built-in  functions to convert between this and various 8-bit
       character sets (i.e.  -d, -i, -k and -m) bluntly assumes  the  standard
       Norwegian  version  of the ISO 646. For pep to work with other national
       7-bit character sets, the compiled in conversion tables  (type  FOLDMA-
       TRIX for those who read the source code) need to be extended.

       The  VMS  version  of  pep runs with the -o option permanently enabled.
       This is because VMS does not support an useful i/o redirection or  pipe
       mechanism.

       The  VMS  Record  Management Service (RMS) knows of several record for-
       mats.  You can see what record format a file is by using  the  VMS  DCL
       command  DIRECTORY/FULL  and examine the field "Record format".  On VMS
       systems, Pep will always generate output files with record  format  set
       to  "Stream_LF",  but some programs may require that the output file is
       in other formats.  To fix this, it might be necessary to run the output
       of pep through the VMS CONVERT utility.  Please see the DEC VMS manuals
       for details.

       The Macintosh "text only" format uses the carriage return (CR)  charac-
       ter  (ASCII  13)  as  terminator.  Most text processors (e.g. MacWrite)
       seems capable of handling two conventions: One is to use CR  to  termi-
       nate  each  line (and two or more consequtive CR's between paragraphs);
       the other is to use CR between paragraphs only.  Pep is also capable of
       handling  both conventions.  The default behaviour is to terminate each
       line, but the -v option may  be  used  to  terminate  paragraphs  only.
       Please note that pep uses a rather simplistic heuristic to identify the
       end of a paragraph, it bluntly assumes that paragraphs are separated by
       blank lines.

       If  you  use  the -o option, then the original input file will be over-
       written.  Before you are familiar with pep, you may find that it  some-
       times  removes  more material than you expect from a file.  It may be a
       good idea to always make a copy of the original file before  you  start
       experimenting  with  pep,  or  you  may  add the "b" argument to the -o
       option (-ob).

       The built-in IBM-PC, DEC and Macintosh conversion  tables  converts  to
       and from the Norwegian version of 7-bit "ASCII" characters.  You should
       use the -g option and "general" conversion tables for  all  other  pur-
       poses.

       Pep  only  knows  the ANSI sequences implemented in the standard MS-DOS
       console driver ANSI.SYS.

       There cannot be a space character between an option  and  the  option's
       argument (e.g. you'll have to use "-gfoo.bar", not "-g foo.bar").

       Pep  will only filter "regular" files.  It will skip directories, sock-
       ets and "special" files.

       Links are the GOTOs of file systems.  If you run  a  hard  linked  file
       through  pep  using the -o option, the link will not be preserved.  Pep
       will just skip soft linked files.

       Pep searches for the conversion tables requested with the -g option  in
       the following order: first the current directory, then the directory of
       the file PEP.EXE (MS-DOS only), then the directory pointed  to  by  the
       PEP     environment     variable,    and    finally    the    directory
       /ifi/bifrost/a03/gisle/lib/pep.

       Pep knows nothing about the COFF-format and the -s option is  primitive
       compared to the UNIX command strings(1).



Version 2.8                     1995 August 11                          PEP(1)
