frogbak

This is a program for managing backups. It does two kinds of backups: full dumps and overlapping incrementals.

It stores it's backups on tape.

It keeps a database of what's on which tape.

It supports a veriety of underlying backup progams (tar, dump, etc).

Current status: abandoned
Last modification: 1996
Download: here



FROGBAK(1)                                                          FROGBAK(1)



NAME
       frogbak - schedule and execute backups for a small network

SYNOPSIS
       frogbak  [-dryrun]  [-summary]  [-future  #h|#d|#w|#m]  [-control  con-
                 -------    --------    -------  -- -- -- --    --------  ----
       trol file]
       ---------
       mkblank tapename
               --------
       recycle tapename [tapename...]
               --------  --------
       sum covered control file [control file...]
          -        ------------  ------------
       send offsite
           -

DESCRIPTION
       These programs provide frogbak services for small networks.  The  algo-
       rithms  they  use were designed for the following environment: one tape
       drive, 20 GB of disk space, several kinds of computers, and a lazy pro-
       grammer  who was paranoid about dumps.  The closer an environment is to
       that, the better the system will work for you.

       The basic design is that the system choses when  to  do  what  kind  of
       dump.   It  does  two kinds of dumps: incrementals and fulls.  Unlike a
       differential, an incremental dump saves the  files  modified  from  the
       time of the last dump at the same dump level to the present rather than
                                    ----
       the files modified from the time of the last dump at a lower  level  to
                                                              -----
       the  presetn.   These  dumps  are organized into three separate tracks.
       The tracks are independent of one another such that an  incremental  in
       one  track  does  not  affect the coverage of an incremental in another
       track.  Two of the tracks have only incrementals and the last has  only
       full  dumps.  This setup means that each file gets saved onto two tapes
       by the incremental tracks.  This notion of tracks is a convienient  way
       to  think about the behavior of frogbak, but it is not how the behavior
       is implemented: it is implemented by making the incrementals  save  the
       files modified between now and the incremental before the last one.

       To  restore  a  filesystem, the last good full dump must be read first.
       If a full dump is bad, just ignore it and skip  to  the  previous  one.
       After  the full dump, every other incremental from the time of the full
       dump until the present must be read.  If an incremental is bad,  switch
       to the other track.

       It  is not expected that dumps will be bad.  Exabyte and DAT media seem
       to be very reliable in  comparison  to  the  9-track  tapes  used  last
       decade.   However,  frogbak  performs the dumps on live filesystems and
       thus sometimes the dump data will be bad even though the media is okay.

       The  choice  of  dumps  is  specified  with  a  control file.  For each
       filesystem, the control file specifies:

              The frequency of incremental dumps.  This places a limit on  how
              often a dump is performed.  Dumps will not occur more often than
              the specified rate.

              The importance of incremental dumps.  The importance is combined
              with  the  lenght  of time it has been since an incremental dump
              has been done compared to specified frequency of dumps  to  pro-
              duce  a  rating  number.  The formula is something like rating =
                                                                      ------
              time since last dump / frequency * importance.  Thus if the fre-
              ---- ----- ---- ----   ---------   -----------
              quency  is  two  days, the importance is 50, and it has been ten
                                                       --
              days since an incremental dump, then the rating would be 10/2*50
                                                       ------
              = 250.

              The frequency and importance of full dumps.

       The  ratings  for  both  incremental  and  full dumps are compared on a
       filesystem by filesytem basis.  For each filesystem,  either  the  full
       dump  or  the  incremental will be discarded from consideration at this
       point.

       The ratings for all of the filesystems are then sorted  and  dumps  are
       performed in order, based on these priorities.

CONTROL FILES
       There  are  four types of lines that can be in the control files.  They
       are: comments, variable  assignments,  filesystem  control  lines,  and
       average statements.

       Any line beginning with a hash (#) symbol is a comment.

       Any  line  beginning  with  a legal (C-style) identifier followed by an
       equals (=) is a variable assignment.  Variable references may  be  made
       in  variable  assignments and in the dump valuation columns.  Variables
       are recognized because the only other symbols that may occur  in  those
       locations are numbers and math operatators like times (*) and plus (+).

       Filesystem control lines are  made  up  of  seven  whitespace-separated
       fields:

       filesystem          Names the filesystem to be dumped.

       host                Names the system that the filesystem is on.

       os                  Names  the  dump  program  to  be  used to dump the
                           filesystem.  The current legal values  for  the  os
                           field  are:  solaris, sunos, freebsd, netbsd, hpux,
                           hp-ux, mach, domain,  ultrix,  sony,  linux,  gtar,
                           dostar, targtar, and xenix.  Dumps can be done with
                           the GNU tar program.  On linux it  is  the  default
                           and  is  called  tar, on other systems it is called
                           tar.  DOS filesystems can be dumped  with  gtar  if
                           they  are  mounted  on  a unix system.  The current
                           dostar setup assumes that the tar program is really
                           GNU tar.

       ifreq               Specifies  the frequency of incremental dumps.  The
                           format is  Where  is a  deci-
                                     --------------       --------
                           mal number and  is one of h, d, w, or m; cor-
                                          ------
                           rosponding to hours, days, weeks, and months.

       ivalue              Specifies the relative value of doing an  incremen-
                           tal  dump on this filesystem after a duration equal
                           to the ifreq.

       ffreq               Specifies the frequency of doing a full dump.  Same
                           format as ifreq.

       fvalue              Specifies  the  value  of doing a full dump of this
                           filesystem after a duration equal to the ffreq.

       I recommend setting all of the frequencies at one day.   That  way  you
       can  tell  if everything is getting dumped or not.  I further recommend
       setting a variable to be the relative importance of doing  full  dumps.
       Then  when  the  ivalue is set to x, and the fvalue is set to x/fd, the
                                         -                           ----
       number of full dumps per incremental can  be  varied  by  changing  the
       value of fd.  This allows the dump system to be tuned easily.
                --

       The last sort of line is an average statment.  The syntax of an average
       statement is average system-name filesystem-1 filesystem-2 etc...  Nor-
                            ----------- ------------ ------------ ------
       mally,  the  each  filesystem for a given system is considered indepen-
       dently.  This means that they may not be near each other  on  the  tape
       and,  futher,  they may not both make it onto the same tape if the tape
       runs out.  It is easier to restor if everything you need is on the same
       tape and still easier if it is grouped together.  The average statement
       causes the averaged dumps to be placed sequentially on the tape.  Their
       ratings are averaged.

RECORDS
       Every time a dump is performed a record of the dump is stored in a file
       that lists dumps done for that filesystem.  The records for full  dumps
       and  incrementail dumps are stored separately.  Full dumps are named by
       transforming all the slashes (/) in the filesystem name  to  dots  (.).
       Thus  /usr/local  becomes  usr.local.full.  As a special case, the root
             ----------           --------------
       filesystem becomes simply .full.  Make sure you use the  -a  option  to
                                 -----
       ls(1) when listing the directories.

       Incremental dumps are similarly named, but with .incr instead of .full.

       All  of  the  records  for  a  given  system  are   stored   in   /mas-
                                                                         -----
       ter path/records/hostname.   The  master path is the path to the top of
       ---------       ----------        -----------
       the frogbak system's directory.  At Berkeley Research and Trading, that
       is  /y/adm/dump.   Thus to find out when the last incremental of /y was
       performed, look in /y/adm/dump/records/troy/y.incr.

       The filesystem dump record files have the following format:

              FROM-DATE TO-DATE TAPE-NAME FILE-NUMBER # COMMENT
              --------- ------- --------- ----------- - -------

       The FROM-DATE beginning of the time covered by  that  particular  dump.
           ---------
       On  full  dumps,  the FROM-DATE is simply 0.  The TAPE-NAME is the sym-
                             ---------                   ---------
       bolic name of the tape that the dump is on.  It is the  name  that  was
       given as an argument to mkblank, and is, hopefully, written on the side
       of the tape.  The FILE-NUMBER field specifies how many  files  must  be
                         -----------
       skipped  over on that tape to get to that dump.  Thus if FILE-NUMBER is
                                                                -----------
       17 and you wanted to restore that dump, you would need  to  use  mt  -f
       device fsf 17 to get to that dump.
       ------

       Although logically the incrementals can be divided into to tracks, they
       are not stored that way in the records database.  In fact, the  logical
       division  is  just  an artifact that that incremental dumps cover back-
       wards to the incremental dump prior to the previous one.

       To find out when something has been backed up, both the .full and .incr
       records  files must be examined.  They give the times and coverages for
       the filesystems.  To find out if a particular file was backed  up,  the
       dump tape must be read.  No index of files saved is kept.

TAPES
       The  information  about  each  dump performed is also stored grouped by
       what tape it is on.  In the directory  /master path/tapes,  information
                                              -------------
       about  each  dump tape is stored.  This information includes tape write
       speed performance figures and other tidbits.

       This  information  substantially  duplicates  the  information  in  the
       records directory.

RESTORES
       Each  different kind of system uses a different dump program and thus a
       different restore program.  The basic idea is that on the  system  that
       was  dumped,  give  a  command that pipes the dump output from the tape
       into the restore program.

       It is usually easiest to forward the tape to the  correct  file  before
       logging onto the system to be restored.  The number of files to forward
       over is listed as the forth field in the system dump records  database.
       On   hp-ux  systems,  the  command  is  mt  -t  /dev/rmt/0mn  fsf  num-
                                                                          ----
       ber of files to skip.  On BSD-based systems, the command is usually  mt
       --------------------
       -f /dev/nrst0 number of files to skip.
                     -----------------------

       The  blocksize used to write the tapes is specified in the beginning of
       the frogbak program file.  The value that I use is 112 blocks, or  56k.
       This  size  is  not arbitrary.  On Suns, sizes above 127 blocks are not
       reliable.  Exabytes physically write data in 8k chunks.   Larger  block
       sizes  have  less system overhead and are generally faster.  56k is the
       largest multiple of 8k smaller than 128 blocks.

       Dumps can be written in several different formats depending on the type
       of system being dumped.  In general the dump(8) command is used, but on
       Apollos the wbak(1) command is used, and on Xenix cpio(1) is used.  The
       command  needed  to restore depends on what was used.  On some servers,
       compresssion is possible in which case the dump must be uncompressed to
       restore.

       At  Berkeley  Research  and  Trading the command needed to restore most
       systems is: remsh server -n dd if=/dev/rmt/0mn ibs=112b |  /etc/restore
                         ------          ------------     ---
       -ivf -.

       Each of the different programs used to do the dumps handles restores in
       a different way.  With wbak(1) and cpio(1), the  set  of  files  to  be
       restored  must  be specified on the command line.  With restore(8), the
       set of files to be restored can be chosen interactivly (-i flag).

       Obviously, you must load the right tape before trying to  restore  from
       it.   Hopefully,  each tape will have a paper label that identifies it.
       If it doesn't or, if the label is incorrect, you can  identify  a  dump
       tapes  by copying off the first file.  The first file on each dump tape
       specifies the tape name and it  lists  which  dumps  are  going  to  be
       attempted.   If  you loose your dump tape database, you may need to use
       this method to restore it.

UTILITIES
       There are several utilities that are part of the frogbak package.  They
       are  sum covered  which  adds  up how much disk space is backed up by a
               -
       control file; recycle which marks  the  tape  as  erased;  send offsite
                                                                      -
       which  figures out which tapes are not needed to do a full restore; and
       mkblank which names a tape.

       The sum covered command is useful for partitioning  the  clients  among
              -
       several  servers  because frogbak doesn't do it for you.  As arguments,
       you must provide the names of control files.

       The mkblank command must be run to initialize blank tapes.  Tapes  must
       be  initialized  before frogbak is run.  The argument to mkblank is the
       name for the tape.  Each tape should have a unique name.   I  recommend
       that the name be a short string followed by a three digit sequence num-
       ber.  In case it isn't obvious, the tape must be in the drive when  you
       run mkblank.

       Although it is possible to just keep buying new tapes, it is not necce-
       sary.  The recycle program lets frogbak know  that  the  dumps  on  the
       recycled  tape  no  longer  exist  and that it is okay to overwrite the
       tape.  The arguments to recycle are the list of tapes  (by  name)  that
       should  be marked as recycled.  Nothing is done to the actual tape when
       it is marked recycled; the database is updated.

       It can be difficult to figure out which tapes are potentially  required
       to  do a restored.  The send offsite program will figure out what tapes
                                   -
       are not required to do a  full  restore  of  everything  (assuming,  of
       course  that  all the tapes are good).  Using, send offsite, it is easy
                                                          -
       to pick which tapes can be sent away.  It also shows you how many tapes
       it  has  been  since every system was covered by a full dump.  Only the
       last few most recent un-needed tapes are shown.

DAILY TASKS
       It is possible to run frogbak from cron(1).  However, a  labeled  blank
       or  recycled  tape  must  be put in the drive prior to running frogbak.
       Tapes which are not either labeled blank or recycled will be  rejected.

       Blank tapes are made with with the mkblank utility.  Recycled tapes are
       made with the recycle program.

       It is important that the output from frogbak be examined each day.   If
       all the dumps run at somewhat standard priorities, then you can tell if
       something has not been dumped recently because  its  priority  will  be
       off.   If  priorities  are  not  standardized,  every  failure  must be
       checked.

       There is no warning system built into frogbak.  You  have  to  be  very
       careful to watch what it does to make sure that nothing gets neglected.

EXAMPLES
       Initialize a new tape and dump to it:

              # mkblank SEQ-037
                ------- -------
              # frogbak
                -------

       Recycle an old tape and dump to it:

              # recycle SEQ-016
                ------- -------
              # frogbak
                -------

       Check to see how much disk space is being backed up:

              # sum covered control.*
                ----------- ---------

       Restore a single file from a dump(8) full dump:

              % rlogin system to be restored -l root
                ------ --------------------- -- ----
              # rsh system with tape -n mt -t /dev/rmt/0mn fsf 8
                --- ---------------- -- -- -- ------------ --- -
              # rsh system with tape -n dd if=/dev/rmt/0mn | restore -ivf -
                --- ---------------- -- -- --------------- - ------- ---- -
              Verify and Initialize tape.
              Dumped from: Sun May  2 20:02:00 1993
              Extract directories from tape
              Initialize symbol table.
              restore > ls
              2 *./        2 *../   16384  dev/  10240  etc/  18433  tmp/

              restore > cd tmp
              restore > ls
              18433  ./                18610  backup.ddout5679  18641  dump.remote
                  2 *../               18643  backup.list5679   18644  rou5688
              18434  5176              18608  bkup.log

              restore > add bkup.log
              Make node ./tmp
              restore > add dump.remote
              restore > extract
              Extract requested files
              extract file ./tmp/bkup.log
              extract file ./tmp/dump.remote
              Add links
              Set directory mode, owner, and times.
              set owner/mode for '.'? [yn] n
              restore > quit
                        ----

OPTIONS
       Frogbak supports a few options:

       -dryrun               Specifies that dumps  should  not  be  performed.
                             Instead, frogbak looks at its control file and at
                             the records files and figures out what  dumps  it
                             would  do.   All of its figuring is sent to stan-
                             dard output for debugging puposes.

       -summary              Like the -dryrun option except that just the pro-
                             posed  set of dumps is printed.  Please note that
                             the summary you get is a summary  of  what  would
                             happen  if you ran frogbak right now.  If frogbak
                             is invoked from cron(8), then it is  likely  that
                             the  actions that are reported now will not match
                             the actions that will actaully occur.

       -future amount of time
               --------------
                             Specifies that frogbak should  pretend  that  the
                             time  is  really sometime in the future.  This is
                             for  use   with   the   -summary   option.    The
                             amount of time  string  is  in the same format as
                             --------------
                             the dump periods in the control  file:  a  number
                             followed  by  the units: h, d, w, or m for hours,
                                                      -  -  -     -
                             days, weeks, or months.

       -control control file Specifies  that  control.control file  should  be
                ------------                          ------------
                             used intead of control.hostname.
                                                    ---------

CONFIGURATION
       The  real options are the configuration variables like compression must
       be specified by changing the frogbak program file itself  (frogback  is
       written in perl(1)).

       $do compress  turns compression on and off.  Compresssion is very handy
          -
       and I recommend using it when you can.   Using  it  requires  a  device
       driver  that  allows odd-sized blocks to be written to tape and the end
       of the dump.  Also, the compress(1) program that comes with most  oper-
       ating  systems is annoyingly slow.  The latest versions of compress are
       much faster and should be used.

       The $eject options controls whether the tape is ejected  after  a  suc-
       cessful dump.

       If you have installed a version of rsh(1) that allows you to specifiy a
       timeout, turn on $timeout rsh.
                                -

ENVIRONMENT
       There are no ENVIRONMENT variables that are used by the frogbak system.

PORTS
       The  frogbak  system  can be thought of as having a server and clients.
       It is not really a client-server system,  but  since  tape  drives  are
       often  on  servers  and  clients are often what is being backed up, the
       analogy holds some water.

       The server currently works with SunOS 4.*, Mach  2.6,  and  HP-UX  8.*.
       The client side currently supports:

       sunos       Sun-3, and Sun-4 running SunOS 4.*.  The dump(8) program is
                   used.

       mach        Mach 2.6 running on i386 systems.  The dump(8)  program  is
                   used.

       hp-ux       HP-UX  8.*  on  HP9000/400, HP9000/700, and HP9000/800 sys-
                   tems.  The dump(8) program is used.

       ultrix      Ultrix 3.* and 4.*  running  on  MIPS-based  systems.   The
                   dump(8) program is used.

       sony        Sony's  BSD4.3  OS  running  on  their  NEWS  systems.  The
                   dump(8) program is used.

       xenix       SCO Xenix running on a i386.  The cpio(1) program is  used.

       domain      Apollo  Domain  OS version 9.6 and above.  The wbak(1) pro-
                   gram is used.

PORTING
       The frogbak system is kinda a pain to move around.  Each of  the  files
       must be customized for each site.  Most, if not all, of the portability
       switches are in the first few lines of each file.  When  modifying  the
       frogbak  file  itself, search for uses of the various strings like Sun-
       OS, and sunos.

       Please send any portability changes back for incorporation.

OFFSITE
       It is critically important that dumps  be  stored  off-site.   Unfortu-
       antly,  frogbak does not provide any help in chosing which tapes should
       go off-site.  In fact, it makes it difficult because  each  tape  is  a
       grab-bag of what was highest priority at the time the tape was written.

BUGS
       This system is not very well  designed  or  implemented.   It  is  very
       cranky.  However it does work reliably.  The major bugs have to do with
       the design.

       The dump sequence, although pretty good,  is  not  optimal.   A  better
       sequence would be a replicated towers-of-hanoi.  The dump sequence does
       not start off smoothly  until every  system  has  been  both  full  and
       incremental dumped, frogbak does things in a somewhat odd order.

       When using frogbak, nothing prevents systems from being overlooked.

       Using  the default rsh(1) program (remsh(1) on HP-UX), it is easy for a
       system to hang the dumps.  Rsh does not have a timeout on input and  if
       the  remote  system being dumped crashes, frogbak will hang.  The solu-
       tion for this is to replace rsh(1) with  a  special  version  that  has
       timeouts.

       The  frogbak  system  is only as good as the dump program that is used.
       The BSD dump(8) program can write bogus  dumps  when  used  on  a  live
       filesystem.  This usually is not a problem because everything is dumped
       so many times.

       The /etc/dumpdates file is faked  when  using  dump(8).   Somtimes  the
       original /etc/dumpdates file is not restored and annoying email is sent
       by frogbak.

FILES
       /y/adm/dump         The top of the frogbak commands and records tree at
                           Berkeley Research and Trading.

       records/hostname    The  directory  of information about dumps of host-
              ---------                                                  -----
                           name.
                           -----

       tapes/              The directory of information about each dump  tape.

       recycled/           The  directory  of old information about tapes that
                           have been recycled.

       logs/               The directory of dump output logs.  This should  be
                           cleaned  occaisionaly  because  they  can be fairly
                           large.

       dump.remote         A script that runs on the system to be dumped.  Its
                           standard output must a dump and nothing else.

       dump.local          A  shell script that copies dump.remote to the sys-
                           tem that is going to be  dump  and  then  runs  it.
                           control.hostname The control file for hostname.
                                   --------                      --------

       backup.log.NNNN     Dump  log files for invocations of frogbak that did
                  ----
                           not complete cleanly.

       /dev/rmt/0mn        Tape device on HP-UX.

       /dev/nrst0          Tape device on SunOS.

CREDITS
       Thanks are due to Bruce Markey for figuring out how  to  tune  frogbak.
       Thanks are due to Larry Hubble for allowing a generous copyright notice
       to be applied to frogbak.

AVAILABILITY
       The copyright on this system is a bit murky.  Some work was done on  it
       on  behalf of TRW Financial Systems and they did not give me permission
       to take the changes with  me.   I  would  be  most  surprised  if  they
       objected.

       Berkeley Research and Trading has disclaimed any rights to frogbak that
       they might have.

AUTHOR
       David Muir Sharnoff    
       ----- ---- --------    ---------------------------

SEE ALSO
       dump(8), restore(8), dd(1), rsh(1), mt(1).



Edition                      May 17, 1995                       FROGBAK(1)