NAME
    Wais - access to freeWAIS-sf libraries

SYNOPSIS
    `use Wais;'

DESCRIPTION
    The interface is divided in four major parts.

    SFgate 4.0
              For backward compatibility the functions used in SFgate up to
              version 4 are still present. Their use is deprecated and they
              are not documented here. These functions may no be supported
              in following versions of this module.

    Protocol  XS functions which provide a low-level access to the WAIS
              protocol. E.g. `generate_search_apdu()' constructs a request
              message.

    SFgate 5.0
              Perl functions that implement high-level access to WAIS
              servers. E.g. parallel searching is supported.

    dictionary
              A bunch of XS functions useful for inspecting local databases.

    We will start with the SFgate 5.0 functions.

USAGE
    The main high-level interface are the functions `Wais::Search' and
    `Wais::Retrieve'. Both return a reference to an object of the class
    `Wais::Result'.

  Wais::Search

    Arguments of `Wais::Search' are hash references, one for each database
    to search. The keys of the hashes should be:

    query     The query to submit.

    database  The database which should be searched.

    host      host is optional. It defaults to `'localhost''.

    port      port is optional. It defaults to `210'.

    tag       A tag by which individual results can be associated to a
              database/host/port triple. If omitted defaults to the database
              name.

    relevant  If present must be a reference to an array containing alternating
              document id's and types. Document id's must be of type
              `Wais:Docid'.

              Here is a complete example:

                   $result = Wais::Search({'query'    => 'pfeifer', 
                                           'database' => $db1, 
                                           'host'     => 'ls6',
                                           'relevant' => [$id, 'TEXT']},
                                          {'query'    => 'pfeifer', 
                                           'database' => $db2});

              If *host* is `'localhost'' and *database*`.src' exists, local
              search is performed instead of connecting a server.

              `Wais::Search' will open `$Wais::maxnumfd' connections in
              parallel at most.

  Wais::Retrieve

              `Wais::Retrieve' should be called with named parameters (i.e.
              a hash). Valid parameters are database, host, port, docid, and
              type.

                      $result = Wais::Retrieve('database' => $db,
                                               'docid'    => $id, 
                                               'host'     => 'ls6',
                                               'type'     => 'TEXT');

              Defaults are the same as for `Wais::Search'. In addition type
              defaults to `'TEXT''.

  `Wais:Result'

              The functions `Wais::Search' and `Wais::Retrieve' return
              references to objects blessed into `Wais:Result'. The
              following methods are available:

    diagnostics         Returns and array of diagnostic messages. Each element
                        (if any) is a reference to an array consisting of

              tag                      The tag of the corresponding search
                                       request or `'document'' if the
                                       request was a retrieve request.

              code                     The WAIS diagnostic code.

              message                  A textual diagnostic message.

    header              Returns and array of WAIS document headers. Each element
                        (if any) is a reference to an array consisting of

              tag                      The tag of the corresponding search
                                       request or `'document'' if the
                                       request was a retrieve request.

              score
              lines                    Length of the corresponding dcoument in
                                       lines.

              length                   Length of the corresponding document in
                                       bytes.

              headline
              types                    A reference to an array of types valid
                                       for docid.

              docid                    A reference to the WAIS identifier
                                       blessed into `Wais::Docid'.

    text                Returns the text fetched by `Wais::Retrieve'.

Dictionary
              There are a couple of functions to inspect local databases.
              See the inspect script in the distribution. You need the
              Curses module to run it. Also adapt the directory settings in
              the top part.

  Wais::dictionary

                     %frequency = Wais::dictionary($database);
                     %frequency = Wais::dictionary($database, $field);
                     %frequency = Wais::dictionary($database, 'foo*');
                     %frequency = Wais::dictionary($database,  $field, 'foo*');

              The function returns an array containing alternating the
              matching words in the global or field dictionary matching the
              prefix if given and the freqence of the preceding word. In a
              sclar context, the number of matching word is returned.

  Wais::list_offset

              The function takes the same arguments as Wais::dictionary. It
              returns the same array rsp. wordcount with the word
              frequencies replaced by the offset of the postinglist in the
              inverted file.

  Wais::postings

                     %postings = Wais::postings($database, 'foo');
                     %postings = Wais::postings($database, $field, 'foo');

              Returns and an array containing alternating numeric document
              id's and a reference to an array whichs first element is the
              internal weight if the word with respect to the document. The
              other elements are the word/character positions of the
              occurances of the word in the document. If freeWAIS-sf is
              compiled with `-DPROXIMITY', word positions are returned
              otherwise character postitions.

              In an scalar context the number of occurances of the word is
              returned.

  Wais::headline

                     $headline = Wais::headline($database, $docid);

              The function retrieves the headline (only the text!) of the
              document numbered `$docid'.

  Wais::document

                     $text = &Wais::document($database, $docid);

              The function retrieves the text of the document numbered
              `$docid'.

Protocol
  Wais::generate_search_apdu

                     $apdu = Wais::generate_search_apdu($query,$database);
                     $relevant = [$id1, 'TEXT', $id2, 'HTML'];
                     $apdu = Wais::generate_search_apdu($query,$database,$relevant);

              Document id's must be of type `WAIS::Docid' as returned by
              `Wais::Result::header' or Wais::Search::header. $WAIS::maxdoc
              may be set to modify the number of documents to retrieve.

  Wais::generate_retrieval_apdu

                     $apdu = Wais::generate_retrieval_apdu($database, $docid, $type);
                     $apdu = Wais::generate_retrieval_apdu($database, $docid, 
                                                           $type, $chunk);

              Request to send the `$chunk''s chunk of the document whichs id
              is `$docid' (must be of type `WAIS::Docid'). $chunk defaults
              to `0'. $Wais::CHARS_PER_PAGE may be set to influence the
              chunk size.

  Wais::local_answer

                     $answer = Wais::local_answer($apdu);

              Answer the request by local search/retrieval. The message
              header is stripped from the result for convenience (see the
              code of `Wais::Search' rsp. documentaion of Wais::Search::new
              below).

  Wais::Search::new

                     $result = Wais::Search::new($message);

              Turn the result message in an object of type `Wais::Search'.
              The following methods are available: diagnostics, header, and
              text. Result of the message is pretty the same as for
              `Wais::Result'. Just the tags are missing.

  Wais::Docid::new

                     $result = new Wais::Docid($distserver, $distdb, $distid,
                                   $copyright,  $origserver, $origdb, $origid);

              Only the first four arguments are manatory.

  Wais::Docid::split

                     ($distserver, $distdb, $distid, $copyright, $origserver, 
                      $origdb, $origid) = Wais::Docid::split($result);
                     ($distserver, $distdb, $distid) = Wais::Docid::split($result);
                     ($distserver, $distdb, $distid) = $result->split;

              The inverse of `Wais::Docid::new' =over 10

    diagnostics
              Return an array of references to `[$code, $message]'

    header    Return an array of references to `[$score, $lines, $length,
              $headline, $types, $docid]'.

    text      Returns the chunk of the document requested. For documents larger
              than $Wais::CHARS_PER_PAGE more than one request must be send.

  Wais::Search::DESTROY

    The objects will be destroyed by Perl.

VARIABLES
    $Wais::version
              Generated by: `sprintf(buf, "Wais %3.1f%d", VERSION,
              PATCHLEVEL);'

    $Wais:errmsg
              Set to an verbose error message if something went wrong. Most
              functions return `undef' on failure after setting
              `$Wais:errmsg'.

    $Wais::maxdoc
              Maximum number of hits to return when searching. Defaults to
              `40'.

    $Wais::CHARS_PER_PAGE
              Maximum number of bytes to retrieve in a single retrieve
              request. `Wais:Retrieve' sends multiple requests if necessary
              to retrieve a document. `CHARS_PER_PAGE' defaults to `4096'.

    $Wais::timeout
              Number of seconds to wait for an answer from remote servers.
              Defaults to 120.

    $Wais::maxnumfd
              Maximum number of file descriptors to use simultaneously in
              `Wais::Search'. Defaults to `10'.

Access to the basic freeWAIS-sf reduction functions
    Wais::Type::stemmer(*word*)
    reduces *word* using the well know Porter algorithm.

      AU: Porter, M.F.
      TI: An Algorithm for Suffix Stripping
      JT: Program
      VO: 14
      PP: 130-137
      PY: 1980
      PM: JUL

    Wais::Type::soundex(*word*)
    computes the 4 byte Soundex code for *word*.

      AU: Gadd, T.N.
      TI: 'Fisching for Werds'. Phonetic Retrieval of written text in
          Information Retrieval Systems
      JT: Program
      VO: 22
      NO: 3
      PP: 222-237
      PY: 1988

    Wais::Type::phonix(*word*)
    computes the 8 byte Phonix code for *word*.

      AU: Gadd, T.N.
      TI: PHONIX: The Algorithm
      JT: Program
      VO: 24
      NO: 4
      PP: 363-366
      PY: 1990
      PM: OCT

BUGS
    `Wais::Search' currently splits the request in groups of
    `$Wais::maxnumfd' requests. Since some requests of the group might be
    local and/or some might refer to the same host/port, groups may not use
    all `$Wais::maxnumfd' possible file descriptors. Therefore some
    performance my be lost when more than `$Wais::maxnumfd' requests are
    processed.

AUTHOR
    Ulrich Pfeifer <pfeifer@ls6.informatik.uni-dortmund.de>