Instructions for Setting up a Departmental Broker
With the Harvest system
you can set up a nicely customized departmental Broker. See the University
of Colorado Computer Science Department's Broker (including the
example queries there) for an example.
If you would like to set up a Broker like this, follow these steps:
-
Retrieve the
Harvest software
and follow the
instructions
for setting up a basic Harvest Gatherer and Broker.
When you are asked to fill in the URLs for the Gatherer,
enter a blank line followed by a '.';
when you are later asked if you want to edit the Gatherer's
workload specification, enter
yes
. From
the text editor build the list of URLs for your departmental information
services you would like to index. For an example, see the University of
Colorado Computer Science Department's
Gatherer configuration file.
The Local-Mappings and fancy RootNodes specifications
(see
example RootNode filters)
used in this Broker enhance efficiency
and control, but you can start with a simple list of a few
FTP/Gopher/HTTP/News RootNodes. If you want to learn about the fancier
features, check the Harvest
User's Manual.
At this point, you have installed a ``stock'' Gatherer and Broker, which
you can use to perform queries against a distributed collection of
documents in many different formats. You can customize the Broker
further with search scoping checkboxes that automatically ``AND''
user queries with particular attributes. The University of Colorado
Computer Science Department's Broker allows this. To add this level of
customization to your Broker, continue with the following steps.
-
After you have installed the ``stock'' Gatherer and Broker, retrieve and
unpack the
departmental Broker customization package.
-
Replace the ``stock'' files that were installed in Step 1 with those
found in the departmental Broker customization package, as follows:
-
In your broker directory, rename the stock query.html to
oldquery.html, and then copy the DeptQuery.html from the
customization package to query.html. This is the main query
page. Because these are normally generated automatically, you will need
to edit the new query.html so that it corresponds to your site.
In particular, you'll need to update the the hostnames and port numbers
to correspond to the Broker you set up in step 1 (search for the strings
``ACTION'' and ``NAME="host"''), as well as the document title and
headings. To figure out what should go in the ACTION and NAME strings,
search for thse strings in oldquery.html and then update
query.html accordingly.
-
Copy the sample-queries.html file from the customization
package into the broker directory also. This is a list of
sample, hard-coded queries. You should update this list to make sense
for your department. You will need to edit both the displayed query
strings and the HREF values inside the HTML anchor tags.
-
Copy DeptQuery.pl.cgi from the customization package into your
$HARVEST_HOME/cgi-bin directory. This is simply an enhanced
version of the BrokerQuery.pl.cgi program. We added another CGI
script variable ('scope') and changed the query string based on its
value.
-
Copy DeptQuery.cf from the customization package into your
$HARVEST_HOME/cgi-bin/lib directory. Make sure this file is
readable by your httpd server. You may name this file something other
than DeptQuery.cf but if you do, you must also change the name in
DeptQuery.html.
-
At this point, you can use your Broker. However, if you want to make it
really fancy, you can customize the Gatherer to apply site-specific
heuristics, so that it generates manual indexing annotations based on local
naming conventions. For example, this is what allows users to perform
queries such as
(group: "numerical computation") and (classification: faculty)
at the University of Colorado Computer Science Department's Broker.
To do this,
we wrote two scripts (AddAnnotations and
staff-SOIF; they are included in the departmental Broker
customization package) to generate manual annotations of documents based on
the University of Colorado CS department's file naming conventions.
AddAnnotations is a driver script, and calls staff-SOIF to
apply our local conventions to generate extra annotations. We have
included these scripts as an example of how you can customize your
departmental Broker. Obviously you'll need to modify the scripts to
handle your site's conventions if you want to use them. For details about
the SOIF manipulation programs used by these scripts, see the Harvest User's
Manual.