GrADS-DODS Server - Administrator's Guide

Table of Contents

Installation

Requirements

The GDS can run on any UNIX platform for which both Java and GrADS are available.

You will need a Java Virtual Machine (JVM) that supports Java 1.3 or higher. Enter java -version at the Unix command prompt to find out what JVM you have currently installed on your system. The Java Virtual Machine is a free download either from Sun Microsystems, or your operating system manufacturer's website.

You will also need GrADS. Because the server uses some new features in GrADS, you will need version 1.8 or higher. Handling station data and client uploads requires version 1.9 or higher. GDS version 2.0 requires GrADS 2.0.a3 or higher. The latest version of GrADS is available at the GrADS home page.

return to table of contents

Download and setup

The latest version of the GDS is available at the GDS home page as a compressed tar archive.

You do not need root user access to run the GDS. There is no build or system install process, because it is a cross-platform Java application. And any number of GDSes can be run on the same system, as long as they are configured to use different ports (see Tomcat settings).

After unpacking the archive, all you need to do is edit the configuration file, and tell the GDS where to find GrADS, by editing the <invoker> tag. If you are not using a full GrADS distribution, make sure the GrADS executable you specify is capable of opening the types of dataset you wish to use.

Next, double check with other users and/or your system administrator, to make sure that port settings for the GDS do not conflict with ports that are already in use. By default the GDS uses ports 9090 and 9095. See Tomcat settings for instructions on how to change these.

At this point, you should be able to start the server and view the example dataset.

If you plan to serve netCDF, HDF, or OPeNDAP data sets, also make sure that the GADDIR environment variable in the GDS startup shell points to the location of the GrADS supplementary data files (available from the GrADS download page). In particular, the file udunits.dat must be present in this directory, since it is needed for COARDS metadata processing. If GADDIR is not set, this error will occur:

error: can't import dataset_name; metadata extraction failed for dataset_file; couldn't open dataset_file

when the GDS tries to access any netCDF, HDF or DODS data.

return to table of contents

Putting your datasets online

Next you will want to put your data online.

First, make sure that all of your datasets are ready to be opened by GrADS. If you have COARDS-compliant NetCDF data, they are ready to go as is. Otherwise, you may need to generate some CTL and/or map files. See the GrADS documentation for more information on how GrADS works with various data formats.

Once this is done, all you need to do to put your datasets online is tell the GDS where they are, using the configuration file. This is done by adding <dataset>, <datadir>, and<datalist> tags inside the <data> tag. If you want, you can organize the way the data appear online using <mapdir> tags.

Note that the GDS does not attempt to access datasets until the first time they are requested by a client, so it may not immediately complain about unusable datasets. Before you invite others to use your server, therefore, it is a good idea to make sure that all of the datasets you are serving work properly, by opening them in your own OPeNDAP-enabled client.

Once you have your data loaded and working, it is highly recommended that you familiarize yourself with the configuration options and administrative tools available by reading the remainder of this documentation.

return to table of contents

Serving real-time or frequently updated data

An important warning for serving changing collections of data, such as real-time observations:

Always post new or modified data under new handles. Never modify the contents of an existing dataset.

Many OPeNDAP clients work on the assumption that the dataset they are making requests from will not change. If it does, they may behave erratically, or worse, return the wrong data values to the user. The user may not even realize this has happened.

For example, if you add new model run data daily, use an absolute date, eg. 01mar2003, in the data file names. Once you have posted the data for March 1, 2003, you can leave it as is until you take it offline or move it to another archive; the next day's data can be posted as 02mar2003. Then, anybody using the March 1data will be unaffected when you post your March 2 data.

By contrast, if you were to use relative times - e.g., post March 1 data as data.today, and then at some point update data.today to point to March 2 data instead - anyone working with the dataset when you switched it would either suddenly receive error messages without explanation, or worse, would start receiving the data for March 2, believing it was for March 1. The OPeNDAP system has no way of indicating to the client that the contents of data.today have moved to data.yesterday, and data.today has been replaced with an entirely different dataset.

The one exception is that it is safe to simply extend the time dimension of a dataset, as long as you keep the origin (t=0) time of the dataset the same. This will not cause any problems for clients unaware of the change, as all data requests in the older time ranges will still return the same data as before the change.

return to table of contents

Controlling the server

Starting and stopping

There are four scripts in the server home directory that are used to control the GDS, which send brief messages to the terminal, and record their actions in more detail in the file log/console.out.

startserver - Starts a background task, which runs GDS, and will respawn it if the process dies, until stopserver is run.

stopserver - Shuts down the GDS, preventing the process from respawning.

rebootserver - Restarts the GDS.

cleanup - Restarts the GDS, clearing all temporary data such as cached metadata, analysis results, and blocked IP addresses.

These now invoke the corresponding scripts in the bin directory, which have been completely rewritten:

bin/gds-start.sh bin/gds-stop.sh bin/gds-cleanup.sh

The behavior of these scripts is similar to the old startup scripts, but there are some major improvements.

Firstly, they can now be run from any directory; it is not necessary to cd to the GDS home directory first.

Second, to reduce the chance of downtime, gds-start.sh creates a process called gds-respawn.sh, which stays running, and attempts to restart Tomcat if it dies unexpectedly. To test this, you can kill the Tomcat process with a KILL signal - gds-respawn.sh will think it has crashed, and attempt to restart it. Note that this does not apply to the TERM signal, which should give Tomcat a chance to shut itself down properly.

Third, gds-start.sh now waits until the GDS is ready to handle requests before exiting, and exits with a non-zero return code if the server fails to start. This allows reliable scripting of follow-on commands which require the GDS to be running, without the need for kludges such as sleepcommands or retrieving test pages with a utility like curl.

return to table of contents

Web-based administration

The GDS has a web-based administration interface which is accessed by URLs of the form:

http://localhost:9090/dods/admin?auth=authorization_string&cmd=command_string

The authorization_string given must match the auth setting given in the <service-admin> configuration tag. The command_string can be one of the following:

reload: Checks for changes to the server configuration. This updates the data catalog, privilege sets, and all other settings contained in the GDS configuration file, without the need to take the server offline.

clear: Removes all temporary entries from the catalog.

The authorization string can be kept from appearing in the log files by using POST rather than GET requests to perform administration tasks. There is a form which can be used to do this, at :

http://machinename:9090/admin.html

Alternately, a utility such as curl can be configured to send the appropriate POST data, which should consist of the portion of the URL that comes after the ?, i.e.:

auth=authorization_string&cmd=command_string

For convenience, there is a script called gds-reload.sh in the utils directory, which will extract the authorization string from gds.xml, and make a POST request using curl, to trigger a configuration update. This script will run as is on many platforms, and should be easy to customize if needed.

By default, the administration utility can only be invoked through the local network interface; that is, it will only accept requests originating from IP address 127.0.0.1. Additional IP addresses can be allowed to make administration requests by using the admin_enabled attribute of the <privilege> tag.

Because they alter the state of internal structures, administration commands can only run while the server is idle. The administration service has a timeout attribute which controls the length of time it will wait for the server to become idle. If this time expires, it will return with an error message, and the command will not be performed.

Except under very heavy loads, this should not be an issue, as even a momentary idle is sufficient. However, on certain systems, it appears that normal data requests can occasionally get "hung" in the middle of transmission, and never finish. This will prevent administrative commands from running, because the server can't consider itself to be idle until all outstanding requests are complete. The cause for this problem is under investigation and it will hopefully be resolved soon; however, in the meantime, it is advisable when scripting the administrative command, to include a fallback to a restart of the server, should the administrative command time out more than once.

return to table of contents

Checking the server status

The script check_gds, found in the servers' home directory, can be used to notify the administrator by email and automatically restart a GDS if it goes offline. This script requires the lynx web browser, or some other command-line utility that can download documents via HTTP.

To use the script, edit it to use the base URL and home directory of the server it is to check, and add it as a cron job with the desired frequency. Once in the crontab, the script can be temporarily disabled by placing a file called block_check_gds in the home directory of the GDS being checked.

You can also use the ps command to check on the server processes. The file temp/tomcat.pid contains the server's current process ID, if any.

return to table of contents

Configuration

Startup parameters

The GDS startup shell uses the values of several environment variables if present. These are:

JAVA - The command GDS should use to start the Java Virtual Machine

JAVA_HOME - The location of a complete Java installation. If set, GDS will use the command $JAVA_HOME/bin/java to start the Java Virtual Machine.

If JAVA is set, JAVA_HOME does nothing. If neither JAVA nor JAVA_HOME is set, GDS will simply use the command java to start the Java Virtual Machine, assuming that the executable is in the system path.

JAVA_OPTS - Arguments to use when starting the Java Virtual Machine for Tomcat. If not set, the default is -server.

ANAGRAM_HOME - Location of the GDS support files(Anagram is the generic framework used to implement the GDS). By defaults this is the directory above where gds-start.sh is located. GDS expects the support files in scripts/ and bin/ to be present, and must be able to write to the temp/ and log/ subdirectories as well.

ANAGRAM_CONFIG - Location of the XML configuration file that GDS should use, relative to $ANAGRAM_HOME. By default this is gds.xml

return to table of contents

Configuration file settings

The GDS derives settings for all of its modules from an XML configuration file. The name of this file is specified at startup; the default is gds.xml. This file is read every time the server starts, and every time a reload command is given using the administrative web interface.

The configuration file reference describes the XML tags that can be used in this file.

return to table of contents

Tomcat settings

Note that the pre-configured Tomcat bundled with the GDS is now version 4.1, and is located in the directory tomcat4/, rather than tomcat/.

The configuration file for Tomcat is tomcat4/conf/server.xml. This file can be edited to change the network port that the GDS runs on, and to increase the size of the connection pool, among other things.

Warning: if two or more Tomcat servers on the same machine are trying to use the same port settings, the startup and shutdown commands will affect both servers unpredictably, and only one of them will actually be able to handle requests.

If your server is shutting down unexpectedly, not starting up, or not responding to the "stopserver" script, check whether you or another user are trying to run a second Tomcat server, with the same port settings.

An explanation of Tomcat and the settings in its configuration file can be found at the Tomcat home page

return to table of contents

Security

Both Tomcat and the GrADS-DODS Server support IP-address-based security. Each can be given its own security settings. Tomcat will allow or deny access to the server based on the settings in its configuration files. For finer grained control, use the <ip_range> and <privilege> tags in the GDS configuration file. These settings allow or deny access to specific datasets and server features.

return to table of contents

Advanced Topics

Static web pages

The GDS comes with some static web pages which can be accessed using Tomcat (including this manual). These pages are located in the subdirectory tomcat4/webapps/ROOT. In general, putting your static content on a separate web server is recommended, since will usually be more efficient than using Tomcat for this. However, Tomcat's static content directory provides an easy way to put up some information about the datasets you are serving, without setting up a separate web server.

return to table of contents

Integrating with Apache

The GDS can fairly easily be integrated into an existing Apache web site. All that is required is to set up the link between Tomcat and Apache. Consult the Tomcat home page for more on this process.

return to table of contents

Deploying to a different servlet container

It is not necessary to use the copy of Tomcat that is distributed with the GDS, to run the GDS. Any servlet container that supports the Java Servlet API 2.2 or higher can be used.

The GDS web application is in the directory tomcat/webapps/dods. You can generate a Web Application archive (WAR) file by changing to src/ and running the makewar script . This will generate an archive called dods.war. To add the WAR to your servlet container, follow the servlet container's instructions.

Once you have added it, you will need to set the property anagram.home to the directory containing the rest of the GDS distribution. This can be set either as a Java system property (using the -d switch when invoking Java) or as a servlet context property (follow the servlet container's instructions).

return to table of contents

Building from source

Source code is included with the GDS, under the path src/.

To generate Java documentation from the source code, change to src/ and run the makedoc script.The documentation will be placed in src/doc/.
To recompile the source code, change to src/ and run the makejar script.

return to table of contents