GrADS-DODS Server - User's Guide

Table of Contents


Accessing data from a web browser

Browsing server contents

To browse a directory of the datasets being served on a GDS, point your web browser to the base URL of the GDS. This will usually be a URL of the form http://machine.domain:9090/dods/.

This directory listing will provide links to "info", "dds" and "das" for each dataset. The first link provides a Web page with a brief summary, followed by a complete metadata listing, for the dataset. The other two provide links to the DODS Data Descriptor Structure, which specifies the logical structure of the dataset, and Data Attribute Structure, which provides descriptive information about the dataset.

You can also retrieve a complete dataset listing for a GDS by adding /xml to its base URL.

If you are given a GDS dataset URL, you can enter that URL in your web browser, and get the "info" listing. This listing will contain links back to the dataset directory for the GDS.

Note: Many OPeNDAP data objects with distinct URLs will often be considered a single "dataset" from a scientific point of view. However, the word "dataset" is used here in a technical sense, to mean a single OPeNDAP data object.

back to table of contents

Retrieving data subsets as ASCII text

The GDS can provide subsets of any dataset it is serving, in ASCII comma-delimited format. To retrieve a subset, enter a URL of the form http://gds-base-url/dataset.ascii?constraint.

The constraint portion of the URL should be an OPeNDAP constraint expression. Some basic constraints:

A constraint of the form var will request the complete contents of the variable.

A constraint of the form var[a:b] will return the subset of the variable defined by a and b.

A constraint of the form var[a:n:b] will return every n th element of the subset defined by a and b.

For subsets of variables with multiple dimensions, each dimension must have a constraint. So a constraint for a subset of a three-dimensional variable would appear as var[a1:b1][a2:b2][a3:b3], or var[a1:n1:b1][a2:n2:b2][a3:n3:b3]

back to table of contents


Accessing data with an OPeNDAP-enabled analysis tool

Opening a dataset and retrieving data subsets

You can retrieve data from a GrADS Data Server using any OPeNDAP-enabled desktop analysis tool (aka "client") such as GrADS, Ferret, Matlab, or IDL. To do this, provide a URL instead of a path name to your client's open command. You can then use the data as if it were a local data set; the client will automatically retrieve data as needed. You may want to also read the notes below on optimizing your scripts to use remote data.

back to table of contents

Performing remote analysis

Remote analysis is very useful for doing calculations on remote data that use a large quantity of input data but generate a small output, such as averaging and correlation functions. This type of calculation will run much faster on the server, and you will only need to download the small result instead of the entire set of inputs.

In order to do data analysis on the server, you construct a URL containing a GrADS expression, and then open that URL with your client. The server will perform the analysis task, and return the results to the client as a DODS dataset with a single variable called result, containing the results. This result dataset can then be used exactly as it if were an original data set (see above). It can even be used as input to further analysis expressions on the server, allowing calculations that use multiple stages of intermediate results to be performed remotely.

The URL for an analysis operation is created by appending _expr_ to the server's base URL, as follows:
http://machine_name:9090/dods/_expr_
followed immediately by three sets of curly braces containing arguments, as follows:

{dataset1,dataset2,...}{expression}{x1:x2,y1:y2,z1:z2,t1:t2}

The first set of curly braces contains a list of all the datasets on the server that are used in the GrADS expression. If the datasets are in a subdirectory, the name of the subdirectory should be included in the dataset name.

Source datasets can include the results of previous analysis expressions, allowing you to perform multi-stage calculations. To use a previous analysis result as a source, put its shorthand name in the list of datasets. The shorthand name for a result dataset is contained in the dataset's title attribute (in GrADS you can view this by typing q file), and has the form _expr_nnnn where nnnn will be some number.

The second set of curly braces contains the GrADS expression to be evaluated. This describes the actual calculation to be performed, using GrADS syntax (see the GrADS home page).

The third set of curly braces contains the boundaries for the expression evaluation in world coordinates (latitude, longitude, elevation, time). These boundaries may not vary in more than three of the four dimensions. The first three coordinate pairs should be given as real numbers. The last pair are time coordinates, and should be in the format recognized by the set time command in GrADS: [hh[mm]z][dd][mon][yyyy]. For example, 0z1jan2000.

Specifically, the analysis is performed as follows:

1. GrADS is invoked.
2. The source datasets are opened in the order they are listed in the first set of curly braces
3. The dimension environment is set according to the parameters in the third set of curly braces.
4. The expression in the second set of curly braces is evaluated and saved as a new dataset.

Thus, a variable in the nth listed dataset should be referred to as var_name.n in the analysis expression. For instance, if dataset2 contains a variable called foo, this variable should be referred to in the expression as foo.2. The expression will be evaluated against the grid of the first dataset opened.

Here are some examples of remote analysis, using GrADS as a client. Please note that for clarity and avoidance of strange browser behavior, the URLs in the following examples have been split into more than one line because they are so long, but they should be entered as one line.

  1. Global Averaging:
    The following expression will return a timeseries of globally-averaged monthly mean 500mb geopotential height based on NCEP reanalysis data being served on the COLA GDS at the Climate Diagnostic Center (http://monsoondata.org:9090/):

    ga-> sdfopen http://monsoondata.org:9090/dods/_expr_{rean3d}
    {tloop(aave(z,global))}{0:0,0:0,500:500,jan1948:dec2000}


  2. Variable Comparison:
    A GDS running at NCAR (http://dataportal.ucar.edu:9191/) is distributing a set of ensemble members from the "Climate of the 20th Century" runs of the COLA atmospheric general circulation model. We will compare the relative humidity "rh" from the first two datasets, namely "C20C_A" and "C20C_B". Suppose we want to find a global time-average of their difference at the 1000 mb level in 1960. Using GrADS as our client, we would open the following URL:

    ga-> sdfopen http://dataportal.ucar.edu:9191/dods/_expr_{/C20C/C20C_A,/C20C/C20C_B}
    {ave((rh.1-rh.2),time=1jan1960,time=1dec1960)}
    {0:360,-90:90,1000:1000,1nov1976:1nov1976}
    ga-> display result


    The analysis results are returned in the variable "result" in the opened dataset. Note that the world coordinate boundaries specified in the third set of curly braces fix the time to 1nov1976 -- this can be set to any arbitrary time because the time dimension specification is overridden by the GrADS expression which tells the server to average over the period from January 1960 to December 1960.

  3. A More Complex Analysis Operation:
    Suppose you wanted to calculate the mean 500mb height anomaly associated with warm tropical SST anomalies. Use the Reynolds SST Analyses to create a time series of the area-averaged SST anomaly between 180 and 90W and 10N and 10S. An "ENSO" mask is then defined for SST anomalies greater than 1 degree. Using this mask, calculate a mean 500mb height from the the NCEP/NCAR Reanalysis Data associated with the warm SST anomalies. All these operations are packaged into a single URL:

    ga-> sdfopen http://monsoondata.org:9090/dods/_expr_{ssta,z5a}
    {tmave(const(maskout(aave(ssta.1,lon=-180,lon=-90,lat=-10,lat=10),
    aave(ssta.1,lon=-180,lon=-90,lat=-10,lat=10)-1.0),1),
    z5a.2(lev=500),t=1, t=600)}
    {0:360,0:90,500:500,jan1950:jan1950}


    The GrADS script sstmask.gs illustrates the use of this example and contains some additional graphics commands to display the analysis result.

back to table of contents

Uploading data

The GDS also allows the client to upload data that can then be used as a source in analysis expressions. This capability is still in under development.

back to table of contents


Using remote data in scripts

You do not have to do anything special to adapt a script to work with remote data. All you need to do is replace local filenames with URLs. This is because from your client program's point of view, a remote dataset behaves exactly like a local dataset except that access is slower.

However, because remote data retrieval is not instantaneous, existing scripts that do not take this into account may run very slowly. Thus it is often desirable to modify the script to improve its efficiency.

The key to writing efficient scripts is fine-tuning your use of I/O requests. DODS-enabled clients such as GrADS only provides the illusion of a continuous connection with a remote dataset. In fact, a new connection is made to the server every time you request data from the I/O layer (for instance by using the "display" command). The speed of these connections is dependent on network latency and server response time, but is generally much slower than an equivalent request from a local disk. Thus, reducing the number of network connections, and the quantity of data sent over the network, will often significantly speed up your script.

Here are some guidelines for writing efficient scripts. The examples given use the GrADS scripting language, but the principles apply to most DODS-enabled clients:

  • Avoid multiple opens. Opening a GDS data file generates as many as eight separate network requests, so try to avoid opening the same file more than once.

  • Store remote data locally if you plan to reuse it. DODS has a limited ability to cache remote data locally, the way a web browser does with web pages. However, this only works when you request the exact same subset. Thus, if you use different parts of the same remote data subset in multiple places in your script, you are actually requesting it multiple times over the network.

    To avoid this, request data once from the server, and then store it in local memory or on disk. In GrADS, you can do this using 'define' or 'set gxout fwrite'. For example:

      'sdfopen http://monsoondata.org:9090/dods/model'
      'set lat 22 52'
      'set lon 233 295'
      'set t 1 5'
      'define psfc = ps/100'
      'd psfc'

    Now you can use the variable 'psfc' as many times as you wish in your script without any additional network requests.

    Note to GrADS users: The 'define' command automatically loops through each time step in the dimension environment, so using 'define' may not always improve your performance if you are accessing time series at a single point. For example:

       'set lon -90'
      'set lat 40'
      'set lev 500'
      'set t 1 15'
      'define ztser = z'


    The above example will result in 15 separate requests for data from the server, one for each time. Each request will only obtain a single data value! If time is the only varying dimension, it is far better to display the data using the 'display' command (which doesn't automatically loop through time) or, if you're going to display the data more than once, use 'set gxout fwrite' to preserve a local copy.

    For a more complex example, take a look at the GrADS script meteogram.gs, which uses 'define' and the 'set gxout fwrite' commands to draw a graphically detailed meteogram for any location in the forecast model domain.

  • Evaluate expressions on the server side when appropriate. It may save time and server resources to package your request into an analysis expression. A good rule of thumb is to use analysis expressions when the size of the result data set is smaller than the total size of the input data, e.g. when doing spatial or time averaging.

    For example, the following script example opens two separate MRF forecast data files and then uses the 'const' function to merge one variable from each of them to form one continuous time series:

       'sdfopen http://monsoondata.org:9090/dods/gfs.2002010800'
      'sdfopen http://monsoondata.org:9090/dods/gfs.2002010800b'
      'set lat 0'
      'set lon 0'
      'set t 1 31'
      'define tt = const(t.1,0,-u) + const(t.2,0,-u)'
      'd tt'


    This second version of the script example creates the same continuous time series using an analysis expression. This script runs three times faster than the first version, and hits the server half as many times.

      baseurl    = 'http://monsoondata.org:9090/dods/_expr_'
      datasets   = '{gfs.2002010800,gfs.2002010800b}'
      expression = '{const(t.1,0,-u)+const(t.2,0,-u)}'
      dimensions = '{0:0,0:0,1000:1000,00Z08JAN2002:00Z23JAN2002}'
      'sdfopen '%baseurl%datasets%expression%dimensions
      'set t 1 31'
      'define tt = result.1'
      'd tt'

    Note however, that there is an overhead on the server associated with each analysis expression. Thus, if the size of the expression output is the same as, or larger than, its inputs, it will be more efficient to retrieve the inputs first, and do the analysis locally.

  • Try to move data requests outside loops. If you are looping over a grid of data points, when possible you should retrieve the whole area you intend to use with a single request, and store it locally for use in the loop. Otherwise you will be making a new network request for each data point inside the loop, which can cause extremely slow performance.

back to table of contents