Table of Contents
Accessing data from a web browser
Browsing server contents
To browse a directory of the datasets being served on a GDS, point your
web browser to the base URL of the GDS. This will usually be a URL of
the form http://machine.domain:9090/dods/ .
This directory listing will provide links to "info", "dds"
and "das" for each dataset. The first link provides a Web page
with a brief summary, followed by a complete metadata listing, for the
dataset. The other two provide links to the DODS Data Descriptor Structure,
which specifies the logical structure of the dataset, and Data Attribute
Structure, which provides descriptive information about the dataset.
You can also retrieve a complete dataset listing for a GDS by adding
/xml to its base URL.
If you are given a GDS dataset URL, you can enter that URL in your web
browser, and get the "info" listing. This listing will contain
links back to the dataset directory for the GDS.
Note: Many OPeNDAP data objects with distinct URLs will often be considered
a single "dataset" from a scientific point of view. However,
the word "dataset" is used here in a technical sense, to mean
a single OPeNDAP data object.
back to table of contents
Retrieving data subsets as ASCII text
The GDS can provide subsets of any dataset it is serving, in ASCII comma-delimited
format. To retrieve a subset, enter a URL of the form http://gds-base-url/dataset.ascii?constraint .
The constraint portion of the URL should be an OPeNDAP constraint
expression. Some basic constraints:
A constraint of the form var will request the complete contents
of the variable.
A constraint of the form var[a:b] will return the subset
of the variable defined by a and b .
A constraint of the form var[a:n:b] will return every n
th element of the subset defined by a and b .
For subsets of variables with multiple dimensions, each dimension must
have a constraint. So a constraint for a subset of a three-dimensional
variable would appear as var[a1:b1][a2:b2][a3:b3] , or
var[a1:n1:b1][a2:n2:b2][a3:n3:b3]
back to table of contents
Accessing data with an OPeNDAP-enabled analysis tool
Opening a dataset and retrieving data subsets
You can retrieve data from a GrADS Data Server using any OPeNDAP-enabled
desktop analysis tool (aka "client") such as GrADS,
Ferret,
Matlab,
or IDL.
To do this, provide a URL instead of a path name to your client's open
command. You can then use the data as if it were a local data set; the
client will automatically retrieve data as needed. You may want to also
read the notes below on optimizing your scripts to use
remote data.
back to table of contents
Performing remote analysis
Remote analysis is very useful for doing calculations on remote data
that use a large quantity of input data but generate a small output, such
as averaging and correlation functions. This type of calculation will
run much faster on the server, and you will only need to download the
small result instead of the entire set of inputs.
In order to do data analysis on the server, you construct a URL containing
a GrADS expression, and then open that URL with your client. The server
will perform the analysis task, and return the results to the client as
a DODS dataset with a single variable called result , containing
the results. This result dataset can then be used exactly as it if were
an original data set (see above). It can even be used as input to further
analysis expressions on the server, allowing calculations that use multiple
stages of intermediate results to be performed remotely.
The URL for an analysis operation is created by appending _expr_
to the server's base URL, as follows:
http://machine_name:9090/dods/_expr_
followed immediately
by three sets of curly braces containing arguments, as follows:
{dataset1,dataset2,...}{expression}{x1:x2,y1:y2,z1:z2,t1:t2}
The first set of curly braces contains a list of all the datasets on
the server that are used in the GrADS expression. If the datasets are
in a subdirectory, the name of the subdirectory should be included in
the dataset name.
Source datasets can include the results of previous analysis expressions,
allowing you to perform multi-stage calculations. To use a previous analysis
result as a source, put its shorthand name in the list of datasets. The
shorthand name for a result dataset is contained in the dataset's title
attribute (in GrADS you can view this by typing q file ),
and has the form _expr_nnnn where nnnn will
be some number.
The second set of curly braces contains the GrADS expression to be evaluated.
This describes the actual calculation to be performed, using GrADS syntax
(see the GrADS home page).
The third set of curly braces contains the boundaries for the expression
evaluation in world coordinates (latitude, longitude, elevation, time).
These boundaries may not vary in more than three of the four dimensions.
The first three coordinate pairs should be given as real numbers. The
last pair are time coordinates, and should be in the format recognized
by the set time command in GrADS: [hh[mm]z][dd][mon][yyyy] .
For example, 0z1jan2000 .
Specifically, the analysis is performed as follows:
1. GrADS is invoked.
2. The source datasets are opened in the order they are listed in the
first set of curly braces
3. The dimension environment is set according to the parameters in the
third set of curly braces.
4. The expression in the second set of curly braces is evaluated and saved
as a new dataset.
Thus, a variable in the n th listed dataset should be referred
to as var_name.n in the analysis expression. For instance,
if dataset2 contains a variable called foo ,
this variable should be referred to in the expression as foo.2 .
The expression will be evaluated against the grid of the first dataset
opened.
Here are some examples of remote analysis, using GrADS as a client. Please
note that for clarity and avoidance of strange browser behavior, the URLs
in the following examples have been split into more than one line because
they are so long, but they should be entered as one line.
- Global Averaging:
The following expression will return a timeseries of globally-averaged
monthly mean 500mb geopotential height based on NCEP reanalysis data
being served on the COLA GDS at the Climate Diagnostic Center (http://monsoondata.org:9090/):
ga-> sdfopen http://monsoondata.org:9090/dods/_expr_{rean3d}
{tloop(aave(z,global))}{0:0,0:0,500:500,jan1948:dec2000}
- Variable Comparison:
A GDS running at NCAR (http://dataportal.ucar.edu:9191/)
is distributing a set of ensemble members from the "Climate of
the 20th Century" runs of the COLA atmospheric general circulation
model. We will compare the relative humidity "rh"
from the first two datasets, namely "C20C_A " and "C20C_B ".
Suppose we want to find a global time-average of their difference at
the 1000 mb level in 1960. Using GrADS as our client, we would open
the following URL:
ga-> sdfopen http://dataportal.ucar.edu:9191/dods/_expr_{/C20C/C20C_A,/C20C/C20C_B}
{ave((rh.1-rh.2),time=1jan1960,time=1dec1960)}
{0:360,-90:90,1000:1000,1nov1976:1nov1976}
ga-> display result
The analysis results are returned in the variable "result" in the opened
dataset. Note that the world coordinate boundaries specified in the
third set of curly braces fix the time to 1nov1976 -- this can be set
to any arbitrary time because the time dimension specification is overridden
by the GrADS expression which tells the server to average over the period
from January 1960 to December 1960.
- A More Complex Analysis Operation:
Suppose you wanted to calculate the mean 500mb height anomaly associated
with warm tropical SST anomalies. Use the Reynolds SST Analyses to create
a time series of the area-averaged SST anomaly between 180 and 90W and
10N and 10S. An "ENSO" mask is then defined for SST anomalies greater
than 1 degree. Using this mask, calculate a mean 500mb height from the
the NCEP/NCAR Reanalysis Data associated with the warm SST anomalies.
All these operations are packaged into a single URL:
ga-> sdfopen http://monsoondata.org:9090/dods/_expr_{ssta,z5a}
{tmave(const(maskout(aave(ssta.1,lon=-180,lon=-90,lat=-10,lat=10),
aave(ssta.1,lon=-180,lon=-90,lat=-10,lat=10)-1.0),1),
z5a.2(lev=500),t=1, t=600)}
{0:360,0:90,500:500,jan1950:jan1950}
The GrADS script sstmask.gs
illustrates the use of this example and contains some additional graphics
commands to display the analysis result.
back to table of contents
Uploading data
The GDS also allows the client to upload data that can then be used as
a source in analysis expressions. This capability is still in under development.
back to table of contents
Using remote data in scripts
You do not have to do anything special to adapt a script to work
with remote data. All you need to do is replace local filenames with URLs.
This is because from your client program's point of view, a remote dataset
behaves exactly like a local dataset except that access is slower.
However, because remote data retrieval is not instantaneous, existing
scripts that do not take this into account may run very slowly. Thus it
is often desirable to modify the script to improve its efficiency.
The key to writing efficient scripts is fine-tuning your use of I/O requests.
DODS-enabled clients such as GrADS only provides the illusion of a continuous
connection with a remote dataset. In fact, a new connection is made to
the server every time you request data from the I/O layer (for instance
by using the "display" command). The speed of these connections is dependent
on network latency and server response time, but is generally much slower
than an equivalent request from a local disk. Thus, reducing the number
of network connections, and the quantity of data sent over the network,
will often significantly speed up your script.
Here are some guidelines for writing efficient scripts. The examples
given use the GrADS scripting language, but the principles apply to most
DODS-enabled clients:
- Avoid multiple opens. Opening a GDS data file generates as
many as eight separate network requests, so try to avoid opening the
same file more than once.
- Store remote data locally if you plan to reuse it. DODS has
a limited ability to cache remote data locally, the way a web browser
does with web pages. However, this only works when you request the exact
same subset. Thus, if you use different parts of the same remote data
subset in multiple places in your script, you are actually requesting
it multiple times over the network.
To avoid this, request data once from the server, and then store it
in local memory or on disk. In GrADS, you can do this using 'define'
or 'set gxout fwrite'. For example:
'sdfopen http://monsoondata.org:9090/dods/model'
'set lat 22 52'
'set lon 233 295'
'set t 1 5'
'define psfc = ps/100'
'd psfc'
Now you can use the variable 'psfc' as many times as you wish
in your script without any additional network requests.
Note to GrADS users: The 'define' command automatically loops
through each time step in the dimension environment, so using 'define'
may not always improve your performance if you are accessing time series
at a single point. For example:
'set lon -90'
'set lat 40'
'set lev 500'
'set t 1 15'
'define ztser = z'
The above example will result in 15 separate requests for data from
the server, one for each time. Each request will only obtain a single
data value! If time is the only varying dimension, it is far better
to display the data using the 'display' command (which doesn't automatically
loop through time) or, if you're going to display the data more than
once, use 'set gxout fwrite' to preserve a local copy.
For a more complex example, take a look at the GrADS script meteogram.gs,
which uses 'define' and the 'set gxout fwrite' commands to draw a graphically
detailed meteogram for any location in the forecast model domain.
- Evaluate expressions on the server side when appropriate. It
may save time and server resources to package your request into an analysis
expression. A good rule of thumb is to use analysis expressions when
the size of the result data set is smaller than the total size of the
input data, e.g. when doing spatial or time averaging.
For example, the following script example opens two separate MRF forecast
data files and then uses the 'const' function to merge one variable
from each of them to form one continuous time series:
'sdfopen http://monsoondata.org:9090/dods/gfs.2002010800'
'sdfopen http://monsoondata.org:9090/dods/gfs.2002010800b'
'set lat 0'
'set lon 0'
'set t 1 31'
'define tt = const(t.1,0,-u) + const(t.2,0,-u)'
'd tt'
This second version of the script example creates
the same continuous time series using an analysis expression. This script
runs three times faster than the first version, and hits the server
half as many times.
baseurl = 'http://monsoondata.org:9090/dods/_expr_'
datasets = '{gfs.2002010800,gfs.2002010800b}'
expression = '{const(t.1,0,-u)+const(t.2,0,-u)}'
dimensions = '{0:0,0:0,1000:1000,00Z08JAN2002:00Z23JAN2002}'
'sdfopen '%baseurl%datasets%expression%dimensions
'set t 1 31'
'define tt = result.1'
'd tt'
Note however, that there is an overhead on the server associated with
each analysis expression. Thus, if the size of the expression output
is the same as, or larger than, its inputs, it will be more efficient
to retrieve the inputs first, and do the analysis locally.
- Try to move data requests outside loops. If you are looping
over a grid of data points, when possible you should retrieve the whole
area you intend to use with a single request, and store it locally for
use in the loop. Otherwise you will be making a new network request
for each data point inside the loop, which can cause extremely slow
performance.
back to table of contents
|