Reading NetCDF and HDF Files with GrADS

Data files in the NetCDF and HDF file formats are called self-describing files (SDF) because the data and metadata are packaged together in the same file. GrADS can read data NetCDF and HDF formatted files, as long as the data are on a regular grid. The HDF format is very general; the GrADS interface is limited to gridded data sets that fit into the internal 5-D lon/lat/lev/time/ensemble grid space. GrADS handles HDF4 Scientific Data Sets and (as of version 2.0.a7) some HDF5 files. In order to read the data in SDFs, GrADS needs a certain amount of metadata in order to place the data in the internal grid space. There are three ways to do this:

  1. Use the sdfopen command to open the file. This requires the least amount of effort for the user -- simply provide the file name (or an OPeNDAP URL) and GrADS does the rest. If you use the sdfopen command to open your SDF, then all the metadata in the file that GrADS requires must conform to the COARDS conventions. The 'sdfopen' interface does not support the HDF5 format. If sdfopen doesn't work, then ...

  2. Use the xdfopen command to open the file. This requires a bit more effort for the user -- you must write a data descriptor file to supplement or replace the existing metadata so that GrADS can understand it. The syntax of the descriptor file used with xdfopen is not exactly the same as that used in a descriptor file for gridded binary data -- see the documentation page for further details. The xdfopen command provides access to a greater number of SDFs, including many that do not conform to any known standard. The 'xdfopen' interface does not support the HDF5 format. If xdfopen doesn't work, then ...

  3. Use the open command to open the file. This requires the user to write a complete GrADS descriptor file to override all the metadata in the file. Guidance for composing a complete descriptor file for NetCDF, HDF-SDS, or HDF5 gridded data files is given below. Please also see the reference page Elements of a Data Descriptor File. The 'open' interface is recommended if you are templating large numbers of data files together, the data are pre-projected onto a non-lat/lon grid, the variables in the file have different undefined values, or the variables in the file have been packed in a non-standard way. The 'open' interface is the only way to read HDF5 files.
NetCDF and HDF-SDS Descriptor File Components

The data descriptor file is free format, which means the components of each record (line of text) are blank delimited and can appear in any order. Leading blanks at the beginning of each record are removed before parsing. Individual records may not be more than 255 characters long. Each record begins with a specific entry name, followed by a number of arguments or keywords, depending on the entry.

Descriptor file entries used for NetCDF, HDF-SDS, and HDF5 files are:

DSET This entry points to the data file. See the reference page for more details.
DTYPE This entry should have either the 'netcdf' or 'hdfsds' keywords.
(GrADS version 2.0.a7+) For HDF5, use the 'hdf5_grid' keyword.
TITLE It is good general practice to include a descriptive title in every GrADS descriptor file.
UNDEF

This entry specifies the undefined or missing data value. An optional second argument is the name of the attribute in the SDF that contains the undefined value. This should be used when individual variables in the data file have different undefined values. After data I/O, the missing values in the grid are converted from the variable undef to the file-wide undef (the numerical value in the first argument of the UNDEF record). Then it appears to GrADS that all variables have the same undef value, even if they don't in the SDF. Attribute names are case sensitive, and it is assumed that the name is identical for all variables in the SDF. If the name given does not match any attributes, or if no name is given, the file-wide undef value will be used.
Example: UNDEF -9.99e8 _FillValue

UNPACK This entry is used for data variables that are 'packed' -- i.e. non-float data that need to be converted to float by applying the following formula:
     y = x * scale_factor + add_offset
Only the attribute name for the scale factor is required. If your SDF does not have an offset attribute, the 2nd argument may be omitted, and the offset will be assigned the default value of 0.0. Attribute names are case sensitive, and it is assumed that the names are identical for all variables in the netcdf or hdfsds data file. If the names given do not match any attributes, the scale factor will be assigned a value of 1.0 and the offset will be assigned a value of 0.0. The transformation of packed data is done after the undef test has been applied.
Examples:
UNPACK scale_factor add_offset
UNPACK Slope Intercept
OPTIONS Valid keywords are 'yrev', 'zrev', 'template', and '365_day_calendar'.
CACHESIZE (GrADS version 2.0.a8+) This entry overrides the default size of the cache for reading HDF5 or NetCDF4 files. It is not relevant for other data types. It should not be necessary to set the cache size explicitly unless the data file has especially large chunks. Please see the documentation on compression.
PDEF (GrADS version 1.9b4+) This is used when the SDF contains data on a native projection other than lat/lon, such as a lambert conformal or polar stereographic grid. See the PDEF documentation for more information.
XDEF
YDEF
ZDEF
TDEF
EDEF
These entries are used to describe the coordinate dimensions in the SDF. The syntax is the same as for binary files. See the reference page for more details. You can use the output from ncdump with the -c option to get information about the coordinate dimensions in the SDF.
VECTORPAIRS

This entry is for explicity identifying vector component pairs. The VECTORPAIRS entry is only necessary if the data are on a native projection other than lat/lon (i.e. you are using PDEF) and if the winds have to be rotated from a grid-relative sense to an Earth-relative sense. (GrADS has to retrieve both the u and v component in order to do the rotation calculation.)

The arguments are the U-component and V-component variable names, separated by a comma, with no spaces. More than one pair of components may be listed; in this case, the pairs should be separated by a space.
Example:
VECTORPAIRS  u,v  u10,v10  uflx,vflx

VARS
through
ENDVARS

The variable declarations in a SDF descriptor file have a few special features, described below. It is not necessary to include a variable declaration for all the variables in the SDF, only those you wish to read with GrADS.

The varname field has the following syntax:
    SDF_name=>grads_name
SDF_name must exactly match the data variable name in the SDF -- it may contain uppercase letters and non-alpha-numeric characters. The grads_name is an alias for SDF_name and must be less than 16 characters, start with an alphabetic character, and cannot contain any upper case letters or non-alpha-numeric characters. The aliasing of variable names may be omitted (i.e., "SDF_name=>" does not precede grads_name) if the SDF_name already meets the criteria for GrADS variable names listed above. For dtype hdf5_grid, the SDF_name must contain the names of all the nested groups (separated by "/") to which the data set belongs (see example below).

The levs field is an integer that specifies the number of vertical levels the variable contains. Variables that do not have a Z dimension should have a levs value of 0. Variables that do have a Z dimension should have a levs value equal to the znum value specified in the ZDEF statement.

The units field is a comma-delimited list of the varying dimensions of the variable. The dimensions are expressed as x, y, z, t, and e and correspond to the five axes defined by XDEF, YDEF, ZDEF, TDEF, and EDEF. The order of the dimensions listed in the units field is important -- it must describe the shape of the variable as it was written to the SDF data file. For NetCDf files, this information appears in the output from ncdump next to the variable name. For HDF5 files, this information appears in the output from h5dump as the variable's dataspace.

Examples:
Height=>hgt   17   t,z,y,x   Geopotential Height (m)
/HDFEOS/GRIDS/ColumnAmountNO2/Data~Fields/CloudFraction=>cf  15  z,y,x  Cloud Fraction

Usage Notes

  1. The NetCDF data types that GrADS currently handles are short, long, and float. The HDF-SDS data types that are handled are 8-bit ints (int8 and uint8), shorts (int16 and uint16), ints (int32 and uint32) and float. These are all converted to type float after the I/O is done.

  2. The sdfopen/xdfopen interface will automatically handle the unpacking of NetCDF data if the following conditions are met:
       a. The packed data type is "short"
       b. The constants used for the transformation are data type "float"
       c. The attribute names are "scale_factor" or "slope" and "add_offset" or "intercept"
    If the packed data in your SDF does not fit this description, then you must use the open command with a complete descriptor file, providing the attribute names in the UNPACK entry. In this case, the attribute data type may be short, long, float, or double.

  3. If the data in the SDF are not floating-point numbers and require a transformation using the attributes named in the UNPACK entry, GrADS assumes the variable undef value corresponds to the data values as they appear in the file, i.e., before they are transformed using a scale factor and offset. Missing packed data values are assigned the file-wide undef value and are never unpacked.

  4. If your data file contains a variable that varies in a non-world-coordinate dimension (e.g. histogram interval, spectral band, ensemble number) then you can put a non-negative integer in the list of varying dimensions that will become the array index of the extra dimension. For example:

    VAR=>hist0   0   0,y,x   First historgram interval for VAR
    VAR=>hist1   0   1,y,x   Second historgram interval for VAR
    VAR=>hist2   0   2,y,x   Third histogram interval for VAR

    Another option in this example would be to fill the unused Z axis with the histogram intervals:

    ZDEF 3 linear 1 1
    ...
    VAR=>hist   3   z,y,x   VAR Histogram

    In this case, it would appear to GrADS that variable 'hist' varies in Z, but the user would have to remember that the Z levels correspond to histogram intervals and not pressure levels. The latter technique makes it easier to slice through the data, but is not the most accurate representation. And if you don't have an unsued world-coordinate axis available, then you still have a way to access all the dimensions of your data variable.

  5. Some SDFs have many more than four coordinate dimensions -- staggered longitude and latitude axes are one example. In this case, it is likely that there will be variables defined on different grids contained in the same SDF. GrADS can only handle one 4D grid per data file -- all the SDF variables listed in a descriptor file must share the same coordinate axes. Multiple descriptor files must be written to describe the varibles defined on different grids.

Examples

  1. Here is a sample output from ncdump for a file containing ocean model output. This file contains eight coordinate dimensions and nine data variables, which are defined on different combinations of coordinate axes. Five separate descriptor files are required to describe all the variables: one for the velocity components u and v, another for velocity component w, a third for potential temperature, a fourth for wind stress components taux and tauy, and a fifth for surface variables hflx, sflx, and eta.

  2. The Weather Research and Forecasting (WRF) Model can generate NetCDF output on non-lat/lon grids. GrADS can read these files in their native format using a complete descriptor file with a PDEF entry. To extract the arguments for the PDEF entry, you can use the global attribute values, which describe the native grid parameters, as well as the data variables which provide the grid point lat/lon values. The WRF model uses staggered grids, just as the ocean model does in the example above. For the sake of clarity, this WRF ncdump output has been edited to show only the four coordinate axes that are relevant for the data variables used in the example descriptor files. There are many, many more data variables and coordinate dimensions in the actual output files. First, here is a sample descriptor file to get the native grid point longitude and latitude values -- note that no PDEF statement is included and the XDEF and YDEF statements do not map to longitude and latitude, they are simply used as abstract grid increments. Any one of these grid points may be used as the reference point in the PDEF entry, this example uses grid point (1,1) with values (-125.898, 26.9628). Finally, here is the descriptor file for four data variables. The WRF model is highly configurable, and also under active development, so this example should be used only as a guideline.