8 minute read

Published:

Motivation:

Daymet is one of those datasets that shows up in countless environmental contexts.

When you search online “how to get daymet data” you will likely end up on the ORNL website describing their RESTful API retrieval process. You might also find ways to download complete Daymet datafiles manually, which is likely way more than you are looking for.

A few months ago I was experimenting with some different ways of accessing Daymet data and ended up accessing the data through the Microsoft Planetary Computer. I liked this method because it was able to be done using a Python package (planetary_computer) which was convenient since the rest of my project was in a Python environment. Also, I was interested in getting familiar with the Planetary Computer resources since there is so much data available.

It took a little bit of time to setup the scripts myself, so I though others might benefit from my sharing this code. Below, I briefly introduce the Daymet dataset and the Microsoft Planetary Computer resources. Then I share and explain my code, and show some example usage.

I’ve dropped the source code into a GitHub repo here: github.com/TrevorJA/PC_daymet_retrieval


Brief commentary on Daymet

Daymet is: “… a data product derived from a collection of algorithms and computer software designed to interpolate and extrapolate from daily meteorological observations to produce gridded estimates of daily weather parameters. Weather parameters generated include daily surfaces of minimum and maximum temperature, precipitation, vapor pressure, radiation, snow water equivalent, and day length produced on a 1 km x 1 km gridded surface” (ORNL, here)

The Daymet data is at daily scale, from 1980 through the present year for CONUS, Hawaii and Puerto Rico.

The data variables include:

  • Precipitation (prcp)
  • Shortwave Radiation (srad)
  • Snow water equivalent (swe)
  • Max and min air temperature (tmax and tmin)
  • Water vapor pressure (vp)

Descriptions for each of these can be accessed here: https://daymet.ornl.gov/overview

Brief commentary on the Planetary Computer

The Microsoft Planetary Computer is a cloud resource which provides access to a multi-petabyte catalog of different datasets. The list of datasets is too long to list here, but I encourage you to checkout the Data Catalog to be familiar with datasets available.

Many of these datasets will be familiar to those working in environmental data science contexts, and span many different datatypes such as: biomass/vegetative landcover imagery (e.g., MODIS), climate and weather data (CONUS404, Daymet, gridMET), elevation maps, even the US census data.

Since the planetary computer provides access to so many different datasets, I thought it was worthwhile to get familiar with their Python API in hopes that I can come back to it in the future.

The Planetary Computer uses the SpatioTemporal Asset Catalogs (STAC) specifications for geospatial data. The STAC specification gives a standardized format for the hierarchical organization of large datasets, and is used throughout the Planetary Computer.

This structure consists of Items/Catalogs/Collections in a JSON format. Explanation of the STAC structure is beyond the scope of this post (and beyond my experience level), but I would point you to the STAC docs as a starting point.

In this case the “Collection” we are interested is: daymet-{timescale}-na

Where na is North America (can also use hi for Hawaii or pr for Puerto Rico Daymet data).

Code Explained

I made two Python classes to retrieve and then manage (save and load) the Daymet data, these are aptly called: DayMetRetriever and DayMetManager.

DayMetRetriever

The DayMetRetriever communicates with the Planetary Computer and will return the data as an xarray.Dataset.

For the code below, the DayMetRetriever expects to be provided a timescale during the initialization and can either be “annual”, “monthly” or “daily”.

One important thing here is that the Daymet data is designed with a Lambert Conformal Conic projection. So I include a method for transforming a standard WGS84 latitude/longitude in based bounding box to match the Daymet projection.

The class methods include:

  • DayMetRetriever.transform_bbox_to_daymet(): Transforms a standard WGS84 bounding box (longitude/latitude) into Daymet’s Lambert Conformal Conic projection
  • DayMetRetriever.get_daymet_in_bbox(): Retrieves the Daymet data for a specified geographic region

The retriever works by first initializing with a specified timescale (daily, monthly, or annual), then using the Planetary Computer API to fetch the data. Once the data is retrieved, it’s automatically subset to the specified bounding box.

import planetary_computer
import pystac_client
import pyproj
import pystac
import xarray as xr
import requests

from pathlib import Path
from typing import Union, List, Tuple



# Define Daymet projection for raw data
daymet_proj = "+proj=lcc +lat_1=25 +lat_2=60 +lat_0=42.5 +lon_0=-100 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs"


class DayMetRetriever:
    def __init__(self, 
                 timescale: str = "annual"):
        """
        Initializes the DayMetRetriever with a specified timescale.

        Parameters
        ----------
        timescale : str, optional
            The timescale of the Daymet dataset to retrieve. Defaults to "annual".
        """
        self.collection_name = f"daymet-{timescale}-na"
        self.daymet_proj = daymet_proj
        self.ds = None
        
        # Configure logging to suppress Azure storage messages
        import logging
        logging.getLogger('azure.core.pipeline.policies.http_logging_policy').setLevel(logging.WARNING)

    def transform_bbox_to_daymet(self, bbox):
        """
        Transforms a WGS84 bounding box (lon/lat) to Daymet's Lambert Conformal Conic (x/y).
        
        Parameters
        ----------
        bbox : list of float
            Bounding box as [west, south, east, north] in WGS84 (lon/lat).
        
        Returns
        -------
        list of float
            Bounding box transformed to Daymet’s coordinate system: [xmin, ymin, xmax, ymax].
        """
        wgs84 = pyproj.CRS("EPSG:4326")  # WGS84
        daymet_crs = pyproj.CRS(daymet_proj)  # Daymet LCC

        transformer = pyproj.Transformer.from_crs(wgs84, daymet_crs, always_xy=True)

        xmin, ymin = transformer.transform(bbox[0], bbox[1])
        xmax, ymax = transformer.transform(bbox[2], bbox[3])

        return [xmin, ymin, xmax, ymax]


    def get_daymet_in_bbox(self, 
                           bbox: List[float]) -> xr.Dataset:
        """
        Retrieves and returns Daymet data for the specified bounding box.

        Parameters
        ----------
        bbox : list of float
            Bounding box defined as [west, south, east, north] coordinates.

        Returns
        -------
        xr.Dataset
            The retrieved Daymet dataset as an xarray Dataset, containing 3 dimensional (x, y, time)
            data for multiple climate variables.
        """
        if len(bbox) != 4:
            raise ValueError("bbox must contain [west, south, east, north] coordinates")
            
        req = requests.get(
            f"https://planetarycomputer.microsoft.com/api/stac/v1/collections/{self.collection_name}"
            )
        
        collection = pystac.Collection.from_dict(req.json())
        
        signed = planetary_computer.sign(collection.assets["zarr-abfs"])
        storage_options = signed.to_dict()['xarray:storage_options']
        
        # get the dataset
        self.ds = xr.open_zarr(signed.href, 
                               consolidated=True,
                               storage_options=storage_options)
        
        
        # subset to the bounding box
        bbox_xy = self.transform_bbox_to_daymet(bbox)
        self.ds = self.ds.sel(x=slice(bbox_xy[0], bbox_xy[2]), 
                              y=slice(bbox_xy[3], bbox_xy[1]))
        
        return self.ds

DayMetManager

The DayMetManager simples saves and loads data from a NetCDF file.

The class has two main methods:

  • DayMetManager.save_netcdf(): Save an xarray.Dataset to a NetCDF file.
  • DayMetManager.load_netcdf(): Loads the NetCDF file back into an xarray.Dataset

Given that this class is not very exicting or interesting, I am going to leave the details out of this post and refer you to the source file on GitHub: PC_daymet_retrieval/daymet_retriever.py and the example usage below.

Example Usage

The code below shows a short demo usage of the DayMetRetriever and DayMetManager. In this example, I get annual data for the upper corner of the Pacific North West as defined by the bounded box (bbox).

from daymet_retriver import DayMetRetriever, DayMetManager

### Define the bounding box of interest
bbox = [-127.199707,44.710922,-116.938477,50.553891] # PNW
timescale = "annual" # "daily", "monthly", or "annual"


### Retrieve the DayMet data from the planetary computer
# Initialize the DayMetRetriever
print("Retrieving DayMet data...")
retriever = DayMetRetriever(timescale=timescale)
ds_raw = retriever.get_daymet_in_bbox(bbox=bbox)


# take only the prcp for this example,
# exporting the full dataset is very slow
ds_simplified = ds_raw[["prcp"]]


### Save the raw dataset to netCDF
print("Saving dataset to netCDF...")

manager = DayMetManager(base_dir="./data")

output_filename = f"daymet_{timescale}_prcp.nc"
manager.save_netcdf(ds_simplified, output_filename)

print("Done!")

To explain what’s happening in this example:

  1. First, I define a bounding box for the Pacific Northwest region using longitude/latitude coordinates and specify that I want annual data rather than daily or monthly.
  2. I create a DayMetRetriever instance with our chosen timescale and use it to fetch data.
  3. To make this example simpler/faster, I only keep the precipitation (prcp) data, but there is no reason you need to do this.
  4. Finally, I use the DayMetManager to save the precipitation data as a NetCDF file in a local directory.

The example_load.py script shows how to use the DayMetManager.load_netcdf() functionality and plots the following visual of the average annual precipitation in the region:

![[annual_prcp_temporal_avg.pngcenter400]]

Wrap-up

The motivation for this post was two-fold:

  1. Give a simple, ready-to-use code to help people gain access to Daymet data
  2. Raise awareness of the Planetary Computer so that you might take advantage of the other datasets available

I hope you got something out of it, if you made it this far.

Also, if you use the code and find any issues or make improvements, please do share those in the GitHub.