Published:
Motivation:
Daymet is one of those datasets that shows up in countless environmental contexts.
When you search online “how to get daymet data” you will likely end up on the ORNL website describing their RESTful API retrieval process. You might also find ways to download complete Daymet datafiles manually, which is likely way more than you are looking for.
A few months ago I was experimenting with some different ways of accessing Daymet data and ended up accessing the data through the Microsoft Planetary Computer. I liked this method because it was able to be done using a Python package (planetary_computer) which was convenient since the rest of my project was in a Python environment. Also, I was interested in getting familiar with the Planetary Computer resources since there is so much data available.
It took a little bit of time to setup the scripts myself, so I though others might benefit from my sharing this code. Below, I briefly introduce the Daymet dataset and the Microsoft Planetary Computer resources. Then I share and explain my code, and show some example usage.
I’ve dropped the source code into a GitHub repo here: github.com/TrevorJA/PC_daymet_retrieval
Brief commentary on Daymet
Daymet is: “… a data product derived from a collection of algorithms and computer software designed to interpolate and extrapolate from daily meteorological observations to produce gridded estimates of daily weather parameters. Weather parameters generated include daily surfaces of minimum and maximum temperature, precipitation, vapor pressure, radiation, snow water equivalent, and day length produced on a 1 km x 1 km gridded surface” (ORNL, here)
The Daymet data is at daily scale, from 1980 through the present year for CONUS, Hawaii and Puerto Rico.
The data variables include:
- Precipitation (
prcp) - Shortwave Radiation (
srad) - Snow water equivalent (
swe) - Max and min air temperature (
tmaxandtmin) - Water vapor pressure (
vp)
Descriptions for each of these can be accessed here: https://daymet.ornl.gov/overview
Brief commentary on the Planetary Computer
The Microsoft Planetary Computer is a cloud resource which provides access to a multi-petabyte catalog of different datasets. The list of datasets is too long to list here, but I encourage you to checkout the Data Catalog to be familiar with datasets available.
Many of these datasets will be familiar to those working in environmental data science contexts, and span many different datatypes such as: biomass/vegetative landcover imagery (e.g., MODIS), climate and weather data (CONUS404, Daymet, gridMET), elevation maps, even the US census data.
Since the planetary computer provides access to so many different datasets, I thought it was worthwhile to get familiar with their Python API in hopes that I can come back to it in the future.
The Planetary Computer uses the SpatioTemporal Asset Catalogs (STAC) specifications for geospatial data. The STAC specification gives a standardized format for the hierarchical organization of large datasets, and is used throughout the Planetary Computer.
This structure consists of Items/Catalogs/Collections in a JSON format. Explanation of the STAC structure is beyond the scope of this post (and beyond my experience level), but I would point you to the STAC docs as a starting point.
In this case the “Collection” we are interested is: daymet-{timescale}-na
Where na is North America (can also use hi for Hawaii or pr for Puerto Rico Daymet data).
Code Explained
I made two Python classes to retrieve and then manage (save and load) the Daymet data, these are aptly called: DayMetRetriever and DayMetManager.
DayMetRetriever
The DayMetRetriever communicates with the Planetary Computer and will return the data as an xarray.Dataset.
For the code below, the DayMetRetriever expects to be provided a timescale during the initialization and can either be “annual”, “monthly” or “daily”.
One important thing here is that the Daymet data is designed with a Lambert Conformal Conic projection. So I include a method for transforming a standard WGS84 latitude/longitude in based bounding box to match the Daymet projection.
The class methods include:
DayMetRetriever.transform_bbox_to_daymet(): Transforms a standard WGS84 bounding box (longitude/latitude) into Daymet’s Lambert Conformal Conic projectionDayMetRetriever.get_daymet_in_bbox(): Retrieves the Daymet data for a specified geographic region
The retriever works by first initializing with a specified timescale (daily, monthly, or annual), then using the Planetary Computer API to fetch the data. Once the data is retrieved, it’s automatically subset to the specified bounding box.
import planetary_computer
import pystac_client
import pyproj
import pystac
import xarray as xr
import requests
from pathlib import Path
from typing import Union, List, Tuple
# Define Daymet projection for raw data
daymet_proj = "+proj=lcc +lat_1=25 +lat_2=60 +lat_0=42.5 +lon_0=-100 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs"
class DayMetRetriever:
def __init__(self,
timescale: str = "annual"):
"""
Initializes the DayMetRetriever with a specified timescale.
Parameters
----------
timescale : str, optional
The timescale of the Daymet dataset to retrieve. Defaults to "annual".
"""
self.collection_name = f"daymet-{timescale}-na"
self.daymet_proj = daymet_proj
self.ds = None
# Configure logging to suppress Azure storage messages
import logging
logging.getLogger('azure.core.pipeline.policies.http_logging_policy').setLevel(logging.WARNING)
def transform_bbox_to_daymet(self, bbox):
"""
Transforms a WGS84 bounding box (lon/lat) to Daymet's Lambert Conformal Conic (x/y).
Parameters
----------
bbox : list of float
Bounding box as [west, south, east, north] in WGS84 (lon/lat).
Returns
-------
list of float
Bounding box transformed to Daymet’s coordinate system: [xmin, ymin, xmax, ymax].
"""
wgs84 = pyproj.CRS("EPSG:4326") # WGS84
daymet_crs = pyproj.CRS(daymet_proj) # Daymet LCC
transformer = pyproj.Transformer.from_crs(wgs84, daymet_crs, always_xy=True)
xmin, ymin = transformer.transform(bbox[0], bbox[1])
xmax, ymax = transformer.transform(bbox[2], bbox[3])
return [xmin, ymin, xmax, ymax]
def get_daymet_in_bbox(self,
bbox: List[float]) -> xr.Dataset:
"""
Retrieves and returns Daymet data for the specified bounding box.
Parameters
----------
bbox : list of float
Bounding box defined as [west, south, east, north] coordinates.
Returns
-------
xr.Dataset
The retrieved Daymet dataset as an xarray Dataset, containing 3 dimensional (x, y, time)
data for multiple climate variables.
"""
if len(bbox) != 4:
raise ValueError("bbox must contain [west, south, east, north] coordinates")
req = requests.get(
f"https://planetarycomputer.microsoft.com/api/stac/v1/collections/{self.collection_name}"
)
collection = pystac.Collection.from_dict(req.json())
signed = planetary_computer.sign(collection.assets["zarr-abfs"])
storage_options = signed.to_dict()['xarray:storage_options']
# get the dataset
self.ds = xr.open_zarr(signed.href,
consolidated=True,
storage_options=storage_options)
# subset to the bounding box
bbox_xy = self.transform_bbox_to_daymet(bbox)
self.ds = self.ds.sel(x=slice(bbox_xy[0], bbox_xy[2]),
y=slice(bbox_xy[3], bbox_xy[1]))
return self.ds
DayMetManager
The DayMetManager simples saves and loads data from a NetCDF file.
The class has two main methods:
DayMetManager.save_netcdf(): Save anxarray.Datasetto a NetCDF file.DayMetManager.load_netcdf(): Loads the NetCDF file back into anxarray.Dataset
Given that this class is not very exicting or interesting, I am going to leave the details out of this post and refer you to the source file on GitHub: PC_daymet_retrieval/daymet_retriever.py and the example usage below.
Example Usage
The code below shows a short demo usage of the DayMetRetriever and DayMetManager. In this example, I get annual data for the upper corner of the Pacific North West as defined by the bounded box (bbox).
from daymet_retriver import DayMetRetriever, DayMetManager
### Define the bounding box of interest
bbox = [-127.199707,44.710922,-116.938477,50.553891] # PNW
timescale = "annual" # "daily", "monthly", or "annual"
### Retrieve the DayMet data from the planetary computer
# Initialize the DayMetRetriever
print("Retrieving DayMet data...")
retriever = DayMetRetriever(timescale=timescale)
ds_raw = retriever.get_daymet_in_bbox(bbox=bbox)
# take only the prcp for this example,
# exporting the full dataset is very slow
ds_simplified = ds_raw[["prcp"]]
### Save the raw dataset to netCDF
print("Saving dataset to netCDF...")
manager = DayMetManager(base_dir="./data")
output_filename = f"daymet_{timescale}_prcp.nc"
manager.save_netcdf(ds_simplified, output_filename)
print("Done!")
To explain what’s happening in this example:
- First, I define a bounding box for the Pacific Northwest region using longitude/latitude coordinates and specify that I want annual data rather than daily or monthly.
- I create a
DayMetRetrieverinstance with our chosen timescale and use it to fetch data. - To make this example simpler/faster, I only keep the precipitation (
prcp) data, but there is no reason you need to do this. - Finally, I use the
DayMetManagerto save the precipitation data as a NetCDF file in a local directory.
The example_load.py script shows how to use the DayMetManager.load_netcdf() functionality and plots the following visual of the average annual precipitation in the region:
| ![[annual_prcp_temporal_avg.png | center | 400]] |
Wrap-up
The motivation for this post was two-fold:
- Give a simple, ready-to-use code to help people gain access to Daymet data
- Raise awareness of the Planetary Computer so that you might take advantage of the other datasets available
I hope you got something out of it, if you made it this far.
Also, if you use the code and find any issues or make improvements, please do share those in the GitHub.
