Modes of ecco_access#

Andrew Delman, 2024-10-10

Introduction

Query-only modes

Direct download modes

In-cloud only access modes

Time comparison of access modes

Introduction#

In the previous tutorial the ecco_access library was introduced, with a few examples of how it can be used to search and access ECCOv4 output available from PO.DAAC. This tutorial summarizes and compares the various modes that ecco_access supports for HTTPS and (in-cloud) S3 data access. We’ll use each mode to search/download/retrieve native grid monthly SSH and wind stress datasets for 6 months (Jan-Jun 2010).

As the time comparison at the end will show, mode = s3_open_fsspec is typically fastest when working in the cloud and you have the necessary json files available. Hence the other tutorials in the ecco-2024 repository generally use this access mode.

import numpy as np
import xarray as xr
from os.path import join,expanduser

import ecco_access as ea

SSH_shortname = 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4'
windstr_shortname = 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4'

Query-only modes#

These modes return the URLs (ls/query) or S3 file paths (s3_ls/s3_query) to access the ECCO output. These modes only work with ecco_podaac_access (not ecco_podaac_to_xrdataset), since we are not opening a dataset, just querying the location of the data.

Note

The ls and query modes are interchangeable and have the same functionality, just by different names. The same is true for s3_ls and s3_query.

ls/query mode#

urls_dict = ea.ecco_podaac_access([SSH_shortname,windstr_shortname],\
                                    StartDate='2010-01',EndDate='2010-06',\
                                    mode='ls')
urls_dict[SSH_shortname]
['https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc']
urls_dict[windstr_shortname]
['https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc']

s3_ls/s3_query mode#

You can use the s3_ls/s3_query mode to find the S3 bucket file paths for AWS in-cloud access:

s3_paths_dict = ea.ecco_podaac_access([SSH_shortname,windstr_shortname],\
                                        StartDate='2010-01',EndDate='2010-06',\
                                        mode='s3_query')
{'ShortName': 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}
{'ShortName': 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}
s3_paths_dict
{'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4': ['s3://podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc'],
 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4': ['s3://podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc',
  's3://podaac-ops-cumulus-protected/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc']}

The dictionary returned has the locations of each of the queried files on S3. Note that to read the data, the files need to be opened, e.g. using s3fs and NASA Earthdata authentication.

# log in to NASA Earthdata
# (will prompt for credentials if they are not already in ~/.netrc)
ea.setup_earthdata_login_auth()

import requests
import s3fs

# obtain NASA Earthdata credentials for in-cloud S3 access
creds = requests.get('https://archive.podaac.earthdata.nasa.gov/s3credentials').json()
s3 = s3fs.S3FileSystem(anon=False,
                       key=creds['accessKeyId'],
                       secret=creds['secretAccessKey'],
                       token=creds['sessionToken'])

# use list comprehension to open files 
# and create list that can be passed to xarray file opener
open_SSH_files = [s3.open(file) for file in s3_paths_dict[SSH_shortname]]

# open xarray dataset
ds_SSH_curr = xr.open_mfdataset(open_SSH_files,\
                                compat='override',data_vars='minimal',coords='minimal',\
                                parallel=True)
ds_SSH_curr
<xarray.Dataset> Size: 15MB
Dimensions:    (time: 6, tile: 13, j: 90, i: 90, i_g: 90, j_g: 90, nv: 2, nb: 4)
Coordinates: (12/13)
  * i          (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * i_g        (i_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89
  * j          (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * j_g        (j_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89
  * tile       (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12
  * time       (time) datetime64[ns] 48B 2010-01-16T12:00:00 ... 2010-06-16
    ...         ...
    YC         (tile, j, i) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    XG         (tile, j_g, i_g) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    YG         (tile, j_g, i_g) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    time_bnds  (time, nv) datetime64[ns] 96B dask.array<chunksize=(1, 2), meta=np.ndarray>
    XC_bnds    (tile, j, i, nb) float32 2MB dask.array<chunksize=(13, 90, 90, 4), meta=np.ndarray>
    YC_bnds    (tile, j, i, nb) float32 2MB dask.array<chunksize=(13, 90, 90, 4), meta=np.ndarray>
Dimensions without coordinates: nv, nb
Data variables:
    SSH        (time, tile, j, i) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
    SSHIBC     (time, tile, j, i) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
    SSHNOIBC   (time, tile, j, i) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
    ETAN       (time, tile, j, i) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
Attributes: (12/57)
    acknowledgement:              This research was carried out by the Jet Pr...
    author:                       Ian Fenty and Ou Wang
    cdm_data_type:                Grid
    comment:                      Fields provided on the curvilinear lat-lon-...
    Conventions:                  CF-1.8, ACDD-1.3
    coordinates_comment:          Note: the global 'coordinates' attribute de...
    ...                           ...
    time_coverage_duration:       P1M
    time_coverage_end:            2010-02-01T00:00:00
    time_coverage_resolution:     P1M
    time_coverage_start:          2010-01-01T00:00:00
    title:                        ECCO Sea Surface Height - Monthly Mean llc9...
    uuid:                         9ce7afa6-400c-11eb-ab45-0cc47a3f49c3

Direct download modes#

download mode#

The download mode directly downloads the queried files under a root directory of your choosing, creating the directory if needed. If ecco_podaac_access is called using this mode, the dictionary returned includes list(s) of the downloaded files that can be passed to xarray.open_mfdataset (or xarray.open_dataset, one file at a time). If ecco_podaac_to_xrdataset is used, the xarray.open_mfdataset step is included and an xarray Dataset is returned.

files_dict = ea.ecco_podaac_access([SSH_shortname,windstr_shortname],\
                                    StartDate='2010-01',EndDate='2010-06',\
                                    mode='download',\
                                    download_root_dir=join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC'))
created download directory /home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4
DL Progress: 100%|###########################| 6/6 [00:02<00:00,  2.28it/s]

=====================================
total downloaded: 35.5 Mb
avg download speed: 13.44 Mb/s
Time spent = 2.6406123638153076 seconds


created download directory /home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4
DL Progress: 100%|###########################| 6/6 [00:02<00:00,  2.46it/s]

=====================================
total downloaded: 35.79 Mb
avg download speed: 14.6 Mb/s
Time spent = 2.451402187347412 seconds
files_dict
{'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4': ['/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc'],
 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4': ['/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc',
  '/home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4/OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc']}

Or if ecco_podaac_to_xrdataset is used (we’ll time how long this takes so it can be compared to other access modes):

import os

# remove files just downloaded
for shortname,files_list in files_dict.items():
    for file in files_list:
        os.remove(file)
%%time

ds_dict = ea.ecco_podaac_to_xrdataset([SSH_shortname,windstr_shortname],\
                                    StartDate='2010-01',EndDate='2010-06',\
                                    mode='download',\
                                    download_root_dir=join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC'))
created download directory /home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4
DL Progress: 100%|###########################| 6/6 [00:01<00:00,  3.21it/s]

=====================================
total downloaded: 35.5 Mb
avg download speed: 18.89 Mb/s
Time spent = 1.878847599029541 seconds


created download directory /home/jovyan/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4
DL Progress: 100%|###########################| 6/6 [00:02<00:00,  2.49it/s]

=====================================
total downloaded: 35.79 Mb
avg download speed: 14.78 Mb/s
Time spent = 2.422295331954956 seconds


CPU times: user 929 ms, sys: 793 ms, total: 1.72 s
Wall time: 5.52 s

Since multiple datasets were queried, the results of ecco_podaac_to_xrdataset are returned in the form of a dictionary. The contents of the wind stress dataset can be seen here:

ds_dict[windstr_shortname]
<xarray.Dataset> Size: 15MB
Dimensions:    (time: 6, tile: 13, j: 90, i: 90, i_g: 90, j_g: 90, nv: 2, nb: 4)
Coordinates: (12/13)
  * i          (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * i_g        (i_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89
  * j          (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * j_g        (j_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89
  * tile       (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12
  * time       (time) datetime64[ns] 48B 2010-01-16T12:00:00 ... 2010-06-16
    ...         ...
    YC         (tile, j, i) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    XG         (tile, j_g, i_g) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    YG         (tile, j_g, i_g) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    time_bnds  (time, nv) datetime64[ns] 96B dask.array<chunksize=(1, 2), meta=np.ndarray>
    XC_bnds    (tile, j, i, nb) float32 2MB dask.array<chunksize=(13, 90, 90, 4), meta=np.ndarray>
    YC_bnds    (tile, j, i, nb) float32 2MB dask.array<chunksize=(13, 90, 90, 4), meta=np.ndarray>
Dimensions without coordinates: nv, nb
Data variables:
    EXFtaux    (time, tile, j, i) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
    EXFtauy    (time, tile, j, i) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
    oceTAUX    (time, tile, j, i_g) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
    oceTAUY    (time, tile, j_g, i) float32 3MB dask.array<chunksize=(1, 13, 90, 90), meta=np.ndarray>
Attributes: (12/57)
    acknowledgement:              This research was carried out by the Jet Pr...
    author:                       Ian Fenty and Ou Wang
    cdm_data_type:                Grid
    comment:                      Fields provided on the curvilinear lat-lon-...
    Conventions:                  CF-1.8, ACDD-1.3
    coordinates_comment:          Note: the global 'coordinates' attribute de...
    ...                           ...
    time_coverage_duration:       P1M
    time_coverage_end:            2010-02-01T00:00:00
    time_coverage_resolution:     P1M
    time_coverage_start:          2010-01-01T00:00:00
    title:                        ECCO Ocean and Sea-Ice Surface Stress - Mon...
    uuid:                         b48fa40a-400d-11eb-9063-0cc47a3f49c3

download_ifspace mode#

This mode is similar to download, but it will also query how much storage is available at the target download location before carrying out downloads, and returns an error if the space to be occupied by the downloaded files is more than a specified fraction of available storage. The function also takes into account if some or all of the queried files are already on disk, and therefore do not need to be downloaded again.

%%time

ds_dict = ea.ecco_podaac_to_xrdataset([SSH_shortname,windstr_shortname],\
                                    StartDate='2010-01',EndDate='2010-06',\
                                    mode='download_ifspace',\
                                    download_root_dir=join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC'))
Size of files to be downloaded to instance is 0.0 GB,
which is 0.0% of the 4.538 GB available storage.
Proceeding with file downloads via NASA Earthdata URLs

SEA_SURFACE_HEIGHT_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

SEA_SURFACE_HEIGHT_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

SEA_SURFACE_HEIGHT_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

SEA_SURFACE_HEIGHT_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

SEA_SURFACE_HEIGHT_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

SEA_SURFACE_HEIGHT_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading
DL Progress: 100%|########################| 6/6 [00:00<00:00, 71902.35it/s]

=====================================
total downloaded: 0.0 Mb
avg download speed: 0.0 Mb/s
Time spent = 0.0026917457580566406 seconds



OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading

OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc already exists, and force=False, not re-downloading
DL Progress: 100%|########################| 6/6 [00:00<00:00, 85307.88it/s]

=====================================
total downloaded: 0.0 Mb
avg download speed: 0.0 Mb/s
Time spent = 0.002772808074951172 seconds


CPU times: user 210 ms, sys: 102 ms, total: 312 ms
Wall time: 482 ms

Notice that the size of files to be downloaded was zero, since all of these files were already on disk at the specified location.

# remove downloaded files
for shortname,files_list in files_dict.items():
    for file in files_list:
        os.remove(file)

download_subset mode#

The download_subset mode is essentially a wrapper for the ecco_podaac_download_subset function, which uses Opendap to allow spatial, temporal, and variable-based subsetting of ECCO datasets and granules at the download stage. Depending on the size of the source dataset (e.g., whether the dataset has a depth dimension or not), this mode may be faster or slower than downloading the full granule files; it will almost certainly be slower than using mode = s3_open_fsspec when you have the json files available. But it can be a space- and time-saver when you are working on your local machine and not in the cloud.

Multiple examples of the ecco_podaac_download_subset functionality are provided in the Downloading Subsets of ECCO Datasets tutorial. Here is one example using ecco_podaac_to_xrdataset to open one year of monthly SST using mode = download_subset (in the ECCOv4r4 output, SST is the top depth layer of THETA):

ds_SST_2015 = ea.ecco_podaac_to_xrdataset('ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4',\
                                            StartDate='2015-01',EndDate='2015-12',\
                                            mode='download_subset',\
                                            download_root_dir=join(user_home_dir,'Downloads',\
                                                                   'ECCO_V4r4_PODAAC','SST_global'),\
                                            vars_to_include=['THETA'],\
                                            k_isel=[0,1,1])
Creating download directory /home/jovyan/Downloads/ECCO_V4r4_PODAAC/SST_global/ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4

Please wait while program searches for the granules ...


Total number of matching granules: 12
DL Progress: 100%|#########################| 12/12 [00:31<00:00,  2.61s/it]

=====================================
total downloaded: 16.73 Mb
avg download speed: 0.53 Mb/s
Time spent = 31.296209812164307 seconds
ds_SST_2015
<xarray.Dataset> Size: 10MB
Dimensions:    (time: 12, k: 1, tile: 13, j: 90, i: 90, j_g: 90, i_g: 90,
                k_p1: 2, k_l: 1, nb: 4, k_u: 1, nv: 2)
Coordinates: (12/24)
    XG         (tile, j_g, i_g) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    Zp1        (k_p1) float32 8B dask.array<chunksize=(2,), meta=np.ndarray>
    Zl         (k_l) float32 4B dask.array<chunksize=(1,), meta=np.ndarray>
    YC         (tile, j, i) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    XC         (tile, j, i) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    YG         (tile, j_g, i_g) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>
    ...         ...
  * k_p1       (k_p1) int32 8B 0 1
  * k_u        (k_u) int32 4B 0
  * nb         (nb) float32 16B 0.0 1.0 2.0 3.0
  * nv         (nv) float32 8B 0.0 1.0
  * tile       (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12
  * time       (time) datetime64[ns] 96B 2015-01-16T12:00:00 ... 2015-12-16T1...
Data variables:
    THETA      (time, k, tile, j, i) float32 5MB dask.array<chunksize=(1, 1, 13, 90, 90), meta=np.ndarray>
Attributes: (12/63)
    acknowledgement:                 This research was carried out by the Jet...
    author:                          Ian Fenty and Ou Wang
    cdm_data_type:                   Grid
    comment:                         Fields provided on the curvilinear lat-l...
    Conventions:                     CF-1.8, ACDD-1.3
    coordinates_comment:             Note: the global 'coordinates' attribute...
    ...                              ...
    time_coverage_end:               2015-02-01T00:00:00
    time_coverage_resolution:        P1M
    time_coverage_start:             2015-01-01T00:00:00
    title:                           ECCO Ocean Temperature and Salinity - Mo...
    uuid:                            f4cfd0f6-4181-11eb-9946-0cc47a3f4815
    history_json:                    [{"$schema":"https:\/\/harmony.earthdata...

Now plot tile 11 of SST during Dec 2015:

ds_SST_2015.THETA.isel(time=11,k=0,tile=11).plot(cmap='RdYlBu_r')
plt.title('SST, Dec 2015')
plt.show()
../../_images/7b07fca31cddacdf8bd4d55a3eef85a3606112b702c68cc6ffe5f508690fc7ec.png

In-cloud only access modes#

s3_open mode#

If you are working in the AWS cloud (in region us-west-2), you can open files from S3 storage without downloading them; this is called “direct access”.

%%time

ds_dict = ea.ecco_podaac_to_xrdataset([SSH_shortname,windstr_shortname],\
                                        StartDate='2010-01',EndDate='2010-06',\
                                        mode='s3_open')
{'ShortName': 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}

Total number of matching granules: 6
{'ShortName': 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}

Total number of matching granules: 6
CPU times: user 3.01 s, sys: 1.05 s, total: 4.06 s
Wall time: 14.1 s

s3_open_fsspec mode#

The s3_open mode allows you to access data “remotely” from S3, but it is usually slower than downloading the data. However, the fsspec and kerchunk libraries provide an efficient way to access data by storing pointers to data chunks in json files. These files have been produced for the ECCO datasets, and by using mode = s3_open_fsspec we can access the data much more quickly without downloading it!

Note

For ECCO Hackweek participants, the json files needed for this mode are available on the efs_ecco volume at /efs_ecco/mzz-jsons/. Specify jsons_root_dir=join('/efs_ecco','mzz-jsons') to use this mode.

%%time

ds_dict = ea.ecco_podaac_to_xrdataset([SSH_shortname,windstr_shortname],\
                                        StartDate='2010-01',EndDate='2010-06',\
                                        mode='s3_open_fsspec',\
                                        jsons_root_dir=join('/efs_ecco','mzz-jsons'))
CPU times: user 77.7 ms, sys: 4.97 ms, total: 82.6 ms
Wall time: 3.34 s
ds = xr.merge(list(ds_dict.values()))
ds
<xarray.Dataset> Size: 25MB
Dimensions:    (time: 6, tile: 13, j: 90, i: 90, nb: 4, j_g: 90, i_g: 90, nv: 2)
Coordinates: (12/13)
    XC         (tile, j, i) float32 421kB -111.6 -111.3 -110.9 ... -105.6 -111.9
    XC_bnds    (tile, j, i, nb) float32 2MB -115.0 -115.0 ... -115.0 -108.5
    XG         (tile, j_g, i_g) float32 421kB -115.0 -115.0 ... -102.9 -109.0
    YC         (tile, j, i) float32 421kB -88.24 -88.38 -88.52 ... -88.08 -88.1
    YC_bnds    (tile, j, i, nb) float32 2MB -88.18 -88.32 ... -88.18 -88.16
    YG         (tile, j_g, i_g) float32 421kB -88.18 -88.32 ... -87.99 -88.02
    ...         ...
  * i_g        (i_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89
  * j          (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * j_g        (j_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89
  * tile       (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12
  * time       (time) datetime64[ns] 48B 2010-01-16T12:00:00 ... 2010-06-16
    time_bnds  (time, nv) datetime64[ns] 96B 2010-01-01 ... 2010-07-01
Dimensions without coordinates: nb, nv
Data variables:
    ETAN       (time, tile, j, i) float32 3MB ...
    SSH        (time, tile, j, i) float32 3MB ...
    SSHIBC     (time, tile, j, i) float32 3MB ...
    SSHNOIBC   (time, tile, j, i) float32 3MB ...
    EXFtaux    (time, tile, j, i) float32 3MB ...
    EXFtauy    (time, tile, j, i) float32 3MB ...
    oceTAUX    (time, tile, j, i_g) float32 3MB ...
    oceTAUY    (time, tile, j_g, i) float32 3MB ...
Attributes: (12/57)
    Conventions:                  CF-1.8, ACDD-1.3
    acknowledgement:              This research was carried out by the Jet Pr...
    author:                       Ian Fenty and Ou Wang
    cdm_data_type:                Grid
    comment:                      Fields provided on the curvilinear lat-lon-...
    coordinates_comment:          Note: the global 'coordinates' attribute de...
    ...                           ...
    time_coverage_duration:       P1M
    time_coverage_end:            1992-02-01T00:00:00
    time_coverage_resolution:     P1M
    time_coverage_start:          1992-01-01T12:00:00
    title:                        ECCO Sea Surface Height - Monthly Mean llc9...
    uuid:                         9302811e-400c-11eb-b69e-0cc47a3f49c3

s3_get mode#

The s3_get mode functions much like the download mode, except files are accesed in-cloud and downloading them to your local instance. If used with ecco_podaac_access, a dictionary containing the file paths/names is returned, that can then be used to open an xarray Dataset.

files_dict = ea.ecco_podaac_access([SSH_shortname,windstr_shortname],\
                                    StartDate='2010-01',EndDate='2010-06',\
                                    mode='s3_get',\
                                    download_root_dir=join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC_S3'))
files_dict[SSH_shortname]
{'ShortName': 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}

Total number of matching granules: 6
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc

=====================================
Time spent = 0.44377636909484863 seconds


{'ShortName': 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}

Total number of matching granules: 6
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc

=====================================
Time spent = 0.4701051712036133 seconds
['/home/jovyan/Downloads/ECCO_V4r4_PODAAC_S3/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc',
 '/home/jovyan/Downloads/ECCO_V4r4_PODAAC_S3/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc',
 '/home/jovyan/Downloads/ECCO_V4r4_PODAAC_S3/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc',
 '/home/jovyan/Downloads/ECCO_V4r4_PODAAC_S3/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc',
 '/home/jovyan/Downloads/ECCO_V4r4_PODAAC_S3/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc',
 '/home/jovyan/Downloads/ECCO_V4r4_PODAAC_S3/ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc']
# remove downloaded files
for shortname,files_list in files_dict.items():
    for file in files_list:
        os.remove(file)

Now use the mode with ecco_podaac_to_xrdataset:

%%time

ds_dict = ea.ecco_podaac_to_xrdataset([SSH_shortname,windstr_shortname],\
                                        StartDate='2010-01',EndDate='2010-06',\
                                        mode='s3_get',\
                                        download_root_dir=join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC_S3'))
{'ShortName': 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}

Total number of matching granules: 6
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc
downloading SEA_SURFACE_HEIGHT_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc

=====================================
Time spent = 0.43352532386779785 seconds


{'ShortName': 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}

Total number of matching granules: 6
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-01_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-02_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-03_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-04_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-05_ECCO_V4r4_native_llc0090.nc
downloading OCEAN_AND_ICE_SURFACE_STRESS_mon_mean_2010-06_ECCO_V4r4_native_llc0090.nc

=====================================
Time spent = 0.4638087749481201 seconds


CPU times: user 588 ms, sys: 379 ms, total: 968 ms
Wall time: 4.8 s
# remove downloaded files
for shortname,files_list in files_dict.items():
    for file in files_list:
        os.remove(file)

s3_get_ifspace mode#

This mode is similar to s3_get, but it will also query how much storage is available at the target download location before carrying out downloads. If the space to be occupied by the downloaded files is more than a specified fraction of available storage, the files are opened remotely (using s3_open), rather than using s3_get.

%%time

ds_dict = ea.ecco_podaac_to_xrdataset([SSH_shortname,windstr_shortname],\
                                        StartDate='2010-01',EndDate='2010-06',\
                                        mode='s3_get_ifspace',\
                                        download_root_dir=join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC_S3'),\
                                        max_avail_frac=0.01)
{'ShortName': 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}
{'ShortName': 'ECCO_L4_STRESS_LLC0090GRID_MONTHLY_V4R4', 'temporal': '2010-01-02,2010-06-30'}
Size of files to be downloaded to instance is 0.066 GB,
which is 1.47% of the 4.5040000000000004 GB available storage.
Download size is larger than specified fraction of available storage.
Generating file lists to open directly from S3.
CPU times: user 3.15 s, sys: 861 ms, total: 4.01 s
Wall time: 12.3 s

Notice that the files were opened remotely on S3 rather than downloaded, but only because max_avail_frac was set very low (0.01 or 1% of available storage). If max_avail_frac was set to its default value of 0.5 (50% of available storage), the files would have been downloaded.

Time comparison of access modes#

Based on the examples above, here are the wall times for generating ds_dict for each function (except download_subset which used a different set of data):

  • download: 5.52 s

  • download_ifspace: comparable to download

  • s3_open: 14.1 s

  • s3_open_fsspec: 3.34 s

  • s3_get: 4.8 s

  • s3_get_ifspace: 12.3 s (similar to s3_open because files were opened remotely)

These numbers will vary depending on the size and layout of the dataset(s) requested. But the “winner” is usually s3_open_fsspec, and the notebooks in our tutorial book use this mode by default. (However, changing the access mode is as easy as changing the mode option in the ecco_podaac_to_xrdataset function!)