Reproducing ECCO Version 4 Release 4#

ECCO Version 4 Release 4 (V4r4) is ECCO’s latest publicly available central estimate (see its data repository on PO.DAAC, which has been used in numerous studies (e.g., Wu et al. (2020)). Wang et al. (2023) provides detailed instructions on how to reproduce the ECCO V4r4 estimate. In this tutorial, we follow the instructions and reproduce the ECCO V4r4 estimate on the P-Cluster.

Obtaining Code and Run-time Parameters#

We first connect to the P-Cluster and change the directory to the user’s directory on /efs_ecco, as described in the P-Cluster introduction tutorial:

ssh -i /path/to/privatekey -X USERNAME@34.210.1.198
cd /efs_ecco/USERNAME

We then follow Sections 2 and 3 of Wang et al. (2023) to download the MITgcm checkpoint (c66g), release-specific code, and run-time parameters. The input files, such as atmospheric forcing and initial conditions, as described in Section 4 of Wang et al. (2023), are several hundreds gigabytes in size. For the sake of time,these input files have been downloaded and stored on the P-Cluster. These files will be linked in the example run script described below.

Modules#

The modules used in the P-Cluster differ from those specified in Section 5 of Wang et al. (2023). They have been loaded in the example .bashrc, which should have been downloaded and renamed to /home/USERNAME/.bashrc as described in the P-Cluster introduction tutorial, so that the required modules are loaded automatically. Specificity, the modules are loaded in example .bashrc as follows:

Module Load Command

module load intel-oneapi-compilers-2021.2.0-gcc-11.1.0-adt4bgf

module load intel-oneapi-mpi-2021.2.0-gcc-11.1.0-ibxno3u

module load netcdf-c-4.8.1-gcc-11.1.0-6so76nc

module load netcdf-fortran-4.5.3-gcc-11.1.0-d35hzyr

module load hdf5-1.10.7-gcc-9.4.0-vif4ht3

With these modules pre-loaded by .bashrc, one can skip the module-loading step in the first box of Section 5.1 of Wang et al. (2023) and proceed directly to the compilation steps in the second box of Section 5.1 of Wang et al. (2023).

Compile#

The steps for compiling the code the same as described in the second box of Section 5.1 of Wang et al. (2023) except for one important change: you need to specify the optfile as ../code/linux_ifort_impi_aws_sysmodule.

cd WORKINGDIR/ECCOV4/release4
mkdir build
cd build
export ROOTDIR=../../../MITgcm
../../../MITgcm/tools/genmake2 -mods=../code -optfile=../code/linux_amd64_ifort+mpi_ice_nas -mpi
make -j16 depend
make -j16 all
cd ..

This optfile has been specifically customized for the P-Cluster. If successful, the executable mitgcmuv will be generated in the build directory.

Run the Model#

After one has issued the compilation steps, successfully compiled the code, and generated the executable in the build directory (WORKINGDIR/ECCOV4/release4/build/mitgcmuv), one can proceed with running the model. For this purpose, we provide an example run script that will integrate the model for three months (See below for how to change the run script to conduct a run over the period of 1992-2017, the entire V4r4 integration period).

SLURM Directives#

As described in the P-Cluster introduction tutorial, the P-Cluster uses SLURM as the batch system. There are a few SLURM directives at the beginning of the run script that request the necessary resources for conducting the run. These SLURM directives are as follows:

SBATCH Commands

Description

#SBATCH -J ECCOv4r4

Job name is ECCOv4r4.

#SBATCH –nodes=3

Request three nodes.

#SBATCH –ntasks-per-node=36

Each node has 36 tasks (processes).

#SBATCH –time=24:00:00

Request a wall clock time of 24 hours.

#SBATCH –exclusive

No other jobs will be scheduled on the same nodes while the job is running.

#SBATCH –partition=sealevel-c5n18xl-demand

Request an on-demand Amazon EC2 C5n instances (Tab C5n under Product Details) that has 72 vCPU and 192 GB memory.

#SBATCH –mem-per-cpu=1GB

Each CPU/process/task has 1GB memory.

#SBATCH -o ECCOv4r4-%j-out

Batch output log file.

#SBATCH -e ECCOv4r4-%j-out

Batch error log file.

Submit the Run and Check Job Status#

To submit the run, issue the following command at /efs_ecco/USERNAME/ECCO/V4/r4/WORKINGDIR/ECCOV4/release4

sbatch run_script_slurm.bash

Once submitting the job, SLURM will generate a job id and show the following message:

Submitted batch job 123

We can then check the status the job by using the following command:

squeue

Usually, SLURM takes several minutes to configure a job, with the status (ST) showing CF (for configuring):

             JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
               123 sealevel- ECCOv4r4 USERNAME  CF       0:53      3 sealevel-c5n18xl-demand-dy-c5n18xlarge-[1-3]

After a while, squeue will show the status changing to R (for run) as shown in following:

             JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
               123 sealevel- ECCOv4r4 USERNAME  R        3:30      3 sealevel-c5n18xl-demand-dy-c5n18xlarge-[1-3]

The run directory is /efs_ecco/USERNAME/ECCO/V4/r4/WORKINGDIR/ECCOV4/release4/run. The 3-month integration takes less than 20 minutes to complete. NORMAL END inside the Batch log file /efs_ecco/USERNAME/ECCO/V4/r4/WORKINGDIR/ECCOV4/release4/ECCOv4r4-123-out indicates a successfully completed run. The run will output monthly means and snapshots of diagnostic fields in /efs_ecco/USERNAME/ECCO/V4/r4/WORKINGDIR/ECCOV4/release4/run/diags/. These fields can be analyzed using Jupyter Notebooks presented in some of the ECCO Hackathon tutorials.

To conduct the entire 26-year (1992–2017) run, comment out the following three lines in the script:

unlink data
cp -p ../namelist/data .
sed -i '/#nTimeSteps=2160,/ s/^#//; /nTimeSteps=227903,/ s/^/#/' data

References#

Wang, O., & Fenty, I. (2023). Instructions for reproducing ECCO Version 4 Release 4 (1.5). Zenodo. https://doi.org/10.5281/zenodo.10038354

Wu, W., Zhan, Z., Peng, S., Ni, S., & Callies, J. (2020). Seismic ocean thermometry. Science, 1515(6510), 1510–1515. https://doi.org/10.1126/science.abb9519