Skip to content

Module pretraining

This module performs pretraining of the ice flow iflo_emulator on a glacier catalog to enhance its performance during glacier forward runs. Pretraining can be a computationally intensive task, taking a few hours to complete. This module should be executed independently, without involving any other IGM modules. Below is an example of a parameter file:

# @package _global_

defaults:
  - override /inputs: []
  - override /processes: [pretraining, iceflow]
  - override /outputs: []

processes:
  iceflow: 
    Nz : 10
    multiple_window_size : 8
    nb_layers : 16
    nb_out_filter : 32
    network : cnn
    new_friction_param : True
    retrain_emulator_lr : 0.0001
    solve_nbitmax : 1000
    solve_stop_if_no_decrease : False
  pretraining:
    epochs : 1000
    data_dir: data/surflib3d_shape_100
    soft_begining: 1000
    min_slidingco: 0.01
    max_slidingco: 0.4
    min_arrhenius: 5
    max_arrhenius: 400

To run this module, you first need access to a glacier catalog. A dataset of a glacier catalog (mountain glaciers) commonly used for pretraining IGM emulators is available here: DOI.

After downloading (or generating your own dataset), organize the folder surflib3d_shape_100 into two subfolders: train and test.

Parameters

Default configuration file (pretraining.yaml):

pretraining:
  data_dir: "/path/to/tfrecords/"
  batch_size: 1
  epochs: 1000
  experiment_name: "name_of_model"
  loss_type: "huber"
  learning_rate: 0.0001
  out_dir: "/path/to/save/models/"
  resume: false

Description of the parameters:

Name Description Default value Units
data_dir Path to the directory containing the TFRecord glacier catalog used for training. /path/to/tfrecords/
batch_size Number of samples per training batch for the neural network. 1
epochs Total number of training epochs for neural network pre-training. 1000
experiment_name Name of the experiment; used as the subdirectory under `out_dir` where checkpoints, figures, and artifacts are saved. name_of_model
loss_type Loss function for the data fidelity term: `mse` (mean squared error) or `huber`. huber
learning_rate Learning rate for the Adam optimizer. 0.0001
out_dir Base output directory; the experiment subdirectory (named by `experiment_name`) is created here to store checkpoints, figures, and artifacts. /path/to/save/models/
resume If True, restores the latest checkpoint from the experiment directory and continues training from the last saved epoch; raises an error if no checkpoint is found. False