va_am package#

Submodules#

va_am.va_am module#

square_dims(size: Union[int, list[int], np.ndarray[int]], ratio_w_h: Union[int, float] = 1)[source]#

Function that return the needed dimensions for the plots of the encoded, given the latent dimension and the ratio between width and height.

Parameters:
  • size (int of list[int]) – The latent dimension size.

  • ratio_w_h (int or float) – Desired ration between width and height.

Returns:

(width, height) dimensions for grid.

Return type:

tuple

Raises:

ValueError – If input size is not positive. If ratio_w_h is <= 0.

Notes

  • Uses sympy.divisors for efficient factor calculation

  • Always returns smaller dimension first in tuple

  • For non-integer ratios, finds closest divisor match

standardize_dims(data: Union[DataArray, Dataset])[source]#

Standardize dimension names to ‘latitude’ and ‘longitude’ across common variants.

Parameters:

data (xarray.DataArray or xarray.Dataset) – Input data with spatial dimensions to standardize.

Returns:

Data with standardized dimension names.

Return type:

xarray.DataArray or xarray.Dataset

Raises:

ValueError – If input is not an xarray object.

Notes

  • Handles common dimension name variants: - Latitude: ‘lat’, ‘latitude’ - Longitude: ‘lon’, ‘long’, ‘longitude’

  • Case-insensitive matching

  • Preserves all other dimensions unchanged

  • Returns original object if no dimension renaming needed

Examples

>>> ds = xr.Dataset(coords={'LAT': [0, 1], 'LON': [0, 1]})
>>> standardized = standardize_dims(ds)
>>> list(standardized.dims)
['latitude', 'longitude']
runAE(input_dim: Union[int, list[int]], latent_dim: int, arch: int, use_VAE: bool, with_cpu: bool, n_epochs: int, data_pred: Union[np.ndarray, list, xr.DataArray], file_save: str, verbose: bool, compile_params: dict = {}, fit_params: dict() = {})[source]#

Function that performs the AE traing.

Parameters:
  • input_dim (int or list of int) – Contains the shape of the input data to the keras.model.

  • latent_dim (int) – Represent the shape of the latent (code) space.

  • arch (int) – Value that determine which model architecture sould be used to build the model.

  • use_VAE (bool) – Value that determines if the model should be a Variational Autoencoder or not.

  • with_cpu (bool) – Value that determines if the cpu should be used instead of (default) gpu.

  • n_epochs (int) – The number of epochs for the keras.model.

  • data_pred (np.ndarray) – Driver/predictor data (usually) to train the model.

  • file_save (str) – Where to save the .h5 model.

  • verbose (bool) – Value that determines if the execution information should be displayed.

  • compile_params (dict) – Dictionary that contains all the parameters (avaible depending on tensorflow/keras version) to use for the .compile() function.

  • fit_params (dict) – Dictionary that contains all the parametes (avaible depending on tensorflow/keras version) to use for the .fit() function, except for epochs and verbose.

Returns:

AE.encoder – Keras object that correspond to the fitted encoder model.

Return type:

keras.model.

Raises:
  • ValueError – If input/output dimensions mismatch.

  • RuntimeError – If GPU device unavailable when requested.

Notes

  • Saves encoder/decoder as separate files

  • Generates training loss plots in ./figures/

  • Automatically creates directories if missing

get_AE_stats(with_cpu: bool, use_VAE: bool, AE_pre=None, AE_ind=None, pre_indust_pred: Optional[Union[list, ndarray]] = None, indust_pred: Optional[Union[list, ndarray]] = None, data_of_interest_pred: Optional[Union[list, ndarray]] = None, period: str = 'both') Union[ndarray, list][source]#

Function used to obtain statistical information about the encoded data by the Autoencoder. It codifies train data based on the period and specific details of the architecture.

Parameters:
  • use_VAE (bool) – Booleans values that determines if the model should be VAE, if the cpu should be used instead gpu or if the, respectively.

  • with_cpu (bool) – Booleans values that determines if the model should be VAE, if the cpu should be used instead gpu or if the, respectively.

  • AE_pre (keras.model) – Encoders keras.model for pre and post industrial period.

  • AE_ind (keras.model) – Encoders keras.model for pre and post industrial period.

  • pre_indust_pred (list or np.ndarray) – Driver/predictor data of pre and post industrial period.

  • indust_pred (list or np.ndarray) – Driver/predictor data of pre and post industrial period.

  • data_of_interes_pred (list or np.ndarray) – Driver/predictor data of interest.

  • period (str) – Value that handle wich part of the data is used.

Returns:

A ndarray containind the data.

Return type:

list or ndarray.

Raises:

ValueError – If invalid period specified. If missing required models/data for period.

Notes

  • Computes absolute differences in latent space

  • Concatenates results for multi-period analyses

  • Flattens outputs to 1D array

am(file_params_name: str, ident: bool, teleg: bool, save_recons: bool, teleg_file: str = '.secret.txt')[source]#

Main function that orchestrates the Analog Method (AM) workflow. It handles configuration loading, preprocessing, analog search execution, and post-processing. Supports Telegram notifications and result saving.

Parameters:
  • file_params_name (str) – Path to the JSON configuration file containing analysis parameters.

  • ident (bool) – Flag to enable heatwave period identification before analysis.

  • teleg (bool) – Flag to enable Telegram notifications for warnings and errors.

  • save_recons (bool) – Flag to save reconstructed data as NetCDF files in the ‘./data/’ directory.

  • teleg_file (str, optional) – Path to Telegram credentials file (default is ‘.secret.txt’).

Raises:
  • OSError – If the configuration file is not found.

  • ValueError – If invalid parameters are detected in the configuration.

Notes

  • Reads parameters from JSON configuration file

  • Handles preprocessing of climate data

  • Performs analog search using specified method

  • Manages Telegram notifications if enabled

  • Saves reconstructions if enabled

  • Executes post-processing for result analysis

The function coordinates the entire AM workflow including optional heatwave identification, data preprocessing, model training, analog search, and result post-processing.

analogSearch(p: int, k: int, data_pred: Union[list, ndarray], data_of_interest_pred: Union[list, ndarray], time_pred: DataArray, data_target: Dataset, enhanced_distance: bool, threshold: Union[int, float], img_size: Union[list, ndarray], iter: int, threshold_offset_counter: int = 20, replace_choice: bool = True, target_var_name: str = 'air', file_time_name: str = 'analogues.npy') tuple[source]#

Funtion that performs the Analog Search Method for a given diver/predictor and target variable.

Parameters:
  • p (int) – The p-order of Minskowski distance to perform.

  • k (int) – Number of near neighbours to search.

  • data_pred (list or ndarray) – Driver/predictor data where to search.

  • data_of_interes_pred (list or ndarray) – Driver/predictor data to be searched.

  • time_pred (DataArray) – Time DataArray corresponding to the driver/predictor data where is searching.

  • data_target (Dataset) – Target Dataset Dataset used to check the target value.

  • enhanced_distance (bool) – Flag that decides if local proximity has to be performed or no.

  • threshold (int or float) – Threshold used in analogSearch to compute local proximity.

  • img_size (list or ndarray) – List that determine the size of the driver/predictor and target images.

  • iter (int) – How many random neighbours to select.

  • threshold_offset_counter (int) – Number used to perform the local proximity. Default 20.

  • replace_choice (bool) – Flag that indicates if iter selected can be replaced.

  • target_var_name (str) – The name of the Target Dataset variable in case of working with different Dataset.

  • file_time_name (str) – The name of the file where to save the found analogues.

Returns:

A tuple containing selected driver/predictor and target.

Return type:

tuple

Raises:

ValueError – If k exceeds available analogs. If p <= 0.

Notes

  • Uses scipy.spatial.minkowski_distance

  • Saves selected analog times to numpy file

  • Handles both encoded and raw predictor data

calculate_interest_region(interest_region: Union[list, ndarray], dims_list: int, resolution: Union[int, float, str] = 2, is_teleg: bool = False, secret_file: str = './secret.txt') list[source]#

Method which transform latitude/longitude degrees to index. It is used to increase the speed of the methods by using numpy arrays insted of Dataset or DataArray.

Parameters:
  • interest_region (list or ndarray) – List which contains the latitude and longitude degrees to be converted as index.

  • dims_list (list of int) – List that contain, in this order, the minimum latitude, maximum latitude, minumin longitude, maximum longitude. When resolution is ‘auto’, dims_list should be a tuple with (latitude, longitude).

  • resolution (int, float or str) – Degrees resolution employed. Default value is 2º. If resolution is ‘auto’ it will infer automatically the resolution (useful when resolution is not constant along the dimensions)

  • is_teleg (bool) – Flag that indicate if the warnings have to be sent to Telegram or not.

  • secret_file (str) – Auxiliar variable only needed if is_teleg True to read token and chat_id values.

Returns:

new_interest_region – A list that contains the equivalent index values, as [lat_start_idx, lat_end_idx, lon_start_idx, lon_end_idx]

Return type:

list

Raises:

ValueError – If resolution=’auto’ without coordinate data. If interest_region outside domain bounds.

Notes

  • Handles both 0-360 and -180-180 longitude systems

  • Automatically adjusts out-of-bounds regions

  • Uses np.isclose for coordinate matching with ‘auto’ resolution

save_reconstruction(params: dict, reconstructions_Pre_Analog: list, reconstructions_Post_Analog: list, reconstructions_Pre_AE: list, reconstructions_Post_AE: list)[source]#

Method that save the target reconstruction based on the runs maded. It do not return anything, only save the Xarray Datasets on the corresponding file on data folder. Each file have the format [name]-[period]-[method]-[time].nc.

Parameters:
  • params (dict) – A dictionary which contains all the needed parameters and configuration. Mainly loaded from the configuration file, with some auxiliar parameters added by other functions.

  • reconstruction_Pre_Analog (list) – A list with the multiple reconstructed pre-industrial data by the Analog Method, for each day (or week).

  • reconstruction_Post_Analog (list) – A list with the multiple reconstructed post-industrial data by the Analog Method, for each day (or week).

  • reconstruction_Pre_AE (list) – A list with the multiple reconstructed pre-industrial data by the AutoEncoder, for each day (or week).

  • reconstruction_Post_AE (list) – A list with the multiple reconstructed post-industrial data by the AutoEncoder, for each day (or week).

Raises:

IOError – If NetCDF writing fails.

Notes

  • Saves files to ./data/ directory

  • Uses xarray for NetCDF export

  • Filenames include execution timestamp

  • Averages multiple reconstructions

perform_preprocess(params: dict) tuple[source]#

Method that perform the preprocessing stage

Parameters:

params (dict) – A dictionary with needed parameters and configuration. Mainly loaded from the configuration file, with some auxiliar parameters added by other functions.

Returns:

A tuple of all needed data.

Return type:

tuple

Raises:
  • FileNotFoundError – If input datasets missing.

  • ValueError – If invalid time ranges.

Notes

  • Handles multiple time resolutions (daily/weekly/monthly)

  • Normalizes data per variable

  • Converts xarray DataArrays to numpy arrays

  • Manages train/test splits

runComparison(params: dict) tuple[source]#

Method that perform the preprocessing, use of the others previous methods, and comparison between analogSearch and AE + analogSearch.

Parameters:

params (dict) – A dictionary with needed parameters and configuration. Mainly loaded from the configuration file, with some auxiliar parameters added by other functions.

Returns:

A tuple of 4 elemets, each containing the corresponding reconstructions list data.

Return type:

tuple

Raises:

RuntimeError – If model loading fails.

Notes

  • Generates comparison CSV files

  • Produces KDE plots of results

  • Handles both CPU/GPU execution

  • Supports multiple latent dimensions

identify_heatwave_days(params: dict) Union[list, ndarray][source]#

Method that perform the identifitacion of the heat wave period, following the definition from http://doi.org/10.1088/1748-9326/10/12/124003.

Parameters:

params (dict) – A dictionary with needed parameters and configuration. Mainly loaded from the configuration file, with some auxiliar parameters added by other functions.

Returns:

heatwave_period – A list of datetime that contains the heat wave period.

Return type:

list or ndarray

Raises:

ValueError – If percentile out of [0,100] range.

Notes

  • Uses 90th percentile by default

  • Follows Russo et al. (2015) methodology

  • Generates validation plots in ./figures/

  • Handles both single and multi-day events

post_process(params_file: str, save_stats: bool = True, is_atribution: bool = False, compare_to_am: bool = True, target_stat: str = 'max')[source]#

Function to perform the post-process after the execution of the main code. This method will save a comparative figure and a statistical summary of the resultss.

Parameters:
  • params_file (str) – Path to the parameters file.

  • save_stats (bool, optional) – If the statistical summary needs to be saved or not, by default True. If False, stats are printed but not saved.

  • is_atribution (bool, optional) – Flag in case you are performing Atribution and want to get a comparison between Pre/Post period results, by default False

  • compare_to_am (bool, optional) – Falg in case you performed also the Classical AM and want compare between AE-AM and AM, by default True

  • target_stat (str, optional) – How to obtain the target value, by mean, max, min, etc., by default “max”.

Raises:

FileNotFoundError – If result files missing.

Notes

  • Generates KDE comparison plots

  • Produces multi-method statistical summaries

  • Handles both attribution and detection modes

  • Supports parallel execution results

va_am(ident: bool = False, method: str = 'day', config_file: str = 'params.json', secret_file: str = 'secrets.txt', verbose: bool = False, teleg: bool = False, period: str = 'both', save_recons: bool = False)[source]#

Equivalent to main function. Its scope is to provide a way to perform the same procedures as main function, but by importing it in another python code.

Parameters:
  • ident (bool) – Value of flag to performs the identification period task or not.

  • method (str) – Specify an method to execute between: day (default), days, seasons, execs, latents, seasons-execs, latents-execs or latents-seasons-execs

  • config_file (str) – The default name of the params/configuration file.

  • secret_file (str) – The default name of the Telegram bot informatin file.

  • verbose (bool) – Value of flag that indicates if verbosity information should be show or not.

  • teleg (bool) – Value of flag for sending Exceptions to Telegram bot.

  • period (str) – Specify the period where to perform the operation between both (default), pre or post.

  • save_recons (bool) – Value of flag for saving or not the reconstrucion information in an .nc file.

main()[source]#

Main

Function prepared for runing and managing the program functionality. It use the argparse module to manage the execution of va_am.py as a bash function. To see help use:

python va_am.py -h

Module contents#