How to…?#

Configuration file#

Regardless the way you will use VA-AM (inside or outside a python code), you will need a configuration file. It has to be a JSON file with a structure as:

{
    "season":                   "all",
    "name":                     "-city_athens1987-",
    "latitude_min":             28,
    "latitude_max":             66,
    "longitude_min":            -8,
    "longitude_max":            50,
    "pre_init":                 "1851-01-06",
    "pre_end":                  "1950-12-31",
    "post_init":                "1951-01-01",
    "post_end":                 "2014-12-28",
    "data_of_interest_init":    "1987-06-01",
    "data_of_interest_end":     "1987-08-31",
    "load_AE":                  false,
    "load_AE_pre":              true,
    "file_AE_pre":              "./models/AE_pre.h5",
    "file_AE_post":             "./models/AE_post.h5",
    "latent_dim":               600,
    "use_VAE":                  true,
    "with_cpu":                 false,
    "n_epochs":                 5,
    "k":                        20,
    "iter":                     1000,
    "interest_region":          [38,38,24,24],
    "resolution":               2,
    "interest_region_type":     "coord",
    "per_what":                 "per_day",
    "remove_year":              false,
    "replace_choice":           true,
    "arch":                     5,
    "verbose":                  true,
    "temp_dataset":             "~/path/to/data/data_dailyMax_t2m_1940-2022.nc",
    "prs_dataset":              "~/path/to/data/prmsl.nc",
    "ident_dataset":            "~/path/to/data_dailyMax_t2m_1940-2022.nc",
    "temp_var_name":            "t2m_dailyMax",
    "p":                        2,
    "enhanced_distance":        true,
    "compile_params":           {
                                    "optimizer":"adamax",
                                    "loss":"mape",
                                    "metrics":["mae", "mse"]
                                },
    "fit_params":               {
                                    "batch_size":64,
                                    "shuffle":true,
                                    "validation_split":0.15
                                }
}

By the flag -f | --configfile or the config_file parameter you can provide the Path to your .json file containing the configuration parameters. If not provided, the program will search for a default params.json file in the directory.

Then, we provide a list of all posible parameters, the type of parameter and a brief description of each one:

Parameter

Type

Description

season

str or list of str

String (or list of strings) that Specify in wich season to perform the method, between: spring, summer, autumn , winter, spring-summer, autumn-winter or all period.

name

str

Arbitrary name for identification of the execution/simulation and result file.

latitude/longitude

int

The defined search region in terms of minimal and maximal latitude and longitude.

interest_region

list of int

Defined interest region where the reconstruction has to be maded, in terms of initial and end latitude and logitude. In should be a subregion of the defined search region. Otherwise it could be also the entire search region, but not bigger that it. See interest_region_type parameter for more details.

interest_region_type

str

Define if the interest_region list refers to list/array index positions (idx option) or to spatial coordinates (coord option).

resolution

int

Coordinates resolution of the dataset (Defaul value 2).

pre/post

str

String with datetime of start (_init) and end (_end) of what we consider pre and post industrial data of our datasets. We can divide the datasets in 2 different states to analyse, or use only one of them (e.g. post) to analyse all your datasets.

period

str

String that indicates in wich period the analysis will be performed. If could be both (default), only pre or only post.

data_of_interest

str

Same as previous, but for specify which is your interest datetime. (See Identify)

load_AE

bool

Flag that specify if the VA sould be loaded from the file_AE. If false, the VA would be re-trained.

load_AE_pre

bool

Same as previous flag, but only for VA in pre epoch.

file_AE

str

Path to where to save the trained models of VA for pre and post. If load_AE is true, also represents from where the models will be loaded.

latent_dim

int

Latent (or code) dimension to which the predictor/driver should be reduced (or codified).

use_VAE

bool

Flag. If true and the arch is compatible, it will use a Variational Autoencoder instead of a normal Autoencoder architecture.

with_cpu

bool

Flag that indicate if the CPU or GPU version of tensorflow should be used, in case of having (or not) a GPU.

n_epochs

int

Number of maximum epoch of training step.

n_execs

int

If method is one of execs, seasons-execs, latents-execs or latents-seasons-execs, it indicates the number of executions to perform with the model (Defaul value 5).

k

int

How many analogue situation to select from the nearest ones. If k = 3 the method will select the 3 nearest analogue situations. (Default value is 20).

iter

int

Number of random extraction to perform from the k nearest analogues, in order to make a reconstruction of the event.

per_what

str

String to specify if the analysis should be diary (per_day) or weekly (per_week). Until now, this are the available option. In later versions montly and yearly analysis will be avaiable.

remove_year

bool

Flag that indicates if the year of the interest period should be removed entirely or not. If false, only the period between data_of_interest_init and data_of_interest_end will be removed from the dataset.

replace_choice

bool

Flag that determines if the iter random selection have to be perfomed with (true) or without (false) replacement.

arch

int

Wich architecture of the available has to to be used. See section for the available architectures.

verbose

bool

If true, several prints and warnings during the exectution will be showed. Also can be controled by -v | --verbose flag or verbose parameter of the outside and inside code execution of program.

temp/prs_dataset

str

Path to target (temp) and predictor/driver (prs) datasets (netcdf4 or grib).

ident_dataset

str

Path to dataset where the identification will be performed. It could be the same (or not) as the target dataset.

temp_var_name

str

Name of target variable in the dataset (default value if not specified is inferred from the dataset).

prs_var_name

str

Name of predictor/driver variable in the dataset. In case you don’t specify it, the name will be inferred automatically. In future multi-variate VA-AM version, this parameter will change, probably to a list of strings or something like this.

p

int

Wich p-Minkowski distance to perform while the analog search, where taxicab distance is p=1, euclidean distance is p=2, and so on (default value 2)

enhanced_distance

bool

Flag that indicates if the enhanced local proximity criterion should be used along with the p-Minkowski distance.

save_recons

bool

Flag that indicates if the reconstruction of the target event should be saved (default value false).

percentile

int

Wich percentile should be used during the identification step (default value 90).

out_preprocess

str or list[str]

What to return from perform_preprocess function. Default value is all. The possible output are: params, img_size, data_prs, data_temp, time_pre_indust_prs, time_indust_prs, data_of_interest_prs, data_of_interest_temp, x_train_pre_prs, x_train_ind_prs, x_test_pre_prs, x_test_ind_prs, pre_indust_prs, pre_indust_temp, indust_prs, indust_temp

compile_params

dict

Dictionary wich contains the configuration input arguments for the model.compile() method, depending on the tensorflow/keras version.

fit_params

dict

Dictionary wich contains the configuration input arguments for the model.fit() method, except for epochs and verbose, depending on the tensorflow/keras version.

Functionality#

This package provide, for now, the below functionality. More are expected in future versions. The github repository have some example of configuration files for some well known heat waves, but you should first check the Configuration file section.

Identify heat waves#

We can perform the identifitacion of the heat wave period, following the definition from Russo paper. You will need a dataset of, ideally, maximum daily (or weekly) temperature as ident_dataset. From that you can perform the identification by by -i | --identifyhw flag or ident parameter as shown below, with the corresponding Configuration file.

# Outside of the python code
$ python -m va_am -i -f "path/to/config-file" ...
# Inside of the python code
from va_am import va_am
va_am(ident=True, config_file="path/to/config-file", ...)

Default methods of package are for Analog search or Va-AM, so you can face 2 different scenarios: you will want to make de itentification as a first step of the other methods, or you will want to only make the identification.

In case you will use the identification as a first step of other methods, it is compatible with all methods except day. E.g., for method execs:

# Outside of the python code
$ python -m va_am -i -m execs -f "path/to/config-file" ...
# Inside of the python code
from va_am import va_am
va_am(ident=True, method="execs", config_file="path/to/config-file",  ...)

In case you will use only the identification, is not required to specify any method. If the -i | --identifyhw flag is used, it will return a warning like Indentify Heat wave period (flag -i --identifyhw) for {params['name'][1:-1]} is not compatible with default 'method' ('day') and this will not be executed indicating that only the identification is going to be performed (instead of defauls day method).

# Outside of the python code
$ python -m va_am -i -f "path/to/config-file" ...
# Inside of the python code
from va_am import va_am
va_am(ident=True, config_file="path/to/config-file",  ...)

Note

If Telegram bot is used you will also recive this warning. See section for more details.

VA-AM methods#

The usual functionality of VA-AM is to use deep learning methods (mainly Autoencoder-based) to enhance the performance of the classic analog. We provide several already-done architectures, such as Variational-Autoencoder , Autoencoder, Deep-Autoencoder, Simetric-Autoencode r, among others (see API reference).

Note

Where the order of architecture in the documentation correspond to its arch value in Configuration file.

For heat wave case a specific architecture is recommended (arch=5)

Is expected to implement in future versions a user-framework or method to use user-own architecture in VA-AM.

Telegram bot#

VA-AM include compatibility with a Telegram bot as warn and allert mechanism. It could be useful when you are performing diferent long task and want to be notified about possibles errors, exceptions and warnings.

To use it is quite easy by -t | --teleg flag or teleg parameter as shown below, but first you will need to fulfill some previous steps:

# Outside of the python code
$ python -m va_am -t ...
# Inside of the python code
from va_am import va_am
va_am(..., teleg=True)

Step 1. Create your own Telegram bot#

For the -t | --teleg option to work, you will need to create your own Telegram bot, which will be who will notify you. BotFather is a built-in Telegram bot that allows you to create another bots. We recommend to follow this Tutorial in order to create the bot.

Note

It is very important to save the token provided by BotFather of your Telegram bot.

Step 2. Create a channel or group#

The next step is to create a Telegram channel or group where you will get the allerts. We recommed the use of a channel, but also a group could be possible. You will need to add your created bot to this channel (or group) and allow it to send message (check the permissions you give to other users/bots as admin of the channel).

When everything ready, you could follow the next step of the Tutorial to get the chat id. Some snippet like the following could give you the chat id:

import requests

TOKEN = "YOUR TELEGRAM BOT TOKEN"
url = f"https://api.telegram.org/bot{TOKEN}/getUpdates"

print(requests.get(url).json())

Note

Chat id is an integer number that represents the channel (or group) which bot is member. It is important to Note that it could be a possitive or negative integer number, so be aware about the - sign.

Step 3. Telegram secrets configuration file#

The last step is to provide a secret file to the program to be able to use your Telegram bot. By the flag -sf | --secretfile or the secret_file parameter you can provide the Path to your .txt (or similar) file containing the secrets.

# Outside of the python code
$ python -m va_am -sr path/to/secret-file ...
# Inside of the python code
from va_am import va_am
va_am(..., secret_file="path/to/secret-file")

If not specified the secret file path, it will be searched at the default secret.txt file.

The scructure of the secret file need to be:

[TOKEN]
[chat-id]
@[user-name]

Important

VA-AM will send exceptions and warnings to the Telegram bot. In order to distinguish better exceptions from warnings, it use your [user-name] to notify you. If not wanted to follow this functionality, you could not provide it and replace @[user-name] by and empty space. In any case, a third row is needed in the file, regardless it is empty, a white/blank space, or your @[user-name].

Caution

DON’T SHARE YOUR SECRET FILE WITH ANYONE!!!!

The [TOKEN] provides absolute access and admin permissions with your bot. In the wrong hands, it could end in a mess (probably your bot will became a spam bot, at best). If your going to use VA-AM in a repository (especially a public one), we recommed you to add your secret file name to the .gitignore file.