MAQRBM package
Submodules
MAQRBM.marbm module
- class MAQRBM.marbm.MARBM(visible_units: int, hidden_units: int, sampler: str = 'SA', seed: Optional[int] = None)[source]
Bases:
torch.nn.modules.module.Module
Mode-Assisted Restricted Boltzmann Machine (MARBM).
Restricted Boltzmann machines (RBMs) are a class of generative models that have historically been challenging to train due to the complex nature of their gradient. The MARBM provides a novel approach to training RBMs by combining the standard gradient updates with an off-gradient direction. This direction is constructed using samples from the RBM ground state, also referred to as ‘mode’. By utilizing mode-assisted training, the RBM benefits from faster convergence, improved training stability, and lower converged relative entropy (KL divergence).
Parameters: - visible_units (int): Number of visible units in the RBM. - hidden_units (int): Number of hidden units in the RBM.
Attributes: - W (torch.Tensor): Weights connecting the visible and hidden units. - h_bias (torch.Tensor): Biases associated with the hidden units. - v_bias (torch.Tensor): Biases associated with the visible units. - free_energies (list): List to store computed free energies during training.
Methods: - forward: Compute the forward pass (probability of hidden given visible). - sample_hidden: Sample from the hidden layer given the visible layer. - sample_visible: Sample from the visible layer given the hidden layer. - contrastive_divergence: Perform a Contrastive Divergence (CD) step. - rbm2qubo: Convert RBM parameters to a QUBO matrix. - train: Train the RBM using mode-assisted training. - reconstruct: Reconstruct input data using the trained RBM. - compute_free_energy: Compute the free energy of a given configuration. - _mode_train_step: Execute one step of mode-assisted training. - _cd_train_step: Execute one step of training using Contrastive Divergence.
- compute_and_log_metric(val_loader, data, loss_metric, sigm)[source]
Computes and logs the specified metric using provided data.
Parameters: - val_loader (torch.utils.data.DataLoader): DataLoader for the validation dataset. - data (torch.Tensor): Input data tensor. - loss_metric (str): The loss metric to compute. Can be ‘free_energy’, ‘kl’, ‘mse’, ‘mae’, or ‘ssim’.
Returns: - metric_value (float): Computed metric value.
- compute_free_energy(v: torch.Tensor) torch.Tensor [source]
Compute the free energy of a given configuration.
The free energy is calculated using the formula: F(v) = - v * v_bias - Σ softplus(W * v + h_bias)
Parameters: v (torch.Tensor): The visible layer configuration, of shape [batch_size, visible_units].
Returns: torch.Tensor: The computed free energy for each configuration in the batch, of shape [batch_size].
- compute_kl_divergence(data: torch.Tensor) torch.Tensor [source]
Compute the KL divergence between the original data and its reconstruction.
data (torch.Tensor): Original input data, of shape [batch_size, visible_units].
torch.Tensor: KL Divergence between the original data and its reconstruction.
- compute_mae(data: torch.Tensor) torch.Tensor [source]
Compute the Mean Absolute Error (MAE) between the original data and its reconstruction.
data (torch.Tensor): Original input data, of shape [batch_size, visible_units].
torch.Tensor: Mean Absolute Error between the original data and its reconstruction.
- compute_mse(data: torch.Tensor) torch.Tensor [source]
Compute the Mean Squared Error (MSE) between the original data and its reconstruction.
data (torch.Tensor): Original input data, of shape [batch_size, visible_units].
torch.Tensor: Mean Squared Error between the original data and its reconstruction.
- compute_ssim(data: torch.Tensor, image_shape: tuple) torch.Tensor [source]
Compute the Structural Similarity Index Measure (SSIM) loss between the original flattened grayscale data and its reconstruction.
data (torch.Tensor): Original input flattened data, of shape [batch_size, visible_units]. image_shape (tuple): The shape (height, width) of the original images before flattening.
torch.Tensor: SSIM loss between the original data and its reconstruction.
- contrastive_divergence(input_data: torch.Tensor, k: int = 1) torch.Tensor [source]
Perform one step of Contrastive Divergence (CD) for training the RBM.
The method approximates the gradient of the log-likelihood of the data by running a Gibbs chain for a specified number of steps, k.
Parameters: - input_data (torch.Tensor): The visible layer data, of shape [batch_size, visible_units]. - k (int, optional): The number of Gibbs sampling steps. Defaults to 1.
Returns: - torch.Tensor: The difference between the outer product of the data and
hidden probabilities at the start and the end of the Gibbs chain, of shape [visible_units, hidden_units].
- extract_data_only_from_batch(batch)[source]
Extracts data from a given batch.
If the batch is a tuple or a list, it extracts the first element. Otherwise, it returns the batch as it is.
Parameters: - batch (Union[Tuple[torch.Tensor, Any], List[torch.Tensor, Any], torch.Tensor]): Input batch which can be a tuple, list, or a tensor.
Returns: - torch.Tensor: Extracted data from the batch.
- extract_features(v)[source]
Extract the features from the input using the hidden activations of the RBM.
Parameters: - v (torch.Tensor): Input data, corresponding to the visible units.
Returns: - torch.Tensor: The activations of the hidden units which can be used as features for downstream tasks.
- Usage:
rbm = MARBM(visible_units, hidden_units) rbm.load_model(path) input_data = … # Your data for which you wish to extract features features = rbm.extract_features(input_data) # You can now feed this into the subsequent layer of your model
- forward(v: torch.Tensor) torch.Tensor [source]
Compute the forward pass.
Given the visible units, compute the probability of the hidden units being activated.
- Parameters
v (torch.Tensor) – The visible units.
- Returns
Probability of hidden units being activated.
- Return type
torch.Tensor
- get_visualization_data()[source]
Retrieve the data used for visualization.
- tuple:
metrics_name (str): Name of the metrics.
metrics_values (list): A list of metric values collected during training.
sigm_values (list): A list of sigmoid values collected during training.
- kl_divergence(p: torch.Tensor, q: torch.Tensor) torch.Tensor [source]
Compute the Kullback-Leibler (KL) Divergence between two probability distributions.
p (torch.Tensor): True probability distribution, of shape [batch_size, visible_units]. q (torch.Tensor): Approximated probability distribution, of shape [batch_size, visible_units].
torch.Tensor: KL Divergence between p and q.
Ensure that both p and q are proper probability distributions, i.e., they both sum up to 1 and do not contain any negative values. If they do not sum up to 1, consider normalizing them.
- load_model(path)[source]
Load the weights and biases of the RBM from a saved state.
- Parameters
path (-) – Path from where to load the model’s state.
- lock_weights()[source]
Lock the weights of the RBM to prevent them from being updated during training. This is useful when utilizing the RBM in transfer learning scenarios, ensuring the pretrained weights remain unchanged.
- preprocess_to_binary(data, threshold=0.5)[source]
Convert the data into binary format based on a given threshold.
Parameters: - data (torch.Tensor): Input data tensor. - threshold (float, optional): The threshold for conversion to binary. Default is 0.5.
Returns: - binary_data (torch.Tensor): Data converted to binary format.
- rbm2qubo() numpy.ndarray [source]
Convert RBM parameters to a QUBO (Quadratic Unconstrained Binary Optimization) matrix.
The QUBO matrix is constructed using the weights and biases of the RBM. The diagonal of the QUBO matrix corresponds to biases, and the off-diagonal elements correspond to the weights.
- Returns
The QUBO matrix with shape (n_total, n_total), where n_total = n_visible + n_hidden.
- Return type
numpy.ndarray
- reconstruct(input_data: torch.Tensor) torch.Tensor [source]
Reconstruct the input data by passing it through the RBM’s hidden layer and then back to the visible layer.
Given an input visible layer, this method computes the activation of the hidden layer using the method sample_hidden, and then reconstructs the visible layer using the method sample_visible. This is a common approach in RBMs for data reconstruction.
Parameters: - input_data (torch.Tensor): A tensor representing the visible layer’s data to be reconstructed.
Shape should be (batch_size, visible_units).
Returns: - torch.Tensor: A tensor of the reconstructed visible layer. Shape is (batch_size, visible_units).
Sample from the hidden layer given the visible layer.
Given the state of the visible units, this method computes the probability of each hidden unit being activated and then samples from a Bernoulli distribution based on these probabilities.
Parameters: - v (torch.Tensor): A tensor representing the state of the visible units. It should have a shape of (batch_size, visible_units).
Returns: - torch.Tensor: A tensor representing the sampled state of the hidden units. It will have a shape of (batch_size, hidden_units).
- sample_visible(h: torch.Tensor) torch.Tensor [source]
Sample from the visible layer given the hidden layer.
Given the state of the hidden units, this method computes the probability of each visible unit being activated and then samples from a Bernoulli distribution based on these probabilities.
Parameters: - h (torch.Tensor): A tensor representing the state of the hidden units. It should have a shape of (batch_size, hidden_units).
Returns: - torch.Tensor: A tensor representing the sampled state of the visible units. It will have a shape of (batch_size, visible_units).
- save_model(path)[source]
Save the trained weights and biases of the RBM.
- Parameters
path (-) – Path to save the model’s state.
- set_sampler_parameters(num_reads=None, annealing_time=None, debugging=None, mainnet=None, logging=None)[source]
Set sampler parameters for the RBM.
- Parameters
num_reads (int, optional) – Number of reads for the sampler. If not provided, the existing value remains unchanged.
annealing_time (float, optional) – Annealing time for the sampler. If not provided, the existing value remains unchanged.
debugging (bool, optional) – Debugging mode flag. If not provided, the existing value remains unchanged.
mainnet (bool, optional) – Flag to determine if the Dynex online computing service should be accessed. If not provided, the existing value remains unchanged.
logging (bool, optional) – Logging mode flag. If not provided, the existing value remains unchanged.
Notes
This method updates the sampler parameters for the RBM based on the provided values.
- train(train_loader, val_loader=None, epochs=10, lr=0.01, k=1, sigm_a=20, sigm_b=- 6, p_max=0.1, plotper=100, loss_metric='free_energy', save_model_per_epoch=False, save_path='./saved_models')[source]
Trains the MARBM model on provided data using the specified parameters.
Parameters: - train_loader (torch.utils.data.DataLoader): DataLoader for the training dataset. - val_loader (torch.utils.data.DataLoader, optional): DataLoader for the validation dataset. Default is None. - epochs (int, optional): Number of training epochs. Default is 10. - lr (float, optional): Learning rate for optimization. Default is 0.01. - k (int, optional): Number of Gibbs sampling steps used in contrastive divergence. Default is 1. - sigm_a (float, optional): Coefficient for the sigmoidal function determining mode switching. Default is 20. - sigm_b (float, optional): Bias for the sigmoidal function determining mode switching. Default is -6. - p_max (float, optional): Upper limit for the probability of the sigmoidal switch function. Must be within (0, 1]. Default is 0.1. - plotper (int, optional): Frequency for calculating and logging the free energy. Default is 100. - loss_metric (str, optional): Metric for loss computation. Accepts ‘free_energy’, ‘kl’ or ‘mse’. Default is ‘free_energy’. - save_model_per_epoch (bool, optional): If True, the model will be saved after every epoch. Default is False. - save_path (str, optional): Path to the directory where models should be saved. Used only if save_model_per_epoch is True. Default is ‘./saved_models’.
Notes: Training alternates between mode-based training and contrastive divergence based on stochastic switching. The probability of selecting mode-based training at each step is given by the sigmoid function ‘sigm = p_max / (1 + np.exp( -sigm_a * (iter_idx + epoch * steps_per_epoch) / total_steps - sigm_b))’. The sigmoid function ensures that as training progresses, especially in the later epochs, there’s an increased likelihood of using mode-based training. This is useful as mode training in the later steps helps the model converge more effectively. Free energy or KL loss is periodically computed and stored based on the plotper interval.
- training: bool