vegasflow package

Module contents

Monte Carlo integration with Tensorflow

Submodules

vegasflow.vflow module

This module contains the VegasFlow class and all its auxiliary functions

The main interfaces of this class are the class VegasFlow and the vegas_wrapper

vegasflow.vflow.importance_sampling_digest(xn, divisions)

Importance sampling algorithm: receives a random array (number of dimensions, number of dim) containing information about from which bins in the grid (n_dims, BINS_MAX+1) the random points have to be sampled

This algorithm is shared between the simplest form of Vegas (VegasFlow: only importance sampling) and Vegas+ (VegasFlowPlus: importance and stratified sampling) and so it has been lifted to its own function

xn: float tensor (n_dim, n_events)
which bins to sample from

divisions: float tensor (n_dims, BINS_MAX+1)
grid of divisions for the importance sampling algorithm

Returns

ind_i (integer tensor (n_events, n_dim)) – index in the divisions grid from which the points should be sampled
x (float tensor (n_events, n_dim)) – random values sampled in the divisions grid
xdelta (float tensor (n_events,)) – weight of the random points

vegasflow.vflow.refine_grid_per_dimension(t_res_sq, subdivisions)

Modifies the boundaries for the vegas grid for a given dimension

Parameters

t_res_sq (tensor) – array of results squared per bin
subdivision (tensor) – current boundaries for the grid

Returns

`new_divisions` – array with the new boundaries of the grid

Return type

tensor

class vegasflow.vflow.VegasFlow(n_dim, n_events, train=True, main_dimension=0, **kwargs)

Bases: MonteCarloFlow

Implementation of the important sampling algorithm from Vegas.

Parameters

n_dim (int) – number of dimensions to be integrated
n_events (int) – number of events per iteration
train (bool) – whether to train the grid
main_dimension (int) – in case of vectorial output, main dimenison in which to train

make_differentiable(): Freeze the grid if the function is to be called within a graph

freeze_grid(): Stops the grid from refining any more

unfreeze_grid(): Enable the refining of the grid

save_grid(file_name)

Save the divisions array in a json file

Parameters

file_name (str) –
checkpoint (Filename in which to save the) –

load_grid(file_name=None, numpy_grid=None)

Load the divisions array from a json file or from a numpy_array

Parameters

file_name (str) –
stored (Filename in which the grid json is) –
numpy_grid (np.array) –
with (Numpy array to substitute divisions) –

refine_grid(arr_res2)

Receives an array with the values of the integral squared per bin per dimension (arr_res2.shape = (n_dim, self.grid_bins)) and reshapes the divisions attribute accordingly

Parameters: arr_res2 (result the integrand sq per dimension and grid bin) –

Function not compiled

vegasflow.vflow.vegas_wrapper(integrand, n_dim, n_iter, total_n_events, **kwargs)

Convenience wrapper

Parameters

integrand (tf.function) –
n_dim (number of dimensions) –
n_iter (number of iterations) –
n_events (number of events per iteration) –

Returns

`final_result` (integral value)
`sigma` (monte carlo error)

vegasflow.vflow.vegas_sampler(*args, **kwargs)

Convenience wrapper for sampling random numbers

Parameters

integrand (tf.function) –
n_dim (number of dimensions) –
n_events (number of events per iteration) –
training_steps (number of training_iterations) –

Returns

`sampler`

Return type

a reference to the generate_random_array method of the integrator class

vegasflow.plain module

Plain implementation of the plainest possible MonteCarlo

class vegasflow.plain.PlainFlow(n_dim, n_events, events_limit=1000000, list_devices=['GPU'], verbose=True, xmin=None, xmax=None, **kwargs)

Bases: MonteCarloFlow

Simple Monte Carlo integrator.

vegasflow.plain.plain_wrapper(*args, **kwargs): Wrapper around PlainFlow

vegasflow.plain.plain_sampler(*args, **kwargs): Wrapper sampler around PlainFlow

vegasflow.monte_carlo module

Abstract class for Monte Carlo integrators implements a distribution of events across multiple devices and tensorflow graph technology

Usage:

In order to implement a new MonteCarloFlow integrator it is necessary to implement (at least) two methods:

_run_event: integrand
This function defines what to do in order to run one event of the Monte Carlo. It is used only for compilation, as the actual integration is done by the run_event method. In order to use the full capabilities of this library, _run_event can take a number of events as its input so it can run more than one event at the same time. All results from _run_event will be accumulated before being passed to _run_iteration.
_run_iteration:
This function defines what to do in a full iteration of the MonteCarlo (i.e., what to do in order to run for n_events)

Device distribution:

The default behaviour is defined in the configflow.py file.

This class will go through the devices given in the list_devices argument and consider them all active and enabled. Then the integration will be broken in batches of events_limit which will be given to the first idle device found. This means if device A is two times faster than device B, it will be expected to get two times as many events. Equally so, if events_limit is greater than n_events, all events will be given to device A as it is the first one found idle.

vegasflow.monte_carlo.print_iteration(it, res, error, extra='', threshold=0.1): Checks the size of the result to select between scientific notation and floating point notation

class vegasflow.monte_carlo.MonteCarloFlow(n_dim, n_events, events_limit=1000000, list_devices=['GPU'], verbose=True, xmin=None, xmax=None, **kwargs)

Bases: ABC

Parent class of all Monte Carlo integrators using tensorflow

Parameters

n_dim (number of dimensions of the integrand) –
n_events (number of events per iteration) –
events_limit (maximum number of events per step) – if events_limit is below n_events each iteration of the MC will be broken down into several steps. Do it in order to limit memory. Note: for a better performance, when n_events is greater than the event limit, n_events should be exactly divisible by events_limit
list_devices (list of device type to use (use None to do the tensorflow default)) –

property n_events: Number of events to run in a single iteration

property events_per_run: Number of events to run in a single step. Use this variable to control how much the memory will be loaded

property history

Returns a list with a tuple of results per iteration This tuple contains:

result: result of each iteration
error: error of the corresponding iteration
histograms: list of histograms for the corresponding iteration

property xjac: The default jacobian is 1 / total number of events

generate_random_array(n_events, *args)

External interface for the generation of random points as a 2D array of (n_events, n_dim). It calls the internal version of _generate_random_array

Parameters

n_events (number of events to generate) –

Returns

`rnds` (array of (n_events, n_dim) random points)
`p(x)` (p(x) associated to the random points)

set_seed(seed): Sets the random seed

get_device(): Looks in the list of devices until it finds a device available, once found makes the device unavailable and returns it

release_device(device): Makes device available again

device_run(ncalls, sent_pc=100.0, **kwargs)

Wrapper function to select a specific device when running the event If the devices were not set, tensorflow default will be used

Parameters: ncalls (number of calls to pass to the integrand) –
Returns: `result`
Return type: raw result from the integrator

set_distribute(queue_object)

Uses dask to distribute the vegasflow run onto a cluster Takes as input a queue_object defining the jobs to be sent

Parameters: queue_object (dask_jobqueue object) –

make_differentiable(): Modifies the attributes of the integration so that it can be compiled inside Tensorflow functions (and, therefore, gradients calculated) Returns a reference to run_event, a method that upon calling it with no arguments will produce results and uncertainties for an integration iteration of ncalls number of events

run_event(tensorize_events=False, **kwargs)

Runs the Monte Carlo event. This corresponds to a number of calls decided by the events_per_run variable. The variable acc is exposed in order to pass the tensor output back to the integrator in case it needs to accumulate.

The main driver of this function is the event attribute which corresponds to the tensorflor compilation of the _run_event method together with the integrand.

Return type: The accumulated result of running all steps

trace(n_events=50): Trace part of the integration (only integrand and random number generator). Note that this is not able to trace post-integration steps as the base class is blind to them

compile(integrand, compilable=True, signature=None, trace=False, check=True)

Receives an integrand, prepares it for integration and tries to compile unless told otherwise.

The input integrand must receive, as an input, an array of random numbers. There are also one optional arguments that will be passed to the function:

weight: weight of each event,

so that the most general signature for the integrand is:

integrand(array_random, weight = None),

the minimal working signature fo the integrand will be

integrand(array_random).

In other words, the integrand must take at least one argument and the integrator will always pass the array of random numbers. For legacy compatibility, the keyword argument n_dim will be accepted but it will fixed to be equal to self.n_dim

This function will try to understand the signature of the function and compile it accordingly, this means:

<1> array_random: DTYPE of shape [None, n_dim] <2> weight: DTYPE of shape [None]

if the function posses any kewyword arguments not included in this list, it will be compiled with a generic tf.function call. This will work most of the time but could trigger retracing on shape-shifting calculations.

If the signature is not to be used, it can be set to false

Parameters

integrand (the function to integrate) –
compilable ((default True) if False, the integration) – is not passed through tf.function
signature ((default: True)) – whether to autodiscover the signature of the integrand
check ((default: True)) – check whether the integrand produces expected results and whether it is vectorial note, with check=False vectorial output will not work

run_integration(n_iter, log_time=True, histograms=None)

Runs the integrator for the chosen number of iterations.

histograms must be a tuple of tf.Variables. At the end of all iterations the histograms per iteration will be output. The variable histograms instead will contain the weighted accumulation of all histograms

Parameters

n_iter (int) – number of iterations
log_time (bool) – flag to decide whether to log the time each iteration takes
histograms (tuple of tf.Variable) – tuple containing the histogram variables so they can be emptied each each iteration

Returns

`final_result` (float) – integral value
`sigma` (float) – monte carlo error

Note: it is possible not to pass any histogram variable and still fill some histogram variable at integration time, but then it is the responsibility of the integrand to empty the histograms each iteration and accumulate them.

vegasflow.monte_carlo.wrapper(integrator_class, integrand, n_dim, n_iter, total_n_events, compilable=True)

Convenience wrapper for integration

Parameters

integrator_class (MonteCarloFlow inherited class) –
integrand (tf.function) –
n_dim (number of dimensions) –
n_iter (number of iterations) –
n_events (number of events per iteration) –

Returns

`final_result` (integral value)
`sigma` (monte carlo error)

vegasflow.monte_carlo.sampler(integrator_class, integrand, n_dim, total_n_events, training_steps=5, compilable=True, return_class=False)

Convenience wrapper for sampling random numbers

Parameters

integrator_class (MonteCarloFlow inherited class) –
integrand (tf.function) –
n_dim (number of dimensions) –
n_events (number of events per iteration) –
training_steps (number of training_iterations) –
return_class (whether to return the full instance of the class or only the random method) –

Returns

`sampler`

Return type

a reference to the generate_random_array method of the integrator class

vegasflow.utils module

This module contains tensorflow_compiled utilities

vegasflow.utils.consume_array_into_indices(input_arr, indices, result_size)

Accumulate the input tensor input_arr into an output tensor of size result_size. The accumulation occurs according to the array of indices.

For instance, input_array = [a,b,c,d] and vector column indices = [[0,1,0,0]].T (with result_size = 2) will result in a final_result: (a+c+d, b)

Parameters

input_arr – Array of results to be consumed
indices – Indices of the bins in which to accumulate the input array
result_size – size of the output array

Returns

Array of size result_size

Return type

final_result

vegasflow.utils.py_consume_array_into_indices(input_arr, indices, result_size): Python interface wrapper for consume_array_into_indices. It casts the possible python-object input into the correct tensorflow types.

vegasflow.utils.generate_condition_function(n_mask, condition='and')

Generates a function that takes a number of masks

and returns a combination of all n_masks for the given condition.

It is possible to pass a list of allowed conditions, in that case the length of the list should be n_masks - 1 and will be apply sequentially.

Note that for 2 masks you can directly use & and |

>>> from vegasflow.utils import generate_condition_function
>>> import tensorflow as tf
>>> f_cond = generate_condition_function(2, condition='or')
>>> t_1 = tf.constant([True, False, True])
>>> t_2 = tf.constant([False, False, True])
>>> full_mask, indices = f_cond(t_1, t_2)
>>> print(f"{full_mask=}

{indices=}”)

full_mask=<tf.Tensor: shape=(3,), dtype=bool, numpy=array([ True, False, True])> indices=<tf.Tensor: shape=(2, 1), dtype=int32, numpy= array([[0],

[2]], dtype=int32)>

>>> f_cond = generate_condition_function(3, condition=['or', 'and'])
>>> t_1 = tf.constant([True, False, True])
>>> t_2 = tf.constant([False, False, True])
>>> t_3 = tf.constant([True, False, False])
>>> full_mask, indices = f_cond(t_1, t_2, t_3)
>>> print(f"{full_mask=}

{indices=}”)

full_mask=<tf.Tensor: shape=(3,), dtype=bool, numpy=array([ True, False, False])> indices=<tf.Tensor: shape=(1, 1), dtype=int32, numpy=array([[0]], dtype=int32)>

n_mask: int: Number of masks the function should accept
condition: str (default=’and’): Condition to apply to all masks. Accepted values are: and, or

condition_to_idx: function: function(*masks) -> full_mask, true indices