tesliper.glassware.array_base

Core functionality of DataArray classes.

This module implements the base class for DataArrays and its core functionality, namely validation of array-like data, along with some helper functions. To implement a DataArray-like container, subclass the ArrayBase class and use one of the ArrayProperty classes to create a validated array-like instance attribute for your new class. You should also provide associated_genres class attribute to signalize, which genres this new DataArray-like class should be used for.

The most basic example may look like this:

>>> class MyDataArray(ArrayBase):
...     associated_genres = ("foo",)
...     filenames = ArrayProperty(dtype=str)
...     values = ArrayProperty(check_against="filenames")
...     def __init__(genre, filenames, values, allow_data_inconsistency=False):
...         super().__init__(genre, filenames, values, allow_data_inconsistency)
>>> foo_array = MyDataArray("foo", ["a", "b", "c"], values=[1, 2, 3])

This definition would be almost a re-implementation of what ArrayBase already provides, but is a good starting point for explanation, so lets elaborate on it a little. ArrayBase expects 4 parameters on initialization of its subclass: genre is a genre of data stored, filenames is a list of conformer identifiers, values is - not surprisingly - a list of data values for each conformer, and allow_data_inconsistency is a boolean flag that controls process of validation of array-like attributes.

filenames and values are ArrayProperty instances - values passed to the constructor as parameters of these names will be checked and validated, and stored as numpy.ndarrays. Moreover, filenames will be stored as strings, because we told the ArrayProperty this is our desired data type for this array-like attribute, using dtype=str. The default data type is float, so values will be converted to floats.

>>> foo_array.filenames
array(["a", "b", "c"], dtype=str)
>>> foo_array.values
array([1.0, 2.0, 3.0], dtype=float)

check_against="filenames" tells ArrayProperty to validate values using filenames as a reference for desired shape of values array. If shape is different than shape of the reference, InconsistentDataError is raised. If you will deal with multidimensional data, you can utilize check_depth parameter to signalize that arrays should have identical shapes only to some certain depth, for example check_depth=2 would accept arrays of shapes (10, 20) and (10, 20, 3) but would raise exception on arrays shaped (10,) and (10, 3). However, in our simple example it wouldn’t make much sense to check more than default depth of 1, since filenames have only one dimension.

>>> MyDataArray("foo", ["a", "b", "c"], values=[1, 2, 3, 4])
Traceback (most recent call last):
     ...
InconsistentDataError: values and filenames must have the same shape up to 1 dimensions.
Arrays of shape (3,) and (4,) were given.

The above exception is also raised if values given to ArrayProperty are a jagged sequence, that is not all entries of the array have identical number of sub-entries. An example of jagged array would be [[1, 2], [3]]. Data in this format usually comes from reading calculations of different molecules rather than conformers, or from corrupted or incomplete output files, so it is not allowed by default. However, if you are sure that you want to work with such data, you can pass allow_data_inconsistency=True to your MyDataArray constructor and ArrayProperty will try to fill-in missing values, producing numpy.ma.masked_array or at least will ignore inconsistencies. You can chose the fill value by specifying fill_value parameter on ArrayProperty instantiation.

Finally we specify associated_genres = ("foo",), which is the only thing in our example that’s not already defined by ArrayBase. This class attribute informs Conformers object that it should use this ArrayBase subclass to instantiate DataArray-like objects for data genres specified in associated_genres. It must be specified as a tuple of strings, buy may be left empty, if no genre should be associated with this particular class. However, the main pourpose of ArrayBase is to provide integration with Conformers machinery - if you wish to use ArrayProperty’s validation features only, you may safely use if in a custom class. It may define allow_data_inconsistency attribute, but it is optional (False is assumed).

>>> class CustomDataHolder:
...     allow_data_inconsistency=True  # class-level attribute will also work
...     points = ArrayProperty(fill_value=0)
...     def __init__(self, points):
...         self.points = points
...
>>> d = CustomDataHolder(points=((1,2,3),(1,2)))
>>> d.points
masked_array(
  data=[[1.0, 2.0, 3.0],
        [1.0, 2.0, --]],
  mask=[[False, False, False],
        [False, False,  True]],
  fill_value=0)

genre, filenames, values, and allow_data_inconsistency are stored on ArrayBase subclass automatically, if super().__init__() is called. However, if you introduce any new init parameters, you must bind them to the object by yourself. Moreover, if you wish to use Conformers automatic initialization of ArrayBase subclasses, you should name those additional parameters with a name of genre you’d like to be retrived or give them a default value, otherwise Conformers.arrayed() won’t know how to initialize such class.

Functions

find_best_shape(jagged)

Find shape of an array, that could fit arbitrarily deep, jagged, nested sequence of sequences.

flatten(items[, depth])

Yield items from any nested iterable as chain of values up to given depth.

longest_subsequences(sequences)

Finds lengths of longest subsequences on each level of given nested sequence.

mask(jagged)

Returns a numpy.array of booleans, of shape that best fits given jagged nested sequence jagged.

to_masked(jagged[, dtype, fill_value])

Convert jagged, arbitrarily deep, nested sequence to numpy.ma.masked_array with missing entries masked.

Classes

ArrayBase(genre, filenames, values[, ...])

Base class for data holding objects.

ArrayProperty(fget, numpy.ndarray]] = None, ...)

Property, that validates array-like value given to its setter and stores it as numpy.ndarray.

CollapsibleArrayProperty(fget, ...)

ArrayProperty that stores only one value, if all entries are identical.

DependentParameter(name, kind, genre_getter, ...)

A parameter that depends on the genre of data array.

JaggedArrayProperty(fget, ...)

ArrayProperty for storing intentionally jagged arrays of data.

tesliper.glassware.array_base.longest_subsequences(sequences: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]]) Tuple[int, ...][source]

Finds lengths of longest subsequences on each level of given nested sequence. Each subsequence should have same number of nesting levels.

Parameters

sequences (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.

Returns

Length of the longest subsequence for each nesting level as a tuple.

Return type

tuple of ints

Notes

If nesting level in not identical in all subsequences, lengths are reported up to first level of non-iterable elements.

>>> longest_subsequences([[[1, 2]], [[1], 2]])
(2,)

Examples

>>> longest_subsequences([[[1, 2]], [[1]]])
(1, 2)
>>> longest_subsequences([[[1, 2]], [[1], [1], [1]]])
(3, 2)
tesliper.glassware.array_base.find_best_shape(jagged: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]]) Tuple[int, ...][source]

Find shape of an array, that could fit arbitrarily deep, jagged, nested sequence of sequences. Reported size for each level of nesting is the length of the longest subsequence on this level.

Parameters

jagged (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.

Returns

Length of the longest subsequence for each nesting level as a tuple.

Return type

tuple of ints

Notes

If nesting level in not identical in all subsequences, size is reported up to first level of non-iterable elements.

>>> find_best_shape([[[1, 2]], [[1], 2]])
(2, 2)

Examples

>>> find_best_shape([[[1, 2]], [[1]]])
(2, 1, 2)
>>> find_best_shape([[[1, 2]], [[1], [1], [1]]])
(2, 3, 2)
tesliper.glassware.array_base.flatten(items: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]], depth: Optional[int] = None) Iterator[source]

Yield items from any nested iterable as chain of values up to given depth. If depth is None, yielded sequence is completely flat.

Parameters
  • items (NestedSequence) – Arbitrarily deep, nested sequence of sequences.

  • depth (int, optional) – How deep should fattening be.

Yields

Any – Values from items as flatted sequence.

tesliper.glassware.array_base.mask(jagged: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]]) numpy.ndarray[source]

Returns a numpy.array of booleans, of shape that best fits given jagged nested sequence jagged. Each boolean value of the output indicates if corresponding value exists in jagged.

Parameters

jagged (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.

Returns

Array of booleans, of shape that best fits jagged, indicating if value of same index exist in jagged.

Return type

numpy.array of bool

Notes

To use output as a mask of numpy.ma.masked_array, it should be inverted. >>> np.ma.array(values, mask=~mask(jagged))

Examples

>>> mask([[1, 2], [1]])
array([[True, True], [True, False]])
>>> mask([[1, 2], []])
array([[True, True], [False, False]])
>>> mask([[[1], []], [[2, 3]]])
array([[[True, False], [False, False]], [[True, True], [False, False]]])
tesliper.glassware.array_base.to_masked(jagged: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]], dtype: Optional[type] = None, fill_value: Optional[Any] = None) numpy.ma.core.MaskedArray[source]

Convert jagged, arbitrarily deep, nested sequence to numpy.ma.masked_array with missing entries masked.

Parameters
  • jagged (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.

  • dtype (type, optional) – Data type of the output. If dtype is None, the type of the data is figured out by numpy machinery.

  • fill_value (scalar, optional) – Value used to fill in the masked values when necessary. If None, a default based on the data-type is used.

Returns

Given jagged converted to numpy.ma.masked_array with missing entries masked.

Return type

numpy.ma.core.MaskedArray

Raises

ValueError – If jagged sequence has inconsistent number of dimensions.

Examples

>>> to_masked([[1, 2], [1]])
array(data=[[1, 2], [1, --]], mask=[[True, True], [True, False]])
>>> to_masked([1, [1]])
Traceback (most recent call last):
ValueError: Cannot convert to masked array: jagged sequence has inconsistent
number of dimensions.
class tesliper.glassware.array_base.ArrayProperty(fget: typing.Optional[typing.Callable[[typing.Any], numpy.ndarray]] = None, fset: typing.Optional[typing.Callable[[typing.Any, typing.Sequence], None]] = None, fdel: typing.Optional[typing.Callable[[typing.Any], None]] = None, doc: typing.Optional[str] = None, dtype: type = <class 'float'>, check_against: typing.Optional[str] = None, check_depth: int = 1, fill_value: typing.Any = 0, fsan: typing.Optional[typing.Callable[[typing.Sequence], typing.Sequence]] = None)[source]

Property, that validates array-like value given to its setter and stores it as numpy.ndarray.

Value given to property setter is:

  1. (optionally) sanitized with user-provided sanitizer function;

  2. (optionally) compared with another array-like attribute of the owner regarding their shape;

  3. transformed to numpy.ndarray of desired data type;

  4. stored in owner’s __dict__.

Setting, getting and deletition of the value may be customized using standard setter, getter and deleter decorators. Additionally, ArrayProperty provides an ArrayProperty.sanitizer decorator. If sanitizer function is provided, it is called as a first step of data validation and should return sanitized array-like value (given original value as a positional parameter).

Validation regarding shape of the value is triggered if check_against parameter is provided. It should be a name of owner’s other array-like attribute as a string. Shape of the value is than compared to the shape of this reference attribute. If shapes are not identical up to the first check_depth dimensions, InconsistentDataError is raised.

Value is always transformed to numpy.ndarray of specified dtype (float by default.) If such conversion cannot be done because value is a jagged array, InconsistentDataError will be raised. However, if owner allows for data inconsistency by defining owner.allow_data_inconsistency = True, non-matching shapes will be ignored and jugged arrays will be padded with fill_value and stored as numpy.ma.masked_array.

Parameters
  • fget – Custom getter for attribute. Default one just returns the stored value.

  • fset – Custom setter for attribute. Default one stores validated values in instance’s __dict__.

  • fdel – Custom deleter for attribute. Deleting attribute is not supported by default.

  • doc – Attribute’s docstring.

  • dtype – Data type of elements of this array-like attribute.

  • check_against – Which other instance’s attribute should be used as a reference for array’s shape. If shape of this attribute and reference attribute’s are different, an exception is raised. Only first check_depth dimensions are compared.

  • check_depth – How many dimensions should be compared when checking shape of the array.

  • fill_value – If values are a jagged array and instance.allow_data_inconsistency is True, this value will be passed to numpy.ma.masked_array constructor as a fill_value.

  • fsan – Custom sanitizer for attribute. “Sanitizer” is here understood as a function that transforms value received by the setter, before the value is validated (checked for corectness) and stored on the instance. fsan should return a sanitized value.

getter(fget: Optional[Callable[[Any], Sequence]])[source]

Descriptor to change the getter on an ArrayProperty.

setter(fset: Optional[Callable[[Any, Sequence], None]])[source]

Descriptor to change the setter on an ArrayProperty.

deleter(fdel: Optional[Callable[[Any], None]])[source]

Descriptor to change the deleter on an ArrayProperty.

sanitizer(fsan: Optional[Callable[[Sequence], Sequence]])[source]

Descriptor to change the sanitizer on an ArrayProperty. Function given as parameter should take one positional argument and return sanitized values. If any sanitizer is provided, it is always called with values given to ArrayProperty setter. Sanitation is performed before .check_input() is called.

check_shape(instance: Any, values: Sequence)[source]

Raises an error if values have different shape than attribute specified as check_against.

check_input(instance: Any, values: Sequence) numpy.ndarray[source]

Checks if values given to setter have same length as attribute specified with check_against.

Parameters
  • instance – Instance of owner class.

  • values – Values to validate.

Returns

Validated values.

Return type

numpy.ndarray

Raises
  • ValueError – If check_against is not None and list of given values have different length than getattr(instance, check_against). If given list of values cannot be converted to dtype type.

  • InconsistentDataError – If values is list of lists of varying size and instance doesn’t allow data inconsistency.

class tesliper.glassware.array_base.JaggedArrayProperty(fget: typing.Optional[typing.Callable[[typing.Any], numpy.ndarray]] = None, fset: typing.Optional[typing.Callable[[typing.Any, typing.Sequence], None]] = None, fdel: typing.Optional[typing.Callable[[typing.Any], None]] = None, doc: typing.Optional[str] = None, dtype: type = <class 'float'>, check_against: typing.Optional[str] = None, check_depth: int = 1, fill_value: typing.Any = 0, fsan: typing.Optional[typing.Callable[[typing.Sequence], typing.Sequence]] = None)[source]

ArrayProperty for storing intentionally jagged arrays of data. InconsistentDataError is only raised if ArrayProperty.check_shape() fails. Given values are converted to masked array and expanded as needed, regardless value of allow_data_inconsistency attribute.

Parameters
  • fget – Custom getter for attribute. Default one just returns the stored value.

  • fset – Custom setter for attribute. Default one stores validated values in instance’s __dict__.

  • fdel – Custom deleter for attribute. Deleting attribute is not supported by default.

  • doc – Attribute’s docstring.

  • dtype – Data type of elements of this array-like attribute.

  • check_against – Which other instance’s attribute should be used as a reference for array’s shape. If shape of this attribute and reference attribute’s are different, an exception is raised. Only first check_depth dimensions are compared.

  • check_depth – How many dimensions should be compared when checking shape of the array.

  • fill_value – If values are a jagged array and instance.allow_data_inconsistency is True, this value will be passed to numpy.ma.masked_array constructor as a fill_value.

  • fsan – Custom sanitizer for attribute. “Sanitizer” is here understood as a function that transforms value received by the setter, before the value is validated (checked for corectness) and stored on the instance. fsan should return a sanitized value.

check_input(instance: Any, values: Sequence) numpy.ndarray[source]

Checks if values given to setter have same length as attribute specified with check_against.

Parameters
  • instance – Instance of owner class.

  • values – Values to validate.

Returns

Validated values.

Return type

numpy.ndarray

Raises
  • ValueError – If check_against is not None and list of given values have different length than getattr(instance, check_against). If given list of values cannot be converted to dtype type.

  • InconsistentDataError – If values is list of lists of varying size and instance doesn’t allow data inconsistency.

check_shape(instance: Any, values: Sequence)

Raises an error if values have different shape than attribute specified as check_against.

deleter(fdel: Optional[Callable[[Any], None]])

Descriptor to change the deleter on an ArrayProperty.

getter(fget: Optional[Callable[[Any], Sequence]])

Descriptor to change the getter on an ArrayProperty.

sanitizer(fsan: Optional[Callable[[Sequence], Sequence]])

Descriptor to change the sanitizer on an ArrayProperty. Function given as parameter should take one positional argument and return sanitized values. If any sanitizer is provided, it is always called with values given to ArrayProperty setter. Sanitation is performed before .check_input() is called.

setter(fset: Optional[Callable[[Any, Sequence], None]])

Descriptor to change the setter on an ArrayProperty.

class tesliper.glassware.array_base.CollapsibleArrayProperty(fget: typing.Optional[typing.Callable[[typing.Any], numpy.ndarray]] = None, fset: typing.Optional[typing.Callable[[typing.Any, typing.Sequence], None]] = None, fdel: typing.Optional[typing.Callable[[typing.Any], None]] = None, doc: typing.Optional[str] = None, dtype: type = <class 'float'>, check_against: typing.Optional[str] = None, check_depth: int = 1, fill_value: typing.Any = 0, fsan: typing.Optional[typing.Callable[[typing.Sequence], typing.Sequence]] = None, strict: bool = False)[source]

ArrayProperty that stores only one value, if all entries are identical.

Parameters
  • fget – Custom getter for attribute. Default one just returns the stored value.

  • fset – Custom setter for attribute. Default one stores validated values in instance’s __dict__.

  • fdel – Custom deleter for attribute. Deleting attribute is not supported by default.

  • doc – Attribute’s docstring.

  • dtype – Data type of elements of this array-like attribute.

  • check_against – Which other instance’s attribute should be used as a reference for array’s shape. If shape of this attribute and reference attribute’s are different, an exception is raised. Only first check_depth dimensions are compared.

  • check_depth – How many dimensions should be compared when checking shape of the array.

  • fill_value – If values are a jagged array and instance.allow_data_inconsistency is True, this value will be passed to numpy.ma.masked_array constructor as a fill_value.

  • fsan – Custom sanitizer for attribute. “Sanitizer” is here understood as a function that transforms value received by the setter, before the value is validated (checked for corectness) and stored on the instance. fsan should return a sanitized value.

  • strict – Boolean flag indicating if check_input() should disallow values that are not all identical. If strict is True it will raise InconsistentDataError when setter is given such values. Defaults to False.

check_shape(instance: Any, values: Sequence)[source]

Raises an error if values have different shape than attribute specified as check_against. Accepts values with size of first dimension equal to 1, even if it is not identical to the size of the first dimension of said attribute.

check_input(instance: Any, values: Union[Sequence, Any]) numpy.ndarray[source]

If given values is not iterable or is of type str it is returned without change. Otherwise it is validated using ArrayProperty.check_input(), and collapsed to single value if all values are identical. If values are non-uniform and instance doesn’t allow data inconsistency, InconsistentDataError is raised.

Parameters
  • instance – Instance of owner class.

  • values – Values to validate.

Returns

Validated array or single value.

Return type

numpy.ndarray or any

Raises
  • ValueError – If ArrayProperty.check_against is not None and list of given values have different length than getattr(instance, ArrayProperty.check_against). If given list of values cannot be converted to ArrayProperty.dtype type.

  • InconsistentDataError – If values is list of lists of varying size and instance doesn’t allow data inconsistency. If property is declared as strict, given values are non-uniform and instance doesn’t allow data inconsistency.

deleter(fdel: Optional[Callable[[Any], None]])

Descriptor to change the deleter on an ArrayProperty.

getter(fget: Optional[Callable[[Any], Sequence]])

Descriptor to change the getter on an ArrayProperty.

sanitizer(fsan: Optional[Callable[[Sequence], Sequence]])

Descriptor to change the sanitizer on an ArrayProperty. Function given as parameter should take one positional argument and return sanitized values. If any sanitizer is provided, it is always called with values given to ArrayProperty setter. Sanitation is performed before .check_input() is called.

setter(fset: Optional[Callable[[Any, Sequence], None]])

Descriptor to change the setter on an ArrayProperty.

class tesliper.glassware.array_base.ArrayBase(genre: str, filenames: Sequence[str], values: Sequence, allow_data_inconsistency: bool = False)[source]

Base class for data holding objects.

It provides an automatic registration of its subclasses as a DataArray-like representations of all associated_genres declared by said subclass. A subclass should provide an associated_genres class attribute, even if it’s not supposed to be directly instantiated with data for any genre, it should be an empty tuple in such case. Otherwise, associated_genres should be a tuple of genre names as strings.

This base class provides the most basic set of attributes, a DataArray-like object should implement, listed in the Parameters section.

Parameters
  • genre – Name of the data genre that values represent.

  • filenames – Sequence of conformers’ identifiers.

  • values – Sequence of values for genre for each conformer in filenames.

  • allow_data_inconsistency – Flag signalizing if instance should allow data inconsistency (see ArrayProperty for details).

abstract property associated_genres

Genres associated with subclassing class.

Should be provided by subclass as class-level attribute. It will be used to determine what class to use to represent data of particular genre when requested via Conforemrs.arrayed() method. May be an empty sequence, if subclass is not intended to be used directly by tesliper’s machinery.

get_repr_args() Dict[str, Any][source]

Returns dictionary that can be used as keword-value pairs to instantiate identical object.

classmethod get_init_params() Dict[str, Union[str, inspect.Parameter]][source]

Returns parameters used to instantiate this class. genre is a genre of data array that is to be instantiated.

class tesliper.glassware.array_base.DependentParameter(name: str, kind: inspect._ParameterKind, genre_getter: Callable[[str], str], *, default: Any, annotation: Any)[source]

A parameter that depends on the genre of data array. It provies a _genre_getter callable attribute that is used to provide a name of data genre that should be used for this parameter.

It is hashable as the original inspect.Parameter, however it must be remembered that Python hashes functions based on their identity.

property genre_getter

Should be a function that given a genre of data array being instantiated, returns a genre that should be used for this parameter.

classmethod from_parameter(parameter: inspect.Parameter, genre_getter: Callable[[str], str])[source]

Casts given inspect.Parameter instance to this class.

empty

alias of inspect._empty

replace(*, name=<class 'inspect._void'>, kind=<class 'inspect._void'>, genre_getter=<class 'inspect._void'>, annotation=<class 'inspect._void'>, default=<class 'inspect._void'>)[source]

Creates a customized copy of the DependentParameter.