tesliper.glassware.array_base
Core functionality of DataArray classes.
This module implements the base class for DataArrays and its core
functionality, namely validation of array-like data, along with some helper functions.
To implement a DataArray-like container, subclass the ArrayBase class
and use one of the ArrayProperty classes to create a validated array-like
instance attribute for your new class. You should also provide associated_genres class
attribute to signalize, which genres this new DataArray-like class should be
used for.
The most basic example may look like this:
>>> class MyDataArray(ArrayBase):
... associated_genres = ("foo",)
... filenames = ArrayProperty(dtype=str)
... values = ArrayProperty(check_against="filenames")
... def __init__(genre, filenames, values, allow_data_inconsistency=False):
... super().__init__(genre, filenames, values, allow_data_inconsistency)
>>> foo_array = MyDataArray("foo", ["a", "b", "c"], values=[1, 2, 3])
This definition would be almost a re-implementation of what ArrayBase already
provides, but is a good starting point for explanation, so lets elaborate on it a
little. ArrayBase expects 4 parameters on initialization of its subclass:
genre is a genre of data stored, filenames is a list of conformer identifiers,
values is - not surprisingly - a list of data values for each conformer, and
allow_data_inconsistency is a boolean flag that controls process of validation of
array-like attributes.
filenames and values are ArrayProperty instances - values passed to the
constructor as parameters of these names will be checked and validated, and stored as
numpy.ndarrays. Moreover, filenames will be stored as strings, because we told the
ArrayProperty this is our desired data type for this array-like attribute,
using dtype=str. The default data type is float, so values will be converted
to floats.
>>> foo_array.filenames
array(["a", "b", "c"], dtype=str)
>>> foo_array.values
array([1.0, 2.0, 3.0], dtype=float)
check_against="filenames" tells ArrayProperty to validate values using
filenames as a reference for desired shape of values array. If shape is different
than shape of the reference, InconsistentDataError is raised. If you will deal
with multidimensional data, you can utilize check_depth parameter to signalize that
arrays should have identical shapes only to some certain depth, for example
check_depth=2 would accept arrays of shapes (10, 20) and (10, 20, 3) but would raise
exception on arrays shaped (10,) and (10, 3). However, in our simple example it wouldn’t
make much sense to check more than default depth of 1, since filenames have only one
dimension.
>>> MyDataArray("foo", ["a", "b", "c"], values=[1, 2, 3, 4])
Traceback (most recent call last):
...
InconsistentDataError: values and filenames must have the same shape up to 1 dimensions.
Arrays of shape (3,) and (4,) were given.
The above exception is also raised if values given to ArrayProperty are a
jagged sequence, that is not all entries of the array have identical number of
sub-entries. An example of jagged array would be [[1, 2], [3]]. Data in this format
usually comes from reading calculations of different molecules rather than conformers,
or from corrupted or incomplete output files, so it is not allowed by default. However,
if you are sure that you want to work with such data, you can pass
allow_data_inconsistency=True to your MyDataArray constructor and
ArrayProperty will try to fill-in missing values, producing
numpy.ma.masked_array or at least will ignore inconsistencies. You can chose the
fill value by specifying fill_value parameter on ArrayProperty instantiation.
Finally we specify associated_genres = ("foo",), which is the only thing in our
example that’s not already defined by ArrayBase. This class attribute informs
Conformers object that it should use this ArrayBase subclass to
instantiate DataArray-like objects for data genres specified in
associated_genres. It must be specified as a tuple of strings, buy may be left
empty, if no genre should be associated with this particular class. However, the main
pourpose of ArrayBase is to provide integration with Conformers
machinery - if you wish to use ArrayProperty’s validation features only, you
may safely use if in a custom class. It may define allow_data_inconsistency
attribute, but it is optional (False is assumed).
>>> class CustomDataHolder:
... allow_data_inconsistency=True # class-level attribute will also work
... points = ArrayProperty(fill_value=0)
... def __init__(self, points):
... self.points = points
...
>>> d = CustomDataHolder(points=((1,2,3),(1,2)))
>>> d.points
masked_array(
data=[[1.0, 2.0, 3.0],
[1.0, 2.0, --]],
mask=[[False, False, False],
[False, False, True]],
fill_value=0)
genre, filenames, values, and allow_data_inconsistency are stored on
ArrayBase subclass automatically, if super().__init__() is called. However,
if you introduce any new init parameters, you must bind them to the object by yourself.
Moreover, if you wish to use Conformers automatic initialization of
ArrayBase subclasses, you should name those additional parameters with a name
of genre you’d like to be retrived or give them a default value, otherwise
Conformers.arrayed() won’t know how to initialize such class.
Functions
|
Find shape of an array, that could fit arbitrarily deep, jagged, nested sequence of sequences. |
|
Yield items from any nested iterable as chain of values up to given depth. |
|
Finds lengths of longest subsequences on each level of given nested sequence. |
|
Returns a numpy.array of booleans, of shape that best fits given jagged nested sequence jagged. |
|
Convert jagged, arbitrarily deep, nested sequence to numpy.ma.masked_array with missing entries masked. |
Classes
|
Base class for data holding objects. |
|
Property, that validates array-like value given to its setter and stores it as |
|
|
|
A parameter that depends on the genre of data array. |
|
|
- tesliper.glassware.array_base.longest_subsequences(sequences: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]]) Tuple[int, ...][source]
Finds lengths of longest subsequences on each level of given nested sequence. Each subsequence should have same number of nesting levels.
- Parameters
sequences (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.
- Returns
Length of the longest subsequence for each nesting level as a tuple.
- Return type
tuple of ints
Notes
If nesting level in not identical in all subsequences, lengths are reported up to first level of non-iterable elements.
>>> longest_subsequences([[[1, 2]], [[1], 2]]) (2,)
Examples
>>> longest_subsequences([[[1, 2]], [[1]]]) (1, 2) >>> longest_subsequences([[[1, 2]], [[1], [1], [1]]]) (3, 2)
- tesliper.glassware.array_base.find_best_shape(jagged: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]]) Tuple[int, ...][source]
Find shape of an array, that could fit arbitrarily deep, jagged, nested sequence of sequences. Reported size for each level of nesting is the length of the longest subsequence on this level.
- Parameters
jagged (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.
- Returns
Length of the longest subsequence for each nesting level as a tuple.
- Return type
tuple of ints
Notes
If nesting level in not identical in all subsequences, size is reported up to first level of non-iterable elements.
>>> find_best_shape([[[1, 2]], [[1], 2]]) (2, 2)
Examples
>>> find_best_shape([[[1, 2]], [[1]]]) (2, 1, 2) >>> find_best_shape([[[1, 2]], [[1], [1], [1]]]) (2, 3, 2)
- tesliper.glassware.array_base.flatten(items: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]], depth: Optional[int] = None) Iterator[source]
Yield items from any nested iterable as chain of values up to given depth. If depth is
None, yielded sequence is completely flat.- Parameters
items (NestedSequence) – Arbitrarily deep, nested sequence of sequences.
depth (int, optional) – How deep should fattening be.
- Yields
Any – Values from items as flatted sequence.
- tesliper.glassware.array_base.mask(jagged: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]]) numpy.ndarray[source]
Returns a numpy.array of booleans, of shape that best fits given jagged nested sequence jagged. Each boolean value of the output indicates if corresponding value exists in jagged.
- Parameters
jagged (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.
- Returns
Array of booleans, of shape that best fits jagged, indicating if value of same index exist in jagged.
- Return type
numpy.array of bool
Notes
To use output as a mask of numpy.ma.masked_array, it should be inverted. >>> np.ma.array(values, mask=~mask(jagged))
Examples
>>> mask([[1, 2], [1]]) array([[True, True], [True, False]]) >>> mask([[1, 2], []]) array([[True, True], [False, False]]) >>> mask([[[1], []], [[2, 3]]]) array([[[True, False], [False, False]], [[True, True], [False, False]]])
- tesliper.glassware.array_base.to_masked(jagged: Sequence[Union[Any, Sequence[Union[Any, NestedSequence]]]], dtype: Optional[type] = None, fill_value: Optional[Any] = None) numpy.ma.core.MaskedArray[source]
Convert jagged, arbitrarily deep, nested sequence to numpy.ma.masked_array with missing entries masked.
- Parameters
jagged (sequence [of sequences [of...]]) – Arbitrarily deep, nested sequence of sequences.
dtype (type, optional) – Data type of the output. If dtype is
None, the type of the data is figured out by numpy machinery.fill_value (scalar, optional) – Value used to fill in the masked values when necessary. If
None, a default based on the data-type is used.
- Returns
Given jagged converted to numpy.ma.masked_array with missing entries masked.
- Return type
numpy.ma.core.MaskedArray
- Raises
ValueError – If jagged sequence has inconsistent number of dimensions.
Examples
>>> to_masked([[1, 2], [1]]) array(data=[[1, 2], [1, --]], mask=[[True, True], [True, False]]) >>> to_masked([1, [1]]) Traceback (most recent call last): ValueError: Cannot convert to masked array: jagged sequence has inconsistent number of dimensions.
- class tesliper.glassware.array_base.ArrayProperty(fget: typing.Optional[typing.Callable[[typing.Any], numpy.ndarray]] = None, fset: typing.Optional[typing.Callable[[typing.Any, typing.Sequence], None]] = None, fdel: typing.Optional[typing.Callable[[typing.Any], None]] = None, doc: typing.Optional[str] = None, dtype: type = <class 'float'>, check_against: typing.Optional[str] = None, check_depth: int = 1, fill_value: typing.Any = 0, fsan: typing.Optional[typing.Callable[[typing.Sequence], typing.Sequence]] = None)[source]
Property, that validates array-like value given to its setter and stores it as
numpy.ndarray.Value given to property setter is:
(optionally) sanitized with user-provided sanitizer function;
(optionally) compared with another array-like attribute of the owner regarding their shape;
transformed to
numpy.ndarrayof desired data type;stored in owner’s
__dict__.
Setting, getting and deletition of the value may be customized using standard
setter,getteranddeleterdecorators. Additionally,ArrayPropertyprovides anArrayProperty.sanitizerdecorator. If sanitizer function is provided, it is called as a first step of data validation and should return sanitized array-like value (given original value as a positional parameter).Validation regarding shape of the value is triggered if check_against parameter is provided. It should be a name of owner’s other array-like attribute as a string. Shape of the value is than compared to the shape of this reference attribute. If shapes are not identical up to the first check_depth dimensions,
InconsistentDataErroris raised.Value is always transformed to
numpy.ndarrayof specified dtype (floatby default.) If such conversion cannot be done because value is a jagged array,InconsistentDataErrorwill be raised. However, if owner allows for data inconsistency by definingowner.allow_data_inconsistency = True, non-matching shapes will be ignored and jugged arrays will be padded with fill_value and stored asnumpy.ma.masked_array.- Parameters
fget – Custom getter for attribute. Default one just returns the stored value.
fset – Custom setter for attribute. Default one stores validated values in instance’s
__dict__.fdel – Custom deleter for attribute. Deleting attribute is not supported by default.
doc – Attribute’s docstring.
dtype – Data type of elements of this array-like attribute.
check_against – Which other instance’s attribute should be used as a reference for array’s shape. If shape of this attribute and reference attribute’s are different, an exception is raised. Only first check_depth dimensions are compared.
check_depth – How many dimensions should be compared when checking shape of the array.
fill_value – If values are a jagged array and
instance.allow_data_inconsistency is True, this value will be passed tonumpy.ma.masked_arrayconstructor as a fill_value.fsan – Custom sanitizer for attribute. “Sanitizer” is here understood as a function that transforms value received by the setter, before the value is validated (checked for corectness) and stored on the instance. fsan should return a sanitized value.
- getter(fget: Optional[Callable[[Any], Sequence]])[source]
Descriptor to change the getter on an
ArrayProperty.
- setter(fset: Optional[Callable[[Any, Sequence], None]])[source]
Descriptor to change the setter on an
ArrayProperty.
- deleter(fdel: Optional[Callable[[Any], None]])[source]
Descriptor to change the deleter on an
ArrayProperty.
- sanitizer(fsan: Optional[Callable[[Sequence], Sequence]])[source]
Descriptor to change the sanitizer on an
ArrayProperty. Function given as parameter should take one positional argument and return sanitized values. If any sanitizer is provided, it is always called with values given toArrayPropertysetter. Sanitation is performed before .check_input() is called.
- check_shape(instance: Any, values: Sequence)[source]
Raises an error if values have different shape than attribute specified as
check_against.
- check_input(instance: Any, values: Sequence) numpy.ndarray[source]
Checks if values given to setter have same length as attribute specified with
check_against.- Parameters
instance – Instance of owner class.
values – Values to validate.
- Returns
Validated values.
- Return type
numpy.ndarray
- Raises
ValueError – If
check_againstis not None and list of given values have different length than getattr(instance,check_against). If given list of values cannot be converted todtypetype.InconsistentDataError – If values is list of lists of varying size and instance doesn’t allow data inconsistency.
- class tesliper.glassware.array_base.JaggedArrayProperty(fget: typing.Optional[typing.Callable[[typing.Any], numpy.ndarray]] = None, fset: typing.Optional[typing.Callable[[typing.Any, typing.Sequence], None]] = None, fdel: typing.Optional[typing.Callable[[typing.Any], None]] = None, doc: typing.Optional[str] = None, dtype: type = <class 'float'>, check_against: typing.Optional[str] = None, check_depth: int = 1, fill_value: typing.Any = 0, fsan: typing.Optional[typing.Callable[[typing.Sequence], typing.Sequence]] = None)[source]
ArrayPropertyfor storing intentionally jagged arrays of data.InconsistentDataErroris only raised ifArrayProperty.check_shape()fails. Given values are converted to masked array and expanded as needed, regardless value of allow_data_inconsistency attribute.- Parameters
fget – Custom getter for attribute. Default one just returns the stored value.
fset – Custom setter for attribute. Default one stores validated values in instance’s
__dict__.fdel – Custom deleter for attribute. Deleting attribute is not supported by default.
doc – Attribute’s docstring.
dtype – Data type of elements of this array-like attribute.
check_against – Which other instance’s attribute should be used as a reference for array’s shape. If shape of this attribute and reference attribute’s are different, an exception is raised. Only first check_depth dimensions are compared.
check_depth – How many dimensions should be compared when checking shape of the array.
fill_value – If values are a jagged array and
instance.allow_data_inconsistency is True, this value will be passed tonumpy.ma.masked_arrayconstructor as a fill_value.fsan – Custom sanitizer for attribute. “Sanitizer” is here understood as a function that transforms value received by the setter, before the value is validated (checked for corectness) and stored on the instance. fsan should return a sanitized value.
- check_input(instance: Any, values: Sequence) numpy.ndarray[source]
Checks if values given to setter have same length as attribute specified with
check_against.- Parameters
instance – Instance of owner class.
values – Values to validate.
- Returns
Validated values.
- Return type
numpy.ndarray
- Raises
ValueError – If
check_againstis not None and list of given values have different length than getattr(instance,check_against). If given list of values cannot be converted todtypetype.InconsistentDataError – If values is list of lists of varying size and instance doesn’t allow data inconsistency.
- check_shape(instance: Any, values: Sequence)
Raises an error if values have different shape than attribute specified as
check_against.
- deleter(fdel: Optional[Callable[[Any], None]])
Descriptor to change the deleter on an
ArrayProperty.
- getter(fget: Optional[Callable[[Any], Sequence]])
Descriptor to change the getter on an
ArrayProperty.
- sanitizer(fsan: Optional[Callable[[Sequence], Sequence]])
Descriptor to change the sanitizer on an
ArrayProperty. Function given as parameter should take one positional argument and return sanitized values. If any sanitizer is provided, it is always called with values given toArrayPropertysetter. Sanitation is performed before .check_input() is called.
- setter(fset: Optional[Callable[[Any, Sequence], None]])
Descriptor to change the setter on an
ArrayProperty.
- class tesliper.glassware.array_base.CollapsibleArrayProperty(fget: typing.Optional[typing.Callable[[typing.Any], numpy.ndarray]] = None, fset: typing.Optional[typing.Callable[[typing.Any, typing.Sequence], None]] = None, fdel: typing.Optional[typing.Callable[[typing.Any], None]] = None, doc: typing.Optional[str] = None, dtype: type = <class 'float'>, check_against: typing.Optional[str] = None, check_depth: int = 1, fill_value: typing.Any = 0, fsan: typing.Optional[typing.Callable[[typing.Sequence], typing.Sequence]] = None, strict: bool = False)[source]
ArrayPropertythat stores only one value, if all entries are identical.- Parameters
fget – Custom getter for attribute. Default one just returns the stored value.
fset – Custom setter for attribute. Default one stores validated values in instance’s
__dict__.fdel – Custom deleter for attribute. Deleting attribute is not supported by default.
doc – Attribute’s docstring.
dtype – Data type of elements of this array-like attribute.
check_against – Which other instance’s attribute should be used as a reference for array’s shape. If shape of this attribute and reference attribute’s are different, an exception is raised. Only first check_depth dimensions are compared.
check_depth – How many dimensions should be compared when checking shape of the array.
fill_value – If values are a jagged array and
instance.allow_data_inconsistency is True, this value will be passed tonumpy.ma.masked_arrayconstructor as a fill_value.fsan – Custom sanitizer for attribute. “Sanitizer” is here understood as a function that transforms value received by the setter, before the value is validated (checked for corectness) and stored on the instance. fsan should return a sanitized value.
strict – Boolean flag indicating if
check_input()should disallow values that are not all identical. If strict isTrueit will raiseInconsistentDataErrorwhen setter is given such values. Defaults toFalse.
- check_shape(instance: Any, values: Sequence)[source]
Raises an error if values have different shape than attribute specified as
check_against. Accepts values with size of first dimension equal to 1, even if it is not identical to the size of the first dimension of said attribute.
- check_input(instance: Any, values: Union[Sequence, Any]) numpy.ndarray[source]
If given values is not iterable or is of type
strit is returned without change. Otherwise it is validated usingArrayProperty.check_input(), and collapsed to single value if all values are identical. If values are non-uniform and instance doesn’t allow data inconsistency,InconsistentDataErroris raised.- Parameters
instance – Instance of owner class.
values – Values to validate.
- Returns
Validated array or single value.
- Return type
numpy.ndarray or any
- Raises
ValueError – If
ArrayProperty.check_againstis not None and list of given values have different length than getattr(instance,ArrayProperty.check_against). If given list of values cannot be converted toArrayProperty.dtypetype.InconsistentDataError – If values is list of lists of varying size and instance doesn’t allow data inconsistency. If property is declared as strict, given values are non-uniform and instance doesn’t allow data inconsistency.
- deleter(fdel: Optional[Callable[[Any], None]])
Descriptor to change the deleter on an
ArrayProperty.
- getter(fget: Optional[Callable[[Any], Sequence]])
Descriptor to change the getter on an
ArrayProperty.
- sanitizer(fsan: Optional[Callable[[Sequence], Sequence]])
Descriptor to change the sanitizer on an
ArrayProperty. Function given as parameter should take one positional argument and return sanitized values. If any sanitizer is provided, it is always called with values given toArrayPropertysetter. Sanitation is performed before .check_input() is called.
- setter(fset: Optional[Callable[[Any, Sequence], None]])
Descriptor to change the setter on an
ArrayProperty.
- class tesliper.glassware.array_base.ArrayBase(genre: str, filenames: Sequence[str], values: Sequence, allow_data_inconsistency: bool = False)[source]
Base class for data holding objects.
It provides an automatic registration of its subclasses as a
DataArray-like representations of allassociated_genresdeclared by said subclass. A subclass should provide anassociated_genresclass attribute, even if it’s not supposed to be directly instantiated with data for any genre, it should be an empty tuple in such case. Otherwise,associated_genresshould be a tuple of genre names as strings.This base class provides the most basic set of attributes, a
DataArray-like object should implement, listed in the Parameters section.- Parameters
genre – Name of the data genre that values represent.
filenames – Sequence of conformers’ identifiers.
values – Sequence of values for genre for each conformer in filenames.
allow_data_inconsistency – Flag signalizing if instance should allow data inconsistency (see
ArrayPropertyfor details).
- abstract property associated_genres
Genres associated with subclassing class.
Should be provided by subclass as class-level attribute. It will be used to determine what class to use to represent data of particular genre when requested via
Conforemrs.arrayed()method. May be an empty sequence, if subclass is not intended to be used directly bytesliper’s machinery.
- class tesliper.glassware.array_base.DependentParameter(name: str, kind: inspect._ParameterKind, genre_getter: Callable[[str], str], *, default: Any, annotation: Any)[source]
A parameter that depends on the genre of data array. It provies a
_genre_gettercallable attribute that is used to provide a name of data genre that should be used for this parameter.It is hashable as the original
inspect.Parameter, however it must be remembered that Python hashes functions based on their identity.- property genre_getter
Should be a function that given a genre of data array being instantiated, returns a genre that should be used for this parameter.
- classmethod from_parameter(parameter: inspect.Parameter, genre_getter: Callable[[str], str])[source]
Casts given
inspect.Parameterinstance to this class.
- empty
alias of
inspect._empty
- replace(*, name=<class 'inspect._void'>, kind=<class 'inspect._void'>, genre_getter=<class 'inspect._void'>, annotation=<class 'inspect._void'>, default=<class 'inspect._void'>)[source]
Creates a customized copy of the
DependentParameter.