Advanced guide

tesliper handles data extracted from the source files in a form of specialized objects, called data arrays. These objects are instances of one of the DataArray subclasses (hence sometimes referenced as DataArray-like objects), described here in a greater detail. DataArray base class defines a basic interface and implements data validation, while its subclasses provided by tesliper define how certain data genres should be treated and processed.

Note

Under the hood, data arrays, and tesliper in general, use numpy to provide fast numeric operations on data.

This part of documentation also shows how to take more control over the data export. Tesliper autotomizes this process quite a bit and exposes only a limited set of possibilities provided by the underlying writer classes. It will be shown here how to use these writer classes directly in your code.

Data array classes

Each DataArray-like object has the following four attributes:

genre

name of the data genre that values represent;

filenames

sequence of conformers’ identifiers as a numpy.ndarray(dtype=str);

values

sequence of values of genre data genre for each conformer in filenames. It is also a numpy.ndarray, but its dtype depends on the particular data array class;

allow_data_inconsistency

a flag that controls the process of data validation. More about data inconsistency will be said later.

Some data arrays may provide more data. For example, any spectral data values wouldn’t be complete without the information about the band that they corresponds to, so data arrays that handle this kind of data also provide a frequencies or wavelengths attribute.

Note

Attributes that hold a band information are actually freq and wavelen respectively, frequencies and wavelengths are convenience aliases.

Creating data arrays

The easiest way to instantiate the data array of desired data genre is to use Conformers.arrayed() factory method. It transforms it’s stored data into the DataArray-like object associated with a particular data genre, ignoring any conformer that is not kept or doesn’t provide data for the requested genre. You may force it to ignore any trimming applied by adding full=True to call parameters (conformers without data for requested genre still will be ignored). Moreover, any other keyword parameters provided will be forwarded to the class constructor, allowing you to override any default values.

>>> from tesliper import Conformers
>>> c = Conformers(
...     one={"gib":-123.5},
...     two={},
...     three={"gib": -123.6},
...     four={"gib":-123.7}
... )
>>> c.kept = ["one", "three"]
>>> c.arrayed("gib")
Energies(genre='gib', filenames=['one' 'three'], values=[-123.5 -123.6], t=298.15)
>>> c.arrayed("gib", full=True)
Energies(genre='gib', filenames=['one' 'three' 'four'], values=[-123.5 -123.6 -123.7], t=298.15)
>>> c.arrayed("gib", t=1111)
Energies(genre='gib', filenames=['one' 'three'], values=[-123.5 -123.6], t=1111)

You can also instantiate any data array directly, providing data by yourself.

>>> from tesliper import Energies
>>> Energies(
...     genre='gib',
...     filenames=['one' 'three'],
...     values=[-123.5 -123.6]
... )
Energies(genre='gib', filenames=['one' 'three'], values=[-123.5 -123.6], t=298.15)

Data validation

On instantiation of a data array class, values provided to its constructor are transformed to the numpy.ndarray of the appropriate type. If this cannot be done due to the incompatibility of type of values elements and data array’s dtype, an exception is raised. However, tesliper will try to convert given values to the target type, if possible.

>>> from tesliper import IntegerArray
>>> arr = IntegerArray(genre="example", filenames=["one"], values=["1"])
>>> arr
IntegerArray(genre="example", filenames=["one"], values=[1])
>>> type(arr.values)
<class 'numpy.ndarray'>

>>> IntegerArray(genre="example", filenames=["one"], values=["1.0"])
Traceback (most recent call last):
...
ValueError: invalid literal for int() with base 10: '1.0'

>>> IntegerArray(genre="example", filenames=["one"], values=[None])
Traceback (most recent call last):
...
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType

Also values size is checked: its first dimension must be of the same size, as the number of entries in the filenames, otherwise ValueError is raised.

>>> IntegerArray(genre="example", filenames=["one"], values=[1, 2])
Traceback (most recent call last):
...
ValueError: values and filenames must have the same shape up to 1 dimensions. Arrays of shape (2,) and (1,) were given.

InconsistentDataError exception is raised when values are multidimensional, but provide uneven number of entries for each conformer (values are a jagged array).

>>> IntegerArray(genre="example", filenames=["one", "two"], values=[[1, 2], [3]])
Traceback (most recent call last):
...
InconsistentDataError: IntegerArray of example genre with unequal number of values for conformer requested.

This behavior may be suppressed, if the instance is initiated with allow_data_inconsistency=True keyword parameter. In such case no exception is raised if numbers of entries doesn’t match, and jagged arrays will be turned into numpy.ma.masked_array instead of numpy.ndarray, if it is possible.

>>> IntegerArray(
...     genre="example",
...     filenames=["one"],
...     values=[1, 2],
...     allow_data_inconsistency=True
... )
IntegerArray(genre="genre", filenames=["one"], values=[1,2], allow_data_incosistency=True)

>>> IntegerArray(
...     genre="example",
...     filenames=["one", "two"],
...     values=[[1, 2], [3]],
...     allow_data_inconsistency=True
... )
IntegerArray(genre='genre', filenames=['one' 'two'], values=[[1 2]
 [3 --]], allow_data_inconsistency=True)

Some data array classes validate also other data provided to its constructor, e.g. Geometry checks if atoms provides an atom specification for each atom in the conformer.

Note

Each validated field is actually a ArrayProperty or its subclass under the hood, which provides the validation mechanism.

Available data arrays

Data arrays provided by tesliper are listed below in categories, along with a short description and with a list of data genres that are associated with a particular data array class. More information about a DataArray-like class of interest may be learn in the API reference.

Generic types

Simple data arrays, that hold a data of particular type. They do not provide any functionality beside initial data validation. They are used by tesliper for segregation of simple data an as a base classes for other data arrays (concerns mostly FloatArray).

IntegerArray

For handling data of int type.

Genres associated with this class:

charge

multiplicity

FloatArray

For handling data of float type.

Genres associated with this class:

zpecorr

tencorr

entcorr

gibcorr

BooleanArray

For handling data of bool type.

Genres associated with this class:

normal_termination

optimization_completed

InfoArray

For handling data of str type.

Genres associated with this class:

command

stoichiometry

Spectral data

Each data array in this category provides a freq or wavelen attribute, also accessible by their convenience aliases frequencies and wavelengths. These attributes store an information about frequency or wavelength that the particular spectral value is associated with (x-axis value of the center of the band).

Activities genres, that are the genres that may be used to simulate the spectrum, also provide a calculate_spectra() method for this purpose (see VibrationalActivities.calculate_spectra(), ScatteringActivities.calculate_spectra(), and ElectronicActivities.calculate_spectra()), as well as a intensities property that calculates a theoretical intensity for each activity value. A convince spectra_name property may be used to get the name of spectra pseudo-genre calculated with particular activities genre.

>>> act = c["dip"]
>>> act.spectra_name
"ir"
>>> from tesliper import lorentzan
>>> spc = act.calculate_spectra(
...     start=200,  # cm^(-1)
...     stop=1800,  # cm^(-1)
...     step=1,     # cm^(-1)
...     width=5,    # cm^(-1)
...     fitting=lorentzan
)
>>> type(spc), spc.genre
(<class 'tesliper.glassware.spectra.Spectra'>, 'ir')
VibrationalData

For handling vibrational (IR and VCD related) data that is not a spectral activity.

Genres associated with this class:

mass

frc

emang

ScatteringData

For handling scattering (Raman and ROA related) data that is not a spectral activity.

Genres associated with this class:

depolarp

depolaru

depp

depu

alpha2

beta2

alphag

gamma2

delta2

cid1

cid2

cid3

rc180

ElectronicData

For handling electronic (UV and ECD related) data that is not a spectral activity.

Genres associated with this class:

eemang

VibrationalActivities

For handling vibrational (IR and VCD related) spectral activity data.

Genres associated with this class:

iri

dip

rot

ScatteringActivities

For handling scattering (Raman and ROA related) spectral activity data.

Genres associated with this class:

ramanactiv

ramact

raman1

roa1

raman2

roa2

raman3

roa3

ElectronicActivities

For handling electronic (UV and ECD related) spectral activity data.

Genres associated with this class:

vdip

ldip

vrot

lrot

vosc

losc

Other data arrays

FilenamesArray

Special case of DataArray, holds only filenames. values property returns same as filenames and ignores any value given to its setter. The only genre associated with this class is filenames pseudo-genre.

Bands

Special kind of data array for band values, to which spectral data or activities correspond. Provides an easy way to convert values between their different representations: frequency, wavelength, and excitation energy. Also allows to easily locate conformers with imaginary frequencies.

>>> arr = Bands(
...     genre="freq",
...     filenames=["one", "two", "three"],
...     values=[[-15, -10, 105], [30, 123, 202], [-100, 12, 165]]
... )
>>> arr.imaginary
array([2, 0, 1])
>>> arr.find_imaginary()
{'one': 2, 'three': 1}
Genres associated with this class:

freq

wavelen

ex_en

Energies

For handling data about the energy of conformers. Provides an easy way of calculating Boltzmann distribution-based population of conformers via a populations property.

>>> arr = Energies(
...     genre="gib",
...     filenames=["one", "two", "three"],
...     values=[-123.505977, -123.505424, -123.506271]
... )
>>> arr.deltas  # difference from lowest energy in kcal/mol
array([0.18448779, 0.53150055, 0.        ])
>>> arr.populations
array([0.34222796, 0.19052561, 0.46724643])
Genres associated with this class:

scf

zpe

ten

ent

gib

Transitions

For handling information about electronic transitions from ground to excited state contributing to each band.

Data is stored in three attributes: ground, excited, and values, which are respectively: list of ground state electronic subshells, list of excited state electronic subshells, and list of coefficients of transitions from corresponding ground to excited subshell. Each of these arrays is of shape (conformers, bands, max_transitions), where ‘max_transitions’ is a highest number of transitions contributing to single band across all bands of all conformers.

Allows to easily calculate contribution of each transition using contribution and to find which transition contributes the most to the particular transition with highest_contribution.

Genres associated with this class:

transitions

Geometry

For handling information about geometry of conformers.

Genres associated with this class:

last_read_geom

input_geom

optimized_geom

Writing to disk

Tesliper object provides an easy, but not necessarily a flexible way of writing calculated and extracted data to disk. If your process requires more flexibility in this matter, you may use tesliers writer objects directly. This will allow you to adjust how generated files are named and will give you more control over what is exported.

Writer classes

A writer object may be created using a writer() factory function. It expects a string parameter, that specifies a desired format for data export. tesliper provides writers for "txt", "csv", "xlsx", and "gjf" file formats. The second mandatory parameter is a destination: the (existing) directory to which files should be written. Just like writing methods of Tesliper object, the function also takes a mode parameter that defines what should happen if any file already exists. Any additional keyword parameters are forwarded to the writer object constructor.

>>> from tesliper import writer
>>> wrt = writer("txt", "/path/to/dir")
>>> type(wrt)
<class 'tesliper.writing.txt_writer.TxtWriter'>

>>> wrt = writer("txt", "/doesnt/exists")
Traceback (most recent call last):
...
FileNotFoundError: Given destination doesn't exist or is not a directory.

Note

writer() factory function is used by tesliper mostly to provide a dynamic access to the writer class most recently registered (on class definition) to handle a particular format. This is useful when you modify an existing writer class or provide a new one.

You can also create any of the writer objects directly, by importing and instantiating its class. The four available writer classes are listed below with a short comment. For more information on which methods they implement and how to use them, refer to the relevant API documentation.

from tesiper import TxtWriter
wrt = TxtWriter(destination="/path/to/dir")
TxtWriter

Generates human-readable text files.

CsvWriter

Generates files in CSV format with optional headers. Allows for the same level of output format customization as Python’s csv.writer (supports specification of dialect and other formatting parameters).

XlsxWriter

Instead of generating multiple files, creates a single .xlsx file and a variable number of spreadsheets inside it.

GjfWriter

Allows to create input files for new calculation job in Gaussian software.

write() and other methods

Writer objects expect data they receive to be a DataArray-like instances. Each writer object provides a write() method for writing arbitrary data arrays to disk. This method dispatches received data arrays to appropriate writing methods, based on their type. You are free to use either write() for easily writing a number of data genres in batch, or other methods for more control. The table below lists these methods, along with a brief description and DataArray-like object, for which the method will be called by writer’s write() method.

Methods used to write certain data

Writer’s Method

Description

Supported arrays

Created files

generic()

Generic data: any genre that provides one value for each conformer.

DataArray, IntegerArray, FloatArray, BooleanArray, InfoArray.

one

overview()

General information about conformers: energies, imaginary frequencies, stoichiometry.

Energies

one

energies()

Detailed information about conformers’ relative energy, including calculated populations

Energies

for each genre

single_spectrum()

A spectrum - calculated for single conformer or averaged.

SingleSpectrum

one

spectral_data()

Data related to spectral activity, but not convertible to spectra.

VibrationalData, ScatteringData, ElectronicData

for each conformer

spectral_activities()

Data that may be used to simulate conformers’ spectra.

VibrationalActivities, ScatteringActivities, ElectronicActivities

for each conformer

spectra()

Spectra for multiple conformers.

Spectra

for each conformer

transitions()

Electronic transitions from ground to excited state, contributing to each band.

Transitions

for each conformer

geometry()

Geometry (positions of atoms in space) of conformers.

Geometry

for each conformer

spectral_data() and spectral_activities() methods need some clarification. They will create one file for each conformer in given data arrays, with data from each provided data array joined in the conformer’s file. It’s important to remember, that only values from each data array are displayed, contrary to band values, which are displayed only once, as provided with the band parameter. Consequently, mixing vibrational and scattering data with a custom name_template is fine, but mixing either of those with electronic data in a single call is not possible.

Warning

You need to make sure that data contained in DataArray-like objects cover the same set of conformers, when passing multiple data array objects to the write() method or any other writing method. Passing two data arrays with data for different sets of conformers may produce files with corrupted data or fail silently. Conformers.trim_incomplete() trimming method may be helpful in preventing such fails.

Not all writer objects implement each of these writing methods, e.g. GjfWriter, that allows to create Gaussian input files, only implements geometry() method (because export of, e.g. a calculated spectrum as a Gaussian input would be pointless). Trying to write() a data array that should be written by a method that is not implemented, or calling such method directly, will raise a NotImplementedError.

Naming files

Usually, calling any of writing methods will produce multiple files in the destination directory: one for each given genre, each conformer, etc. tesliper provides a reasonable naming scheme for these files, but you can modify it, by providing your own name_templates in place of the default ones. To do this you will need to call desired writing methods directly, instead of using write().

Each writing method uses a value of name_template parameter given to the method call to create a filename for each file it generates. name_template should be a string that contains (zero, one, or more) label identifiers in form of ${identifier}. These identifiers will be substituted to produce a final filename. Available identifiers and their meaning are as follows:

${ext} - appropriate file extension;
${conf} - name of the conformer;
${num} - number of the file according to internal counter;
${genre} - genre of exported data;
${cat} - category of produced output;
${det} - category-specific detail.

The ${ext} identifier is filled with the value of Writers .extension attribute, which value is also used to identify a writer class: "txt", "csv", etc. Other values are provided by the particular writing method.

from tesiper import Tesliper, writer
tslr = Tesliper(input_dir="/project/input")
...  # data extracted and trimmed
# tslr.conformers.kept_keys() == {"conf_one", "conf_four"}
freq, dip, rot = tslr["freq"], tslr["dip"], tslr["rot"]
wrt = writer("txt", "/project/default")
wrt.spectral_activities(band=freq, data=[dip, rot])
wrt = writer("txt", "/project/custom")
wrt.spectral_activities(
    band=freq, data=[dip, rot],
    name_template="name_${num}_${genre}.xy"
)
contents of /project
.
├───input
│   └─── ...
├───default
│   ├───conf_one.activities-vibrational.txt
│   └───conf_four.activities-vibrational.txt
└───custom
    ├───name_1_freq.xy
    └───name_2_freq.xy