Scripting with `tesliper`

This part discusses basics of using Python API. For tutorial on using a Graphical Interface, see gui.

tesliper provides a Tesliper class as a main entry point to its functionality. This class allows you to easily perform any typical task: read and write files, filter data, calculate and average spectra. It is recommended to read Conventions and Terms to get a general idea of what to expect. The next paragraphs will introduce you to basic use cases with examples and explanations. The examples do not use real data, but simplified mockups, to not obscure the logic presented.

from tesliper import Tesliper

# extract data from Gaussian output files
tslr = Tesliper(input_dir="./opt_and_freq")
tslr.extract()

# conditional filtering of conformers
tslr.conformers.trim_non_normal_termination()
tslr.conformers.trim_not_optimized()
tslr.conformers.trim_imaginary_frequencies()
tslr.conformers.trim_to_range("gib", maximum=10, attribute="deltas")
tslr.conformers.trim_rmsd(threshold=1.0, window_size=0.5, energy_genre="gib")

# calculate and average spectra, export data
tslr.calculate_spectra()
tslr.average_spectra()
tslr.export_energies(fmt="txt")
tslr.export_averaged(fmt="csv")

Reading files

After importing Tesliper class, we instantiate it with an input_dir parameter, which is a path to the directory containing output files from quantum chemical calculations software. You may also provide an output_dir parameter, defining where tesliper should write the files it generates. Both of those parameters are optional and default to the current working directory, if omitted. You may also provide a wanted_files parameter, which should be a list of filenames that Tesliper should parse, ignoring any other files present in input_dir. Omitting wanted_files means that no file should be ignored.

Note

Tesliper accepts also quantum_software parameter, which is a hint for tesliper on how it should parse output files it reads. However, only Gaussian software is supported out-of-the-box, and quantum_software="gaussian" is a default value. If you wish to use tesliper to work with another qc package, you will need to define a custom parser that subclasses the ParserBase class. Refer to its documentation for more information.

You can extract data from the files in output_dir using Tesliper.extract() method. Tesliper.extract() respects input_dir and wanted_files given to Tesliper, but path and wanted_files parameters provided to the method call will take precedence. If you would like to read files in the whole directory tree, you may perform a recursive extraction, using extract(recursive=True). So assuming a following directory structure:

project
├── optimization
│   ├── conf_one.out
│   └── conf_two.out
│   └── conf_three.out
└── vibrational
    ├── conf_one.out
    └── conf_two.out
    └── conf_three.out

you could use any of the following to get the same effect.

# option 1: change *input_dir*
tslr = Tesliper(input_dir="./project/optimization")
tslr.extract()
tslr.input_dir = "./project/vibrational"
tslr.extract()

# option 2: override *input_dir* only for one call
tslr = Tesliper(input_dir="./project/optimization")
tslr.extract()
tslr.extract(path="./project/vibrational")

# option 3: read the whole tree
tslr = Tesliper(input_dir="./project")
tslr.extract(recursive=True)

tesliper will try to guess the extension of files it should parse: e.g. Gaussian output files may have “.out” or “.log” extension. If those are mixed in the source directory, an exception will be raised. You can prevent this by providing the extension parameter, only files with given extension will be parsed.

project
├── conf_one.out
└── conf_two.log

tslr = Tesliper(input_dir="./project")
tslr.extract()  # raises ValueError
tslr.extract(extension="out")  # ok

Filtering conformers

Tesliper.extract() will read and parse files it thinks are output files of the quantum chemical software and update a Tesliper.conformers internal data storage. It is a dict-like Conformers instance, that stores data for each conformer in a form of an ordinary dict. This inner dict uses genre names as keys and data as values (the form of which depends on the genre itself). Conformers provide a number of methods for filtering conformers it knows, allowing to easily hide data that should excluded from further analysis. tesliper calls this process a trimming. The middle part of the first code snippet are example of trimming conformers:

tslr.conformers.trim_non_normal_termination()
tslr.conformers.trim_not_optimized()
tslr.conformers.trim_imaginary_frequencies()
tslr.conformers.trim_to_range("gib", maximum=10, attribute="deltas")
tslr.conformers.trim_rmsd(threshold=1.0, window_size=0.5, energy_genre="gib")

As you may suspect, trim_non_normal_termination() hides data from calculations that did not terminate normally, trim_not_optimized() hides data from conformers that are not optimized, and trim_imaginary_frequencies() hides data from conformers that have at least one imaginary frequency. More trimming methods is described below.

Conformers hidden are not kept. Information about which conformers are kept and not kept is stored in Conformers.kept attribute, which may also be manipulated more directly. More on this topic will be explained later.

As mentioned earlier, Tesliper.conformers is a dict-like structure, and as such offers a typical functionality of Python’s dicts. However, checking for presence with conf in tslr.conformers or requesting a view with standard keys(), values(), or items() will operate on the whole data set, ignoring any trimming applied earlier. Conformers class offers additional kept_keys(), kept_values(), and kept_items() methods, that return views that acknowledge trimming.

Trimming methods

There is a number of those methods available for you, beside those mentioned above. Below you will find them listed with a short summary and a link to a more comprehensive explanation in the method’s documentation.

trim_incomplete(): Filters out conformers that doesn’t contain data for as many expected genres as other conformers.
trim_imaginary_frequencies(): Filters out conformers that contain imaginary frequencies (any number of negative frequency values).
trim_non_matching_stoichiometry(): Filters out conformers that have different stoichiometry than expected.
trim_not_optimized(): Filters out conformers that failed structure optimization.
trim_non_normal_termination(): Filters out conformers, which calculation job did not terminate normally (was erroneous or interrupted).
trim_inconsistent_sizes(): Filters out conformers that have iterable data genres in different size than most conformers. Helpful when InconsistentDataError occurs.
trim_to_range(): Filters out conformers that have a value of some specific data or property outside of the given range, e.g. their calculated population is less than 0.01.
trim_rmsd(): Filters out conformers that are identical to another conformer, judging by a given threshold of the root-mean-square deviation of atomic positions (RMSD).
select_all(): Marks all conformers as kept.
reject_all(): Marks all conformers as not kept.

Manipulating `Conformers.kept`

Information, which conformer is kept and which is not, is stored in the Conformers.kept attribute. It is a list of booleans, one for each conformer stored, defining which conformers should be processed by tesliper.

# assuming "conf_two" has imaginary frequencies
tslr.conformers.trim_imaginary_frequencies()
tslr.conformers.kept == [True, False, True]  # True
tslr.export_data(["genres", "to", "export"])
# only files for "conf_one" and "conf_three" are generated

Conformers.kept may be modified using trimming methods described earlier, but also more directly: by setting it to a new value. Firstly, it is the most straightforward to just assign a new list of boolean values to it. This list should have the same number of elements as the number of conformers contained. A ValueError is raised if it doesn’t.

>>> tslr.conformers.kept
[True, True, True]
>>> tslr.conformers.kept = [False, True, False]
>>> tslr.conformers.kept
[False, True, False]
>>> tslr.conformers.kept = [False, True, False, True]
Traceback (most recent call last):
...
ValueError: Must provide boolean value for each known conformer.
4 values provided, 3 excepted.

Secondly, list of filenames of conformers intended to be kept may be given. Only these conformers will be kept. If given filename is not in the underlying tslr.conformers’ dictionary, KeyError is raised.

>>> tslr.conformers.kept = ['conf_one']
>>> tslr.conformers.kept
[True, False, False]
>>>  tslr.conformers.kept = ['conf_two', 'other']
Traceback (most recent call last):
...
KeyError: Unknown conformers: other.

Thirdly, list of integers representing conformers’ indices may be given. Only conformers with specified indices will be kept. If one of given integers can’t be translated to conformer’s index, IndexError is raised. Indexing with negative values is not supported currently.

>>> tslr.conformers.kept = [1, 2]
>>> tslr.conformers.kept
[False, True, True]
>>> tslr.conformers.kept = [2, 3]
Traceback (most recent call last):
...
IndexError: Indexes out of bounds: 3.

Fourthly, assigning True or False to this attribute will mark all conformers as kept or not kept respectively.

>>> tslr.conformers.kept = False
>>> tslr.conformers.kept
[False, False, False]
>>> tslr.conformers.kept = True
>>> tslr.conformers.kept
[True, True, True]

Warning

List of kept values may be also modified by setting its elements to True or False. It is advised against, however, as a mistake such as tslr.conformers.kept[:2] = [True, False, False] will break some functionality by forcibly changing size of tslr.conformers.kept list.

Trimming temporarily

Conformers provide two convenience context managers for temporarily trimming its data: untrimmed and trimmed_to(). The first one will simply undo any trimming previously done, allowing you to operate on the full data set or apply new, complex trimming logic. When Python exits untrimmed context, previous trimming is restored.

>>> tslr.conformers.kept = [False, True, False]
>>> with tslr.conformers.untrimmed:
>>>     tslr.conformers.kept
[True, True, True]
>>> tslr.conformers.kept
[False, True, False]

The second one temporarily applies an arbitrary trimming, provided as a parameter to the trimmed_to() call. Any value normally accepted by Conformers.kept may be used here.

>>> tslr.conformers.kept = [True, True, False]
>>> with tslr.conformers.trimmed_to([1, 2]):
>>>     tslr.conformers.kept
[False, True, True]
>>> tslr.conformers.kept
[True, True, False]

Tip

To trim conformers temporarily without discarding a currently applied trimming, you may use:

with tslr.conformers.trimmed_to(tslr.conformers.kept):
    ...  # temporary trimming upon the current one

Simulating spectra

To calculate a simulated spectra you will need to have spectral activities extracted. These will most probably come from a freq or td Gaussian calculation job, depending on a genre of spectra you would like to simulate. tesliper can simulate IR, VCD, UV, ECD, Raman, and ROA spectra, given the calculated values of conformers’ optical activity. When you call Tesliper.calculate_spectra() without any parameters, it will calculate spectra of all available genres, using default activities genres and default parameters, and store them in the Tesliper.spectra dictionary. Aside form this, the spectra calculated are returned by the method.

You can calculate a specific spectra genres only, by providing a list of their names as a parameter to the Tesliper.calculate_spectra() call. Also in this case a default activities genres and default parameters will be used to calculate desired spectra, see Activities genres and Calculation parameters below to learn how this can be customized.

ir_and_uv = tslr.calculate_spectra(["ir", "uv"])
assert ir_and_uv["ir"] is tslr.spectra["ir"]

Calculation parameters

tesliper uses Lorentzian or Gaussian fitting function to simulate spectra from corresponding optical activities values. Both of these require to specify a desired width of peak, as well as the beginning, end, and step of the abscissa (x-axis values). If not told otherwise, tesliper will use a default values for these parameters and a default fitting function for a given spectra genre. These default values are available via Tesliper.standard_parameters and are as follows.

Default calculation parameters
Parameter	IR, VCD, Raman, ROA	UV, ECD
width	6 [$\mathrm{cm}^{-1}$]	0.35 [$\mathrm{eV}$]
start	800 [$\mathrm{cm}^{-1}$]	150 [$\mathrm{nm}$]
stop	2900 [$\mathrm{cm}^{-1}$]	800 [$\mathrm{nm}$]
step	2 [$\mathrm{cm}^{-1}$]	1 [$\mathrm{nm}$]
fitting	`lorentzian()`	`gaussian()`

You can change the parameters used for spectra simulation by altering values in the Tesliper.parameters dictionary. It stores a dict of parameters’ values for each of spectra genres (“ir”, “vcd”, “uv”, “ecd”, “raman”, and “roa”). start, stop, and step expect its values to by in $\mathrm{cm}^{-1}$ units for vibrational and scattering spectra, and $\mathrm{nm}$ units for electronic spectra. width expects its value to be in $\mathrm{cm}^{-1}$ units for vibrational and scattering spectra, and $\mathrm{eV}$ units for electronic spectra. fitting should be a callable that may be used to simulate peaks as curves, preferably one of: gaussian() or lorentzian().

# change parameters' values one by one
tslr.parameters["uv"]["step"] = 0.5
tslr.parameters["uv"]["width"] = 0.5

tslr.parameters["vcd"].update(  # or with an update
    {"start": 500, "stop": 2500, "width": 2}
)

# "fitting" should be a callable
from tesliper import lorentzian
tslr.parameters["uv"]["fitting"] = lorentzian

Descriptions of parameters
Parameter	`type`	Description
width	`float` or `int`	the beginning of the spectral range
start	`float` or `int`	the end of the spectral range
stop	`float` or `int`	step of the abscissa
step	`float` or `int`	width of the peak
fitting	`Callable`	function used to simulate peaks as curves

Warning

When modifying Tesliper.parameters be careful to not delete any of the key-value pairs. If you need to revert to standard parameters’ values, you can just reassign them to Tesliper.standard_parameters.

tslr.parameters["ir"] = {
...     "start": 500, "stop": 2500, "width": 2
... }  # this will cause problems!
# revert to default values
tslr.parameters["ir"] = tslr.standard_parameters["ir"]

Activities genres

Instead of specifying a spectra genre you’d like to get, you may specify an activities genre you prefer to use to calculate a corresponding spectrum. The table below summarizes which spectra genres may be calculated from which activities genres.

Spectra and corresponding activities genres
Spectra	Default activity	Other activities
IR	dip	iri
VCD	rot
UV	vosc	losc, vdip, ldip
ECD	vrot	lrot
Raman	raman1	ramact, ramanactiv, raman2, raman3
ROA	roa1	roa2, roa3

Warning

If you provide two different genres that map to the same spectra genre, only one of them will be accessible, the other will be thrown away. If you’d like to compare results of simulations using different genres, you need to store the return value of Tesliper.calculate_spectra() call.

>>> out = tslr.calculate_spectra(["vrot", "lrot"])
>>> list(out.keys())  # only one representation returned
["ecd"]
>>> velo = tslr.calculate_spectra(["vrot"])
>>> length = tslr.calculate_spectra(["lrot"])
>>> assert not velo["ecd"] == length["ecd"]  # different

Averaging spectra

Each possible conformer contributes to the compound’s spectrum proportionally to it’s population in the mixture. tesliper can calculate conformers’ population from their relative energies, using a technique called Boltzmann distribution. Assuming that any energies genre is available (usually at least scf energies are), after calculating spectra you want to simulate, you should call Tesliper.average_spectra() to get the final simulated spectra.

>>> averaged = tslr.average_spectra()  # averages available spectra
>>> assert tslr.averaged is averaged  # just a reference

Tesliper.average_spectra() averages each spectra stored in the Tesliper.spectra dictionary, using each available energies genre. Generated average spectra are stored in Tesliper.averaged dictionary, using a tuples of ("spectra_genre", "energies_genre") as keys. average_spectra() returns a reference to this attribute.

Note

There is also a Tesliper.get_averaged_spectrum() method for calculating a single averaged spectrum using a given spectra genre and energies genre. Value returned by this method is not automatically stored.

Temperature of the system

Boltzmann distribution depends on the temperature of the system, which is assumed to be the room temperature, expressed as $298.15\ \mathrm{Kelvin}$ ($25.0^{\circ}\mathrm{C}$). You can change it by setting Tesliper.temperature() attribute to the desired value. This must be done before calculation of the average spectrum to have an effect.

>>> tslr.temperature
298.15  # default value in Kelvin
>>> averaged = tslr.average_spectra()  # averages available spectra
>>> tslr.temperature = 300.0  # in Kelvin
>>> high_avg = tslr.average_spectra()
>>> assert not averaged == high_avg  # resulting average is different

Note

Tesliper.temperature() value must be a positive number, absolute zero or lower is not allowed, as it would cause problems in calculation of Boltzmann distribution. An attempt to set temperature to equal or below $0\ \mathrm{K}$ will raise a ValueError.

Comparing with experiment

The experimental spectrum may be loaded with tesliper from a text or CSV file. The software helps you adjust the shift and scale of your simulated spectra to match the experiment. Unfortunately, tesliper does not offer broad possibilities when it comes to mathematical comparison of the simulated spectra and the experimental one. You will need to use an external library or write your own logic to do that.

Loading experimental spectra

To load an experimental spectrum use Tesliper.load_experimental() method. You will need to provide a path to the file (absolute or relative to the current Tesliper.input_dir) and a genre name of the loaded experimental spectrum. When the file is read, its content is stored in Tesliper.experimental dictionary.

>>> spectrum = tslr.load_experimental("path/to/spectrum.xy", "ir")
>>> tslr.experimental["ir"] is spectrum
True

Adjusting calculated spectra

Spectra calculated and loaded from disk with tesliper are stored as instances of Spectra or SingleSpectrum classes. Both of them provide a scale_to() and shift_to() methods that adjust a scale and offset (respectively) to match another spectrum, provided as a parameter. Parameters found automatically may not be perfect, so you may provide them yourself, by manually setting scaling and offset to desired values.

>>> spectra = tslr.spectrum["ir"]
>>> spectra.scaling  # affects spectra.y
1.0
>>> spectra.scale_to(tslr.experimental["ir"])
>>> spectra.scaling
1.32
>>> spectra.offset = 50  # bathochromic shift, affects spectra.x

Corrected values may be accessed via spectra.x and spectra.y, original values may be accessed via spectra.abscissa and spectra.values.

Writing to disk

Once you have data you care about, either extracted or calculated, you most probably would like to store it, process it with another software, or visualize it. tesliper provides a way to save this data in one of the supported formats: CSV, human-readable text files, or .xlsx spreadsheet files. Moreover, tesliper may produce Gaussian input files, allowing you to easily setup a next step of calculations.

Exporting data

tesliper provides a convenient shorthands for exporting certain kinds of data:

Tesliper.export_energies() will export information about conformers’ energies and their calculated populations;
Tesliper.export_spectral_data() will export data that is related to spectral activity, but cannot be used to simulate spectra;
Tesliper.export_activities() will export unprocessed spectral activities, normally used to simulate spectra;
Tesliper.export_spectra() will export spectra calculated so far that are stored in Tesliper.spectra dictionary;
Tesliper.export_averaged() will export each averaged spectrum calculated so far that is stored in Tesliper.averaged dictionary.

Each of these methods take two parameters: fmt and mode. fmt is a file format, to which data should be exported. It should be one of the following values: "txt" (the default), "csv", "xlsx". mode denotes how files should be opened and should be one of: "a" (append to existing file), "x" (the default, only write if file doesn’t exist yet), "w" (overwrite file if it already exists).

These export methods will usually produce a number of files in the Tesliper.output_dir, which names will be picked automatically, according to the genre of the exported data and/or the conformer that data relates to. export_energies() will produce a file for each available energies genre and an additional overview file, export_spectra() will create a file for spectra genre and each conformer, and so on.

Note

"xlsx" format is an exception from the above - it will produce only one file, named “tesliper-output.xlsx”, and create multiple spreadsheets inside this file. The appending mode is useful when exporting data to "xlsx" format, as it allows to write multiple kinds of data (with calls to multiple of these methods) to this single destination .xlsx file.

There is also a Tesliper.export_data() available, which will export only genres you specifically request (plus “freq” or “wavelen”, if any genre given genre is a spectral data-related). The same applies here in the context of output format, write mode, and names of produced files.

Tip

If you would like to customize names of the files produced, you will need to directly use one of the writer objects provided by tesliper. Refer to the writing module documentation for more information.

Creating input files

You can use Tesliper.export_job_file() to prepare input files for the quantum chemical calculations software. Apart from the typical fmt (only "gjf" is supported by default) and mode parameters, this method also accepts the geometry_genre and any number of additional keyword parameters, specifying calculations details. geometry_genre should be a name of the data genre, representing conformers’ geometry, that should be used as input geometry. Additional keyword parameters are passed to the writer object, relevant to the fmt requested. Keywords supported by the default "gjf"-format writer are as follows:

route
A calculations route: keywords specifying calculations directives for quantum chemical calculations software.

link0
Dictionary with “link zero” commands, where each key is command’s name and each value is this command’s parameter.

comment
Contents of title section, i.e. a comment about the calculations.

post_spec
Anything that should be placed after conformer’s geometry specification. Will be written to the file as given.

link0 parameter should be explained in more details. It supports standard link zero commands used with Gaussian software, like Mem, Chk, or NoSave. Full list of these commands may be found in the documentation for GjfWriter.link0. Any non-parametric link0 command (i.e. Save, NoSave, and ErrorSave), should be given a True value if it should be included in the link0 section.

Path-like commands, e.g. Chk or RWF, may be parametrized for each conformer. You can put a placeholder inside a given string path, which will be substituted when writing to file. The most useful placeholders are probably ${conf} and ${num} that evaluate to conformer’s name and ordinal number respectively. More information about placeholders may be found in GjfWriter.make_name() documentation.

>>> list(tslr.conformers.kept_keys())
["conf_one", "conf_three"]
>>> tslr.export_job_file(
...     geometry_genre="optimized_geom",
...     route="# td=(nstates=80)",
...     comment="Example of parametrization in .gjf files",
...     link0={
...         "Mem": "10MW",
...         "Chk": "path/to/${conf}.chk",
...         "Save": True,
...     },
... )
>>> [file.name for file in tslr.output_dir.iterdir()]
["conf_one.gjf", "conf_three.gjf"]

Then contents of “conf_one.gjf” is:

%Mem=10MW
%Chk=path/to/conf_one.gjf
%Save

# td=(nstates=80)

Example of parametrization in .gjf files

[geometry specification...]

Saving session for later

If you’d like to come back to the data currently contained within a tesliper instance, you may serialize it using Tesliper.serialize() method. Provide the method call with a filename parameter, under which filename the session should be stored inside the current output_dir. You may also omit it to use the default ".tslr" name. All data, extracted and calculated, including current kept status of each conformer, is saved and may be loaded later using Tesliper.load() class method.

curr_dir = tslr.output_dir
tslr.serialize()
loaded = Tesliper.load(curr_dir / ".tslr")
assert loaded.conformers == tslr.conformers
assert loaded.conformers.kept == tslr.conformers.kept
assert loaded.spectra.keys() == tslr.spectra.keys()

Scripting with tesliper

Reading files

Filtering conformers

Trimming methods

Manipulating Conformers.kept

Trimming temporarily

Simulating spectra

Calculation parameters

Activities genres

Averaging spectra

Temperature of the system

Comparing with experiment

Loading experimental spectra

Adjusting calculated spectra

Writing to disk

Exporting data

Creating input files

Saving session for later

Scripting with `tesliper`

Manipulating `Conformers.kept`