Scripting with tesliper
This part discusses basics of using Python API. For tutorial on using a Graphical Interface, see gui.
tesliper
provides a Tesliper
class as a main entry point to its
functionality. This class allows you to easily perform any typical task: read and write
files, filter data, calculate and average spectra. It is recommended to read
Conventions and Terms to get a general idea of what to expect. The next paragraphs will
introduce you to basic use cases with examples and explanations. The examples do not use
real data, but simplified mockups, to not obscure the logic presented.
from tesliper import Tesliper
# extract data from Gaussian output files
tslr = Tesliper(input_dir="./opt_and_freq")
tslr.extract()
# conditional filtering of conformers
tslr.conformers.trim_non_normal_termination()
tslr.conformers.trim_not_optimized()
tslr.conformers.trim_imaginary_frequencies()
tslr.conformers.trim_to_range("gib", maximum=10, attribute="deltas")
tslr.conformers.trim_rmsd(threshold=1.0, window_size=0.5, energy_genre="gib")
# calculate and average spectra, export data
tslr.calculate_spectra()
tslr.average_spectra()
tslr.export_energies(fmt="txt")
tslr.export_averaged(fmt="csv")
Reading files
After importing Tesliper
class, we instantiate it with an input_dir
parameter, which is a path to the directory containing output files from quantum
chemical calculations software. You may also provide an output_dir parameter, defining
where tesliper
should write the files it generates. Both of those parameters are
optional and default to the current working directory, if omitted. You may also provide
a wanted_files parameter, which should be a list of filenames that Tesliper
should parse, ignoring any other files present in input_dir. Omitting wanted_files
means that no file should be ignored.
Note
Tesliper
accepts also quantum_software parameter, which is a hint for
tesliper
on how it should parse output files it reads. However, only Gaussian
software is supported out-of-the-box, and quantum_software="gaussian"
is a
default value. If you wish to use tesliper
to work with another qc package,
you will need to define a custom parser that subclasses the ParserBase
class. Refer to its documentation for more information.
You can extract data from the files in output_dir using Tesliper.extract()
method. Tesliper.extract()
respects input_dir and wanted_files given to
Tesliper
, but path and wanted_files parameters provided to the method call
will take precedence. If you would like to read files in the whole directory tree, you
may perform a recursive extraction, using extract(recursive=True)
. So assuming a
following directory structure:
project
├── optimization
│ ├── conf_one.out
│ └── conf_two.out
│ └── conf_three.out
└── vibrational
├── conf_one.out
└── conf_two.out
└── conf_three.out
you could use any of the following to get the same effect.
# option 1: change *input_dir*
tslr = Tesliper(input_dir="./project/optimization")
tslr.extract()
tslr.input_dir = "./project/vibrational"
tslr.extract()
# option 2: override *input_dir* only for one call
tslr = Tesliper(input_dir="./project/optimization")
tslr.extract()
tslr.extract(path="./project/vibrational")
# option 3: read the whole tree
tslr = Tesliper(input_dir="./project")
tslr.extract(recursive=True)
tesliper
will try to guess the extension of files it should parse: e.g. Gaussian
output files may have “.out” or “.log” extension. If those are mixed in the source
directory, an exception will be raised. You can prevent this by providing the
extension parameter, only files with given extension will be parsed.
project
├── conf_one.out
└── conf_two.log
tslr = Tesliper(input_dir="./project")
tslr.extract() # raises ValueError
tslr.extract(extension="out") # ok
Filtering conformers
Tesliper.extract()
will read and parse files it thinks are output files of the
quantum chemical software and update a Tesliper.conformers
internal data
storage. It is a dict
-like Conformers
instance, that stores data for each
conformer in a form of an ordinary dict
. This inner dict uses genre
names as keys and data as values (the form of which depends on the genre itself).
Conformers
provide a number of methods for filtering conformers it knows,
allowing to easily hide data that should excluded from further analysis. tesliper
calls this process a trimming. The middle part of the first code snippet are example
of trimming conformers:
tslr.conformers.trim_non_normal_termination()
tslr.conformers.trim_not_optimized()
tslr.conformers.trim_imaginary_frequencies()
tslr.conformers.trim_to_range("gib", maximum=10, attribute="deltas")
tslr.conformers.trim_rmsd(threshold=1.0, window_size=0.5, energy_genre="gib")
As you may suspect, trim_non_normal_termination()
hides data from
calculations that did not terminate normally, trim_not_optimized()
hides data from conformers that are not optimized, and
trim_imaginary_frequencies()
hides data from conformers that have at
least one imaginary frequency. More trimming methods is described below.
Conformers hidden are not kept.
Information about which conformers are kept and not kept is stored in
Conformers.kept
attribute, which may also be manipulated more directly. More on
this topic will be explained later.
As mentioned earlier, Tesliper.conformers
is a dict-like
structure, and as such offers a typical functionality of Python’s dict
s. However,
checking for presence with conf in tslr.conformers
or requesting a view with
standard keys()
, values()
, or items()
will operate
on the whole data set, ignoring any trimming applied earlier. Conformers
class
offers additional kept_keys()
, kept_values()
, and
kept_items()
methods, that return views that acknowledge trimming.
Trimming methods
There is a number of those methods available for you, beside those mentioned above. Below you will find them listed with a short summary and a link to a more comprehensive explanation in the method’s documentation.
trim_incomplete()
Filters out conformers that doesn’t contain data for as many expected genres as other conformers.
trim_imaginary_frequencies()
Filters out conformers that contain imaginary frequencies (any number of negative frequency values).
trim_non_matching_stoichiometry()
Filters out conformers that have different stoichiometry than expected.
trim_not_optimized()
Filters out conformers that failed structure optimization.
trim_non_normal_termination()
Filters out conformers, which calculation job did not terminate normally (was erroneous or interrupted).
trim_inconsistent_sizes()
Filters out conformers that have iterable data genres in different size than most conformers. Helpful when
InconsistentDataError
occurs.trim_to_range()
Filters out conformers that have a value of some specific data or property outside of the given range, e.g. their calculated population is less than 0.01.
trim_rmsd()
Filters out conformers that are identical to another conformer, judging by a given threshold of the root-mean-square deviation of atomic positions (RMSD).
select_all()
Marks all conformers as kept.
reject_all()
Marks all conformers as not kept.
Manipulating Conformers.kept
Information, which conformer is kept and which is not, is stored in the
Conformers.kept
attribute. It is a list of booleans, one for each conformer
stored, defining which conformers should be processed by tesliper
.
# assuming "conf_two" has imaginary frequencies
tslr.conformers.trim_imaginary_frequencies()
tslr.conformers.kept == [True, False, True] # True
tslr.export_data(["genres", "to", "export"])
# only files for "conf_one" and "conf_three" are generated
Conformers.kept
may be modified using trimming methods described earlier, but also more directly: by setting it to a new value. Firstly, it is the
most straightforward to just assign a new list of boolean values to it. This list should
have the same number of elements as the number of conformers contained. A
ValueError
is raised if it doesn’t.
>>> tslr.conformers.kept
[True, True, True]
>>> tslr.conformers.kept = [False, True, False]
>>> tslr.conformers.kept
[False, True, False]
>>> tslr.conformers.kept = [False, True, False, True]
Traceback (most recent call last):
...
ValueError: Must provide boolean value for each known conformer.
4 values provided, 3 excepted.
Secondly, list of filenames of conformers intended to be kept may be given. Only these
conformers will be kept. If given filename is not in the underlying
tslr.conformers
’ dictionary, KeyError
is raised.
>>> tslr.conformers.kept = ['conf_one']
>>> tslr.conformers.kept
[True, False, False]
>>> tslr.conformers.kept = ['conf_two', 'other']
Traceback (most recent call last):
...
KeyError: Unknown conformers: other.
Thirdly, list of integers representing conformers’ indices may be given. Only conformers with specified indices will be kept. If one of given integers can’t be translated to conformer’s index, IndexError is raised. Indexing with negative values is not supported currently.
>>> tslr.conformers.kept = [1, 2]
>>> tslr.conformers.kept
[False, True, True]
>>> tslr.conformers.kept = [2, 3]
Traceback (most recent call last):
...
IndexError: Indexes out of bounds: 3.
Fourthly, assigning True
or False
to this attribute will mark all
conformers as kept or not kept respectively.
>>> tslr.conformers.kept = False
>>> tslr.conformers.kept
[False, False, False]
>>> tslr.conformers.kept = True
>>> tslr.conformers.kept
[True, True, True]
Warning
List of kept values may be also modified by setting its elements to True
or
False
. It is advised against, however, as a mistake such as
tslr.conformers.kept[:2] = [True, False, False]
will break some functionality by
forcibly changing size of tslr.conformers.kept
list.
Trimming temporarily
Conformers
provide two convenience context managers for temporarily trimming
its data: untrimmed
and trimmed_to()
. The first
one will simply undo any trimming previously done, allowing you to operate on the full
data set or apply new, complex trimming logic. When Python exits
untrimmed
context, previous trimming is restored.
>>> tslr.conformers.kept = [False, True, False]
>>> with tslr.conformers.untrimmed:
>>> tslr.conformers.kept
[True, True, True]
>>> tslr.conformers.kept
[False, True, False]
The second one temporarily applies an arbitrary trimming, provided as a parameter to the
trimmed_to()
call. Any value normally accepted by Conformers.kept
may be used here.
>>> tslr.conformers.kept = [True, True, False]
>>> with tslr.conformers.trimmed_to([1, 2]):
>>> tslr.conformers.kept
[False, True, True]
>>> tslr.conformers.kept
[True, True, False]
Tip
To trim conformers temporarily without discarding a currently applied trimming, you may use:
with tslr.conformers.trimmed_to(tslr.conformers.kept):
... # temporary trimming upon the current one
Simulating spectra
To calculate a simulated spectra you will need to have spectral activities extracted.
These will most probably come from a freq or td Gaussian calculation job, depending
on a genre of spectra you would like to simulate. tesliper
can simulate IR, VCD, UV,
ECD, Raman, and ROA spectra, given the calculated values of conformers’ optical
activity. When you call Tesliper.calculate_spectra()
without any parameters, it
will calculate spectra of all available genres, using default activities genres and
default parameters, and store them in the Tesliper.spectra
dictionary. Aside
form this, the spectra calculated are returned by the method.
You can calculate a specific spectra genres only, by providing a list of their names as
a parameter to the Tesliper.calculate_spectra()
call. Also in this case a default
activities genres and default parameters will be used to calculate desired spectra, see
Activities genres and Calculation parameters below to learn how this can be
customized.
ir_and_uv = tslr.calculate_spectra(["ir", "uv"])
assert ir_and_uv["ir"] is tslr.spectra["ir"]
Calculation parameters
tesliper
uses Lorentzian or
Gaussian fitting function to
simulate spectra from corresponding optical activities values. Both of these require to
specify a desired width of peak, as well as the beginning, end, and step of the abscissa
(x-axis values). If not told otherwise, tesliper
will use a default values for these
parameters and a default fitting function for a given spectra genre. These default
values are available via Tesliper.standard_parameters
and are as follows.
Parameter |
IR, VCD, Raman, ROA |
UV, ECD |
---|---|---|
width |
6 [\(\mathrm{cm}^{-1}\)] |
0.35 [\(\mathrm{eV}\)] |
start |
800 [\(\mathrm{cm}^{-1}\)] |
150 [\(\mathrm{nm}\)] |
stop |
2900 [\(\mathrm{cm}^{-1}\)] |
800 [\(\mathrm{nm}\)] |
step |
2 [\(\mathrm{cm}^{-1}\)] |
1 [\(\mathrm{nm}\)] |
fitting |
You can change the parameters used for spectra simulation by altering values in the
Tesliper.parameters
dictionary. It stores a dict
of parameters’ values for
each of spectra genres (“ir”, “vcd”, “uv”, “ecd”, “raman”, and “roa”). start, stop,
and step expect its values to by in \(\mathrm{cm}^{-1}\) units for vibrational and
scattering spectra, and \(\mathrm{nm}\) units for electronic spectra. width
expects its value to be in \(\mathrm{cm}^{-1}\) units for vibrational and scattering
spectra, and \(\mathrm{eV}\) units for electronic spectra. fitting should be a
callable that may be used to simulate peaks as curves, preferably one of:
gaussian()
or lorentzian()
.
# change parameters' values one by one
tslr.parameters["uv"]["step"] = 0.5
tslr.parameters["uv"]["width"] = 0.5
tslr.parameters["vcd"].update( # or with an update
{"start": 500, "stop": 2500, "width": 2}
)
# "fitting" should be a callable
from tesliper import lorentzian
tslr.parameters["uv"]["fitting"] = lorentzian
Parameter |
|
Description |
---|---|---|
width |
|
the beginning of the spectral range |
start |
|
the end of the spectral range |
stop |
|
step of the abscissa |
step |
|
width of the peak |
fitting |
|
function used to simulate peaks as curves |
Warning
When modifying Tesliper.parameters
be careful to not delete any of the
key-value pairs. If you need to revert to standard parameters’ values, you can just
reassign them to Tesliper.standard_parameters
.
tslr.parameters["ir"] = {
... "start": 500, "stop": 2500, "width": 2
... } # this will cause problems!
# revert to default values
tslr.parameters["ir"] = tslr.standard_parameters["ir"]
Activities genres
Instead of specifying a spectra genre you’d like to get, you may specify an activities genre you prefer to use to calculate a corresponding spectrum. The table below summarizes which spectra genres may be calculated from which activities genres.
Spectra |
Default activity |
Other activities |
---|---|---|
IR |
dip |
iri |
VCD |
rot |
|
UV |
vosc |
losc, vdip, ldip |
ECD |
vrot |
lrot |
Raman |
raman1 |
ramact, ramanactiv, raman2, raman3 |
ROA |
roa1 |
roa2, roa3 |
Warning
If you provide two different genres that map to the same spectra genre, only one of
them will be accessible, the other will be thrown away. If you’d like to compare
results of simulations using different genres, you need to store the return value
of Tesliper.calculate_spectra()
call.
>>> out = tslr.calculate_spectra(["vrot", "lrot"])
>>> list(out.keys()) # only one representation returned
["ecd"]
>>> velo = tslr.calculate_spectra(["vrot"])
>>> length = tslr.calculate_spectra(["lrot"])
>>> assert not velo["ecd"] == length["ecd"] # different
Averaging spectra
Each possible conformer contributes to the compound’s spectrum proportionally to it’s
population in the mixture. tesliper
can calculate conformers’ population from their
relative energies, using a technique called Boltzmann distribution. Assuming that any energies
genre is available (usually at least scf energies are), after calculating spectra you
want to simulate, you should call Tesliper.average_spectra()
to get the final
simulated spectra.
>>> averaged = tslr.average_spectra() # averages available spectra
>>> assert tslr.averaged is averaged # just a reference
Tesliper.average_spectra()
averages each spectra stored in the
Tesliper.spectra
dictionary, using each available energies genre. Generated
average spectra are stored in Tesliper.averaged
dictionary, using a tuples
of ("spectra_genre", "energies_genre")
as keys. average_spectra()
returns a reference to this attribute.
Note
There is also a Tesliper.get_averaged_spectrum()
method for calculating a
single averaged spectrum using a given spectra genre and energies genre. Value
returned by this method is not automatically stored.
Temperature of the system
Boltzmann distribution depends on the temperature of the system, which is assumed to be
the room temperature, expressed as \(298.15\ \mathrm{Kelvin}\)
(\(25.0^{\circ}\mathrm{C}\)). You can change it by setting
Tesliper.temperature()
attribute to the desired value. This must be done before
calculation of the average spectrum to have an effect.
>>> tslr.temperature
298.15 # default value in Kelvin
>>> averaged = tslr.average_spectra() # averages available spectra
>>> tslr.temperature = 300.0 # in Kelvin
>>> high_avg = tslr.average_spectra()
>>> assert not averaged == high_avg # resulting average is different
Note
Tesliper.temperature()
value must be a positive number, absolute zero or
lower is not allowed, as it would cause problems in calculation of Boltzmann
distribution. An attempt to set temperature to equal or below \(0\ \mathrm{K}\)
will raise a ValueError.
Comparing with experiment
The experimental spectrum may be loaded with tesliper
from a text or CSV file. The
software helps you adjust the shift and scale of your simulated spectra to match the
experiment. Unfortunately, tesliper
does not offer broad possibilities when it comes
to mathematical comparison of the simulated spectra and the experimental one. You will
need to use an external library or write your own logic to do that.
Loading experimental spectra
To load an experimental spectrum use Tesliper.load_experimental()
method. You
will need to provide a path to the file (absolute or relative to the current
Tesliper.input_dir
) and a genre name of the loaded experimental spectrum. When
the file is read, its content is stored in Tesliper.experimental
dictionary.
>>> spectrum = tslr.load_experimental("path/to/spectrum.xy", "ir")
>>> tslr.experimental["ir"] is spectrum
True
Adjusting calculated spectra
Spectra calculated and loaded from disk with tesliper
are stored as instances of
Spectra
or SingleSpectrum
classes. Both of them provide a
scale_to()
and shift_to()
methods that
adjust a scale and offset (respectively) to match another spectrum, provided as a
parameter. Parameters found automatically may not be perfect, so you may provide them
yourself, by manually setting scaling
and
offset
to desired values.
>>> spectra = tslr.spectrum["ir"]
>>> spectra.scaling # affects spectra.y
1.0
>>> spectra.scale_to(tslr.experimental["ir"])
>>> spectra.scaling
1.32
>>> spectra.offset = 50 # bathochromic shift, affects spectra.x
Corrected values may be accessed via spectra.x
and spectra.y
,
original values may be accessed via spectra.abscissa
and spectra.values
.
Writing to disk
Once you have data you care about, either extracted or calculated, you most probably
would like to store it, process it with another software, or visualize it. tesliper
provides a way to save this data in one of the supported formats: CSV, human-readable
text files, or .xlsx spreadsheet files. Moreover, tesliper
may produce Gaussian
input files, allowing you to easily setup a next step of calculations.
Exporting data
tesliper
provides a convenient shorthands for exporting certain kinds of data:
Tesliper.export_energies()
will export information about conformers’ energies and their calculated populations;Tesliper.export_spectral_data()
will export data that is related to spectral activity, but cannot be used to simulate spectra;Tesliper.export_activities()
will export unprocessed spectral activities, normally used to simulate spectra;Tesliper.export_spectra()
will export spectra calculated so far that are stored inTesliper.spectra
dictionary;Tesliper.export_averaged()
will export each averaged spectrum calculated so far that is stored inTesliper.averaged
dictionary.
Each of these methods take two parameters: fmt and mode. fmt is a file format, to
which data should be exported. It should be one of the following values: "txt"
(the
default), "csv"
, "xlsx"
. mode denotes how files should be opened and should be
one of: "a"
(append to existing file), "x"
(the default, only write if file
doesn’t exist yet), "w"
(overwrite file if it already exists).
These export methods will usually produce a number of files in the
Tesliper.output_dir
, which names will be picked automatically, according to the
genre of the exported data and/or the conformer that data relates to.
export_energies()
will produce a file for each available energies genre
and an additional overview file, export_spectra()
will create a file
for spectra genre and each conformer, and so on.
Note
"xlsx"
format is an exception from the above - it will produce only one file,
named “tesliper-output.xlsx”, and create multiple spreadsheets inside this file. The
appending mode is useful when exporting data to "xlsx"
format, as it allows to
write multiple kinds of data (with calls to multiple of these methods) to this
single destination .xlsx file.
There is also a Tesliper.export_data()
available, which will export only genres
you specifically request (plus “freq” or “wavelen”, if any genre given genre is a
spectral data-related). The same applies here in the context of output format, write
mode, and names of produced files.
Tip
If you would like to customize names of the files produced, you will need to
directly use one of the writer objects provided by tesliper
. Refer to the
writing
module documentation for more information.
Creating input files
You can use Tesliper.export_job_file()
to prepare input files for the quantum
chemical calculations software. Apart from the typical fmt (only "gjf"
is
supported by default) and mode parameters, this method also accepts the
geometry_genre and any number of additional keyword parameters, specifying
calculations details. geometry_genre should be a name of the data genre, representing
conformers’ geometry, that should be used as input geometry. Additional keyword
parameters are passed to the writer object, relevant to the fmt requested. Keywords
supported by the default "gjf"-format writer
are as follows:
- route
A calculations route: keywords specifying calculations directives for quantum chemical calculations software.
- link0
Dictionary with “link zero” commands, where each key is command’s name and each value is this command’s parameter.
- comment
Contents of title section, i.e. a comment about the calculations.
- post_spec
Anything that should be placed after conformer’s geometry specification. Will be written to the file as given.
link0
parameter should be explained in more details. It supports standard link zero
commands used with Gaussian software, like Mem
, Chk
, or NoSave
. Full list of
these commands may be found in the documentation for GjfWriter.link0
. Any
non-parametric link0
command (i.e. Save
, NoSave
, and ErrorSave
), should
be given a True
value if it should be included in the link0
section.
Path-like commands, e.g. Chk
or RWF
, may be parametrized for each conformer. You
can put a placeholder inside a given string path, which will be substituted when writing
to file. The most useful placeholders are probably ${conf}
and ${num}
that
evaluate to conformer’s name and ordinal number respectively. More information about
placeholders may be found in GjfWriter.make_name()
documentation.
>>> list(tslr.conformers.kept_keys())
["conf_one", "conf_three"]
>>> tslr.export_job_file(
... geometry_genre="optimized_geom",
... route="# td=(nstates=80)",
... comment="Example of parametrization in .gjf files",
... link0={
... "Mem": "10MW",
... "Chk": "path/to/${conf}.chk",
... "Save": True,
... },
... )
>>> [file.name for file in tslr.output_dir.iterdir()]
["conf_one.gjf", "conf_three.gjf"]
Then contents of “conf_one.gjf” is:
%Mem=10MW
%Chk=path/to/conf_one.gjf
%Save
# td=(nstates=80)
Example of parametrization in .gjf files
[geometry specification...]
Saving session for later
If you’d like to come back to the data currently contained within a tesliper
instance, you may serialize it using Tesliper.serialize()
method. Provide the
method call with a filename parameter, under which filename the session should be
stored inside the current output_dir
. You may also omit it to use the
default ".tslr"
name. All data, extracted and calculated, including current
kept status of each conformer, is saved and may be loaded later using
Tesliper.load()
class method.
curr_dir = tslr.output_dir
tslr.serialize()
loaded = Tesliper.load(curr_dir / ".tslr")
assert loaded.conformers == tslr.conformers
assert loaded.conformers.kept == tslr.conformers.kept
assert loaded.spectra.keys() == tslr.spectra.keys()