tesliper.extraction.soxhlet
A tool for batch parsing files from specified directory.
Classes
|
A tool for data extraction from files in specific directory. |
- class tesliper.extraction.soxhlet.Soxhlet(path: Optional[Union[str, pathlib.Path]] = None, purpose: str = 'gaussian', wanted_files: Optional[Iterable[Union[str, pathlib.Path]]] = None, extension: Optional[str] = None, recursive: bool = False)[source]
A tool for data extraction from files in specific directory. Typical use:
>>> s = Soxhlet('absolute/path_to/working/directory') >>> data = s.extract()
- Parameters
path (str or pathlib.Path) – String representing absolute path to directory containing files, which will be the subject of data extraction.
purpose (str) – Determines which from registered parsers should be used for extraction. purposes supported out-of-the-box are “gaussian”, “spectra”, and “parameters”.
wanted_files (list of str or pathlib.Path objects, optional) – List of files, that should be loaded for further extraction. If omitted, all output files present in directory will be processed.
extension (str, optional) – A string representing file extension of output files, that should be parsed. If omitted, Soxhlet will try to resolve it based on contents of directory given in path parameter.
recursive (bool) – If True, given path will be searched recursively, extracting data from subdirectories, otherwise subdirectories are ignored and only files placed directly in path will be parsed.
- Raises
FileNotFoundError – If path passed as argument to constructor doesn’t exist or is not a directory.
ValueError – If no parser is registered for given purpose.
- property all_files
List of all files present in directory bounded to Soxhlet instance. If its recursive attribute is
True, also files from subdirectories are included.
- property files
List of all wanted files available in given directory. If wanted_files is not specified, evaluates to all files in said directory. If Soxhlet object’s recursive attribute is
True, also files from subdirectories are included.
- property wanted_files: Optional[Set[str]]
Set of files that are desired for data extraction, stored as filenames without an extension. Any iterable of strings or Path objects is transformed to this form.
>>> s = Soxhlet() >>> s.wanted_files = [Path("./dir/file_one.out"), Path("./dir/file_two.out")] >>> s.wanted_files {"file_one", "file_two"}
May also be set to
Noneor other “falsy” value, in such case it is ignored.
- property output_files: List[pathlib.Path]
List of (sorted by file name) gaussian output files from files list associated with Soxhlet instance.
- filter_files(ext: Optional[str] = None) List[pathlib.Path][source]
Filters files from filenames list.
Filters file names in list associated with
Soxhletobject instance. It returns list of file names ending with provided ext string, representing file extension and starting with any of filenames associated with instance as wanted_files if those were provided.- Parameters
ext (str) – Strings representing file extension.
- Returns
List of filtered filenames as strings.
- Return type
list
- Raises
ValueError – If parameter ext is not given and attribute
extensioninNone.
- guess_extension() str[source]
Tries to figure out which extension should be assumed.
Looks for files, which names end with one of the extensions defined by currently used parser. Returns extension that matches as the only one. Raises an exception if extension cannot be easily guessed.
- Returns
The extension of files that are present in filenames list, which current parser can parse.
- Return type
str
- Raises
ValueError – If more than one type of files declared by a current parser as possibly compatible is present in list of filenames.
FileNotFoundError – If none of files declared by a current parser as possibly compatible are present in list of filenames.
TypeError – If current parser does not declare any compatible file extensions.
- extract_iter() Generator[Tuple[str, dict], None, None][source]
Extracts data from files associated with
Soxhletinstance (viapathandwanted_filesattributes), using a current parser (determined by a purpose provided onSoxhlet’s instantiation). Implemented as generator. If Soxhlet instance’srecursiveattribute isTrue, also files from subdirectories are parsed.- Yields
tuple – Two item tuple with name of parsed file as first and extracted data as second item, for each file associated with Soxhlet instance.
- extract() dict[source]
Extracts data from files associated with
Soxhletinstance (viapathandwanted_filesattributes), using a current parser (determined by a purpose provided onSoxhlet’s instantiation). IfSoxhlet.recursiveattribute isTrue, also files from subdirectories are parsed.- Returns
dictionary of extracted data, with name of parsed file as key and data as value, for each file associated with Soxhlet instance.
- Return type
dict of dicts
- parse_one(source: Union[str, pathlib.Path]) Any[source]
Parse one file using current parser (determined by a purpose provided on
Soxhlet’s instantiation) and return extracted data.- Parameters
source (str or Path) – Path or Path-like object to a file. May be given as an absolute path or relative to the
Soxhlet.path.- Returns
Data in a format that current parser provides.
- Return type
any
- Raises
FileNotFoundError – If no source file is found.