tesliper.extraction.parser_base

This module contains a definition of Abstract Base Class for file parsers.

Classes

ParserBase()

Abstract Base Class for parsers implemented as finite state machines.

class tesliper.extraction.parser_base.ParserBase[source]

Abstract Base Class for parsers implemented as finite state machines.

This base class defines some methods to organize work parsers implemented as finite state machines: automates registration of methods and functions as parser’s states, manages its execution, and registers derived class as parser used for certain type of files (which registry is used by Soxhlet object).

The default parsing flow goes as follow:

  1. method parse() is called with file handle as argument;

  2. method initial() is set as a ‘workhorse’

  3. ‘workhorse’ is called for consecutive lines in file handle

  4. initial() checks if any registered trigger matches current line

  5. workhorse() is changed to method associated with first matching trigger

  6. calling ‘workhorse’ on consecutive lines continues

  7. parse() returns dictionary with extracted values

To make this possible, each method marked as state should return dictionary (or sequence convertible to dict) and handle changing ‘workhorse’ to next appropriate state. To mark a method as parser’s state use ParserBase.state decorator in class definition or add a state directly to parser instance using ‘add_state’ method.

When subclassing ParserBase, one should implement initial() and parse() methods. Those abstract methods implement basic functionality, described above. See methods’ documentation for more details. If you wish not to use default ParserBase’s protocol, simply override those methods to your liking. Values for class attributes extensions and purpose should also be provided.

To register class derived from ParserBase for use by Soxhlet object, simply set purpose class attribute to name, under which class should be registered. Setting it to one of names already defined (e.g. ‘gaussian’) will override the default parser used by Soxhlet object.

states

Dictionary of parser states, created automatically on object instantiation from object methods marked as states; method name is used as a key by default.

Type

dict

triggers

Dictionary of triggers for parser states, created automatically on object instantiation from object methods marked as states with triggers; key for a particular state trigger should be the same as state’s key in states dictionary.

Type

dict

abstract property extensions

File extensions that should be cosidered compatible with a parser subclassing ParserBase. It will be used by Soxhlet to identify which files to parse when reading files in batch. Should be a class attribute with a tuple of str, where each element is a file extension. May also be an empty tuple, if files discovery feature is not needed for the parser.

abstract property purpose

An identifier for a parser subclassing ParserBase. It allows tesliper to pick a correct parser for each parsing task. A falsy value, i.e. an empty string or None prevents the parser from beeing registered for use by tesliper. If custom subclass uses a purpose already known, e.g. “gaussian” or “spectra”, it will override the original parser for this purpose.

property workhorse: Callable

Callable marked as a current state used by parser object.

Setter can take a callable or a string as a parameter. If name as string is passed to setter, it will be translated to a method registered as state. If no method was registered under this name, InvalidStateError will be raised. No other checks are performed when argument is callable.

add_state(state: Callable, name: str = '', trigger: str = '')[source]

Register callable as parser’s state.

This method registers a callable under name key in states dictionary. If trigger parameter is given, it is registered under the same key in triggers dictionary.

Parameters
  • state (Callable) – callable, that is to be registered as parser’s state

  • name (str, optional) – name under which the callable should be registered; defaults to callable.__name__

  • trigger (str, optional) – string with regular expression, that will be compiled with re module

Returns

callable object registered as state

Return type

Callable

remove_state(name: str)[source]

Removes the state from parser’s registered states.

Parameters

name (str) – name of state, that should be unregistered

Raises

InvalidStateError – if no callable was registered under the name ‘name’

abstract initial(line: str) dict[source]

An initial parser state.

A default implementation checks if any of defined triggers matches a line and sets an associated state as parser’s workhorse, if it does. This is an abstract method and should be overridden in subclass. Its default implementation can be used, however, by calling super().initial(line) in subclass’s method.

Notes

initial() method is always registered as parser’s state.

Parameters

line (str) – currently parsed line

Returns

empty dictionary

Return type

dict

abstract parse(lines: Iterable) dict[source]

Parses consecutive elements of iterable and returns data found as dictionary.

Dictionary with extracted data is updated with workhorse’s return value, so all states should return dictionary or compatible sequence. This is an abstract method and should be overridden in subclass. Its default implementation can be used, however, by calling data = super().parse(lines) in subclass’s method.

Notes

After execution - either successful or interrupted by exception - workhorse is set back to initial() method.

Parameters

lines (Iterable) – iterable (i.e. file handle), that will be parsed, line by line

Returns

dictionary with data extracted by parser

Return type

dict

Raises

InvalidStateError – if dictionary can’t be updated with state’s return value

static state(state=None, trigger=None)[source]

Convenience decorator for registering a method as parser’s state. It can be with or without ‘trigger’ parameter, like this:

>>> @ParserBase.state
... def method(self, arg): pass

or

>>> @ParserBase.state(trigger='triggering regex')
... def method(self, arg): pass

This function marks a method state as parser’s state by defining is_state attribute on said method and setting its values to True. If trigger is given, it is stored in method’s attribute trigger. During instantiation of ParserBase’s subclass, methods marked as states are registered under method.__name__ key in its states (and possibly triggers) attribute. It is meaningless if used outside of ParserBase’s subclass definition.

Parameters
  • state (Callable) – callable, that is to be registered as parser’s state

  • trigger (str, optional) – string with regular expression, that will be compiled with re module

Returns

callable object registered as state if ‘state’ was given or decorator if only ‘trigger’ was given

Return type

Callable

Raises
  • TypeError – if no arguments given

  • InvalidStateError – if state argument is not callable