tesliper.extraction.parser_base
This module contains a definition of Abstract Base Class for file parsers.
Classes
Abstract Base Class for parsers implemented as finite state machines. |
- class tesliper.extraction.parser_base.ParserBase[source]
Abstract Base Class for parsers implemented as finite state machines.
This base class defines some methods to organize work parsers implemented as finite state machines: automates registration of methods and functions as parser’s states, manages its execution, and registers derived class as parser used for certain type of files (which registry is used by
Soxhletobject).The default parsing flow goes as follow:
method
parse()is called with file handle as argument;method
initial()is set as a ‘workhorse’‘workhorse’ is called for consecutive lines in file handle
initial()checks if any registered trigger matches current lineworkhorse()is changed to method associated with first matching triggercalling ‘workhorse’ on consecutive lines continues
parse()returns dictionary with extracted values
To make this possible, each method marked as state should return dictionary (or sequence convertible to dict) and handle changing ‘workhorse’ to next appropriate state. To mark a method as parser’s state use ParserBase.state decorator in class definition or add a state directly to parser instance using ‘add_state’ method.
When subclassing ParserBase, one should implement
initial()andparse()methods. Those abstract methods implement basic functionality, described above. See methods’ documentation for more details. If you wish not to use default ParserBase’s protocol, simply override those methods to your liking. Values for class attributesextensionsandpurposeshould also be provided.To register class derived from ParserBase for use by
Soxhletobject, simply setpurposeclass attribute to name, under which class should be registered. Setting it to one of names already defined (e.g. ‘gaussian’) will override the default parser used bySoxhletobject.- states
Dictionary of parser states, created automatically on object instantiation from object methods marked as states; method name is used as a key by default.
- Type
dict
- triggers
Dictionary of triggers for parser states, created automatically on object instantiation from object methods marked as states with triggers; key for a particular state trigger should be the same as state’s key in
statesdictionary.- Type
dict
- abstract property extensions
File extensions that should be cosidered compatible with a parser subclassing
ParserBase. It will be used bySoxhletto identify which files to parse when reading files in batch. Should be a class attribute with a tuple of str, where each element is a file extension. May also be an empty tuple, if files discovery feature is not needed for the parser.
- abstract property purpose
An identifier for a parser subclassing
ParserBase. It allowstesliperto pick a correct parser for each parsing task. A falsy value, i.e. an empty string orNoneprevents the parser from beeing registered for use bytesliper. If custom subclass uses a purpose already known, e.g. “gaussian” or “spectra”, it will override the original parser for this purpose.
- property workhorse: Callable
Callable marked as a current state used by parser object.
Setter can take a callable or a string as a parameter. If name as string is passed to setter, it will be translated to a method registered as state. If no method was registered under this name,
InvalidStateErrorwill be raised. No other checks are performed when argument is callable.
- add_state(state: Callable, name: str = '', trigger: str = '')[source]
Register callable as parser’s state.
This method registers a callable under name key in
statesdictionary. If trigger parameter is given, it is registered under the same key intriggersdictionary.- Parameters
state (Callable) – callable, that is to be registered as parser’s state
name (str, optional) – name under which the callable should be registered; defaults to callable.__name__
trigger (str, optional) – string with regular expression, that will be compiled with re module
- Returns
callable object registered as state
- Return type
Callable
- remove_state(name: str)[source]
Removes the state from parser’s registered states.
- Parameters
name (str) – name of state, that should be unregistered
- Raises
InvalidStateError – if no callable was registered under the name ‘name’
- abstract initial(line: str) dict[source]
An initial parser state.
A default implementation checks if any of defined triggers matches a line and sets an associated state as parser’s workhorse, if it does. This is an abstract method and should be overridden in subclass. Its default implementation can be used, however, by calling
super().initial(line)in subclass’s method.Notes
initial()method is always registered as parser’s state.- Parameters
line (str) – currently parsed line
- Returns
empty dictionary
- Return type
dict
- abstract parse(lines: Iterable) dict[source]
Parses consecutive elements of iterable and returns data found as dictionary.
Dictionary with extracted data is updated with workhorse’s return value, so all states should return dictionary or compatible sequence. This is an abstract method and should be overridden in subclass. Its default implementation can be used, however, by calling
data = super().parse(lines)in subclass’s method.Notes
After execution - either successful or interrupted by exception -
workhorseis set back toinitial()method.- Parameters
lines (Iterable) – iterable (i.e. file handle), that will be parsed, line by line
- Returns
dictionary with data extracted by parser
- Return type
dict
- Raises
InvalidStateError – if dictionary can’t be updated with state’s return value
- static state(state=None, trigger=None)[source]
Convenience decorator for registering a method as parser’s state. It can be with or without ‘trigger’ parameter, like this:
>>> @ParserBase.state ... def method(self, arg): pass
or
>>> @ParserBase.state(trigger='triggering regex') ... def method(self, arg): pass
This function marks a method state as parser’s state by defining
is_stateattribute on said method and setting its values toTrue. If trigger is given, it is stored in method’s attribute trigger. During instantiation ofParserBase’s subclass, methods marked as states are registered undermethod.__name__key in itsstates(and possiblytriggers) attribute. It is meaningless if used outside ofParserBase’s subclass definition.- Parameters
state (Callable) – callable, that is to be registered as parser’s state
trigger (str, optional) – string with regular expression, that will be compiled with re module
- Returns
callable object registered as state if ‘state’ was given or decorator if only ‘trigger’ was given
- Return type
Callable
- Raises
TypeError – if no arguments given
InvalidStateError – if state argument is not callable