Reference¶
compynator.core¶
The essentials of all parser combinators.
The six basic regex definitions are mapped according to:
Regex |
Compynator |
|---|---|
empty set |
|
epsilon |
|
character |
|
concatenation |
|
alternative |
|
Kleene star |
|
Monadic properties are Succeed for unit, Parser.then for bind, and
optionally Fail for zero.
-
compynator.core.Empty= Parser<Succeed@140076859025104>¶ An empty string. This always succeeds.
-
class
compynator.core.Failure(parser, message, remain='', cause_or_causes=None)¶ A collection of zero ``Result``s.
This class is used to propagate parse failures and incrementally construct a
Success. We usually start out with an instance of this class, then add moreResultobjects to it to produce aSuccess.>>> parser = 'a' + Terminal('b') + 'c' >>> ret = Failure(parser, 'Parser fails.', None) >>> s = ret.add_all(parser('abc')) >>> isinstance(s, Success) True
-
add(result)¶ Returns a
ResultSet(could beself) withresult.
-
add_all(results)¶ Returns a
ResultSet(could beself) with allresults.
-
-
class
compynator.core.ParseContext(options=None)¶ Internal book-keeping data structure.
-
class
compynator.core.Parser(parser_function)¶ Callable that takes a sequence of tokens & returns a
ResultSet.The specific types of inputs and outputs are not known. However, inputs usually are strings. The requirements for inputs are:
they must have
__len__they are indexable and slicable
A parser must return a collection of
Result``s. Their ``valueelements can be any type but theirremainelements must be a slice of the input tokens.This class can be used as a decorator:
@Parser def head(tokens): if len(tokens) >= 1: return Success(result=Result(tokens[0], tokens[1:])) return Failure(head, 'Unable to obtain more tokens', tokens)
In that example,
headis a parser that returns the first element of the sequence of tokens. Thenheadcan be chained (then,+) with other parsers, filtered (where,value), or composed together to be more useful.-
call(callback)¶ Simple wrapper around
filterto always callcallback.In ambiguous grammar (like the example below), there might be repeated results if
callmakes up a part of the variable. Please note the difference in two definitions of the same production rule.>>> count = 0 >>> def cb(r): ... global count ... count += 1 >>> empty = Succeed('') >>> s = ((Terminal('s') + (lambda _: s)) ^ empty).call(cb) >>> r = s('ss') >>> assert len(r) == 3 >>> count 6 >>> count = 0 >>> s = (Terminal('s') + (lambda _: s)) ^ empty >>> r = s.call(cb)('ss') >>> assert len(r) == 3 >>> count 3
-
filter(callback, take_if=True)¶ Executes
callbackon a successful parse and filters results.callbackmust take aResult. Every possible result of a rule will be passed tocallback.If truth value as returned by
callbackis the same astake_if, thatResultobject is included.NOTE: The ordering between
filterandmemoizeis important and may result incallbacknot being invoked.
-
memoize()¶ Memoizes parsed results of
self.The memoization allows for ambiguous grammar to be processed efficiently. See the paper Parser Combinators for Ambiguous Left Recursive Grammars.
This modifier is recommended when the unbiased
__xor__operator is used, or when left recursion is in the grammar:>>> empty = Succeed('') >>> s = ((Terminal('s') + (lambda _: s) + (lambda _: s)) ^ ... empty).memoize() >>> len(s('s' * 20)) 21
Without the
memoizemodifier in the above example, it would take a very long time to parse.
-
parse(tokens)¶ Parses the input
tokensunder the default context.
-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
repeat(lower=0, upper=None, reducer=<built-in function concat>, value='', take_all=False)¶ Repeatedly parses [lower, upper] occurrences.
If
upperisNone, there is no upper bound. Thereduceris used to join the results together similar to how it is used inthen. The zeroth parse result (parseris not invoked yet) is aSuccessofvalue. The first reduction is between zeroth and first results. Iftake_all, then all results are returned. If nottake_all, then only the greediest results are returned.>>> p = Terminal('a').repeat() >>> set(p('')) {Result(value='', remain='')} >>> set(p('b')) {Result(value='', remain='b')} >>> set(p('a')) {Result(value='a', remain='')} >>> set(p('aa')) {Result(value='aa', remain='')}
-
skip(binder)¶ Similar to
then, but the reducer takes the first value.
-
then(binder, reducer=<function Parser.<lambda>>)¶ Chains
selfand parser(s) returned bybinderviareducer.This is the
bindfunction in monadic sense.binderis a callable that takes in aResult.valueand returns aParserobject. This parser is then applied onResult.remain.bindercan also be aParserobject. In this case,binderis used directly as the second parser.If not,
binderwill be converted into aTerminal(str(binder)).reducertakes two arguments, the first isResult.valueof this parser, and the second is theResult.valueof the second parser. The result ofreducermakes up the final result of the composed parser.The default
reduceronly takes the secondResult.value.In code, this looks like:
ret = Fail(tokens) for value, remain self(tokens): next_parser = binder(value) for next_value, next_remain in next_parser(remain): final_value = reducer(value, next_value) ret = ret.add(Result(final_value, next_remain))
-
value(converter_or_value)¶ Converts
Result.valueinto a different value.converter_or_valuecan be a callable, or an object. If it is a callable, it takesResult.valueand returns a converted value. If it is a value, that value is used.For example:
>>> digit = One.where(lambda c: '0' <= c <= '9') >>> set(digit('8bc')) {Result(value='8', remain='bc')} >>> digit_as_int = digit.value(int) >>> set(digit_as_int('8bc')) {Result(value=8, remain='bc')}
-
where(predicate)¶ Selects results whose values pass
predicate.predicateis a callable that takesResult.valueand returnsTrueif thatResultshould be included. This is a convenient wrapper aroundfilter.For example:
>>> digit = One.where(lambda c: '0' <= c <= '9') >>> set(digit('abc')) set() >>> set(digit('8bc')) {Result(value='8', remain='bc')}
-
class
compynator.core.Result¶ Holds the parsed results.
Each result is a 2-tuple of value and remaining unparsed sequence of tokens.
NOTE: The input tokens are assumed to be immutable and
len(remain)is sufficient to tell if two ``Result.remain``s are equal.
-
class
compynator.core.ResultSet¶ A sized iterable collection of
Result.To incrementally construct a result set, first start with a
Failure, then add moreResultviaaddoradd_all.-
add(result)¶ Returns a
ResultSet(could beself) withresult.
-
add_all(results)¶ Returns a
ResultSet(could beself) with allresults.
-
-
class
compynator.core.Succeed(value)¶ Always returns a parsed result of
valueregardless of input.For example:
>>> s = Succeed(10) >>> set(s('abc')) {Result(value=10, remain='abc')} >>> set(s('def')) {Result(value=10, remain='def')}
-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
-
class
compynator.core.Success(*args, result=None, results=None)¶ A collection of
Resultin a successful parse.A
Successmust have at least oneResult. The constructor can take either keyword argumentresultorresults, but not both at the same time.-
add(result)¶ Returns a
ResultSet(could beself) withresult.
-
add_all(results)¶ Returns a
ResultSet(could beself) with allresults.
-
-
class
compynator.core.Terminal(terminal)¶ Matches
terminalto the beginning of input tokens.>>> t = Terminal('t') >>> set(t('')) set() >>> set(t('t')) {Result(value='t', remain='')}
-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
-
compynator.core.default_parse_context(tokens)¶ Returns
ParseContextfortokens.
compynator.niceties¶
-
compynator.niceties.Alnum= Parser<_Or@140076859338000>¶ Exactly one ASCII letter or digit.
-
compynator.niceties.Alpha= Parser<_Or@140076859337872>¶ Exactly one letter a-zA-Z
-
class
compynator.niceties.Collect(*parsers)¶ A combinator that runs through all
parsersin sequence and collects their results in a collection of many flattened collections.This is best described with examples:
>>> a, b, c = [Terminal(x) for x in 'abc'] >>> p = Collect(a, b, c) >>> set(p('adc')) set() >>> p('adc') Failure('Failed to collect.', 'adc', [Failure("Parser index 1: Expecting terminal 'b'.", 'dc', ())]) >>> set(p('abc')) {Result(value=('a', 'b', 'c'), remain='')} >>> a = a.repeat(0, 2, take_all=True) >>> p = Collect(a, a, a) >>> rs = p('a') >>> len(rs) 4 >>> Result(value=('', '', ''), remain='a') in rs True >>> Result(value=('', '', 'a'), remain='') in rs True >>> Result(value=('', 'a', ''), remain='') in rs True >>> Result(value=('a', '', ''), remain='') in rs True >>> len(p('aa')) # -/-/-, -/-/a, -/a/-, a/-/-, ... # -/-/aa, -/a/a, -/aa/-, a/-/a, a/a/-, aa/-/- 10
Note that the final
ResultSetcould grow exponentially.-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
-
compynator.niceties.Digit= Parser<_Filter@140076859081552>¶ Exactly one decimal digit.
-
class
compynator.niceties.Forward¶ A forward declaration of a rule.
This is useful in case the rule is defined recursively. For example, the BNF rule
exp ::= (exp '-' exp) | 'o'could be defined as followed:>>> exp = Forward() >>> exp.is_(((exp + '-' + exp) ^ 'o').memoize()) >>> set(exp('o')) {Result(value='o', remain='')} >>> sorted(exp('o-o')) [Result(value='o', remain='-o'), Result(value='o-o', remain='')]
A forward declaration of Parser is the same as referring to that parser in a lambda:
>>> exp = (Succeed(None).then(lambda _: exp + '-' + exp) ^ 'o').memoize() >>> set(exp('o')) {Result(value='o', remain='')} >>> sorted(exp('o-o')) [Result(value='o', remain='-o'), Result(value='o-o', remain='')]
A
RuntimeErrorwill be raised if aForwardhas not calledis_, or if that method is called more than once.>>> exp = Forward() >>> exp('abc') Traceback (most recent call last): File "<stdin>", line 1, in ? RuntimeError: A forward declaration has no definition. >>> exp.is_('abc') >>> exp.is_('abc') Traceback (most recent call last): File "<stdin>", line 1, in ? RuntimeError: Already defined.
-
is_(parser)¶ Defines a forward declaration.
If
parseris not typedParser, its string representation will be made into aTerminal.This method must be called exactly once for each
Forwardobject. ARuntimeErrorwill be raised if it is called more than once.
-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
-
compynator.niceties.HexDigit= Parser<_Or@140076859081808>¶ Exactly one hexadecimal digit.
-
class
compynator.niceties.ITerminal(terminal)¶ Case insensitive terminal.
-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
-
class
compynator.niceties.Lookahead(parser, take_if=True, value='')¶ Tries
parserbut does not consume input.If the truth value of the parse result is
take_if, aSuccessofvalueis returned. Otherwise, aFailureis returned.-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
-
compynator.niceties.Lower= Parser<_Filter@140076859336400>¶ Exactly one letter a-z.
-
compynator.niceties.OctDigit= Parser<_Filter@140076859335056>¶ Exactly one octadecimal digit.
-
class
compynator.niceties.Regex(regex)¶ Regex matcher.
-
parse_with_context(tokens, context)¶ Parses input
tokensunder the context ofcontext.
-
-
compynator.niceties.Upper= Parser<_Filter@140076859337616>¶ Exactly one letter A-Z