Expose get_capture and get_captures API to Python#289
Expose get_capture and get_captures API to Python#289kmaehashi wants to merge 2 commits intoguidance-ai:mainfrom
get_capture and get_captures API to Python#289Conversation
|
Hey @kmaehashi thanks for the contribution! Exposing the captures through python is reasonable, esp. from a debugging standpoint. Would you mind adding a test or two to I also notice that the python api is a little inconsistent with the rust api, namely |
|
Hi @hudson-ai, thanks for the review! I've just added the tests. I was actually waiting to make sure we were on the same page regarding the API design before writing them, so I'm glad we agree on |
|
Looking good! One more request (would have mentioned sooner, but I had to re-acquaint myself to this code...) -- would you document (and encode in the test) that I.e. grm = r"""start: "hello " group1 group2+
group1[capture,lazy]: /[a-z]+/
group2[capture="body"]: /[a-z]{4}/"""
m = matcher(grm)
m.consume_tokens(tokenizer().tokenize_str("hello worldabcd"))
assert m.get_capture("group1") == b"w"
assert m.get_capture("body") == b"abcd"
assert m.get_captures() == [("group1", b"w"), ("body", b"orld"), ("body", b"abcd")]This answers the question that I found myself asking... "why doesn't get_captures return a dict?" |
|
Makes sense! Updated accordingly, let me know what you think. |
|
Looks good to me, but please remove the nfs temp file you committed by accident: Beyond this, any objections @riedgar-ms ? |
e84c7a5 to
279d366
Compare
When debugging grammars, it is handy if users can see which part of string are captured by which rule. This PR adds
get_captureandget_capturesto the Matcher and expose them to Python so that matched strings for[capture]rules can be inspected from Python.