saint- | Also, so does raku automatically add a new line at the end of a file when you slurp it? | 00:01 | |
stevied | The .* will suck up everything. | 00:21 | |
You need \N* | 00:22 | ||
More concise way: `token TOP { | 00:26 | ||
<line>+ %% \v | |||
}` | |||
saint- | Gotcha, yeah I got it to work with \N | 00:33 | |
Is \N just a perl regex thing? | 00:34 | ||
I'm guessing \V or \v is just as good just a bit safer? | 00:35 | ||
Looks like it, I guess I never really knew pcre, just the standard one-ish | 00:36 | ||
Is there a purpose to the ^ and $ anchors in TOP? | 00:39 | ||
00:44
zacts joined
|
|||
saint- | Is there a way to say not a token in the grammar? | 00:50 | |
For example is there any way to get this to work? I want to be able to have a character to be any character except for a double line-break | 01:00 | ||
www.toptal.com/developers/hastebin...vimofib.pl | |||
The idea is to like make a paragraph as a single token | |||
I'm not sure how to get that to work | |||
01:01
zacts left
|
|||
klebs | in this error: `===SORRY!===Object of type X in QAST::WVal, but not in SC` what does SC stand for? | 01:28 | |
I have worked around this issue by splitting the whole text on \n\n and processing each chunk separately | 01:34 | ||
saint- | klebs, yeah I figured I could do that, but was just curious as well how to do it in pure raku | 01:36 | |
Or rather pure raku grammar | |||
Think I figured it out if anyone is interested www.toptal.com/developers/hastebin...emaguh.xml | 01:48 | ||
klebs | i have run into problems doing this sort of thing when you want to complexify what you parse as a "block" | 01:59 | |
saint- | I've made it a bit more complex and so far so good | ||
klebs | š | 02:00 | |
saint- | Will let you know if it starts to break haha | 02:02 | |
klebs | what made it break for me was when i started introducing grammar `rules `which dont technically care about whitespace. i know you can set the ws token to \h*, but sometimes that isn't what the problem needs either. | 02:07 | |
there might have been some other details as well to consider... i forget | |||
i was basically attempting to mix preprocessing logic and grammar rules | |||
saint- | Interesting | ||
klebs | while that works to a certain extent, sometimes the bugs when they show up are mega subtle | ||
like, in which cases does something in a block parse as a rule and which cases does it trip your section snipping logic | 02:08 | ||
as things get more complex, sometimes your snipping logic might trip in odd circumstances, then you try to change your snipping logic, and something you had working before might break | 02:09 | ||
doesnt happen on small complexity grammars as much, but if you start trying to add more detail to your block parser, you might hit situations like that | 02:10 | ||
hopefully not -- most of the time it happened to me was when i was trying to parse python doc comments, and extract the relevant items from them (ie when the author describes parameters, function outputs, etc) | 02:11 | ||
what worked for me was to just preprocess everything up front and break it into smaller chunks, which could each be parsed against one of several smaller less complex grammars | |||
02:13
razetime joined
|
|||
saint- | Based on this code www.toptal.com/developers/hastebin...igojiq.xml do you know how I would print out all refBlock matches only? | 02:28 | |
klebs | you can write an actions class for this purpose | 02:43 | |
do you want the match or the text that matches? | 02:46 | ||
something like: ```raku | 02:48 | ||
grammar Lit { | |||
token TOP { <block>+ } | |||
proto token block { * } | |||
token block:sym<section> { <sectionBreak> } | |||
token block:sym<ref> { <refBlock> | |||
token block:sym<plain> { <plainBlock> } | |||
token sectionBreak { \n\n } | |||
token notSectionBreak { [<!sectionBreak> .] } | |||
token char { <notSectionBreak> | <ref> } | |||
token ref { \(\d+\) } | |||
token refBlock { <ref> <char>+ } | |||
token plainBlock { <char>+ } | |||
} | |||
saint- | klebs either the match or the text that matches, the text ideally | 02:57 | |
Ah hmm | 02:58 | ||
Thanks I'll save that and look at it more | 02:59 | ||
klebs | np | 03:24 | |
03:26
Guest35 left
04:35
saint- left
04:41
saint- joined
05:15
zacts joined
05:26
razetime_ joined,
razetime_ left
05:52
zacts left
07:17
frost joined,
frost left
07:32
frost joined
10:10
zacts joined
12:10
zacts left
12:44
frost left
|
|||
deadmarshal | I found something weird :|. I used a range inside lines method (i know i shouldn't), but it gave back some lines which didn't correspond to the numbers in the range. and it didn't throw any errors or something. is this normal? | 12:56 | |
like this for example: 'input.txt'.IO.lines(4..12); | 12:57 | ||
MasterDuke | lines takes a limit, so it probably just numified the range (which would be the number of elements) and used that | 13:09 | |
m: say +(4..12) | |||
camelia | 9 | ||
MasterDuke | you could argue that maybe it should use a range as indices (which would likely involve adding a new multi candidate for lines), feel free to open an issue in the rakudo repo about it | 13:11 | |
deadmarshal | ok thank you | 13:37 | |
13:50
Guest35 joined
14:10
razetime left
|
|||
gfldex | .lines retuns a Positional, so a subscript should work | 14:36 | |
lizmat | yeah, 'input.txt'.IO.lines[4..12] | 16:30 | |
deadmarshal | awesome ;) | 16:40 | |
22:39
zacts joined
23:12
zacts left
|