|
01:59
frost joined
02:53
frost left
02:57
frost joined
03:02
m_athias left,
m_athias joined
|
|||
| stevied | in a regex, what is the equivalent of not matching a new line from perl: `[^\n]+` | 03:04 | |
|
03:11
codechurch joined
03:35
frost left
03:36
frost joined
04:02
codechurch left
05:23
frost left
06:06
TempIRCLogger__ left,
qorg11 left,
Manifest0 left,
SmokeMachine left,
CIAvash left,
thowe left,
sivoais left,
mjgardner left
06:07
gfldex left,
anight[m] left,
m_athias left,
Util left,
destroycomputers left,
samebchase left,
lizmat left,
tbrowder left,
codesections left,
discord-raku-bot left,
MasterDuke left,
camelia left
06:13
frost joined,
m_athias joined,
discord-raku-bot joined,
lizmat joined,
TempIRCLogger__ joined,
Util joined,
qorg11 joined,
MasterDuke joined,
Manifest0 joined,
destroycomputers joined,
gfldex joined,
anight[m] joined,
CIAvash joined,
SmokeMachine joined,
tbrowder joined,
samebchase joined,
mjgardner joined,
thowe joined,
sivoais joined,
codesections joined,
camelia joined
|
|||
| I'm totally lost with grammars | 07:21 | ||
| this works: | |||
| ``` | |||
| grammar G { | |||
| token TOP {<blah> k} | |||
| token blah { \w\w\w } | |||
| } | |||
| my $match = G.parse('duck'); | |||
| say $match; | |||
| ``` | |||
| this doesn't match: | |||
| ``` | |||
| grammar G { | |||
| token TOP {<blah> k} | |||
| token blah { \w+ } | |||
| } | |||
| Nahita | `\N+` I believe | 07:27 | |
| stevied | this has got to be a bug. this matches: | 07:48 | |
| ``` | |||
| grammar G { | |||
| token TOP { 'd' <blah> '/' } | |||
| token blah { \w+ } | |||
| } | |||
| my $match = G.parse('duc/'); | |||
| say $match; | |||
| ``` | |||
| this doesn't: | |||
| ``` | |||
| grammar G { | |||
| token TOP { 'd' <blah> 'k' } | |||
| token blah { \w+ } | |||
| lizmat | the \w+ is probably too greedy | 07:51 | |
| stevied | ok, sorry, the `/` is not a `\w` character | 07:52 | |
| so that makes sense | |||
| i tried making it non-greedy | |||
| didn't work: `\w+?` | |||
| lizmat | please, I'm a pretty Raku grammar noob myself :-) | 07:53 | |
| stevied | now I don't feel so bad. | 07:54 | |
| lizmat | m: my token blah { \w+? }; say "foo" ~~ / f <blah> o / | ||
| camelia | 「foo」 blah => 「o」 |
||
| stevied | maybe you can't do non-greedy in grammars? | ||
| lizmat | looks to me you can ? | ||
| there's also: raku.land/github:jnthn/Grammar::Debugger | 07:55 | ||
| stevied | but that's not a grammar, right? | 07:56 | |
| m: grammar G { token TOP { 'd' <blah> 'k' } token blah { \w+? } } my $match = G.parse('duck'); say $match; | 07:57 | ||
| lizmat | no, but a grammar is just a module of regexen really, with regexen being methods | ||
| stevied | m: grammar G { token TOP { 'd' <blah> 'k' } token blah { \w+? } }; my $match = G.parse('duck'); say $match; | ||
| lizmat | m: grammar G { token TOP { 'd' <blah> 'k' } token blah { \w+? } }; my $match = G.parse('duck'); say $match; | ||
| camelia | ===SORRY!=== Error while compiling <tmp> Strange text after block (missing semicolon or comma?) at <tmp>:1 ------> grammar G { token TOP { 'd' <blah> 'k' }⏏ token blah { \w+? } }; my $match = G.pa expecting any of: … |
||
| lizmat | m: grammar G { token TOP { 'd' <blah> 'k' }; token blah { \w+? } }; my $match = G.parse('duck'); say $match; | ||
| camelia | (Any) | ||
| lizmat | m: my token blah { \w+? }; say "duck" ~~ / d <blah> k / | 07:58 | |
| camelia | Nil | ||
| stevied | m: grammar G { token TOP { 'd' <blah> 'k' }; token blah { \w+? } }; my $match = G.parse('duck'); say $match; | ||
|
07:58
frost left
|
|||
| lizmat | hmmm | 07:59 | |
| stevied | i gotta get to bed. wanted to go out on a good note but getting nowhere on this | 08:03 | |
| lizmat | sorry, hope we'll be a able to provide more clarity in the morn | 08:04 | |
| stevied | using that debugger | ||
| with non-greedy, it's just matching the "u" and nothing else | |||
| lizmat | m: my regex blah { \w+ }; say "duck" ~~ / d <blah> k / | 08:06 | |
| camelia | 「duck」 blah => 「uc」 |
||
| lizmat | it needs to be able to backtrack, that's why it needs to be a regex | ||
| breakfast& | |||
| stevied | oh the TOP needs to be a regex it looks like | 08:08 | |
| I had only tried making the second block a regex | |||
| actually, they both need to be regexes, not tokens | 08:09 | ||
| alright, I'll have to sleep on this. I know what backtracking is but don't quite understand how it works across two different regexes like this. weird shit. | 08:10 | ||
|
13:55
discord-raku-bot left,
discord-raku-bot joined
13:59
discord-raku-bot left
14:00
discord-raku-bot joined
15:36
frost joined
16:03
frost left
|
|||
| ok, got this working: | 17:41 | ||
| ``` | |||
| grammar G { | |||
| token TOP { .*? ( '<' 'a' <-[ > ]>+ '>' <hypertext> '<' '/' 'a' '>' .*? )+ .* } | |||
| token hypertext { <-[ < ]>+ } | |||
| } | |||
| my @matches = G.parse('<a href="kjsdf">blah 1</a><a href="/">blah 2</a>'); | |||
| say @matches; | |||
| ``` | |||
| it works, but I'm gonna say that grammars are not the ideal tool for parsing html, just like with regexes | |||
| is that the common wisdom? | |||
| lizmat | I think it is :-) | 18:02 | |
| especially since HTML can be improperly formed and still sorta render ok in a browser | 18:03 | ||
| stevied | I'm in the middle of posting to reddit about this right now. Let's see what happens. | ||
| lizmat | stevied++ | ||
| stevied | right. though in my particular situation, I'm parsing an html document created from markdown using a tool. so the html should be well-formed | 18:04 | |
| www.reddit.com/r/rakulang/comments..._to_parse/ | 18:26 | ||
| don't know who that dude in the picture is 🙂 | |||
| m_athias | @stevied#8273 why do you need the .*? in there? it should work just fine without them. | 18:37 | |
| if you want to allow whitespace at the beginning <.ws> works. that way lies madness: writing decent rules to figure out what whitespace is relevant is a pain. | 18:48 | ||
|
18:55
thowe left,
thowe joined
|
|||
| stevied | @m_athias, I don't know why it's in there. I created it with lots of trial and error. I can play with it some more. | 18:57 | |
| are you talking about the one at the beginning or near the end? | 18:58 | ||
| ok, yup. remove that worked | 18:59 | ||
| whoa, removing the second one worked, too | 19:01 | ||
| heh, i clearly don't know what I'm doing | |||
| actually, i take that back. remove those breaks things. I had change the string getting parsed to remove the text before and after the first and last anchor tags | 19:03 | ||
| actually, i take that back. removing those .*? breaks things. I had change the string getting parsed to remove the text before and after the first and last anchor tags | |||
| ah, dammit. I pated in the wrong code to reddit | |||
| good catch | |||
| ok, fixed it | 19:04 | ||
| ah, dammit. I pasted in the wrong code to reddit | 19:05 | ||
| alright, so what's the best way to parse html, then? | 19:59 | ||
| I think i'll pose this question to stackoverflow. too many requirements to outline here | |||
| stackoverflow.com/questions/708996...embed-code | 20:08 | ||