#raku-beginner on 28 January 2022 - Raku Programming Language Log

01:59 frost joined 02:53 frost left 02:57 frost joined 03:02 m_athias left, m_athias joined
stevied	in a regex, what is the equivalent of not matching a new line from perl: `[^\n]+`	03:04	Copy link Message link Add to gist Remove
03:11 codechurch joined 03:35 frost left 03:36 frost joined 04:02 codechurch left 05:23 frost left 06:06 TempIRCLogger__ left, qorg11 left, Manifest0 left, SmokeMachine left, CIAvash left, thowe left, sivoais left, mjgardner left 06:07 gfldex left, anight[m] left, m_athias left, Util left, destroycomputers left, samebchase left, lizmat left, tbrowder left, codesections left, discord-raku-bot left, MasterDuke left, camelia left 06:13 frost joined, m_athias joined, discord-raku-bot joined, lizmat joined, TempIRCLogger__ joined, Util joined, qorg11 joined, MasterDuke joined, Manifest0 joined, destroycomputers joined, gfldex joined, anight[m] joined, CIAvash joined, SmokeMachine joined, tbrowder joined, samebchase joined, mjgardner joined, thowe joined, sivoais joined, codesections joined, camelia joined
	I'm totally lost with grammars	07:21	Copy link Message link Add to gist Remove
	this works:		Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	grammar G {		Copy link Message link Add to gist Remove
	token TOP {<blah> k}		Copy link Message link Add to gist Remove
	token blah { \w\w\w }		Copy link Message link Add to gist Remove
	}		Copy link Message link Add to gist Remove
	my $match = G.parse('duck');		Copy link Message link Add to gist Remove
	say $match;		Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	this doesn't match:		Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	grammar G {		Copy link Message link Add to gist Remove
	token TOP {<blah> k}		Copy link Message link Add to gist Remove
	token blah { \w+ }		Copy link Message link Add to gist Remove
	}		Copy link Message link Add to gist Remove
Nahita	`\N+` I believe	07:27	Copy link Message link Add to gist Remove
stevied	this has got to be a bug. this matches:	07:48	Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	grammar G {		Copy link Message link Add to gist Remove
	token TOP { 'd' <blah> '/' }		Copy link Message link Add to gist Remove
	token blah { \w+ }		Copy link Message link Add to gist Remove
	}		Copy link Message link Add to gist Remove
	my $match = G.parse('duc/');		Copy link Message link Add to gist Remove
	say $match;		Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	this doesn't:		Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	grammar G {		Copy link Message link Add to gist Remove
	token TOP { 'd' <blah> 'k' }		Copy link Message link Add to gist Remove
	token blah { \w+ }		Copy link Message link Add to gist Remove
lizmat	the \w+ is probably too greedy	07:51	Copy link Message link Add to gist Remove
stevied	ok, sorry, the `/` is not a `\w` character	07:52	Copy link Message link Add to gist Remove
	so that makes sense		Copy link Message link Add to gist Remove
	i tried making it non-greedy		Copy link Message link Add to gist Remove
	didn't work: `\w+?`		Copy link Message link Add to gist Remove
lizmat	please, I'm a pretty Raku grammar noob myself :-)	07:53	Copy link Message link Add to gist Remove
stevied	now I don't feel so bad.	07:54	Copy link Message link Add to gist Remove
lizmat	m: my token blah { \w+? }; say "foo" ~~ / f <blah> o /		Copy link Message link Add to gist Remove Run code
camelia	｢foo｣ blah => ｢o｣		Copy link Message link Add to gist Remove
stevied	maybe you can't do non-greedy in grammars?		Copy link Message link Add to gist Remove
lizmat	looks to me you can ?		Copy link Message link Add to gist Remove
	there's also: raku.land/github:jnthn/Grammar::Debugger	07:55	Copy link Message link Add to gist Remove
stevied	but that's not a grammar, right?	07:56	Copy link Message link Add to gist Remove
	m: grammar G { token TOP { 'd' <blah> 'k' } token blah { \w+? } } my $match = G.parse('duck'); say $match;	07:57	Copy link Message link Add to gist Remove
lizmat	no, but a grammar is just a module of regexen really, with regexen being methods		Copy link Message link Add to gist Remove
stevied	m: grammar G { token TOP { 'd' <blah> 'k' } token blah { \w+? } }; my $match = G.parse('duck'); say $match;		Copy link Message link Add to gist Remove
lizmat	m: grammar G { token TOP { 'd' <blah> 'k' } token blah { \w+? } }; my $match = G.parse('duck'); say $match;		Copy link Message link Add to gist Remove
camelia	===SORRY!=== Error while compiling <tmp> Strange text after block (missing semicolon or comma?) at <tmp>:1 ------> grammar G { token TOP { 'd' <blah> 'k' }⏏ token blah { \w+? } }; my $match = G.pa expecting any of: …		Copy link Message link Add to gist Remove
lizmat	m: grammar G { token TOP { 'd' <blah> 'k' }; token blah { \w+? } }; my $match = G.parse('duck'); say $match;		Copy link Message link Add to gist Remove
camelia	(Any)		Copy link Message link Add to gist Remove
lizmat	m: my token blah { \w+? }; say "duck" ~~ / d <blah> k /	07:58	Copy link Message link Add to gist Remove Run code
camelia	Nil		Copy link Message link Add to gist Remove
stevied	m: grammar G { token TOP { 'd' <blah> 'k' }; token blah { \w+? } }; my $match = G.parse('duck'); say $match;		Copy link Message link Add to gist Remove
07:58 frost left
lizmat	hmmm	07:59	Copy link Message link Add to gist Remove
stevied	i gotta get to bed. wanted to go out on a good note but getting nowhere on this	08:03	Copy link Message link Add to gist Remove
lizmat	sorry, hope we'll be a able to provide more clarity in the morn	08:04	Copy link Message link Add to gist Remove
stevied	using that debugger		Copy link Message link Add to gist Remove
	with non-greedy, it's just matching the "u" and nothing else		Copy link Message link Add to gist Remove
lizmat	m: my regex blah { \w+ }; say "duck" ~~ / d <blah> k /	08:06	Copy link Message link Add to gist Remove Run code
camelia	｢duck｣ blah => ｢uc｣		Copy link Message link Add to gist Remove
lizmat	it needs to be able to backtrack, that's why it needs to be a regex		Copy link Message link Add to gist Remove
	breakfast&		Copy link Message link Add to gist Remove
stevied	oh the TOP needs to be a regex it looks like	08:08	Copy link Message link Add to gist Remove
	I had only tried making the second block a regex		Copy link Message link Add to gist Remove
	actually, they both need to be regexes, not tokens	08:09	Copy link Message link Add to gist Remove
	alright, I'll have to sleep on this. I know what backtracking is but don't quite understand how it works across two different regexes like this. weird shit.	08:10	Copy link Message link Add to gist Remove
13:55 discord-raku-bot left, discord-raku-bot joined 13:59 discord-raku-bot left 14:00 discord-raku-bot joined 15:36 frost joined 16:03 frost left
	ok, got this working:	17:41	Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	grammar G {		Copy link Message link Add to gist Remove
	token TOP { .? ( '<' 'a' <-[ > ]>+ '>' <hypertext> '<' '/' 'a' '>' .? )+ .* }		Copy link Message link Add to gist Remove
	token hypertext { <-[ < ]>+ }		Copy link Message link Add to gist Remove
	}		Copy link Message link Add to gist Remove
	my @matches = G.parse('<a href="kjsdf">blah 1</a><a href="/">blah 2</a>');		Copy link Message link Add to gist Remove
	say @matches;		Copy link Message link Add to gist Remove
	```		Copy link Message link Add to gist Remove
	it works, but I'm gonna say that grammars are not the ideal tool for parsing html, just like with regexes		Copy link Message link Add to gist Remove
	is that the common wisdom?		Copy link Message link Add to gist Remove
lizmat	I think it is :-)	18:02	Copy link Message link Add to gist Remove
	especially since HTML can be improperly formed and still sorta render ok in a browser	18:03	Copy link Message link Add to gist Remove
stevied	I'm in the middle of posting to reddit about this right now. Let's see what happens.		Copy link Message link Add to gist Remove
lizmat	stevied++		Copy link Message link Add to gist Remove
stevied	right. though in my particular situation, I'm parsing an html document created from markdown using a tool. so the html should be well-formed	18:04	Copy link Message link Add to gist Remove
	www.reddit.com/r/rakulang/comments..._to_parse/	18:26	Copy link Message link Add to gist Remove
	don't know who that dude in the picture is 🙂		Copy link Message link Add to gist Remove
m_athias	@stevied#8273 why do you need the .*? in there? it should work just fine without them.	18:37	Copy link Message link Add to gist Remove
	if you want to allow whitespace at the beginning <.ws> works. that way lies madness: writing decent rules to figure out what whitespace is relevant is a pain.	18:48	Copy link Message link Add to gist Remove
18:55 thowe left, thowe joined
stevied	@m_athias, I don't know why it's in there. I created it with lots of trial and error. I can play with it some more.	18:57	Copy link Message link Add to gist Remove
	are you talking about the one at the beginning or near the end?	18:58	Copy link Message link Add to gist Remove
	ok, yup. remove that worked	18:59	Copy link Message link Add to gist Remove
	whoa, removing the second one worked, too	19:01	Copy link Message link Add to gist Remove
	heh, i clearly don't know what I'm doing		Copy link Message link Add to gist Remove
	actually, i take that back. remove those breaks things. I had change the string getting parsed to remove the text before and after the first and last anchor tags	19:03	Copy link Message link Add to gist Remove
	actually, i take that back. removing those .*? breaks things. I had change the string getting parsed to remove the text before and after the first and last anchor tags		Copy link Message link Add to gist Remove
	ah, dammit. I pated in the wrong code to reddit		Copy link Message link Add to gist Remove
	good catch		Copy link Message link Add to gist Remove
	ok, fixed it	19:04	Copy link Message link Add to gist Remove
	ah, dammit. I pasted in the wrong code to reddit	19:05	Copy link Message link Add to gist Remove
	alright, so what's the best way to parse html, then?	19:59	Copy link Message link Add to gist Remove
	I think i'll pose this question to stackoverflow. too many requirements to outline here		Copy link Message link Add to gist Remove
	stackoverflow.com/questions/708996...embed-code	20:08	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!