#raku-beginner on 14 August 2024 - Raku Programming Language Log

This channel is intended for people just starting with the Raku Programming Language (raku.org). Logs are available at irclogs.raku.org/raku-beginner/live.html Set by lizmat on 8 June 2022.
02:10 kjp left 02:11 kjp joined 02:12 kjp left, kjp joined 07:51 Chanakan left 07:54 Chanakan joined 08:11 cleo left 08:14 Tirifto left 08:15 Tirifto joined 08:22 lizmat left, gfldex left, snonux left 08:34 lizmat joined, gfldex joined, snonux joined
a12l	I'm looking at the code block in docs.raku.org/language/regexes#Mod...ier:_%,_%%	08:53	Copy link Message link Add to gist Remove
	raku say so 'abc,def' ~~ / ^ [\w+] 1 % ',' $ /; # OUTPUT: «False␤» say so 'abc,def' ~~ / ^ [\w+] 2 % ',' $ /; # OUTPUT: «True␤»	08:54	Copy link Message link Add to gist Remove
	What does the square brackets do here? Because they don't define a chap class here, right?	08:56	Copy link Message link Add to gist Remove
lizmat	square brackets in Raku regexes are for grouping	09:15	Copy link Message link Add to gist Remove
	docs.raku.org/language/regexes#Non...g_grouping	09:16	Copy link Message link Add to gist Remove
	a12! you probably want <[ ]> docs.raku.org/language/regexes#Enu...and_ranges	09:17	Copy link Message link Add to gist Remove
a12l	lizmat: Thanks! Now I understand.	12:52	Copy link Message link Add to gist Remove
	Yeah, I'm looking at char classes, but was confused when I read the docs.		Copy link Message link Add to gist Remove
14:14 MasterDuke left
	What's the difference between <![ pattern ]> and <-[ pattern ]> in regex?	14:19	Copy link Message link Add to gist Remove
lizmat	-[ ] is part of the builder of the charclass	14:26	Copy link Message link Add to gist Remove
	m: say "ooo" ~~ / <[\w]>+ /		Copy link Message link Add to gist Remove Run code
camelia	｢ooo｣		Copy link Message link Add to gist Remove
lizmat	m: say "ooo" ~~ / <[\w]-[o]>+ / # aka, all word chars except o	14:27	Copy link Message link Add to gist Remove Run code
camelia	Nil		Copy link Message link Add to gist Remove
lizmat	m: say "foo" ~~ / <[\w]-[o]>+ / # aka, all word chars except o		Copy link Message link Add to gist Remove Run code
camelia	｢f｣		Copy link Message link Add to gist Remove
a12l	lizmat: And it differ to <![ ... ]> how?	14:36	Copy link Message link Add to gist Remove
	Note that I don't know what char classes actually are, and how they differ from other part of regexes		Copy link Message link Add to gist Remove
librasteve	@a12l is that because you didn't read the doc yet or because you read it and it didn't make sense?	14:38	Copy link Message link Add to gist Remove
lizmat	<![ ] is a negated lookaround assertion: docs.raku.org/language/regexes#Loo...assertions		Copy link Message link Add to gist Remove
a12l	I read about half the page about regexes (so not all). And I feel that quite a lot of background knowledge is assumed. And several sections is dependent on stuff that comes later in the page, and they are themselves dependent on other sections.	14:42	Copy link Message link Add to gist Remove
librasteve	yeah - the doc can be a bit of a slog to read ... since it covers all the things and there are many	14:43	Copy link Message link Add to gist Remove
a12l	@librasteve Do you happen to know a good intro to regexps in Raku for a person not used to regexes tools?	14:45	Copy link Message link Add to gist Remove
	Have taken a finite automata course at Uni, but I still feel that there's stuff that I haven't seen before	14:46	Copy link Message link Add to gist Remove
librasteve	to cut a long story short, a char class is a set of characters that you can match against ... there are many predefined ones for example \w matches all "word" chars and you can also make your own with the <[ ... ]> syntax, so in lizmats example, <[\w]-[o]> you get all the word chars from \w less the o since that is substracted form the set via the minus -		Copy link Message link Add to gist Remove
	you may like this raku.guide/#_regular_expressions	14:48	Copy link Message link Add to gist Remove
a12l	Example: I don't know what "lookahead" means, and the description says "Lookaround assertions, which need a character class in its simpler form, work both ways. They match, but they don't consume a character." The docs never describe what "consume" means.	14:50	Copy link Message link Add to gist Remove
librasteve	this sounds like the raku guide (which an intro level) may be too simple for your needs ... tbh I am not sure ... for sure, the regexes reference is indeed quite dense docs.raku.org/language/regexes and especially the parts on Zero-width assertions.	14:56	Copy link Message link Add to gist Remove
	I guess the basic idea is that a regex proceeds character by character, usually moving to the next character of the input string on each step. Internally the regex tracks the position count in the input string. This is called "consuming" the character. In the typical case, the regex never goes backward.	14:59	Copy link Message link Add to gist Remove
	However, there are some examples when the regex backtracks - I asked my favourite LLM to describe this chatgpt.com/share/5a5fe9a4-f01d-4e...b9f48c2e2e	15:01	Copy link Message link Add to gist Remove
	ok so far?	15:06	Copy link Message link Add to gist Remove
	then to complete the story, from docs.raku.org/language/regexes#Sum...f_anchors, we have Anchors are zero-width regex elements. Hence they do not use up a character of the input string, that is, they do not advance the current position at which the regex engine tries to match. A good mental model is that they match between two characters of an input string, or before the first, or after the last	15:07	Copy link Message link Add to gist Remove
	character of an input string.		Copy link Message link Add to gist Remove
a12l	That helps, thanks!	15:08	Copy link Message link Add to gist Remove
librasteve	and from just below, we have Zero-Width assertions can help you implement your own anchor: it turns another regex into an anchor, making them consume no characters of the input string. There are two variants: lookahead and lookbehind assertions.		Copy link Message link Add to gist Remove
a12l	lizmat: Thalns for the explainations		Copy link Message link Add to gist Remove
librasteve	to further confuse matters there is the notion of a Lookaround Assertions with "work both ways" ... tbh honest I do not know what that means	15:09	Copy link Message link Add to gist Remove
a12l	Feels like 50 percent of Raku is the regex sub language 😛	15:10	Copy link Message link Add to gist Remove
librasteve	:cameliathink:		Copy link Message link Add to gist Remove
	oh, here's a practical example of a Lookaround Assertion ... first in Lookahead mode:	15:15	Copy link Message link Add to gist Remove
	m: say '333' ~~ m/<?[7]> \d+/;	15:17	Copy link Message link Add to gist Remove
Raku eval	False		Copy link Message link Add to gist Remove
librasteve	m: say '733' ~~ m/<?[7]> \d+/;	15:18	Copy link Message link Add to gist Remove
Raku eval	｢733｣		Copy link Message link Add to gist Remove
librasteve	^^ so this is saying "check that the first digit is a 7, but leave it (ie do not consume it) for the \d+ to capture	15:19	Copy link Message link Add to gist Remove
	And now in Lookbehind mode:	15:21	Copy link Message link Add to gist Remove
	m: say '333$' ~~ m/ \d+ <?[$]>/;		Copy link Message link Add to gist Remove
Raku eval	｢333｣		Copy link Message link Add to gist Remove
librasteve	m: say '333' ~~ m/ \d+ <?[$]>/;		Copy link Message link Add to gist Remove
Raku eval	False		Copy link Message link Add to gist Remove
librasteve	^^ and this is saying "check that the digits are followed by a $"	15:22	Copy link Message link Add to gist Remove
	so the Lookaround idea is that the same syntax can be used to gate what is captured in "both ways"	15:23	Copy link Message link Add to gist Remove
	anyway all this assertion stuff is in the deep end of regex - in practice I very rarely need to use it and fwiw, regardless of language, regex is a great tool for any kind of string checking	15:24	Copy link Message link Add to gist Remove
a12l	So a char class always matches against a single character?	15:34	Copy link Message link Add to gist Remove
lizmat	yes unless you specify a multiplier, such a + or *	15:35	Copy link Message link Add to gist Remove
a12l	And the rest of the regex handle how to match against the rest of the string?		Copy link Message link Add to gist Remove
librasteve	The char class is the set of chars that are compared - the quantifier controls how many times the match is done	15:36	Copy link Message link Add to gist Remove
	(as lizmat says)		Copy link Message link Add to gist Remove
lizmat	s/multiplier/quentifier :-)		Copy link Message link Add to gist Remove
	*quantifier		Copy link Message link Add to gist Remove
a12l	But can the quantifier be part of the char class, or does it operate on char classes?	15:38	Copy link Message link Add to gist Remove
	Because you can have quantifiers that operate on char classes if I remember correct?		Copy link Message link Add to gist Remove
lizmat	quantifiers can NOT be part of a char class	15:39	Copy link Message link Add to gist Remove
a12l	And how does nested regex expressions work?		Copy link Message link Add to gist Remove
lizmat	they would be just a + or * as a character		Copy link Message link Add to gist Remove
	by nesting ?		Copy link Message link Add to gist Remove
a12l	I'm trying to negate the lower char class to match against a uppercase letter	15:48	Copy link Message link Add to gist Remove
	"Abc" ~~ rx { <![<lower>]> }		Copy link Message link Add to gist Remove
	but I get an empty result		Copy link Message link Add to gist Remove
	m: "Abc" ~~ rx { <![<lower>]> }		Copy link Message link Add to gist Remove
Raku eval			Copy link Message link Add to gist Remove
a12l	I'm guessing that I doing something very basic error, with how I try to negate <lower>, but I don't know what	15:50	Copy link Message link Add to gist Remove
librasteve	reads the docs	15:52	Copy link Message link Add to gist Remove
	m: "Abc" ~~ rx { <-[lower]> }	15:53	Copy link Message link Add to gist Remove
Raku eval			Copy link Message link Add to gist Remove
librasteve	m: say "Abc" ~~ rx { <-[lower]> }		Copy link Message link Add to gist Remove
Raku eval	｢A｣		Copy link Message link Add to gist Remove
librasteve	oops for got to say the result		Copy link Message link Add to gist Remove
	while that works, I think its better style to include the \ like this:	15:56	Copy link Message link Add to gist Remove
	m: say "Abc" ~~ rx { <-[\lower]> }		Copy link Message link Add to gist Remove
Raku eval	｢A｣		Copy link Message link Add to gist Remove
librasteve	remember the minus - (and plus +) are the way to do set math in character classes	16:00	Copy link Message link Add to gist Remove
a12l	I think I misunderstood lizmat before. I interpreted them that I should use <-[]> when defining a char class, and not using it.	16:05	Copy link Message link Add to gist Remove
	What does the backslash add to the expression?	16:07	Copy link Message link Add to gist Remove
	And what does the angle brackets do?		Copy link Message link Add to gist Remove
librasteve	in the regex slang, which also applies to raku Grammars, angle brackets tell you that you have an inner regex	16:10	Copy link Message link Add to gist Remove
	m: my $inner = rx/\d/; say "123Abc" ~~ /<$inner>+/;	16:12	Copy link Message link Add to gist Remove
Raku eval	｢123｣		Copy link Message link Add to gist Remove
librasteve	m: my regex inner {\d}; say "123Abc" ~~ /<inner>+/;	16:14	Copy link Message link Add to gist Remove
Raku eval	｢123｣ inner => ｢1｣ inner => ｢2｣ inner => ｢3｣		Copy link Message link Add to gist Remove
librasteve	^^ in the first example, I can use angle brackets to interpolate a variable inside another regex <$inner>, in the second example I use the regex keyword to make a subrule <inner> no need for the $ this time	16:16	Copy link Message link Add to gist Remove
	see here for interpolation docs.raku.org/language/regexes#Reg...erpolation and here for subrules docs.raku.org/syntax/regex	16:17	Copy link Message link Add to gist Remove
	(welll you did ask how nesting works)		Copy link Message link Add to gist Remove
	so, angle brackets have great power and these are the building blocks of Grammars (a topic for another day)	16:18	Copy link Message link Add to gist Remove
	a predefined char class like lower is a reserved use of angle brackets (see the list here docs.raku.org/language/regexes#Pre...r_classes)	16:19	Copy link Message link Add to gist Remove
	m: say "123Abc" ~~ /<lower>+/	16:20	Copy link Message link Add to gist Remove
Raku eval	｢bc｣ lower => ｢b｣ lower => ｢c｣		Copy link Message link Add to gist Remove
librasteve	hopefully that is making sense a bit		Copy link Message link Add to gist Remove
	other reserved variants of the angle brackets are with ! and ? as we have seen for zero-width assertions and with inner square brackets [] for char classes often with -+ to combine them	16:22	Copy link Message link Add to gist Remove
	m: say "123Abc" ~~ /<[\w]-[b]>+/	16:25	Copy link Message link Add to gist Remove
Raku eval	｢123A｣		Copy link Message link Add to gist Remove
librasteve	so here I remove b from the total set \w (ie all word chars) and so the match stops	16:26	Copy link Message link Add to gist Remove
	m: say "123Abc" ~~ /<[\w]>+/		Copy link Message link Add to gist Remove
Raku eval	｢123Abc｣		Copy link Message link Add to gist Remove
librasteve	^^ if I leave b in then it carries on		Copy link Message link Add to gist Remove
	note: in my investigations today it seems that the backslashed predefined char classes like \w work well with char class set math, I have noticed some issues using the others like <lower> - the docs say You can also write the backslashed forms for character classes between the [ ] so that means that they must be from this list only docs.raku.org/language/regexes#Bac...er_classes	16:31	Copy link Message link Add to gist Remove
	otherwise the results are unpredictable		Copy link Message link Add to gist Remove
	and to your question, the backslash inside the char class brackets <[ ... ]> is pretty much the only escape character that then picks out these backslashed predefined char classes and must also be used double if you want to include a literal backslash \\	16:33	Copy link Message link Add to gist Remove
	thanks for you patience!	16:34	Copy link Message link Add to gist Remove
a12l	!Thanks for the examples @librasteve!	20:27	Copy link Message link Add to gist Remove
antononcube	I would have sent you to talk to LLMs long time ago...	21:20	Copy link Message link Add to gist Remove
21:21 MasterDuke joined
a12l	I use Perplexity (unsure what model they''re using), but I get so much contradictory info in the replies that I don't fully trust it.	21:51	Copy link Message link Add to gist Remove
	On Raku		Copy link Message link Add to gist Remove
antononcube	@a12l Great on using Perplexity. As for better results with Raku, I would (shamlessly) advice to use my Raku packages for interacting with LLMs: 1) Install "Jupyter::Kernel" (or other decent Raku REPL) 2) Use the LLM prompt "CodeWriterX" with Raku.	21:59	Copy link Message link Add to gist Remove
	"Jupyter::Chatbook" automatically loads the Raku LLM packages "LLM::Functions" and "LLM::Prompts".	22:01	Copy link Message link Add to gist Remove
	The prompt "CodeWriterX" can be used both in chat cell and programmatically.		Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!