This channel is intended for people just starting with the Raku Programming Language (raku.org). Logs are available at irclogs.raku.org/raku-beginner/live.html
Set by lizmat on 8 June 2022.
02:10 kjp left 02:11 kjp joined 02:12 kjp left, kjp joined 07:51 Chanakan left 07:54 Chanakan joined 08:11 cleo left 08:14 Tirifto left 08:15 Tirifto joined 08:22 lizmat left, gfldex left, snonux left 08:34 lizmat joined, gfldex joined, snonux joined
a12l I'm looking at the code block in docs.raku.org/language/regexes#Mod...ier:_%,_%% 08:53
raku say so 'abc,def' ~~ / ^ [\w+] ** 1 % ',' $ /; # OUTPUT: «False␤» say so 'abc,def' ~~ / ^ [\w+] ** 2 % ',' $ /; # OUTPUT: «True␤» 08:54
What does the square brackets do here? Because they don't define a chap class here, right? 08:56
lizmat square brackets in Raku regexes are for grouping 09:15
docs.raku.org/language/regexes#Non...g_grouping 09:16
a12! you probably want <[ ]> docs.raku.org/language/regexes#Enu...and_ranges 09:17
a12l lizmat: Thanks! Now I understand. 12:52
Yeah, I'm looking at char classes, but was confused when I read the docs.
14:14 MasterDuke left
What's the difference between <![ pattern ]> and <-[ pattern ]> in regex? 14:19
lizmat -[ ] is part of the builder of the charclass 14:26
m: say "ooo" ~~ / <[\w]>+ /
camelia 「ooo」
lizmat m: say "ooo" ~~ / <[\w]-[o]>+ / # aka, all word chars except o 14:27
camelia Nil
lizmat m: say "foo" ~~ / <[\w]-[o]>+ / # aka, all word chars except o
camelia 「f」
a12l lizmat: And it differ to <![ ... ]> how? 14:36
Note that I don't know what char classes actually are, and how they differ from other part of regexes
librasteve @a12l is that because you didn't read the doc yet or because you read it and it didn't make sense? 14:38
lizmat <![ ] is a negated lookaround assertion: docs.raku.org/language/regexes#Loo...assertions
a12l I read about half the page about regexes (so not all). And I feel that quite a lot of background knowledge is assumed. And several sections is dependent on stuff that comes later in the page, and they are themselves dependent on other sections. 14:42
librasteve yeah - the doc can be a bit of a slog to read ... since it covers all the things and there are many 14:43
a12l @librasteve Do you happen to know a good intro to regexps in Raku for a person not used to regexes tools? 14:45
Have taken a finite automata course at Uni, but I still feel that there's stuff that I haven't seen before 14:46
librasteve to cut a long story short, a char class is a set of characters that you can match against ... there are many predefined ones for example \w matches all "word" chars and you can also make your own with the <[ ... ]> syntax, so in lizmats example, <[\w]-[o]> you get all the word chars from \w less the o since that is substracted form the set via the minus -
you may like this raku.guide/#_regular_expressions 14:48
a12l Example: I don't know what "lookahead" means, and the description says "Lookaround assertions, which need a character class in its simpler form, work both ways. They match, but they don't consume a character." The docs never describe what "consume" means. 14:50
librasteve this sounds like the raku guide (which an intro level) may be too simple for your needs ... tbh I am not sure ... for sure, the regexes reference is indeed quite dense docs.raku.org/language/regexes and especially the parts on Zero-width assertions. 14:56
I guess the basic idea is that a regex proceeds character by character, usually moving to the next character of the input string on each step. Internally the regex tracks the position count in the input string. This is called "consuming" the character. In the typical case, the regex never goes backward. 14:59
However, there are some examples when the regex backtracks - I asked my favourite LLM to describe this chatgpt.com/share/5a5fe9a4-f01d-4e...b9f48c2e2e 15:01
ok so far? 15:06
then to complete the story, from docs.raku.org/language/regexes#Sum...f_anchors, we have Anchors are zero-width regex elements. Hence they do not use up a character of the input string, that is, they do not advance the current position at which the regex engine tries to match. A good mental model is that they match between two characters of an input string, or before the first, or after the last 15:07
character of an input string.
a12l That helps, thanks! 15:08
librasteve and from just below, we have Zero-Width assertions can help you implement your own anchor: it turns another regex into an anchor, making them consume no characters of the input string. There are two variants: lookahead and lookbehind assertions.
a12l lizmat: Thalns for the explainations
librasteve to further confuse matters there is the notion of a Lookaround Assertions with "work both ways" ... tbh honest I do not know what that means 15:09
a12l Feels like 50 percent of Raku is the regex sub language 😛 15:10
librasteve :cameliathink:
oh, here's a practical example of a Lookaround Assertion ... first in Lookahead mode: 15:15
m: say '333' ~~ m/<?[7]> \d+/; 15:17
Raku eval False
librasteve m: say '733' ~~ m/<?[7]> \d+/; 15:18
Raku eval 「733」
librasteve ^^ so this is saying "check that the first digit is a 7, but leave it (ie do not consume it) for the \d+ to capture 15:19
And now in Lookbehind mode: 15:21
m: say '333$' ~~ m/ \d+ <?[$]>/;
Raku eval 「333」
librasteve m: say '333' ~~ m/ \d+ <?[$]>/;
Raku eval False
librasteve ^^ and this is saying "check that the digits are followed by a $" 15:22
so the Lookaround idea is that the same syntax can be used to gate what is captured in "both ways" 15:23
anyway all this assertion stuff is in the deep end of regex - in practice I very rarely need to use it and fwiw, regardless of language, regex is a great tool for any kind of string checking 15:24
a12l So a char class always matches against a single character? 15:34
lizmat yes unless you specify a multiplier, such a + or * 15:35
a12l And the rest of the regex handle how to match against the rest of the string?
librasteve The char class is the set of chars that are compared - the quantifier controls how many times the match is done 15:36
(as lizmat says)
lizmat s/multiplier/quentifier :-)
*quantifier
a12l But can the quantifier be part of the char class, or does it operate on char classes? 15:38
Because you can have quantifiers that operate on char classes if I remember correct?
lizmat quantifiers can NOT be part of a char class 15:39
a12l And how does nested regex expressions work?
lizmat they would be just a + or * as a character
by nesting ?
a12l I'm trying to negate the lower char class to match against a uppercase letter 15:48
"Abc" ~~ rx { <![<lower>]> }
but I get an empty result
m: "Abc" ~~ rx { <![<lower>]> }
Raku eval
a12l I'm guessing that I doing something very basic error, with how I try to negate <lower>, but I don't know what 15:50
librasteve reads the docs 15:52
m: "Abc" ~~ rx { <-[lower]> } 15:53
Raku eval
librasteve m: say "Abc" ~~ rx { <-[lower]> }
Raku eval 「A」
librasteve oops for got to say the result
while that works, I think its better style to include the \ like this: 15:56
m: say "Abc" ~~ rx { <-[\lower]> }
Raku eval 「A」
librasteve remember the minus - (and plus +) are the way to do set math in character classes 16:00
a12l I think I misunderstood lizmat before. I interpreted them that I should use <-[]> when defining a char class, and not using it. 16:05
What does the backslash add to the expression? 16:07
And what does the angle brackets do?
librasteve in the regex slang, which also applies to raku Grammars, angle brackets tell you that you have an inner regex 16:10
m: my $inner = rx/\d/; say "123Abc" ~~ /<$inner>+/; 16:12
Raku eval 「123」
librasteve m: my regex inner {\d}; say "123Abc" ~~ /<inner>+/; 16:14
Raku eval 「123」 inner => 「1」 inner => 「2」 inner => 「3」
librasteve ^^ in the first example, I can use angle brackets to interpolate a variable inside another regex <$inner>, in the second example I use the regex keyword to make a subrule <inner> no need for the $ this time 16:16
see here for interpolation docs.raku.org/language/regexes#Reg...erpolation and here for subrules docs.raku.org/syntax/regex 16:17
(welll you did ask how nesting works)
so, angle brackets have great power and these are the building blocks of Grammars (a topic for another day) 16:18
a predefined char class like lower is a reserved use of angle brackets (see the list here docs.raku.org/language/regexes#Pre...r_classes) 16:19
m: say "123Abc" ~~ /<lower>+/ 16:20
Raku eval 「bc」 lower => 「b」 lower => 「c」
librasteve hopefully that is making sense a bit
other reserved variants of the angle brackets are with ! and ? as we have seen for zero-width assertions and with inner square brackets [] for char classes often with -+ to combine them 16:22
m: say "123Abc" ~~ /<[\w]-[b]>+/ 16:25
Raku eval 「123A」
librasteve so here I remove b from the total set \w (ie all word chars) and so the match stops 16:26
m: say "123Abc" ~~ /<[\w]>+/
Raku eval 「123Abc」
librasteve ^^ if I leave b in then it carries on
note: in my investigations today it seems that the backslashed predefined char classes like \w work well with char class set math, I have noticed some issues using the others like <lower> - the docs say You can also write the backslashed forms for character classes between the [ ] so that means that they must be from this list only docs.raku.org/language/regexes#Bac...er_classes 16:31
otherwise the results are unpredictable
and to your question, the backslash inside the char class brackets <[ ... ]> is pretty much the only escape character that then picks out these backslashed predefined char classes and must also be used double if you want to include a literal backslash \\ 16:33
thanks for you patience! 16:34
a12l !Thanks for the examples @librasteve! 20:27
antononcube I would have sent you to talk to LLMs long time ago... 21:20
21:21 MasterDuke joined
a12l I use Perplexity (unsure what model they''re using), but I get so much contradictory info in the replies that I don't fully trust it. 21:51
On Raku
antononcube @a12l Great on using Perplexity. As for better results with Raku, I would (shamlessly) advice to use my Raku packages for interacting with LLMs: 1) Install "Jupyter::Kernel" (or other decent Raku REPL) 2) Use the LLM prompt "CodeWriterX" with Raku. 21:59
"Jupyter::Chatbook" automatically loads the Raku LLM packages "LLM::Functions" and "LLM::Prompts". 22:01
The prompt "CodeWriterX" can be used both in chat cell and programmatically.