This channel is intended for people just starting with the Raku Programming Language (raku.org). Logs are available at irclogs.raku.org/raku-beginner/live.html Set by lizmat on 8 June 2022. |
|||
02:10
kjp left
02:11
kjp joined
02:12
kjp left,
kjp joined
07:51
Chanakan left
07:54
Chanakan joined
08:11
cleo left
08:14
Tirifto left
08:15
Tirifto joined
08:22
lizmat left,
gfldex left,
snonux left
08:34
lizmat joined,
gfldex joined,
snonux joined
|
|||
a12l | I'm looking at the code block in docs.raku.org/language/regexes#Mod...ier:_%,_%% | 08:53 | |
raku say so 'abc,def' ~~ / ^ [\w+] ** 1 % ',' $ /; # OUTPUT: «False» say so 'abc,def' ~~ / ^ [\w+] ** 2 % ',' $ /; # OUTPUT: «True» | 08:54 | ||
What does the square brackets do here? Because they don't define a chap class here, right? | 08:56 | ||
lizmat | square brackets in Raku regexes are for grouping | 09:15 | |
docs.raku.org/language/regexes#Non...g_grouping | 09:16 | ||
a12! you probably want <[ ]> docs.raku.org/language/regexes#Enu...and_ranges | 09:17 | ||
a12l | lizmat: Thanks! Now I understand. | 12:52 | |
Yeah, I'm looking at char classes, but was confused when I read the docs. | |||
14:14
MasterDuke left
|
|||
What's the difference between <![ pattern ]> and <-[ pattern ]> in regex? | 14:19 | ||
lizmat | -[ ] is part of the builder of the charclass | 14:26 | |
m: say "ooo" ~~ / <[\w]>+ / | |||
camelia | 「ooo」 | ||
lizmat | m: say "ooo" ~~ / <[\w]-[o]>+ / # aka, all word chars except o | 14:27 | |
camelia | Nil | ||
lizmat | m: say "foo" ~~ / <[\w]-[o]>+ / # aka, all word chars except o | ||
camelia | 「f」 | ||
a12l | lizmat: And it differ to <![ ... ]> how? | 14:36 | |
Note that I don't know what char classes actually are, and how they differ from other part of regexes | |||
librasteve | @a12l is that because you didn't read the doc yet or because you read it and it didn't make sense? | 14:38 | |
lizmat | <![ ] is a negated lookaround assertion: docs.raku.org/language/regexes#Loo...assertions | ||
a12l | I read about half the page about regexes (so not all). And I feel that quite a lot of background knowledge is assumed. And several sections is dependent on stuff that comes later in the page, and they are themselves dependent on other sections. | 14:42 | |
librasteve | yeah - the doc can be a bit of a slog to read ... since it covers all the things and there are many | 14:43 | |
a12l | @librasteve Do you happen to know a good intro to regexps in Raku for a person not used to regexes tools? | 14:45 | |
Have taken a finite automata course at Uni, but I still feel that there's stuff that I haven't seen before | 14:46 | ||
librasteve | to cut a long story short, a char class is a set of characters that you can match against ... there are many predefined ones for example \w matches all "word" chars and you can also make your own with the <[ ... ]> syntax, so in lizmats example, <[\w]-[o]> you get all the word chars from \w less the o since that is substracted form the set via the minus - | ||
you may like this raku.guide/#_regular_expressions | 14:48 | ||
a12l | Example: I don't know what "lookahead" means, and the description says "Lookaround assertions, which need a character class in its simpler form, work both ways. They match, but they don't consume a character." The docs never describe what "consume" means. | 14:50 | |
librasteve | this sounds like the raku guide (which an intro level) may be too simple for your needs ... tbh I am not sure ... for sure, the regexes reference is indeed quite dense docs.raku.org/language/regexes and especially the parts on Zero-width assertions. | 14:56 | |
I guess the basic idea is that a regex proceeds character by character, usually moving to the next character of the input string on each step. Internally the regex tracks the position count in the input string. This is called "consuming" the character. In the typical case, the regex never goes backward. | 14:59 | ||
However, there are some examples when the regex backtracks - I asked my favourite LLM to describe this chatgpt.com/share/5a5fe9a4-f01d-4e...b9f48c2e2e | 15:01 | ||
ok so far? | 15:06 | ||
then to complete the story, from docs.raku.org/language/regexes#Sum...f_anchors, we have Anchors are zero-width regex elements. Hence they do not use up a character of the input string, that is, they do not advance the current position at which the regex engine tries to match. A good mental model is that they match between two characters of an input string, or before the first, or after the last | 15:07 | ||
character of an input string. | |||
a12l | That helps, thanks! | 15:08 | |
librasteve | and from just below, we have Zero-Width assertions can help you implement your own anchor: it turns another regex into an anchor, making them consume no characters of the input string. There are two variants: lookahead and lookbehind assertions. | ||
a12l | lizmat: Thalns for the explainations | ||
librasteve | to further confuse matters there is the notion of a Lookaround Assertions with "work both ways" ... tbh honest I do not know what that means | 15:09 | |
a12l | Feels like 50 percent of Raku is the regex sub language 😛 | 15:10 | |
librasteve | :cameliathink: | ||
oh, here's a practical example of a Lookaround Assertion ... first in Lookahead mode: | 15:15 | ||
m: say '333' ~~ m/<?[7]> \d+/; | 15:17 | ||
Raku eval | False | ||
librasteve | m: say '733' ~~ m/<?[7]> \d+/; | 15:18 | |
Raku eval | 「733」 | ||
librasteve | ^^ so this is saying "check that the first digit is a 7, but leave it (ie do not consume it) for the \d+ to capture | 15:19 | |
And now in Lookbehind mode: | 15:21 | ||
m: say '333$' ~~ m/ \d+ <?[$]>/; | |||
Raku eval | 「333」 | ||
librasteve | m: say '333' ~~ m/ \d+ <?[$]>/; | ||
Raku eval | False | ||
librasteve | ^^ and this is saying "check that the digits are followed by a $" | 15:22 | |
so the Lookaround idea is that the same syntax can be used to gate what is captured in "both ways" | 15:23 | ||
anyway all this assertion stuff is in the deep end of regex - in practice I very rarely need to use it and fwiw, regardless of language, regex is a great tool for any kind of string checking | 15:24 | ||
a12l | So a char class always matches against a single character? | 15:34 | |
lizmat | yes unless you specify a multiplier, such a + or * | 15:35 | |
a12l | And the rest of the regex handle how to match against the rest of the string? | ||
librasteve | The char class is the set of chars that are compared - the quantifier controls how many times the match is done | 15:36 | |
(as lizmat says) | |||
lizmat | s/multiplier/quentifier :-) | ||
*quantifier | |||
a12l | But can the quantifier be part of the char class, or does it operate on char classes? | 15:38 | |
Because you can have quantifiers that operate on char classes if I remember correct? | |||
lizmat | quantifiers can NOT be part of a char class | 15:39 | |
a12l | And how does nested regex expressions work? | ||
lizmat | they would be just a + or * as a character | ||
by nesting ? | |||
a12l | I'm trying to negate the lower char class to match against a uppercase letter | 15:48 | |
"Abc" ~~ rx { <![<lower>]> } | |||
but I get an empty result | |||
m: "Abc" ~~ rx { <![<lower>]> } | |||
Raku eval | |||
a12l | I'm guessing that I doing something very basic error, with how I try to negate <lower>, but I don't know what | 15:50 | |
librasteve | reads the docs | 15:52 | |
m: "Abc" ~~ rx { <-[lower]> } | 15:53 | ||
Raku eval | |||
librasteve | m: say "Abc" ~~ rx { <-[lower]> } | ||
Raku eval | 「A」 | ||
librasteve | oops for got to say the result | ||
while that works, I think its better style to include the \ like this: | 15:56 | ||
m: say "Abc" ~~ rx { <-[\lower]> } | |||
Raku eval | 「A」 | ||
librasteve | remember the minus - (and plus +) are the way to do set math in character classes | 16:00 | |
a12l | I think I misunderstood lizmat before. I interpreted them that I should use <-[]> when defining a char class, and not using it. | 16:05 | |
What does the backslash add to the expression? | 16:07 | ||
And what does the angle brackets do? | |||
librasteve | in the regex slang, which also applies to raku Grammars, angle brackets tell you that you have an inner regex | 16:10 | |
m: my $inner = rx/\d/; say "123Abc" ~~ /<$inner>+/; | 16:12 | ||
Raku eval | 「123」 | ||
librasteve | m: my regex inner {\d}; say "123Abc" ~~ /<inner>+/; | 16:14 | |
Raku eval | 「123」 inner => 「1」 inner => 「2」 inner => 「3」 | ||
librasteve | ^^ in the first example, I can use angle brackets to interpolate a variable inside another regex <$inner>, in the second example I use the regex keyword to make a subrule <inner> no need for the $ this time | 16:16 | |
see here for interpolation docs.raku.org/language/regexes#Reg...erpolation and here for subrules docs.raku.org/syntax/regex | 16:17 | ||
(welll you did ask how nesting works) | |||
so, angle brackets have great power and these are the building blocks of Grammars (a topic for another day) | 16:18 | ||
a predefined char class like lower is a reserved use of angle brackets (see the list here docs.raku.org/language/regexes#Pre...r_classes) | 16:19 | ||
m: say "123Abc" ~~ /<lower>+/ | 16:20 | ||
Raku eval | 「bc」 lower => 「b」 lower => 「c」 | ||
librasteve | hopefully that is making sense a bit | ||
other reserved variants of the angle brackets are with ! and ? as we have seen for zero-width assertions and with inner square brackets [] for char classes often with -+ to combine them | 16:22 | ||
m: say "123Abc" ~~ /<[\w]-[b]>+/ | 16:25 | ||
Raku eval | 「123A」 | ||
librasteve | so here I remove b from the total set \w (ie all word chars) and so the match stops | 16:26 | |
m: say "123Abc" ~~ /<[\w]>+/ | |||
Raku eval | 「123Abc」 | ||
librasteve | ^^ if I leave b in then it carries on | ||
note: in my investigations today it seems that the backslashed predefined char classes like \w work well with char class set math, I have noticed some issues using the others like <lower> - the docs say You can also write the backslashed forms for character classes between the [ ] so that means that they must be from this list only docs.raku.org/language/regexes#Bac...er_classes | 16:31 | ||
otherwise the results are unpredictable | |||
and to your question, the backslash inside the char class brackets <[ ... ]> is pretty much the only escape character that then picks out these backslashed predefined char classes and must also be used double if you want to include a literal backslash \\ | 16:33 | ||
thanks for you patience! | 16:34 | ||
a12l | !Thanks for the examples @librasteve! | 20:27 | |
antononcube | I would have sent you to talk to LLMs long time ago... | 21:20 | |
21:21
MasterDuke joined
|
|||
a12l | I use Perplexity (unsure what model they''re using), but I get so much contradictory info in the replies that I don't fully trust it. | 21:51 | |
On Raku | |||
antononcube | @a12l Great on using Perplexity. As for better results with Raku, I would (shamlessly) advice to use my Raku packages for interacting with LLMs: 1) Install "Jupyter::Kernel" (or other decent Raku REPL) 2) Use the LLM prompt "CodeWriterX" with Raku. | 21:59 | |
"Jupyter::Chatbook" automatically loads the Raku LLM packages "LLM::Functions" and "LLM::Prompts". | 22:01 | ||
The prompt "CodeWriterX" can be used both in chat cell and programmatically. |