This channel is intended for people just starting with the Raku Programming Language (raku.org). Logs are available at irclogs.raku.org/raku-beginner/live.html Set by lizmat on 8 June 2022. |
|||
04:59
teatwo left
07:48
teatime joined
|
|||
a12l | @antononcube Thanks for the tip! I'm looking into getting Jupyter up and running on my machine | 10:53 | |
11:42
MasterDuke left
|
|||
antononcube | @a12l Please ask for help if you have any difficulties. (TBH, installing Jupyter is not trivial.) | 12:29 | |
a12l | Yes, see that. Need to postpone it a couple of weeks | 12:38 | |
What does the colon before the character class name mean? Perplexity answers that you should prepend predefined char classes with a colon, but it works without any problem without one. But I do see a difference in the result in the REPL, but can't identify the difference | 17:09 | ||
cdn.discordapp.com/attachments/768...60cba& | |||
librasteve | docs.raku.org/syntax/%3C%3Aproperty%3E - AI is often wrong ;-) in this case it is nonsensical to mix : with alpha | 17:12 | |
m: say "Abc" ~~ / <:Ll> / | 17:13 | ||
Raku eval | 「b」 | ||
librasteve | ^^ this works because Ll is a unicode char property | ||
these are listed in the General Category table here en.wikipedia.org/wiki/Unicode_char...r_property | 17:15 | ||
one of the innovations of raku is that it adds deep integration with unicode in regexs | 17:16 | ||
here is a great set of posts on unicode that use raku rx examples ... dev.to/bbkr/utf-8-regular-expressions-20h0 | 17:26 | ||
a12l | But it still works? | 17:37 | |
From what I can understand <:alpha> should use a char class consisting of all unicode chars with the general category prop named "alpha". But from the table in docs.raku.org/syntax/%3C%3Apropert...roperty%3E I can't find any unicode general category named alpha. | 17:39 | ||
So why does it work? | |||
I assume here that alpha is defined somewhere in Raku that it's equal to "<:Letter>" | 17:40 | ||
librasteve | I think that the compiler should flag <\alpha> and <:alpha> as errors - so I will raise a bug issue | 17:41 | |
the docs.raku.org/language/regexes#Pre...er_classes are raku concepts which are much more general and useful than unicode properties - a couple of them do map onto unicode properties as shown in the table - but in general they are not the same | 17:44 | ||
a12l | Good to know! | ||
What's the point of escaping the char class name inside angle brackets? And don't you just escape the first char, and not the whole char class name? | 17:45 | ||
librasteve | please show an example | ||
a12l | You just wrote <\alpha>, and yesterday you wrote > while that works, I think its better style to include the \ like this: > m: say "Abc" ~~ rx { <-[\lower]> } | 17:47 | |
But maybe backlash have different semantics here? | |||
librasteve | good catch - these are both errors on my part | 17:48 | |
a12l | Did you just typo add a backlash? | 17:51 | |
From what I understand, a backslash followed by a single char converts that literal char to the char class with that char as an alias for its name? | 17:52 | ||
E.g. \w is equal to <alnum> , and \d is equal to <digit>? | 17:53 | ||
librasteve | ok - I am going a bit slower than you ... here is the bug report github.com/rakudo/rakudo/issues/5622 | 17:55 | |
that covers the <:alpha> part of the discussion | 17:56 | ||
on the <[\alpha]> part, this was covered in our chat yesterday when I noted that only backslashed char classes can be used as components in custom char classes like <[\w]> | 17:58 | ||
now let me try to address your points one by one | |||
consider this custom char class ... <[\w]+[\\]> this will match all word chars \w plus any backslash char \ in the input string. the first backslash is used to denote the use of one of the built in predefined backslashed char classes as listed here docs.raku.org/language/regexes#Bac...r_classes. like any escape char you then need a way to override it if you want to use it directly. | 18:03 | ||
so, for clarity this is NOT escaping the first character, this is how you bring in the prefined char class \w | 18:04 | ||
exactly! the "shorthand" column in the docs.raku.org/language/regexes#Pre...er_classes table gives those that have an equivalent | 18:07 | ||
sorry for my errors, I am also learning as I go ... hope you are enjoying the raku firehose learning experience | 18:10 | ||
ab5tract | librasteve: I’ve flagged this issue before too, but never got around to making the problem solving ticket | 18:11 | |
librasteve | oh and I forgot to search previous issues | 18:12 | |
ab5tract | There’s no validation whatsoever of the builtin character classes. If you provide an invalid one, there’s no complaint or failure object, just a lack of a match | ||
no worries, it’s kind of obscure to begin with. | 18:13 | ||
a12l | Thanks for the explanation @librasteve! | 18:14 | |
Now I'm wondering, what's the difference between a char class and a sub-expression? Or is the former considered a subset to the later? | 18:15 | ||
If I don't remember, if I've created a named regex and want to include that in another regex, I would write <name-of-regex in my new regex? | 18:16 | ||
librasteve | The predefined character classes in the leftmost column are all of the form <name>, a hint to the fact that they are implemented as built-in named regexes. <- that's what the docs have to say about it | 18:20 | |
so, basically yes, this is raku eating it's own dogfood | |||
a12l | Now I'm probably nitpicky (but I want to now exactly, so I don't get confused later), but isn't actually so that the predefined char classes have their name? And then you want to use them in a regex you surround name with angle brackets <name> so that Raku knows that it should treat name as a name for a char class? | 18:23 | |
I.e. I interpret that part of the docs as if the angle brackets is part of the char class name, but what is actually happen they show how you would use it in a regex? | 18:24 | ||
librasteve | errr yes and yes ;-) | 18:26 | |
consider this | 18:28 | ||
m: say "Abc" ~~ / <alpha>+ / | |||
Raku eval | 「Abc」 alpha => 「A」 alpha => 「b」 alpha => 「c」 | ||
librasteve | and this | ||
m: my regex my_alpha { \w }; say "Abc" ~~ / <my_alpha>+ / | |||
Raku eval | 「Abc」 my_alpha => 「A」 my_alpha => 「b」 my_alpha => 「c」 | ||
librasteve | its the same - illustrating how <alpha> and the other built ins work | 18:30 | |
a12l | Thanks! Now I understand why I misunderstood the docs yesterday and tried to write "Abc" ~~ rx { <![<lower>]> } | 18:32 | |
I thought that lower's actual name was <lower>, so I wrote that | 18:33 | ||
ab5tract | m: say “Rr” ~~ /<:Lowercase>/ | 19:28 | |
camelia | 「r」 | ||
ab5tract | m: say “Rr” ~~ /<:Lc>/ | ||
camelia | Nil | ||
ab5tract | m: say “Rr” ~~ /<:Ll>/ | ||
camelia | 「r」 | ||
ab5tract | Right, LowercaseLetter | ||
Note that these are categories, rather than classes | 19:33 | ||
librasteve | aka properties 🙃 | 19:35 | |
a12l | I have a string, and I want to check that there's at least one letter, and if all letters are uppercase. My solution is to first apply ~~ rx { <upper>+ } on the string, and then apply ~~ rx { <alpha>+ }. If those two Match objects contain the same strings I conclude that there's at least one letter, and all letters are uppercase. My question is, how do I compare those Match objects? Unsure how | 20:00 | |
lizmat | have you tried eq ? | 20:01 | |
a12l | Nope. eq compare strings, so does eq automatically to cast the objects into strings? | 20:03 | |
lizmat | m: "foo" ~~ / \w+ /; say $/ eq $/ | ||
camelia | True | ||
lizmat | eq coerces Cool objects to strings, and Match objects are Cool | 20:04 | |
m: say Match ~~ Cool | |||
camelia | True | ||
lizmat | m: say "42" + "666" | 20:05 | |
camelia | 708 | ||
a12l | Ah, Cool 😉 | ||
lizmat | even though these were strings, you can add them as long as they can be coerced to something numerical | ||
a12l | Thanks | ||
lizmat | m: say "42" + "foo" | ||
camelia | Cannot convert string to number: base-10 number must begin with valid digits or '.' in '⏏foo' (indicated by ⏏) in block <unit> at <tmp> line 1 |
||
a12l | m: "Tom-ay-to, tom-aaaah-to." ~~ / <alpha>+ / | 20:19 | |
Raku eval | |||
a12l | Wait, it should match Tom (the first part before the first dash)? | 20:20 | |
Why doesn't match the result from my local Raduko repl? | |||
What I'm wondering is what I should so it matches against all the letters in the string, and not only ones before the first char that breaks the pattern | 20:23 | ||
lizmat | m: say "Tom-ay-to, tom-aaaah-to." ~~ / <alpha>+ / | 20:24 | |
camelia | 「Tom」 alpha => 「T」 alpha => 「o」 alpha => 「m」 |
||
lizmat | m: say "Tom-ay-to, tom-aaaah-to." ~~ / <.alpha>+ / | 20:44 | |
camelia | 「Tom」 | ||
lizmat | the non-capturing version | ||
nahita3882 | you didn't print the result of the expression, so we don't see anything in the bot's output | 20:46 | |
in REPL, the printing automatically happens | |||
a12l | ah, thanks! Will keep that in mind | 20:47 | |
nahita3882 | To this end, you need the "global" modifier docs.raku.org/language/regexes#Global | ||
this won't stop at the first match | |||
we enable this "adverb" (adverb because it affects how the match is performed) with :g | 20:48 | ||
can't directly put it like :g/ .../ or /:g .../ because it's a matching adverb | 20:49 | ||
so we need to use m:g/ ... / | |||
np; normally it warns about unused values but with smartmatch operator (~~), i guess it thinks it can have side effects so it doesn't | 20:50 | ||
m: 32 | |||
Raku eval | WARNINGS for /home/glot/main.raku: Useless use of constant integer 32 in sink context (line 1) | 20:51 | |
23:12
MasterDuke joined
|