🦋 Welcome to the MAIN() IRC channel of the Raku Programming Language (raku.org). Log available at irclogs.raku.org/raku/live.html . If you're a beginner, you can also check out the #raku-beginner channel! Set by lizmat on 6 September 2022. |
|||
00:00
reportable6 left
00:02
reportable6 joined
00:06
simcop2387 left,
perlbot left
00:07
simcop2387 joined
00:08
perlbot joined
|
|||
rf | So turns out you need === defined for CArray[my_cstruct_type] to work | 00:28 | |
=== (Str $foo, my_cstruct_type $bar) | |||
00:33
derpydoo joined
01:03
xinming left
01:06
xinming joined
01:11
clsn_ joined
02:11
evalable6 left,
tellable6 left,
bloatable6 left,
statisfiable6 left,
benchable6 left,
squashable6 left,
sourceable6 left,
shareable6 left,
bisectable6 left,
releasable6 left,
unicodable6 left,
committable6 left,
quotable6 left,
nativecallable6 left,
reportable6 left,
linkable6 left,
notable6 left,
greppable6 left,
coverable6 left,
quotable6 joined,
nativecallable6 joined,
sourceable6 joined,
linkable6 joined,
notable6 joined
02:12
tellable6 joined,
shareable6 joined,
committable6 joined,
reportable6 joined
02:13
greppable6 joined,
squashable6 joined,
bisectable6 joined,
statisfiable6 joined,
coverable6 joined,
bloatable6 joined
02:14
benchable6 joined,
evalable6 joined,
releasable6 joined,
unicodable6 joined
02:25
MasterDuke left
02:31
rf left
02:38
codesections left
03:31
Xliff left
03:46
swaggboi left
04:05
swaggboi joined
05:05
coverable6 left,
releasable6 left,
benchable6 left,
evalable6 left,
quotable6 left,
bisectable6 left,
shareable6 left,
sourceable6 left,
greppable6 left,
unicodable6 left,
linkable6 left,
reportable6 left,
notable6 left,
committable6 left,
tellable6 left,
bloatable6 left,
squashable6 left,
statisfiable6 left,
nativecallable6 left,
sourceable6 joined,
nativecallable6 joined
05:06
benchable6 joined,
quotable6 joined,
coverable6 joined,
notable6 joined
05:07
releasable6 joined,
bisectable6 joined,
greppable6 joined,
committable6 joined,
unicodable6 joined,
squashable6 joined,
evalable6 joined,
statisfiable6 joined,
shareable6 joined
05:08
linkable6 joined,
reportable6 joined,
bloatable6 joined,
tellable6 joined
05:11
wbvalid joined
05:18
wbvalid left
05:23
jpn joined
05:28
jpn left
06:00
reportable6 left
06:02
reportable6 joined
06:49
teatime joined
06:52
teatwo left
07:16
Sgeo left
07:29
Max51 joined
07:30
Max51 left
07:45
jpn joined
08:04
jpn left
08:06
jpn joined
08:24
jpn left
08:35
abraxxa joined
08:37
simcop2387 left
08:38
simcop2387 joined,
perlbot left
08:40
perlbot joined
09:04
discord-raku-bot left
09:05
discord-raku-bot joined
09:41
ab5tract joined
09:45
jpn joined
|
|||
tbrowder__ | g'day, all. does anyone have a working workflows/windows.yml for modules on github? | 11:02 | |
11:13
linkable6 left,
evalable6 left
11:14
linkable6 joined,
evalable6 joined
|
|||
Nemokosch | wouldn't bet my life on that, good sir. But hope dies last | 11:16 | |
tbrowder__ | 👍🏻 | 11:48 | |
11:50
petro-cuniculo joined
11:53
gcd left
11:57
petro-cuniculo left
12:00
reportable6 left
12:03
reportable6 joined,
abraxxa left
13:03
linkable6 left,
evalable6 left
13:06
evalable6 joined,
linkable6 joined
13:43
rf joined
|
|||
rf | Morning folks | 13:43 | |
13:44
jgaz joined
13:46
jpn left
13:49
jpn joined
|
|||
Anton Antonov | @rf Morning, you, Haskel apologist ! | 13:50 | |
And monad-promoter… | 13:52 | ||
Voldenet | Promises are monads and they're everywhere | 13:53 | |
monad-ish | 13:54 | ||
rf | Anton :P | 13:56 | |
13:57
jpn left
14:00
jpn joined
14:05
jpn left
|
|||
Voldenet | say (await Promise.kept(Promise.kept(42))).WHAT | 14:10 | |
evalable6 | (Promise) | ||
Voldenet | this looks more monadish than js impl, that would just return 42 in that case | ||
14:10
jpn joined
|
|||
[Coke] | would appreciate if someone could review the "is it a bug" question in github.com/Raku/doc/issues/4271 | 14:18 | |
lizmat | my question would be: did it recently change, or has it always been this way? | 14:20 | |
14:20
abraxxa-home joined
|
|||
Nemokosch | why would if ever topicalize? 🤔 | 14:22 | |
lizmat | yeah, it feels like an implementation detail | ||
Nemokosch | > The with statement is like if, but tests for definedness rather than truth, and it topicalizes on the condition, much like given: | 14:26 | |
so sounds like the documentation contradicts itself | 14:27 | ||
> You may intermix if-based and with-based clauses. this is the interesting part... | 14:28 | ||
m: if 0 { .say } orwith Nil { .say } else { .say } | |||
Raku eval | Nil | ||
Nil | 14:29 | ||
Nemokosch | perhaps this is what it's trying to say | ||
m: if 0 { .say } orwith Nil { .say } elsif 12 { .say } else { .say } | 14:30 | ||
Raku eval | (Any) | ||
Nemokosch | this seems surprising to me, though | ||
the former else clause ran, as an elsif clause, and this time it un-topicalized | |||
14:32
jpn left
14:34
jpn joined
|
|||
rf | Voldenet: Monads are a container with a map and bind | 14:36 | |
(and return) but that isn't super important | |||
14:39
jpn left
|
|||
rf | Not sure if promise fits it perfectly | 14:39 | |
dutchie | do you not need return to do the bind/join equivalence | 14:40 | |
Woodi | rf: but monad-ish can mean "clousure" too ;) | ||
14:41
simcop2387 left
14:42
perlbot left,
perlbot_ joined,
simcop2387 joined
|
|||
rf | Woodi: Not sure what you mean by that | 14:42 | |
Woodi | rf: just trying to abuse meanings becouse of some similiarities :) | 14:43 | |
not even sure what "bind" is, too lispy :) | |||
14:43
perlbot_ is now known as perlbot
|
|||
rf | bind : M a -> (a -> M b) -> M b | 14:43 | |
Woodi | so M is domain of values ? | 14:44 | |
and result is which part ? | |||
but assumed functions... | 14:45 | ||
rf | M is a monad, a is the type held within the monad | ||
Nemokosch | let's keep it simple | 14:48 | |
which operation returns a monad, and which a value? | |||
rf | Bind always returns a monad | 14:49 | |
Woodi | whay it is doubled ? a -> M b -> M b ? | 14:50 | |
rf | (a -> M b) is another function | 14:51 | |
Woodi | then what -> means ? | 14:52 | |
rf | en.wikipedia.org/wiki/Partial_application | 14:53 | |
exp | lol a wikipedia page on computer science is not going to make things any more understandable | 14:54 | |
they unironically care only about number of facts expressed, not how many people understand what's written | |||
Woodi | so bind is function that returns monad that changes values into ... ? | 14:55 | |
I thinked about bind in Lisp like some kind of pointer... | 14:56 | ||
Nemokosch | oh | ||
14:56
jpn joined
|
|||
so bind is the one that takes a function that constructs the new monad directly | 14:56 | ||
dutchie | if we stick to just talking about promises, bind corresponds to the thenmethod | ||
then method | |||
Nemokosch | the then method, when you directly return a Promise in the callback | 14:57 | |
dutchie | yeah exactly | ||
the callback is the a -> M b | |||
Woodi | partial application describes curring ? | 14:58 | |
dutchie | the invocant is the M a which gets "unwrapped" and fed into the callback | ||
14:58
perlbot left,
simcop2387 left
|
|||
Nemokosch | Promise.resolve(42).then(x => { const funky = Math.random()*x; return Promise.resolve(funky); }) | 14:58 | |
in pseudocode that absolutely isn't Javascript ^^ | 14:59 | ||
dutchie | Woodi: they are closely related yes. a "curried" function takes multiple args by returning another function with those args "partially applied" | ||
some people are more precise than others in keeping the two terms distinct | |||
Woodi | dutchie: my math teacher said: understand and then memorize or memorize and then understand :) | 15:00 | |
Nemokosch | yeah I guess think of Haskell | 15:01 | |
from what I know, Haskell only has functions that take one argument | 15:02 | ||
Woodi | so looks curring use partial application or even is p.a. ... | ||
Nemokosch: only one ? crazy :) | 15:03 | ||
tellable6 | Woodi, I'll pass your message to Nemokosch | ||
Nemokosch | I don't know Haskell syntax but going by this logic, a function that "takes several parameters", would be called like f(1)('asd')(True) | ||
15:04
perlbot joined
|
|||
where f would return a new function that would return a new function that would return.... you get the idea | 15:04 | ||
Woodi | sounds in order ;) | ||
15:05
simcop2387 joined
|
|||
Nemokosch | and on each call, the current parameter is built into the returned function | 15:05 | |
at which point it's just a matter of approach if you say "it has n unbound variables" or you say it's an nth order function | 15:06 | ||
15:10
Sgeo joined
15:11
tbrowder_ joined
|
|||
rf | Nemo bind will "unwrap" the first monad and feed the unwrapped value to a new function (the second parameter) which returns a new monad | 15:12 | |
github.com/rawleyfowler/Monad-Resu...lt.rakumod | |||
^ That repo implements a monad if you;re interested Woodi | 15:14 | ||
Also Nemo you are correct a function call in Haskell is like f(foo)(bar)(baz) | 15:15 | ||
Anton Antonov | @Voldenet "Promises are monads and they're everywhere" -- you are on record, I will verify the monad axioms on promises (and be vocal if you wrong.) | 15:21 | |
@Voldenet "monad-ish" -- nice escape (from rigorous feedback.) | 15:22 | ||
rf | Hahahaha | 15:23 | |
15:25
grondilu joined
|
|||
Anton Antonov | @rf I considered working on a post that criticizes your monad approach. Diced to postpone it indefinitely. | 15:25 | |
rf | I am not opposed to counter ideas, though, I haven't heard a compelling one against monads yet. | 15:26 | |
Voldenet | I once said about that about js since "ye it's mostly monads" but then it wasn't using composition properly | ||
because then(a).then(b) is different depending on whether return value is Promise or not | 15:27 | ||
rf | then is map | ||
Voldenet | from then(x=>a(b(x))) | ||
Woodi | rf: checking | ||
Anton Antonov | @rf My point of view on monads is how much a monadic system (e.g. a Raku package) makes the code written with it to have algebraic properties. | ||
Voldenet | hence my test above | ||
say (await Promise.kept(Promise.kept(42))).WHAT | |||
evalable6 | (Promise) | ||
Voldenet | it's at least not as bad as js | 15:28 | |
rf | Anton: I am more interested in abstracting side-effects than algebraic properties | ||
It benefits the consumers of code to use Monadds as well so you can describe the intent of the code | 15:29 | ||
Voldenet | I bet you can sort of do algebraic effects in raku if you like pain | 15:30 | |
Anton Antonov | @rf Sure. But, I leverage the algebraic properties when I make translations from natural language DSLs into programming language DSLs. (And vice-versa.) Hence, the algebraic properties for me are important. | ||
Voldenet | and .throw/.resume combo | ||
Anton Antonov | @Voldenet Dully noted. | ||
Nemokosch | then is kind of both bind and map, from what I understand | 15:31 | |
rf | I really dislike exceptions, which is why I made Monad::Result, I think its very gross to make the caller decipher what possible exceptions can be thrown | ||
then is just map, map : M a -> (a -> b) -> M b | |||
Nemokosch | well, then join me on the dark side and let's dislike control exceptions together 😛 | 15:32 | |
rf | CATCH { default } on every block is just as bad IMO | ||
Plus it's not enforced or implied so uncaught exceptions are far too common | |||
Nemokosch | false negatives are worse than false positives with this really | 15:33 | |
when you only see that some of your assumptions didn't hold | |||
Anton Antonov | @rf You and @Nemokosch but be on the same gray side. (Or same far side gallery.) | ||
Woodi | rf: "exceptions" looks like "sudden explosions" :) but concept of shortcuts in execution flow should be usefull... if we have good behaving code like calculations... | 15:34 | |
Anton Antonov | @rf @Voldenet Here is a (very schematic) flowchart of my monads-for-DSLs workflow: raw.githubusercontent.com/antononc...agents.jpg | ||
rf | That is an interesting approach | 15:36 | |
Woodi: Most software I write needs to be triple redundant and have 0 exceptions, thus why I prefer monads over exceptions. Shortcuts can simply be expressed as function composition assuming the types align | 15:37 | ||
Which is also one of the main concepts behind Humming-Bird ^ | |||
Voldenet | m: class Effect is Exception { has $.x is rw; }; CATCH { when Effect { .x = 42; .resume; } }; my $x = Effect.new; $x.throw; say $x.x | 15:38 | |
camelia | 42 | ||
Voldenet | I'm begging you, don't use the above thing | ||
it sort of works though | |||
rf | Anton you are the first ML person I have heard even use the word Monad :D | 15:39 | |
Anton Antonov | @rf Basically, if I can use "your" monads if I can put the operations in a reduce statement. For example, reduces(&my-monad-bind, my-monad-unit-object(), [&some-op1, &some-op2, &some-op3, &take-value] ) . | ||
rf | Yes that should work | 15:40 | |
Anton Antonov | @rf Ok, good. (Meaning, "you are on record and will try to verify.") | 15:41 | |
@rf ML people use monads, but they do not know and/or use the terminology. | 15:42 | ||
rf | As long as your ops are -> M a -> (a -> M b) -> M b | ||
Voldenet | most people use monads and algebraic effects | ||
Anton Antonov | @rf Right, the associativity rule. | ||
Voldenet | they just buried in N layers of their language abstraction | ||
s/they/they're/ | 15:43 | ||
Anton Antonov | @Voldenet Most Data Science people do not want to program. So, whatever simplifications are used to make the required work more palatable. | 15:44 | |
Voldenet | I remember showing my data sci. code to data scientist, he scratched his head and said he didn't get the code :/ | 15:46 | |
(I tried to abstract away data science part so I could get to my programming one…) | |||
Anton Antonov | @Voldenet Right, hence, I make/use natural language DSLs for Data Science. They still say the same. | 15:47 | |
Voldenet | that makes sense | ||
Anton Antonov | Hopefully, I am not overestimating the interest in this -- here is an example of data wrangling Python code generation from sequences of natural language commands: github.com/antononcube/RakuForPred...thon.ipynb | 15:51 | |
Or, if you prefer, the Raku code results version: github.com/antononcube/RakuForPred...Raku.ipynb | 15:52 | ||
15:57
perlbot left
15:58
simcop2387 left
16:01
perlbot joined
16:02
simcop2387 joined
|
|||
clsn_ | So. Haven't worked with raku in a *long* time, and some things have changed. Right now, I can't see how it's possible to make a regex that matches a *combining* character (or set thereof). I can only match base characters and specify combining characters on them if I want, but I'm searching for the actual combining character which may be on any of many bases (and may even have other combining chars with it.) | 16:54 | |
This is not an unrealistic request, by the way. Not everything is like é where the accent isn't something you'd want to search for without the letter. I'm working with Hebrew cantillation marks, which are like punctuation that happen to be written as combining characters. | 16:56 | ||
m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ /\x[0591]/; | 16:59 | ||
camelia | Nil | ||
17:02
linkable6 left,
evalable6 left
17:03
linkable6 joined
17:04
evalable6 joined
|
|||
Nemokosch | strings are normalized according to NFC | 17:09 | |
clsn_ | Yes, which is fair enough... But rakudo, from what I've seen, matches stuff according to its "NFG". How might I write a regex that can match the 0591 in that string? NFC vs NFD isn't really relevant; none of the characters there are or can be precomposed. | 17:11 | |
Nemokosch | I'd expect a regex to operate on the level of characters, not codepoints | 17:14 | |
clsn_ | Well, from a Unicode perspective, \x[0591] is a character, so I'm not sure what you mean. If you mean by graphemes, that sort of presumes that it doesn't make sense to search for an \x[0591] because it is written as a diacritic, yet that makes just as little sense as saying that it doesn't make sense to search for a comma in a sentence. | 17:15 | |
Nemokosch | I'm not sure if it's still a character after NFC | 17:17 | |
but sure thing, definitely not a grapheme, and a high-level string has characters as graphemes | |||
17:18
codesections joined
|
|||
clsn_ | NFC, as I understand it, is "combine everything that can be combined into precomposed characters," and nothing in the example string can make up a precomposed character. Am I misunderstanding you? | 17:18 | |
Well, then, how would I write a grammar to search for it? It may not be written as a spacing character, but it is exactly as reasonable to search for it as it is to search for a comma or semicolon in English text. | 17:19 | ||
Nemokosch | This is probably beyond me. There is stuff like this docs.raku.org/type/Uni.html | 17:23 | |
but whether it works with regex stuff, no clue | 17:24 | ||
clsn_ | So I could convert it to something more unicode-ish, but can I then use regex--- I see. | ||
This is actually for pretty much the ONLY program I've ever written in rakudo, apart from contributions I made to the actual project. And it *used* to work. Many years ago. | 17:25 | ||
17:26
cfa joined
|
|||
cfa | bisectable6: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ /\x[0591]/; | 17:26 | |
bisectable6 | cfa, Will bisect the whole range automagically because no endpoints were provided, hang tight | ||
cfa, ¦6c (67 commits): «Nil» | |||
cfa, Nothing to bisect! | |||
Nemokosch | that must have been a lot of years ago for sure | 17:27 | |
probably prior to MoarVM, and MoarVM has been the state of art runtime since like 2013 | |||
clsn_ | It was a REALLY long time ago; I'm not sure I can find quite how long it was. Eh, I probably have logs someplace... | ||
Nemokosch | anyway, now I'm not convinced that it is intended to work | 17:28 | |
clsn_ | Yeah, the latest commit in my repo is from December 2011. | ||
Nemokosch | there is a candidate for smartmatching Uni against Regex github.com/rakudo/rakudo/blob/2022...ex.pm6#L47 but it basically converts to Str and calls it a day | 17:29 | |
clsn_ | It may or may not be "right" for it to work *as stated*, but I think there definitely needs to be some way to make it work, or you're really missing something important. | ||
i.e. converting to some form or another that regex-matches on codepoints or something like that. | 17:30 | ||
Nemokosch | what I doubt, though, is that this is high-level enough to fall into regex territory | ||
clsn_ | s/i\.e\./e.g./ (can't believe I used \ for that...) | ||
Nemokosch | yeah that sounds horrible tbh, to replace a part of a grapheme | 17:31 | |
clsn_ | Well, I still contend that if regexes can't do it in any fashion, then you're failing to capture or make available something very important and not unreasonable for people to want to do. I present my own program as evidence of that (granted, one might argue that I only barely qualify as "people"...) | 17:32 | |
I could easily see someone studying Hebrew or Arabic doing searches for vowel-patterns (which indicate grammatical forms). | |||
Nemokosch | well I'm just saying that it perhaps doesn't fall into regex territory | ||
clsn_ | and the Hebrew Bible cantillations are part of a well-understood and well-defined grammar. | 17:33 | |
Not certain what that really means, or if that answers. You can do that, you just can't use regexes for it? And yet it's matching patterns in a string of characters, isn't that what regexes are supposed to do for a living? Why should someone have to write up their own homegrown regex-matcher just for certain kinds of characters? | 17:34 | ||
Nemokosch | they are not "characters" on Str level | 17:36 | |
clsn_ | My program from way back when would parse a Biblical sentence according to the structure of sentential breaks encoded by the cantillations and output a tree graph in dot format. That's parsing text with a grammar. | ||
Nemokosch | And like, regex is not meant for any pattern matching. For example, you can't just arbitrarily match binary patterns in the unicode representation | 17:37 | |
I mean, sorry for your loss | |||
cfa | here's another example, | 17:38 | |
m: say "u\x[0308]" ~~ /\x[0308]/ | |||
camelia | Nil | ||
clsn_ | web.meson.org/cache/Esth:8:9.png | ||
Nemokosch | But I'm not convinced that this is a problem with the regex itself, as it clearly works on the principle that a character is a grapheme | ||
clsn_ | I can see that this is a limitation of the way rakudo has chosen to define strings and regexes. But I wonder if that choice is defensible in the face of, well, not being able to do exactly what regexes and grammars are supposed to do. | 17:39 | |
Nemokosch | frankly I don't know about Unicode enough to understand what makes a "combining character" a "character", in this jargon | ||
Again, I don't think regexes (let alone grammars) are supposed to dig this deep | 17:40 | ||
clsn_ | Eh, that's because "character" sounds like it should be some graphical unit, i.e. a grapheme, so it's hard to see a combining character as one. | ||
Voldenet | if you don't mind performance hit then | ||
Nemokosch | So what is it exactly, that it isn't just called a codepoint? | ||
clsn_ | But whyever not? As I said, it's a very reasonable thing to ask a grammar to do. | ||
Voldenet | m: my $x="עֵֽינֵיכֶ֑ם"; say 0x591 (elem) $x.ords; | 17:41 | |
camelia | True | ||
Nemokosch | You said so yes but it didn't sound any different from saying that grammars are for binary inspection. | ||
clsn_ | In Unicode parlance, character and codepoint can be almost interchangeable. Indeed, I understand what you mean about having trouble seeing it as a character, but coming from a more Unicode-centric POV myself, I find the opposite to be true. | ||
Nemokosch | Also, you earlier made the distinction from é. ("Not everything is like é where the accent isn't something you'd want to search for without the letter.") | 17:42 | |
what backs this distinction up, that could be somehow integrated? | 17:43 | ||
clsn_ | I don't know. Binary patterns are not regex-fodder because they don't generally have structural meaning that's useful for pattern-matching in most strings. Combining characters do. I guess there's some fuzziness in that argument. | ||
Ah, that's a better question... | |||
Nemokosch | Yes, this whole fuzziness | ||
clsn_ | OK, let's see if I can explain what I mean by that, and maybe I'm wrong about the distinction as well... | ||
Nemokosch | that even though "combining characters" fall back into being codepoints and hence just binary data specified by Unicode, they can matter on textual level sometimes apparently | 17:44 | |
clsn_ | An é is, in a sense, a letter in itself. That's (kinda) why it has a precomposed codepoint, or at least why it was thought at some point to be worth encoding precomposed and Unicode inherited it. And even if considered as an e plus an acute accent, there's nothing in common between e+acute and a+acute. They're independent of one another. | 17:45 | |
Nemokosch | > é oof | 17:46 | |
clsn_ | It's not like it's completely impossible, but it would be an odd situation wherein you'd want to search for words with 3 or more accents or something. | ||
Do my unicode chars not come through okay? | |||
Nemokosch | not really. I mean, this is just universally sad. Here we are in 2023 and the best we could get is like, semi-cover fairly similar languages in IT | 17:47 | |
Voldenet | The problem is that one grapheme can be respresented by multiple codepoints | 17:48 | |
Nemokosch | anyway. What I think should (and might?) exist is still something like "capture this letter containing codepoint XYZ" | ||
clsn_ | OTOH, Hebrew and Arabic vowels, for example, or even Devanagari combining vowel marks, are more related to themselves and each other than to the letters they are on. á and é have nothing in common, particularly, but का and गा rhyme, both might represent similar grammatical constructions, etc. | 17:49 | |
17:49
cfa left
|
|||
clsn_ | Ideally not "containing codepoint XYZ" but "containing a regexp(?) of these codepoints" or at the very least "containing a codepoint out of this set". | 17:49 | |
Nemokosch | ngl this also sounds to me that Unicode itself is either misunderstood or contains problematic concepts | 17:51 | |
teatime | it is complex for sure | ||
clsn_ | It's even more so in Hebrew and Arabic. A word that is CONSONANT + QAMATS(05B3) + CONSONANT + PATAH(05B7) + CONSONANT is very distinctly third-person singular masculine past tense, simple construction. | ||
Nemokosch | like, if this \x[0591] is so useful on its own and an acute accent isn't, why aren't they distinguished on any conceptual level? | 17:52 | |
clsn_ | I don't need to know what the consonants are, but that's what that word means (there are exceptions and phonological concerns and blahblahblah but to first approximation.) | ||
0591 represents the chief sentential pause in the middle of a Biblical verse. | |||
web.meson.org/cache/Esth:3:12:.svg is an even more extreme example (the longest verse in the Hebrew Bible) | 17:55 | ||
The cantillations define and determine that tree. Just as one might parse an English sentence on periods and commas and semicolons (but the cantillations are more precisely-defined and fine-grained.) | |||
From a Unicode perspective, I guess combining characters are combining characters (they do have combining classes, though), and they don't try to distinguish ones which are more or less important than others, probably because they're not suppressing the ones of lesser importance. But here, NFG *does* "suppress" them, in some sense, in that you can't conceive of them without their bearers, and that sucks in the ones that have independent meaning as well. | 17:59 | ||
17:59
abraxxa-home left
18:00
reportable6 left
18:01
reportable6 joined
|
|||
clsn_ | For that matter, I don't think you can even search for "some hebrew letter followed by a TSERE" or whatever (i.e. use a character class for the base.) | 18:03 | |
Nemokosch | my point is that if they are so important, perhaps they should stand on their own, just like nobody would pretend that a comma or a dot is a combining character | ||
or any punctuation for that matter | |||
lizmat | clsn_: :ignoremark ? | ||
Voldenet | probably ignoremark won't work | 18:04 | |
lizmat | docs.raku.org/language/regexes.html#Ignoremark | ||
why wouldn't it ? | |||
clsn_ | I tried ignoremark. | ||
lizmat | example? | ||
clsn_ | That ignores the mark. But I don't want to ignore the mark! I want to search for a specific mark!! | ||
Voldenet | m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1497 (elem) $/.ords }> / | 18:05 | |
camelia | 「י」 | ||
lizmat | well, then search for the char with :ignoremark, and then check whether it is followed by a TSERE ? | ||
Voldenet | there's more than one way to do what you want | ||
<?{ }> is not very elegant solution, but a solution | |||
Nemokosch | a not very elegant solution to a not very elegant task 😅 | 18:06 | |
Voldenet | in fact | ||
m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1497 == $/.ord }> / | |||
camelia | 「י」 | ||
clsn_ | Maybe they should stand on their own. But Unicode considers combiningness from the point of view of graphics, not semantic sense. By adopting that, rakudo has placed ALL the combining characters in the same bucket. If there's a distinction that should be made, it will need to be made in rakudo. | ||
Nemokosch | > But Unicode considers combiningness from the point of view of graphics, not semantic sense. Holdya holdya. So far, all you said was how you have the Unicode perspective. | 18:07 | |
Voldenet | current combining characters situation is probably a tradeoff, since combining characters turn elegant constant-time algos into monsters | ||
clsn_ | I can certainly search codepoint-by-codepoint and find the characters I'm looking for. But then, once more, didn't God create regexes precisely to do this kind of job? I'm looking for the word that contains a \x[0591] in a string of words. How can I do that? | ||
Voldenet | but the above one _is_ the regex | 18:08 | |
… :) | |||
Nemokosch | the only problem with it is that it's slow-ish, really | ||
Voldenet | you can compose it and put more regexes in it | ||
clsn_ | That's how I understand what I think Unicode is doing; maybe I'm wrong about that. | ||
I'm sorry, I'm not seeing how that's working. Expecially since the thing you're matching is a letter without any diacritics. | 18:09 | ||
Nemokosch | m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1425== $/.ord }> / | 18:10 | |
Raku eval | Nil | ||
Nemokosch | meh, why ord | ||
clsn_ | Here... here's the whole verse. Please tell me a regex I can use to find the word with the 0591 under it: "כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃" | ||
Nemokosch | m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1425 (elem) $/.ords }> / | ||
Raku eval | 「כֶ֑」 | ||
Nemokosch | this was the better one | ||
clsn_ | That's the right letter, yes. Maybe one can do this after all? Placing other dummy letters around it? | 18:11 | |
Nemokosch | this literally does "take the letter and check what it's made of" | ||
clsn_ | (It's Genesis 3:5, btw; I just picked it arbitrarily when trying this out.) | ||
hm. so then could I say... | 18:12 | ||
Nemokosch | in either case, thank you for the journey at least | ||
Voldenet | m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~ / (\w<?{ 1497 (elem) $/.ords }>) / | ||
camelia | 「י」 0 => 「י」 |
||
Voldenet | perhaps this, but my terminal outputs it all as spaces | ||
Nemokosch | I wouldn't have thought for the life of me that something that has zero length can be this significant | ||
Voldenet | that… doesn't help | ||
Nemokosch | funky, it turned backwards | 18:13 | |
clsn_ | m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~/<:Lo>*.<?{ 1497 (elem) $/.ords}<:Lo>*/;' | 18:14 | |
camelia | ===SORRY!=== Error while compiling <tmp> Unable to parse expression in metachar:sym<assert>; couldn't find final '>' (corresponding starter was at line 1) at <tmp>:1 ------> ay $x ~~/<:Lo>*.<?{ 1497 (elem) $/.ords}⏏<:Lo>*/;' … |
||
clsn_ | bah, sorry, my rakudo regex-fu is very weak, it's been a looong time. | ||
The "turning backwards" is probably an artifact of the Bidi algorithm at work in your terminal, which is the cause of much headache and profanity. | 18:15 | ||
Voldenet | m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~/<:Lo>*.<?{ 1497 (elem) $/.ords}><:Lo>*/;' | ||
camelia | ===SORRY!=== Error while compiling <tmp> Unable to parse expression in single quotes; couldn't find final "'" (corresponding starter was at line 1) at <tmp>:1 ------> :Lo>*.<?{ 1497 (elem) $/.ords}><:Lo>*/;'⏏<EOL> expecting … |
||
Voldenet | m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~ /<:Lo>*.<?{ 1497 (elem) $/.ords }><:Lo>*/; | 18:16 | |
camelia | 「כִּ֚י יֹדֵ֣עַ」 | ||
Voldenet | apparently it works | ||
clsn_ | Not really, it's the wrong work. | ||
word. | |||
Still, it's catching a whole word... um, a whole PAIR of words... which is... is it better than just a letter? | 18:17 | ||
Wait, 0591 is 1425, not 1427 | |||
Voldenet | right :D | 18:18 | |
clsn_ | 1427 is HEBREW ACCENT SHALSHELET, 0593, which is a VERY rare cantillation and certainly not found in this verse. | ||
You can write 0x0591, right? With hex notation? That'll be less confusing. | |||
Voldenet | m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; my regex etnahta { .<?{ 1425 (elem) $/.ords }> }; say $x ~~ /<:Lo>*<etnahta><:Lo>*/; | 18:19 | |
camelia | 「עֵֽינֵיכֶ֑ם」 etnahta => 「כֶ֑」 |
||
Voldenet | you could simply do this | ||
it's probably more sane when you want to compose it | |||
you can use 0x591 if you want, the `{ 1425 (elem) $/.ords }` is regular subroutine | 18:20 | ||
clsn_ | Ugh, hard to read because of the Bidi stuff. But still. That... looks right, actually. | 18:22 | |
Still smacks slightly of not-ideal, but requiring you to use a subroutine just to pick out the combining character you want isn't THAT unreasonable. (though actually, I need to be able to check for any member of a *set* of combining characters, but that's probably generalizable from this.) | 18:24 | ||
Voldenet | I didn't test this for performance, maybe some form of checking substrings of .encode would've been faster | 18:25 | |
clsn_ | What you have here is maybe clumsier than it once was, but still has some elegance, thank you. | ||
Meh, I'm not terribly fussed about performance. Computers are fast enough that even slow for them is still fast, when dealing on the scale and number of instances I'm worried about. | 18:26 | ||
I'll have to see if/how I can shoehorn this in to my old program, but it looks like a promising path. | 18:27 | ||
18:28
grondilu left
|
|||
clsn_ | Anyway, so thanks very much, and maybe it's something for you to ponder as well. | 18:35 | |
Voldenet | I've sort of given up from expecting much from unicode | 18:36 | |
m: "ł".NFD.say | |||
camelia | NFD:0x<0142> | ||
Voldenet | common polish letter, l with a stroke, is defined as character, so it would never match l anyhow… | 18:37 | |
doesn't put high confidence in the standard itself | |||
clsn_ | Yeah, Unicode has plenty of st00pid in it. Some of it comes from the fact that encoding letters is just plain more complicated than it sounds, but much of it is... well... yeah, st00pid. | 18:39 | |
They have some tables, I think, for dealing with stuff like what you're talking about in SOME cases, but I'm pretty sure not in that case. Whatever; I'm not here to defend Unicode. I am fully aware of its flaws (some of them; I'm sure it has more I don't know about yet) and will not dispute faults you find in it. | 18:42 | ||
18:49
teatwo joined
18:52
teatime left
|
|||
[Coke] | I'm late, but if you're looking for the accent, then you probably want a different normalized form (with the combining chars split out), and then look for that. | 19:15 | |
m: say <e á é a>.NFD.grep: 0x0301 | 19:17 | ||
camelia | (769 769) | ||
[Coke] | m: say <e á é a>.map(*.NFD).grep(*.grep: 0x0301).map(*.Str) | 19:19 | |
camelia | (á é) | ||
[Coke] | there you go, that's more useful. | ||
you could replace that inner grep with a \c[] with the combining char's name (or the decimal codepoint) or whatever. | 19:22 | ||
This should also work if any of the graphemes have multiple combining chars. | 19:23 | ||
19:29
derpydoo left
19:48
jpn left
|
|||
[Coke] | www.perlfoundation.org/the-perl--r...rence.html is only showing last year | 19:49 | |
19:51
jpn joined
|
|||
clsn_ | It's not an accent, and it isn't like I can list all the letters it might be on. And it isn't an NFC/NFD thing, because it isn't something that can be precomposed anyway. But thanks! | 19:54 | |
[Coke] | then you should be able to see it in the ords for that grapheme, no? | 20:03 | |
(you should be able to skip the NFD step if it doesn't need decomposing, I mean.) | 20:04 | ||
20:09
jpn left
|
|||
clsn_ | You would think. Hm, so use grep instead of ~~? But is that looking through codepoint by codepoint? Which might not be a bad thing, to be fair. | 20:24 | |
So long as it is done a bit more elegantly than just a for-loop through the whole string! :) | |||
20:32
jpn joined
20:39
jpn left
20:43
jpn joined
20:49
jpn left
20:50
rf left
|
|||
[Coke] | I think this is a raku bug. Tried to install my own module, App::Unicode::Mangler, and got an error line like: | 21:19 | |
[App::Unicode::Mangle] Please u | |||
[App::Unicode::Mangle] se uniparse instead. | |||
I think something is trying to print "nice" whitespace there and failing. | |||
21:24
perlbot left,
simcop2387 left,
perlbot joined
|
|||
[Coke] | m: "e̸".ords.say # see, this has the ords already - if it was combinable, you'd get the combined char here. | 21:24 | |
camelia | (101 824) | ||
21:25
simcop2387 joined
21:30
perlbot left
21:33
perlbot joined
21:40
perlbot left,
perlbot joined
21:54
jpn joined
22:01
jpn left
|
|||
guifa | is nqp big integer the same as a Raku Int? | 22:13 | |
[Coke] | ¡nʞɐɹ# 'oʃʃǝH | 22:14 | |
No nqp types are exactly the Raku types. | |||
guifa | how can I convert any old Int into a big int for nqp use? I'm trying to find the fastest way to shift the char codes of a string by X | 22:15 | |
22:17
simcop2387 left,
simcop2387 joined
|
|||
lizmat | guifa: why would you need bigints for that ? | 22:28 | |
guifa | errr, I guess there are actually two separate ops there and my brain is a bit tired hahaha | 22:29 | |
step one is to do some math on big ints (because I don't want to error if numbers are two big) | |||
step two is then to shift the char codes by X | |||
lizmat | how would that look in Raku ? :-) | 22:30 | |
guifa | the second part, @str.ords.map(* + $adjust-value)>>.chr.join | 22:32 | |
I've been testing around to see the fasest method | |||
sorry $str | |||
lizmat | you realize that .ord will only produce the first codepoint of a grapheme | 22:33 | |
guifa | Yeah -- in this case, it's a guarantee that it's a single codepoint | ||
lizmat | ok, check | ||
so, if $adjust-value is 13, you're doing something like a rot13 | 22:34 | ||
guifa | $str.trans( <0 1 2 3 4 5 6 8 9> => <a b c d e f g h i j>) is the fastest native Raku method, but has a huge start up penalty, so unless numbers are regularly 100+ digits, the current winner is $new := $new ~ ($_ + 49).chr for ^$a.ords; | ||
yup | |||
lizmat | m: use nqp; say nqp::strfromcodes("foo".NFC) # does this give an idea ? | 22:39 | |
camelia | foo | ||
Voldenet | `$new := $new ~ ($_ + 49).chr for ^$a.ords` | 22:41 | |
doesn't it malloc for every character? | 22:42 | ||
No idea how can this be faster | |||
lizmat | m: use nqp; my int32 @a; @a.push($_ + 3) for "foo".NFC; say nqp::strfromcodes(@a) | ||
camelia | irr | ||
guifa | my int32 @temp; nqp::strtocodes($str, nqp::const::NORMALIZE_NFC, @temp); @temp[$_] += $adj; $str := nqp::strfromcodes(@temp) | ||
^^ that's basically about 3% faster than the trans method | 22:43 | ||
lizmat | only 3% ? | ||
guifa | that's why I think there should be a faster way | ||
lizmat | += is generally not the fastest | ||
guifa | also when I tried nqp::for(…, …) it says it expects a block, but I give it one | 22:46 | |
lizmat | nqp::for is an interesting beastb :-) | 22:48 | |
Voldenet | the faster way would be to use cstring, then avx256 sum it with 0x3131313131313131 | ||
reject sanity, embrace xs | |||
22:51
japhb left,
japhb joined
|
|||
guifa | Voldenet: ha, yeah. I mean, I get I'm basically doing something that's solving a problem Raku wasn't made to solve hahaha | 22:51 | |
Voldenet: ha, yeah. I mean, I get I'm basically doing something that's solving a problem Raku wasn't made to solve hahaha | 22:52 | ||
22:52
ugexe left
|
|||
guifa | It's just killing me I can't speed up number formatting by much more and probably 40% of it is not being able to do math on strings (understandable, Raku abstracts away a lot of that stuff intentionally) and 40% of it is wanting to supper arbitrarily large numbers | 22:54 | |
Voldenet | actually I think that stuff like `$str.ords.map(* + 49).map(*.chr).join` could be rewritten into vectorized form | 22:56 | |
22:57
jgaz left
|
|||
Voldenet | on the optimizer leve | 22:57 | |
Nemokosch | not sure if it would help here but did folks officially give up on moving away from libtommath? | 22:59 | |
in MoarVM that is | |||
23:01
ugexe joined
|
|||
guifa | Thankfully for formatting with West Arabic digits I can skip the rot'ing, but with any others I'll need to add them in (thankfully, that's an easy optimization) | 23:04 | |
23:12
jpn joined
|
|||
guifa | okay this is ugly as sin but it's def faster | 23:12 | |
nqp::strtocodes($str, nqp::const::NORMALIZE_NFC, @temp); nqp::bindpos_i(@temp,$_,nqp::add_i(nqp::atpos_i(@temp,$_),$adj)) for ^@temp; $str = nqp::strfromcodes(@temp) | 23:13 | ||
23:17
derpydoo joined
23:18
jpn left
|
|||
guifa | oh nice | 23:19 | |
changing out that for ^@temp with a my int32 $temp = nqp::elems(@temp); while($temp--, { ^^thatmess upthere }); knocks off another 15-20% | 23:20 | ||
[Coke] | is there a way to find out if your grapheme will render? | 23:21 | |
m: "d͖̤ᷛ᷼f͚ͯᷬ̒ ".uninames.say | 23:22 | ||
camelia | (LATIN SMALL LETTER D COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW COMBINING DIAERESIS BELOW COMBINING LATIN LETTER SMALL CAPITAL G COMBINING DOUBLE INVERTED BREVE BELOW LATIN SMALL LETTER F COMBINING DOUBLE RING BELOW COMBINING LATIN SMALL LETTER… | ||
[Coke] | in my local terminal, that's a box with a ? in it. It's valid unicode, but my terminal can't display it. | 23:23 | |
guifa | not from Raku at least -- you'd need to come up with some way to query the terminal, know what font it will use, and then figure out if the font has that character in its inventory | 23:24 | |
I think dwarren has some moduels for the font side of stuff | 23:26 | ||
[Coke] | s͔᷹o̟ᷔ ̵̢ę͚a̴̔s᷻́y͖ᷗ ̟᷽t̝̦o̵͡ ̯᷍gᷙ᷅o᷆̽ ̠ᷦoᷖᷪfᷧ̀f᷺ᷝ ᷇̏t̜̊hᷗ͘e̠͑ ̱ᷰṟᷮa᷇ͪíͭl̲ᷤs̩͍ | 23:28 | |
so easy to go off the rails | 23:29 | ||
made some slight improvements to github.com/coke/raku-unicode-mangler - at least it doesn't generate invalid characters now, just a lot of unprintables. :) | |||
Nemokosch | 😄 | 23:32 |