#raku on 17 March 2023 - Raku Programming Language Log

🦋 Welcome to the MAIN() IRC channel of the Raku Programming Language (raku.org). Log available at irclogs.raku.org/raku/live.html . If you're a beginner, you can also check out the #raku-beginner channel! Set by lizmat on 6 September 2022.
00:00 reportable6 left 00:02 reportable6 joined 00:06 simcop2387 left, perlbot left 00:07 simcop2387 joined 00:08 perlbot joined
rf	So turns out you need === defined for CArray[my_cstruct_type] to work	00:28	Copy link Message link Add to gist Remove
	=== (Str $foo, my_cstruct_type $bar)		Copy link Message link Add to gist Remove
00:33 derpydoo joined 01:03 xinming left 01:06 xinming joined 01:11 clsn_ joined 02:11 evalable6 left, tellable6 left, bloatable6 left, statisfiable6 left, benchable6 left, squashable6 left, sourceable6 left, shareable6 left, bisectable6 left, releasable6 left, unicodable6 left, committable6 left, quotable6 left, nativecallable6 left, reportable6 left, linkable6 left, notable6 left, greppable6 left, coverable6 left, quotable6 joined, nativecallable6 joined, sourceable6 joined, linkable6 joined, notable6 joined 02:12 tellable6 joined, shareable6 joined, committable6 joined, reportable6 joined 02:13 greppable6 joined, squashable6 joined, bisectable6 joined, statisfiable6 joined, coverable6 joined, bloatable6 joined 02:14 benchable6 joined, evalable6 joined, releasable6 joined, unicodable6 joined 02:25 MasterDuke left 02:31 rf left 02:38 codesections left 03:31 Xliff left 03:46 swaggboi left 04:05 swaggboi joined 05:05 coverable6 left, releasable6 left, benchable6 left, evalable6 left, quotable6 left, bisectable6 left, shareable6 left, sourceable6 left, greppable6 left, unicodable6 left, linkable6 left, reportable6 left, notable6 left, committable6 left, tellable6 left, bloatable6 left, squashable6 left, statisfiable6 left, nativecallable6 left, sourceable6 joined, nativecallable6 joined 05:06 benchable6 joined, quotable6 joined, coverable6 joined, notable6 joined 05:07 releasable6 joined, bisectable6 joined, greppable6 joined, committable6 joined, unicodable6 joined, squashable6 joined, evalable6 joined, statisfiable6 joined, shareable6 joined 05:08 linkable6 joined, reportable6 joined, bloatable6 joined, tellable6 joined 05:11 wbvalid joined 05:18 wbvalid left 05:23 jpn joined 05:28 jpn left 06:00 reportable6 left 06:02 reportable6 joined 06:49 teatime joined 06:52 teatwo left 07:16 Sgeo left 07:29 Max51 joined 07:30 Max51 left 07:45 jpn joined 08:04 jpn left 08:06 jpn joined 08:24 jpn left 08:35 abraxxa joined 08:37 simcop2387 left 08:38 simcop2387 joined, perlbot left 08:40 perlbot joined 09:04 discord-raku-bot left 09:05 discord-raku-bot joined 09:41 ab5tract joined 09:45 jpn joined
tbrowder__	g'day, all. does anyone have a working workflows/windows.yml for modules on github?	11:02	Copy link Message link Add to gist Remove
11:13 linkable6 left, evalable6 left 11:14 linkable6 joined, evalable6 joined
Nemokosch	wouldn't bet my life on that, good sir. But hope dies last	11:16	Copy link Message link Add to gist Remove
tbrowder__	👍🏻	11:48	Copy link Message link Add to gist Remove
11:50 petro-cuniculo joined 11:53 gcd left 11:57 petro-cuniculo left 12:00 reportable6 left 12:03 reportable6 joined, abraxxa left 13:03 linkable6 left, evalable6 left 13:06 evalable6 joined, linkable6 joined 13:43 rf joined
rf	Morning folks	13:43	Copy link Message link Add to gist Remove
13:44 jgaz joined 13:46 jpn left 13:49 jpn joined
Anton Antonov	@rf Morning, you, Haskel apologist !	13:50	Copy link Message link Add to gist Remove
	And monad-promoter…	13:52	Copy link Message link Add to gist Remove
Voldenet	Promises are monads and they're everywhere	13:53	Copy link Message link Add to gist Remove
	monad-ish	13:54	Copy link Message link Add to gist Remove
rf	Anton :P	13:56	Copy link Message link Add to gist Remove
13:57 jpn left 14:00 jpn joined 14:05 jpn left
Voldenet	say (await Promise.kept(Promise.kept(42))).WHAT	14:10	Copy link Message link Add to gist Remove
evalable6	(Promise)		Copy link Message link Add to gist Remove
Voldenet	this looks more monadish than js impl, that would just return 42 in that case		Copy link Message link Add to gist Remove
14:10 jpn joined
[Coke]	would appreciate if someone could review the "is it a bug" question in github.com/Raku/doc/issues/4271	14:18	Copy link Message link Add to gist Remove
lizmat	my question would be: did it recently change, or has it always been this way?	14:20	Copy link Message link Add to gist Remove
14:20 abraxxa-home joined
Nemokosch	why would if ever topicalize? 🤔	14:22	Copy link Message link Add to gist Remove
lizmat	yeah, it feels like an implementation detail		Copy link Message link Add to gist Remove
Nemokosch	> The with statement is like if, but tests for definedness rather than truth, and it topicalizes on the condition, much like given:	14:26	Copy link Message link Add to gist Remove
	so sounds like the documentation contradicts itself	14:27	Copy link Message link Add to gist Remove
	> You may intermix if-based and with-based clauses. this is the interesting part...	14:28	Copy link Message link Add to gist Remove
	m: if 0 { .say } orwith Nil { .say } else { .say }		Copy link Message link Add to gist Remove
Raku eval	Nil		Copy link Message link Add to gist Remove
	Nil	14:29	Copy link Message link Add to gist Remove
Nemokosch	perhaps this is what it's trying to say		Copy link Message link Add to gist Remove
	m: if 0 { .say } orwith Nil { .say } elsif 12 { .say } else { .say }	14:30	Copy link Message link Add to gist Remove
Raku eval	(Any)		Copy link Message link Add to gist Remove
Nemokosch	this seems surprising to me, though		Copy link Message link Add to gist Remove
	the former else clause ran, as an elsif clause, and this time it un-topicalized		Copy link Message link Add to gist Remove
14:32 jpn left 14:34 jpn joined
rf	Voldenet: Monads are a container with a map and bind	14:36	Copy link Message link Add to gist Remove
	(and return) but that isn't super important		Copy link Message link Add to gist Remove
14:39 jpn left
rf	Not sure if promise fits it perfectly	14:39	Copy link Message link Add to gist Remove
dutchie	do you not need return to do the bind/join equivalence	14:40	Copy link Message link Add to gist Remove
Woodi	rf: but monad-ish can mean "clousure" too ;)		Copy link Message link Add to gist Remove
14:41 simcop2387 left 14:42 perlbot left, perlbot_ joined, simcop2387 joined
rf	Woodi: Not sure what you mean by that	14:42	Copy link Message link Add to gist Remove
Woodi	rf: just trying to abuse meanings becouse of some similiarities :)	14:43	Copy link Message link Add to gist Remove
	not even sure what "bind" is, too lispy :)		Copy link Message link Add to gist Remove
14:43 perlbot_ is now known as perlbot
rf	bind : M a -> (a -> M b) -> M b	14:43	Copy link Message link Add to gist Remove
Woodi	so M is domain of values ?	14:44	Copy link Message link Add to gist Remove
	and result is which part ?		Copy link Message link Add to gist Remove
	but assumed functions...	14:45	Copy link Message link Add to gist Remove
rf	M is a monad, a is the type held within the monad		Copy link Message link Add to gist Remove
Nemokosch	let's keep it simple	14:48	Copy link Message link Add to gist Remove
	which operation returns a monad, and which a value?		Copy link Message link Add to gist Remove
rf	Bind always returns a monad	14:49	Copy link Message link Add to gist Remove
Woodi	whay it is doubled ? a -> M b -> M b ?	14:50	Copy link Message link Add to gist Remove
rf	(a -> M b) is another function	14:51	Copy link Message link Add to gist Remove
Woodi	then what -> means ?	14:52	Copy link Message link Add to gist Remove
rf	en.wikipedia.org/wiki/Partial_application	14:53	Copy link Message link Add to gist Remove
exp	lol a wikipedia page on computer science is not going to make things any more understandable	14:54	Copy link Message link Add to gist Remove
	they unironically care only about number of facts expressed, not how many people understand what's written		Copy link Message link Add to gist Remove
Woodi	so bind is function that returns monad that changes values into ... ?	14:55	Copy link Message link Add to gist Remove
	I thinked about bind in Lisp like some kind of pointer...	14:56	Copy link Message link Add to gist Remove
Nemokosch	oh		Copy link Message link Add to gist Remove
14:56 jpn joined
	so bind is the one that takes a function that constructs the new monad directly	14:56	Copy link Message link Add to gist Remove
dutchie	if we stick to just talking about promises, bind corresponds to the thenmethod		Copy link Message link Add to gist Remove
	then method		Copy link Message link Add to gist Remove
Nemokosch	the then method, when you directly return a Promise in the callback	14:57	Copy link Message link Add to gist Remove
dutchie	yeah exactly		Copy link Message link Add to gist Remove
	the callback is the a -> M b		Copy link Message link Add to gist Remove
Woodi	partial application describes curring ?	14:58	Copy link Message link Add to gist Remove
dutchie	the invocant is the M a which gets "unwrapped" and fed into the callback		Copy link Message link Add to gist Remove
14:58 perlbot left, simcop2387 left
Nemokosch	Promise.resolve(42).then(x => { const funky = Math.random()*x; return Promise.resolve(funky); })	14:58	Copy link Message link Add to gist Remove
	in pseudocode that absolutely isn't Javascript ^^	14:59	Copy link Message link Add to gist Remove
dutchie	Woodi: they are closely related yes. a "curried" function takes multiple args by returning another function with those args "partially applied"		Copy link Message link Add to gist Remove
	some people are more precise than others in keeping the two terms distinct		Copy link Message link Add to gist Remove
Woodi	dutchie: my math teacher said: understand and then memorize or memorize and then understand :)	15:00	Copy link Message link Add to gist Remove
Nemokosch	yeah I guess think of Haskell	15:01	Copy link Message link Add to gist Remove
	from what I know, Haskell only has functions that take one argument	15:02	Copy link Message link Add to gist Remove
Woodi	so looks curring use partial application or even is p.a. ...		Copy link Message link Add to gist Remove
	Nemokosch: only one ? crazy :)	15:03	Copy link Message link Add to gist Remove
tellable6	Woodi, I'll pass your message to Nemokosch		Copy link Message link Add to gist Remove
Nemokosch	I don't know Haskell syntax but going by this logic, a function that "takes several parameters", would be called like f(1)('asd')(True)		Copy link Message link Add to gist Remove
15:04 perlbot joined
	where f would return a new function that would return a new function that would return.... you get the idea	15:04	Copy link Message link Add to gist Remove
Woodi	sounds in order ;)		Copy link Message link Add to gist Remove
15:05 simcop2387 joined
Nemokosch	and on each call, the current parameter is built into the returned function	15:05	Copy link Message link Add to gist Remove
	at which point it's just a matter of approach if you say "it has n unbound variables" or you say it's an nth order function	15:06	Copy link Message link Add to gist Remove
15:10 Sgeo joined 15:11 tbrowder_ joined
rf	Nemo bind will "unwrap" the first monad and feed the unwrapped value to a new function (the second parameter) which returns a new monad	15:12	Copy link Message link Add to gist Remove
	github.com/rawleyfowler/Monad-Resu...lt.rakumod		Copy link Message link Add to gist Remove
	^ That repo implements a monad if you;re interested Woodi	15:14	Copy link Message link Add to gist Remove
	Also Nemo you are correct a function call in Haskell is like f(foo)(bar)(baz)	15:15	Copy link Message link Add to gist Remove
Anton Antonov	@Voldenet "Promises are monads and they're everywhere" -- you are on record, I will verify the monad axioms on promises (and be vocal if you wrong.)	15:21	Copy link Message link Add to gist Remove
	@Voldenet "monad-ish" -- nice escape (from rigorous feedback.)	15:22	Copy link Message link Add to gist Remove
rf	Hahahaha	15:23	Copy link Message link Add to gist Remove
15:25 grondilu joined
Anton Antonov	@rf I considered working on a post that criticizes your monad approach. Diced to postpone it indefinitely.	15:25	Copy link Message link Add to gist Remove
rf	I am not opposed to counter ideas, though, I haven't heard a compelling one against monads yet.	15:26	Copy link Message link Add to gist Remove
Voldenet	I once said about that about js since "ye it's mostly monads" but then it wasn't using composition properly		Copy link Message link Add to gist Remove
	because then(a).then(b) is different depending on whether return value is Promise or not	15:27	Copy link Message link Add to gist Remove
rf	then is map		Copy link Message link Add to gist Remove
Voldenet	from then(x=>a(b(x)))		Copy link Message link Add to gist Remove
Woodi	rf: checking		Copy link Message link Add to gist Remove
Anton Antonov	@rf My point of view on monads is how much a monadic system (e.g. a Raku package) makes the code written with it to have algebraic properties.		Copy link Message link Add to gist Remove
Voldenet	hence my test above		Copy link Message link Add to gist Remove
	say (await Promise.kept(Promise.kept(42))).WHAT		Copy link Message link Add to gist Remove
evalable6	(Promise)		Copy link Message link Add to gist Remove
Voldenet	it's at least not as bad as js	15:28	Copy link Message link Add to gist Remove
rf	Anton: I am more interested in abstracting side-effects than algebraic properties		Copy link Message link Add to gist Remove
	It benefits the consumers of code to use Monadds as well so you can describe the intent of the code	15:29	Copy link Message link Add to gist Remove
Voldenet	I bet you can sort of do algebraic effects in raku if you like pain	15:30	Copy link Message link Add to gist Remove
Anton Antonov	@rf Sure. But, I leverage the algebraic properties when I make translations from natural language DSLs into programming language DSLs. (And vice-versa.) Hence, the algebraic properties for me are important.		Copy link Message link Add to gist Remove
Voldenet	and .throw/.resume combo		Copy link Message link Add to gist Remove
Anton Antonov	@Voldenet Dully noted.		Copy link Message link Add to gist Remove
Nemokosch	then is kind of both bind and map, from what I understand	15:31	Copy link Message link Add to gist Remove
rf	I really dislike exceptions, which is why I made Monad::Result, I think its very gross to make the caller decipher what possible exceptions can be thrown		Copy link Message link Add to gist Remove
	then is just map, map : M a -> (a -> b) -> M b		Copy link Message link Add to gist Remove
Nemokosch	well, then join me on the dark side and let's dislike control exceptions together 😛	15:32	Copy link Message link Add to gist Remove
rf	CATCH { default } on every block is just as bad IMO		Copy link Message link Add to gist Remove
	Plus it's not enforced or implied so uncaught exceptions are far too common		Copy link Message link Add to gist Remove
Nemokosch	false negatives are worse than false positives with this really	15:33	Copy link Message link Add to gist Remove
	when you only see that some of your assumptions didn't hold		Copy link Message link Add to gist Remove
Anton Antonov	@rf You and @Nemokosch but be on the same gray side. (Or same far side gallery.)		Copy link Message link Add to gist Remove
Woodi	rf: "exceptions" looks like "sudden explosions" :) but concept of shortcuts in execution flow should be usefull... if we have good behaving code like calculations...	15:34	Copy link Message link Add to gist Remove
Anton Antonov	@rf @Voldenet Here is a (very schematic) flowchart of my monads-for-DSLs workflow: raw.githubusercontent.com/antononc...agents.jpg		Copy link Message link Add to gist Remove
rf	That is an interesting approach	15:36	Copy link Message link Add to gist Remove
	Woodi: Most software I write needs to be triple redundant and have 0 exceptions, thus why I prefer monads over exceptions. Shortcuts can simply be expressed as function composition assuming the types align	15:37	Copy link Message link Add to gist Remove
	Which is also one of the main concepts behind Humming-Bird ^		Copy link Message link Add to gist Remove
Voldenet	m: class Effect is Exception { has $.x is rw; }; CATCH { when Effect { .x = 42; .resume; } }; my $x = Effect.new; $x.throw; say $x.x	15:38	Copy link Message link Add to gist Remove Run code
camelia	42		Copy link Message link Add to gist Remove
Voldenet	I'm begging you, don't use the above thing		Copy link Message link Add to gist Remove
	it sort of works though		Copy link Message link Add to gist Remove
rf	Anton you are the first ML person I have heard even use the word Monad :D	15:39	Copy link Message link Add to gist Remove
Anton Antonov	@rf Basically, if I can use "your" monads if I can put the operations in a reduce statement. For example, reduces(&my-monad-bind, my-monad-unit-object(), [&some-op1, &some-op2, &some-op3, &take-value] ) .		Copy link Message link Add to gist Remove
rf	Yes that should work	15:40	Copy link Message link Add to gist Remove
Anton Antonov	@rf Ok, good. (Meaning, "you are on record and will try to verify.")	15:41	Copy link Message link Add to gist Remove
	@rf ML people use monads, but they do not know and/or use the terminology.	15:42	Copy link Message link Add to gist Remove
rf	As long as your ops are -> M a -> (a -> M b) -> M b		Copy link Message link Add to gist Remove
Voldenet	most people use monads and algebraic effects		Copy link Message link Add to gist Remove
Anton Antonov	@rf Right, the associativity rule.		Copy link Message link Add to gist Remove
Voldenet	they just buried in N layers of their language abstraction		Copy link Message link Add to gist Remove
	s/they/they're/	15:43	Copy link Message link Add to gist Remove
Anton Antonov	@Voldenet Most Data Science people do not want to program. So, whatever simplifications are used to make the required work more palatable.	15:44	Copy link Message link Add to gist Remove
Voldenet	I remember showing my data sci. code to data scientist, he scratched his head and said he didn't get the code :/	15:46	Copy link Message link Add to gist Remove
	(I tried to abstract away data science part so I could get to my programming one…)		Copy link Message link Add to gist Remove
Anton Antonov	@Voldenet Right, hence, I make/use natural language DSLs for Data Science. They still say the same.	15:47	Copy link Message link Add to gist Remove
Voldenet	that makes sense		Copy link Message link Add to gist Remove
Anton Antonov	Hopefully, I am not overestimating the interest in this -- here is an example of data wrangling Python code generation from sequences of natural language commands: github.com/antononcube/RakuForPred...thon.ipynb	15:51	Copy link Message link Add to gist Remove
	Or, if you prefer, the Raku code results version: github.com/antononcube/RakuForPred...Raku.ipynb	15:52	Copy link Message link Add to gist Remove
15:57 perlbot left 15:58 simcop2387 left 16:01 perlbot joined 16:02 simcop2387 joined
clsn_	So. Haven't worked with raku in a long time, and some things have changed. Right now, I can't see how it's possible to make a regex that matches a combining character (or set thereof). I can only match base characters and specify combining characters on them if I want, but I'm searching for the actual combining character which may be on any of many bases (and may even have other combining chars with it.)	16:54	Copy link Message link Add to gist Remove
	This is not an unrealistic request, by the way. Not everything is like é where the accent isn't something you'd want to search for without the letter. I'm working with Hebrew cantillation marks, which are like punctuation that happen to be written as combining characters.	16:56	Copy link Message link Add to gist Remove
	m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ /\x[0591]/;	16:59	Copy link Message link Add to gist Remove Run code
camelia	Nil		Copy link Message link Add to gist Remove
17:02 linkable6 left, evalable6 left 17:03 linkable6 joined 17:04 evalable6 joined
Nemokosch	strings are normalized according to NFC	17:09	Copy link Message link Add to gist Remove
clsn_	Yes, which is fair enough... But rakudo, from what I've seen, matches stuff according to its "NFG". How might I write a regex that can match the 0591 in that string? NFC vs NFD isn't really relevant; none of the characters there are or can be precomposed.	17:11	Copy link Message link Add to gist Remove
Nemokosch	I'd expect a regex to operate on the level of characters, not codepoints	17:14	Copy link Message link Add to gist Remove
clsn_	Well, from a Unicode perspective, \x[0591] is a character, so I'm not sure what you mean. If you mean by graphemes, that sort of presumes that it doesn't make sense to search for an \x[0591] because it is written as a diacritic, yet that makes just as little sense as saying that it doesn't make sense to search for a comma in a sentence.	17:15	Copy link Message link Add to gist Remove
Nemokosch	I'm not sure if it's still a character after NFC	17:17	Copy link Message link Add to gist Remove
	but sure thing, definitely not a grapheme, and a high-level string has characters as graphemes		Copy link Message link Add to gist Remove
17:18 codesections joined
clsn_	NFC, as I understand it, is "combine everything that can be combined into precomposed characters," and nothing in the example string can make up a precomposed character. Am I misunderstanding you?	17:18	Copy link Message link Add to gist Remove
	Well, then, how would I write a grammar to search for it? It may not be written as a spacing character, but it is exactly as reasonable to search for it as it is to search for a comma or semicolon in English text.	17:19	Copy link Message link Add to gist Remove
Nemokosch	This is probably beyond me. There is stuff like this docs.raku.org/type/Uni.html	17:23	Copy link Message link Add to gist Remove
	but whether it works with regex stuff, no clue	17:24	Copy link Message link Add to gist Remove
clsn_	So I could convert it to something more unicode-ish, but can I then use regex--- I see.		Copy link Message link Add to gist Remove
	This is actually for pretty much the ONLY program I've ever written in rakudo, apart from contributions I made to the actual project. And it used to work. Many years ago.	17:25	Copy link Message link Add to gist Remove
17:26 cfa joined
cfa	bisectable6: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ /\x[0591]/;	17:26	Copy link Message link Add to gist Remove
bisectable6	cfa, Will bisect the whole range automagically because no endpoints were provided, hang tight		Copy link Message link Add to gist Remove
	cfa, ¦6c (67 commits): «Nil␤»		Copy link Message link Add to gist Remove
	cfa, Nothing to bisect!		Copy link Message link Add to gist Remove
Nemokosch	that must have been a lot of years ago for sure	17:27	Copy link Message link Add to gist Remove
	probably prior to MoarVM, and MoarVM has been the state of art runtime since like 2013		Copy link Message link Add to gist Remove
clsn_	It was a REALLY long time ago; I'm not sure I can find quite how long it was. Eh, I probably have logs someplace...		Copy link Message link Add to gist Remove
Nemokosch	anyway, now I'm not convinced that it is intended to work	17:28	Copy link Message link Add to gist Remove
clsn_	Yeah, the latest commit in my repo is from December 2011.		Copy link Message link Add to gist Remove
Nemokosch	there is a candidate for smartmatching Uni against Regex github.com/rakudo/rakudo/blob/2022...ex.pm6#L47 but it basically converts to Str and calls it a day	17:29	Copy link Message link Add to gist Remove
clsn_	It may or may not be "right" for it to work as stated, but I think there definitely needs to be some way to make it work, or you're really missing something important.		Copy link Message link Add to gist Remove
	i.e. converting to some form or another that regex-matches on codepoints or something like that.	17:30	Copy link Message link Add to gist Remove
Nemokosch	what I doubt, though, is that this is high-level enough to fall into regex territory		Copy link Message link Add to gist Remove
clsn_	s/i\.e\./e.g./ (can't believe I used \ for that...)		Copy link Message link Add to gist Remove
Nemokosch	yeah that sounds horrible tbh, to replace a part of a grapheme	17:31	Copy link Message link Add to gist Remove
clsn_	Well, I still contend that if regexes can't do it in any fashion, then you're failing to capture or make available something very important and not unreasonable for people to want to do. I present my own program as evidence of that (granted, one might argue that I only barely qualify as "people"...)	17:32	Copy link Message link Add to gist Remove
	I could easily see someone studying Hebrew or Arabic doing searches for vowel-patterns (which indicate grammatical forms).		Copy link Message link Add to gist Remove
Nemokosch	well I'm just saying that it perhaps doesn't fall into regex territory		Copy link Message link Add to gist Remove
clsn_	and the Hebrew Bible cantillations are part of a well-understood and well-defined grammar.	17:33	Copy link Message link Add to gist Remove
	Not certain what that really means, or if that answers. You can do that, you just can't use regexes for it? And yet it's matching patterns in a string of characters, isn't that what regexes are supposed to do for a living? Why should someone have to write up their own homegrown regex-matcher just for certain kinds of characters?	17:34	Copy link Message link Add to gist Remove
Nemokosch	they are not "characters" on Str level	17:36	Copy link Message link Add to gist Remove
clsn_	My program from way back when would parse a Biblical sentence according to the structure of sentential breaks encoded by the cantillations and output a tree graph in dot format. That's parsing text with a grammar.		Copy link Message link Add to gist Remove
Nemokosch	And like, regex is not meant for any pattern matching. For example, you can't just arbitrarily match binary patterns in the unicode representation	17:37	Copy link Message link Add to gist Remove
	I mean, sorry for your loss		Copy link Message link Add to gist Remove
cfa	here's another example,	17:38	Copy link Message link Add to gist Remove
	m: say "u\x[0308]" ~~ /\x[0308]/		Copy link Message link Add to gist Remove Run code
camelia	Nil		Copy link Message link Add to gist Remove
clsn_	web.meson.org/cache/Esth:8:9.png		Copy link Message link Add to gist Remove
Nemokosch	But I'm not convinced that this is a problem with the regex itself, as it clearly works on the principle that a character is a grapheme		Copy link Message link Add to gist Remove
clsn_	I can see that this is a limitation of the way rakudo has chosen to define strings and regexes. But I wonder if that choice is defensible in the face of, well, not being able to do exactly what regexes and grammars are supposed to do.	17:39	Copy link Message link Add to gist Remove
Nemokosch	frankly I don't know about Unicode enough to understand what makes a "combining character" a "character", in this jargon		Copy link Message link Add to gist Remove
	Again, I don't think regexes (let alone grammars) are supposed to dig this deep	17:40	Copy link Message link Add to gist Remove
clsn_	Eh, that's because "character" sounds like it should be some graphical unit, i.e. a grapheme, so it's hard to see a combining character as one.		Copy link Message link Add to gist Remove
Voldenet	if you don't mind performance hit then		Copy link Message link Add to gist Remove
Nemokosch	So what is it exactly, that it isn't just called a codepoint?		Copy link Message link Add to gist Remove
clsn_	But whyever not? As I said, it's a very reasonable thing to ask a grammar to do.		Copy link Message link Add to gist Remove
Voldenet	m: my $x="עֵֽינֵיכֶ֑ם"; say 0x591 (elem) $x.ords;	17:41	Copy link Message link Add to gist Remove Run code
camelia	True		Copy link Message link Add to gist Remove
Nemokosch	You said so yes but it didn't sound any different from saying that grammars are for binary inspection.		Copy link Message link Add to gist Remove
clsn_	In Unicode parlance, character and codepoint can be almost interchangeable. Indeed, I understand what you mean about having trouble seeing it as a character, but coming from a more Unicode-centric POV myself, I find the opposite to be true.		Copy link Message link Add to gist Remove
Nemokosch	Also, you earlier made the distinction from é. ("Not everything is like é where the accent isn't something you'd want to search for without the letter.")	17:42	Copy link Message link Add to gist Remove
	what backs this distinction up, that could be somehow integrated?	17:43	Copy link Message link Add to gist Remove
clsn_	I don't know. Binary patterns are not regex-fodder because they don't generally have structural meaning that's useful for pattern-matching in most strings. Combining characters do. I guess there's some fuzziness in that argument.		Copy link Message link Add to gist Remove
	Ah, that's a better question...		Copy link Message link Add to gist Remove
Nemokosch	Yes, this whole fuzziness		Copy link Message link Add to gist Remove
clsn_	OK, let's see if I can explain what I mean by that, and maybe I'm wrong about the distinction as well...		Copy link Message link Add to gist Remove
Nemokosch	that even though "combining characters" fall back into being codepoints and hence just binary data specified by Unicode, they can matter on textual level sometimes apparently	17:44	Copy link Message link Add to gist Remove
clsn_	An é is, in a sense, a letter in itself. That's (kinda) why it has a precomposed codepoint, or at least why it was thought at some point to be worth encoding precomposed and Unicode inherited it. And even if considered as an e plus an acute accent, there's nothing in common between e+acute and a+acute. They're independent of one another.	17:45	Copy link Message link Add to gist Remove
Nemokosch	> Ã© oof	17:46	Copy link Message link Add to gist Remove
clsn_	It's not like it's completely impossible, but it would be an odd situation wherein you'd want to search for words with 3 or more accents or something.		Copy link Message link Add to gist Remove
	Do my unicode chars not come through okay?		Copy link Message link Add to gist Remove
Nemokosch	not really. I mean, this is just universally sad. Here we are in 2023 and the best we could get is like, semi-cover fairly similar languages in IT	17:47	Copy link Message link Add to gist Remove
Voldenet	The problem is that one grapheme can be respresented by multiple codepoints	17:48	Copy link Message link Add to gist Remove
Nemokosch	anyway. What I think should (and might?) exist is still something like "capture this letter containing codepoint XYZ"		Copy link Message link Add to gist Remove
clsn_	OTOH, Hebrew and Arabic vowels, for example, or even Devanagari combining vowel marks, are more related to themselves and each other than to the letters they are on. á and é have nothing in common, particularly, but का and गा rhyme, both might represent similar grammatical constructions, etc.	17:49	Copy link Message link Add to gist Remove
17:49 cfa left
clsn_	Ideally not "containing codepoint XYZ" but "containing a regexp(?) of these codepoints" or at the very least "containing a codepoint out of this set".	17:49	Copy link Message link Add to gist Remove
Nemokosch	ngl this also sounds to me that Unicode itself is either misunderstood or contains problematic concepts	17:51	Copy link Message link Add to gist Remove
teatime	it is complex for sure		Copy link Message link Add to gist Remove
clsn_	It's even more so in Hebrew and Arabic. A word that is CONSONANT + QAMATS(05B3) + CONSONANT + PATAH(05B7) + CONSONANT is very distinctly third-person singular masculine past tense, simple construction.		Copy link Message link Add to gist Remove
Nemokosch	like, if this \x[0591] is so useful on its own and an acute accent isn't, why aren't they distinguished on any conceptual level?	17:52	Copy link Message link Add to gist Remove
clsn_	I don't need to know what the consonants are, but that's what that word means (there are exceptions and phonological concerns and blahblahblah but to first approximation.)		Copy link Message link Add to gist Remove
	0591 represents the chief sentential pause in the middle of a Biblical verse.		Copy link Message link Add to gist Remove
	web.meson.org/cache/Esth:3:12:.svg is an even more extreme example (the longest verse in the Hebrew Bible)	17:55	Copy link Message link Add to gist Remove
	The cantillations define and determine that tree. Just as one might parse an English sentence on periods and commas and semicolons (but the cantillations are more precisely-defined and fine-grained.)		Copy link Message link Add to gist Remove
	From a Unicode perspective, I guess combining characters are combining characters (they do have combining classes, though), and they don't try to distinguish ones which are more or less important than others, probably because they're not suppressing the ones of lesser importance. But here, NFG does "suppress" them, in some sense, in that you can't conceive of them without their bearers, and that sucks in the ones that have independent meaning as well.	17:59	Copy link Message link Add to gist Remove
17:59 abraxxa-home left 18:00 reportable6 left 18:01 reportable6 joined
clsn_	For that matter, I don't think you can even search for "some hebrew letter followed by a TSERE" or whatever (i.e. use a character class for the base.)	18:03	Copy link Message link Add to gist Remove
Nemokosch	my point is that if they are so important, perhaps they should stand on their own, just like nobody would pretend that a comma or a dot is a combining character		Copy link Message link Add to gist Remove
	or any punctuation for that matter		Copy link Message link Add to gist Remove
lizmat	clsn_: :ignoremark ?		Copy link Message link Add to gist Remove
Voldenet	probably ignoremark won't work	18:04	Copy link Message link Add to gist Remove
lizmat	docs.raku.org/language/regexes.html#Ignoremark		Copy link Message link Add to gist Remove
	why wouldn't it ?		Copy link Message link Add to gist Remove
clsn_	I tried ignoremark.		Copy link Message link Add to gist Remove
lizmat	example?		Copy link Message link Add to gist Remove
clsn_	That ignores the mark. But I don't want to ignore the mark! I want to search for a specific mark!!		Copy link Message link Add to gist Remove
Voldenet	m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1497 (elem) $/.ords }> /	18:05	Copy link Message link Add to gist Remove Run code
camelia	｢י｣		Copy link Message link Add to gist Remove
lizmat	well, then search for the char with :ignoremark, and then check whether it is followed by a TSERE ?		Copy link Message link Add to gist Remove
Voldenet	there's more than one way to do what you want		Copy link Message link Add to gist Remove
	<?{ }> is not very elegant solution, but a solution		Copy link Message link Add to gist Remove
Nemokosch	a not very elegant solution to a not very elegant task 😅	18:06	Copy link Message link Add to gist Remove
Voldenet	in fact		Copy link Message link Add to gist Remove
	m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1497 == $/.ord }> /		Copy link Message link Add to gist Remove Run code
camelia	｢י｣		Copy link Message link Add to gist Remove
clsn_	Maybe they should stand on their own. But Unicode considers combiningness from the point of view of graphics, not semantic sense. By adopting that, rakudo has placed ALL the combining characters in the same bucket. If there's a distinction that should be made, it will need to be made in rakudo.		Copy link Message link Add to gist Remove
Nemokosch	> But Unicode considers combiningness from the point of view of graphics, not semantic sense. Holdya holdya. So far, all you said was how you have the Unicode perspective.	18:07	Copy link Message link Add to gist Remove
Voldenet	current combining characters situation is probably a tradeoff, since combining characters turn elegant constant-time algos into monsters		Copy link Message link Add to gist Remove
clsn_	I can certainly search codepoint-by-codepoint and find the characters I'm looking for. But then, once more, didn't God create regexes precisely to do this kind of job? I'm looking for the word that contains a \x[0591] in a string of words. How can I do that?		Copy link Message link Add to gist Remove
Voldenet	but the above one _is_ the regex	18:08	Copy link Message link Add to gist Remove
	… :)		Copy link Message link Add to gist Remove
Nemokosch	the only problem with it is that it's slow-ish, really		Copy link Message link Add to gist Remove
Voldenet	you can compose it and put more regexes in it		Copy link Message link Add to gist Remove
clsn_	That's how I understand what I think Unicode is doing; maybe I'm wrong about that.		Copy link Message link Add to gist Remove
	I'm sorry, I'm not seeing how that's working. Expecially since the thing you're matching is a letter without any diacritics.	18:09	Copy link Message link Add to gist Remove
Nemokosch	m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1425== $/.ord }> /	18:10	Copy link Message link Add to gist Remove
Raku eval	Nil		Copy link Message link Add to gist Remove
Nemokosch	meh, why ord		Copy link Message link Add to gist Remove
clsn_	Here... here's the whole verse. Please tell me a regex I can use to find the word with the 0591 under it: "כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"		Copy link Message link Add to gist Remove
Nemokosch	m: my $x="עֵֽינֵיכֶ֑ם"; say $x ~~ / .<?{ 1425 (elem) $/.ords }> /		Copy link Message link Add to gist Remove
Raku eval	｢כֶ֑｣		Copy link Message link Add to gist Remove
Nemokosch	this was the better one		Copy link Message link Add to gist Remove
clsn_	That's the right letter, yes. Maybe one can do this after all? Placing other dummy letters around it?	18:11	Copy link Message link Add to gist Remove
Nemokosch	this literally does "take the letter and check what it's made of"		Copy link Message link Add to gist Remove
clsn_	(It's Genesis 3:5, btw; I just picked it arbitrarily when trying this out.)		Copy link Message link Add to gist Remove
	hm. so then could I say...	18:12	Copy link Message link Add to gist Remove
Nemokosch	in either case, thank you for the journey at least		Copy link Message link Add to gist Remove
Voldenet	m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~ / (\w<?{ 1497 (elem) $/.ords }>) /		Copy link Message link Add to gist Remove Run code
camelia	｢י｣ 0 => ｢י｣		Copy link Message link Add to gist Remove
Voldenet	perhaps this, but my terminal outputs it all as spaces		Copy link Message link Add to gist Remove
Nemokosch	I wouldn't have thought for the life of me that something that has zero length can be this significant		Copy link Message link Add to gist Remove
Voldenet	that… doesn't help		Copy link Message link Add to gist Remove
Nemokosch	funky, it turned backwards	18:13	Copy link Message link Add to gist Remove
clsn_	m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~/<:Lo>.<?{ 1497 (elem) $/.ords}<:Lo>/;'	18:14	Copy link Message link Add to gist Remove Run code
camelia	===SORRY!=== Error while compiling <tmp> Unable to parse expression in metachar:sym<assert>; couldn't find final '>' (corresponding starter was at line 1) at <tmp>:1 ------> ay $x ~~/<:Lo>.<?{ 1497 (elem) $/.ords}⏏<:Lo>/;' …		Copy link Message link Add to gist Remove
clsn_	bah, sorry, my rakudo regex-fu is very weak, it's been a looong time.		Copy link Message link Add to gist Remove
	The "turning backwards" is probably an artifact of the Bidi algorithm at work in your terminal, which is the cause of much headache and profanity.	18:15	Copy link Message link Add to gist Remove
Voldenet	m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~/<:Lo>.<?{ 1497 (elem) $/.ords}><:Lo>/;'		Copy link Message link Add to gist Remove
camelia	===SORRY!=== Error while compiling <tmp> Unable to parse expression in single quotes; couldn't find final "'" (corresponding starter was at line 1) at <tmp>:1 ------> :Lo>.<?{ 1497 (elem) $/.ords}><:Lo>/;'⏏<EOL> expecting …		Copy link Message link Add to gist Remove
Voldenet	m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; say $x ~~ /<:Lo>.<?{ 1497 (elem) $/.ords }><:Lo>/;	18:16	Copy link Message link Add to gist Remove
camelia	｢כִּ֚י יֹדֵ֣עַ｣		Copy link Message link Add to gist Remove
Voldenet	apparently it works		Copy link Message link Add to gist Remove
clsn_	Not really, it's the wrong work.		Copy link Message link Add to gist Remove
	word.		Copy link Message link Add to gist Remove
	Still, it's catching a whole word... um, a whole PAIR of words... which is... is it better than just a letter?	18:17	Copy link Message link Add to gist Remove
	Wait, 0591 is 1425, not 1427		Copy link Message link Add to gist Remove
Voldenet	right :D	18:18	Copy link Message link Add to gist Remove
clsn_	1427 is HEBREW ACCENT SHALSHELET, 0593, which is a VERY rare cantillation and certainly not found in this verse.		Copy link Message link Add to gist Remove
	You can write 0x0591, right? With hex notation? That'll be less confusing.		Copy link Message link Add to gist Remove
Voldenet	m: my $x="כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּיוֹם֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִהְיִיתֶם֙ כֵּֽאלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָרָֽע׃"; my regex etnahta { .<?{ 1425 (elem) $/.ords }> }; say $x ~~ /<:Lo><etnahta><:Lo>/;	18:19	Copy link Message link Add to gist Remove Run code
camelia	｢עֵֽינֵיכֶ֑ם｣ etnahta => ｢כֶ֑｣		Copy link Message link Add to gist Remove
Voldenet	you could simply do this		Copy link Message link Add to gist Remove
	it's probably more sane when you want to compose it		Copy link Message link Add to gist Remove
	you can use 0x591 if you want, the `{ 1425 (elem) $/.ords }` is regular subroutine	18:20	Copy link Message link Add to gist Remove
clsn_	Ugh, hard to read because of the Bidi stuff. But still. That... looks right, actually.	18:22	Copy link Message link Add to gist Remove
	Still smacks slightly of not-ideal, but requiring you to use a subroutine just to pick out the combining character you want isn't THAT unreasonable. (though actually, I need to be able to check for any member of a set of combining characters, but that's probably generalizable from this.)	18:24	Copy link Message link Add to gist Remove
Voldenet	I didn't test this for performance, maybe some form of checking substrings of .encode would've been faster	18:25	Copy link Message link Add to gist Remove
clsn_	What you have here is maybe clumsier than it once was, but still has some elegance, thank you.		Copy link Message link Add to gist Remove
	Meh, I'm not terribly fussed about performance. Computers are fast enough that even slow for them is still fast, when dealing on the scale and number of instances I'm worried about.	18:26	Copy link Message link Add to gist Remove
	I'll have to see if/how I can shoehorn this in to my old program, but it looks like a promising path.	18:27	Copy link Message link Add to gist Remove
18:28 grondilu left
clsn_	Anyway, so thanks very much, and maybe it's something for you to ponder as well.	18:35	Copy link Message link Add to gist Remove
Voldenet	I've sort of given up from expecting much from unicode	18:36	Copy link Message link Add to gist Remove
	m: "ł".NFD.say		Copy link Message link Add to gist Remove Run code
camelia	NFD:0x<0142>		Copy link Message link Add to gist Remove
Voldenet	common polish letter, l with a stroke, is defined as character, so it would never match l anyhow…	18:37	Copy link Message link Add to gist Remove
	doesn't put high confidence in the standard itself		Copy link Message link Add to gist Remove
clsn_	Yeah, Unicode has plenty of st00pid in it. Some of it comes from the fact that encoding letters is just plain more complicated than it sounds, but much of it is... well... yeah, st00pid.	18:39	Copy link Message link Add to gist Remove
	They have some tables, I think, for dealing with stuff like what you're talking about in SOME cases, but I'm pretty sure not in that case. Whatever; I'm not here to defend Unicode. I am fully aware of its flaws (some of them; I'm sure it has more I don't know about yet) and will not dispute faults you find in it.	18:42	Copy link Message link Add to gist Remove
18:49 teatwo joined 18:52 teatime left
[Coke]	I'm late, but if you're looking for the accent, then you probably want a different normalized form (with the combining chars split out), and then look for that.	19:15	Copy link Message link Add to gist Remove
	m: say <e á é a>.NFD.grep: 0x0301	19:17	Copy link Message link Add to gist Remove Run code
camelia	(769 769)		Copy link Message link Add to gist Remove
[Coke]	m: say <e á é a>.map(.NFD).grep(.grep: 0x0301).map(*.Str)	19:19	Copy link Message link Add to gist Remove Run code
camelia	(á é)		Copy link Message link Add to gist Remove
[Coke]	there you go, that's more useful.		Copy link Message link Add to gist Remove
	you could replace that inner grep with a \c[] with the combining char's name (or the decimal codepoint) or whatever.	19:22	Copy link Message link Add to gist Remove
	This should also work if any of the graphemes have multiple combining chars.	19:23	Copy link Message link Add to gist Remove
19:29 derpydoo left 19:48 jpn left
[Coke]	www.perlfoundation.org/the-perl--r...rence.html is only showing last year	19:49	Copy link Message link Add to gist Remove
19:51 jpn joined
clsn_	It's not an accent, and it isn't like I can list all the letters it might be on. And it isn't an NFC/NFD thing, because it isn't something that can be precomposed anyway. But thanks!	19:54	Copy link Message link Add to gist Remove
[Coke]	then you should be able to see it in the ords for that grapheme, no?	20:03	Copy link Message link Add to gist Remove
	(you should be able to skip the NFD step if it doesn't need decomposing, I mean.)	20:04	Copy link Message link Add to gist Remove
20:09 jpn left
clsn_	You would think. Hm, so use grep instead of ~~? But is that looking through codepoint by codepoint? Which might not be a bad thing, to be fair.	20:24	Copy link Message link Add to gist Remove
	So long as it is done a bit more elegantly than just a for-loop through the whole string! :)		Copy link Message link Add to gist Remove
20:32 jpn joined 20:39 jpn left 20:43 jpn joined 20:49 jpn left 20:50 rf left
[Coke]	I think this is a raku bug. Tried to install my own module, App::Unicode::Mangler, and got an error line like:	21:19	Copy link Message link Add to gist Remove
	[App::Unicode::Mangle] Please u		Copy link Message link Add to gist Remove
	[App::Unicode::Mangle] se uniparse instead.		Copy link Message link Add to gist Remove
	I think something is trying to print "nice" whitespace there and failing.		Copy link Message link Add to gist Remove
21:24 perlbot left, simcop2387 left, perlbot joined
[Coke]	m: "e̸".ords.say # see, this has the ords already - if it was combinable, you'd get the combined char here.	21:24	Copy link Message link Add to gist Remove Run code
camelia	(101 824)		Copy link Message link Add to gist Remove
21:25 simcop2387 joined 21:30 perlbot left 21:33 perlbot joined 21:40 perlbot left, perlbot joined 21:54 jpn joined 22:01 jpn left
guifa	is nqp big integer the same as a Raku Int?	22:13	Copy link Message link Add to gist Remove
[Coke]	¡nʞɐɹ# 'oʃʃǝH	22:14	Copy link Message link Add to gist Remove
	No nqp types are exactly the Raku types.		Copy link Message link Add to gist Remove
guifa	how can I convert any old Int into a big int for nqp use? I'm trying to find the fastest way to shift the char codes of a string by X	22:15	Copy link Message link Add to gist Remove
22:17 simcop2387 left, simcop2387 joined
lizmat	guifa: why would you need bigints for that ?	22:28	Copy link Message link Add to gist Remove
guifa	errr, I guess there are actually two separate ops there and my brain is a bit tired hahaha	22:29	Copy link Message link Add to gist Remove
	step one is to do some math on big ints (because I don't want to error if numbers are two big)		Copy link Message link Add to gist Remove
	step two is then to shift the char codes by X		Copy link Message link Add to gist Remove
lizmat	how would that look in Raku ? :-)	22:30	Copy link Message link Add to gist Remove
guifa	the second part, @str.ords.map(* + $adjust-value)>>.chr.join	22:32	Copy link Message link Add to gist Remove
	I've been testing around to see the fasest method		Copy link Message link Add to gist Remove
	sorry $str		Copy link Message link Add to gist Remove
lizmat	you realize that .ord will only produce the first codepoint of a grapheme	22:33	Copy link Message link Add to gist Remove
guifa	Yeah -- in this case, it's a guarantee that it's a single codepoint		Copy link Message link Add to gist Remove
lizmat	ok, check		Copy link Message link Add to gist Remove
	so, if $adjust-value is 13, you're doing something like a rot13	22:34	Copy link Message link Add to gist Remove
guifa	$str.trans( <0 1 2 3 4 5 6 8 9> => <a b c d e f g h i j>) is the fastest native Raku method, but has a huge start up penalty, so unless numbers are regularly 100+ digits, the current winner is $new := $new ~ ($_ + 49).chr for ^$a.ords;		Copy link Message link Add to gist Remove
	yup		Copy link Message link Add to gist Remove
lizmat	m: use nqp; say nqp::strfromcodes("foo".NFC) # does this give an idea ?	22:39	Copy link Message link Add to gist Remove Run code
camelia	foo		Copy link Message link Add to gist Remove
Voldenet	`$new := $new ~ ($_ + 49).chr for ^$a.ords`	22:41	Copy link Message link Add to gist Remove
	doesn't it malloc for every character?	22:42	Copy link Message link Add to gist Remove
	No idea how can this be faster		Copy link Message link Add to gist Remove
lizmat	m: use nqp; my int32 @a; @a.push($_ + 3) for "foo".NFC; say nqp::strfromcodes(@a)		Copy link Message link Add to gist Remove Run code
camelia	irr		Copy link Message link Add to gist Remove
guifa	my int32 @temp; nqp::strtocodes($str, nqp::const::NORMALIZE_NFC, @temp); @temp[$_] += $adj; $str := nqp::strfromcodes(@temp)		Copy link Message link Add to gist Remove
	^^ that's basically about 3% faster than the trans method	22:43	Copy link Message link Add to gist Remove
lizmat	only 3% ?		Copy link Message link Add to gist Remove
guifa	that's why I think there should be a faster way		Copy link Message link Add to gist Remove
lizmat	+= is generally not the fastest		Copy link Message link Add to gist Remove
guifa	also when I tried nqp::for(…, …) it says it expects a block, but I give it one	22:46	Copy link Message link Add to gist Remove
lizmat	nqp::for is an interesting beastb :-)	22:48	Copy link Message link Add to gist Remove
Voldenet	the faster way would be to use cstring, then avx256 sum it with 0x3131313131313131		Copy link Message link Add to gist Remove
	reject sanity, embrace xs		Copy link Message link Add to gist Remove
22:51 japhb left, japhb joined
guifa	Voldenet: ha, yeah. I mean, I get I'm basically doing something that's solving a problem Raku wasn't made to solve hahaha	22:51	Copy link Message link Add to gist Remove
	Voldenet: ha, yeah. I mean, I get I'm basically doing something that's solving a problem Raku wasn't made to solve hahaha	22:52	Copy link Message link Add to gist Remove
22:52 ugexe left
guifa	It's just killing me I can't speed up number formatting by much more and probably 40% of it is not being able to do math on strings (understandable, Raku abstracts away a lot of that stuff intentionally) and 40% of it is wanting to supper arbitrarily large numbers	22:54	Copy link Message link Add to gist Remove
Voldenet	actually I think that stuff like `$str.ords.map(* + 49).map(*.chr).join` could be rewritten into vectorized form	22:56	Copy link Message link Add to gist Remove
22:57 jgaz left
Voldenet	on the optimizer leve	22:57	Copy link Message link Add to gist Remove
Nemokosch	not sure if it would help here but did folks officially give up on moving away from libtommath?	22:59	Copy link Message link Add to gist Remove
	in MoarVM that is		Copy link Message link Add to gist Remove
23:01 ugexe joined
guifa	Thankfully for formatting with West Arabic digits I can skip the rot'ing, but with any others I'll need to add them in (thankfully, that's an easy optimization)	23:04	Copy link Message link Add to gist Remove
23:12 jpn joined
guifa	okay this is ugly as sin but it's def faster	23:12	Copy link Message link Add to gist Remove
	nqp::strtocodes($str, nqp::const::NORMALIZE_NFC, @temp); nqp::bindpos_i(@temp,$_,nqp::add_i(nqp::atpos_i(@temp,$_),$adj)) for ^@temp; $str = nqp::strfromcodes(@temp)	23:13	Copy link Message link Add to gist Remove
23:17 derpydoo joined 23:18 jpn left
guifa	oh nice	23:19	Copy link Message link Add to gist Remove
	changing out that for ^@temp with a my int32 $temp = nqp::elems(@temp); while($temp--, { ^^thatmess upthere }); knocks off another 15-20%	23:20	Copy link Message link Add to gist Remove
[Coke]	is there a way to find out if your grapheme will render?	23:21	Copy link Message link Add to gist Remove
	m: "d͖̤ᷛ᷼f͚ͯᷬ̒ ".uninames.say	23:22	Copy link Message link Add to gist Remove Run code
camelia	(LATIN SMALL LETTER D COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW COMBINING DIAERESIS BELOW COMBINING LATIN LETTER SMALL CAPITAL G COMBINING DOUBLE INVERTED BREVE BELOW LATIN SMALL LETTER F COMBINING DOUBLE RING BELOW COMBINING LATIN SMALL LETTER…		Copy link Message link Add to gist Remove
[Coke]	in my local terminal, that's a box with a ? in it. It's valid unicode, but my terminal can't display it.	23:23	Copy link Message link Add to gist Remove
guifa	not from Raku at least -- you'd need to come up with some way to query the terminal, know what font it will use, and then figure out if the font has that character in its inventory	23:24	Copy link Message link Add to gist Remove
	I think dwarren has some moduels for the font side of stuff	23:26	Copy link Message link Add to gist Remove
[Coke]	s͔᷹o̟ᷔ ̵̢ę͚a̴̔s᷻́y͖ᷗ ̟᷽t̝̦o̵͡ ̯᷍gᷙ᷅o᷆̽ ̠ᷦoᷖᷪfᷧ̀f᷺ᷝ ᷇̏t̜̊hᷗ͘e̠͑ ̱ᷰṟᷮa᷇ͪíͭl̲ᷤs̩͍	23:28	Copy link Message link Add to gist Remove
	so easy to go off the rails	23:29	Copy link Message link Add to gist Remove
	made some slight improvements to github.com/coke/raku-unicode-mangler - at least it doesn't generate invalid characters now, just a lot of unprintables. :)		Copy link Message link Add to gist Remove
Nemokosch	😄	23:32	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!