#moarvm on 16 May 2022 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:07 reportable6 left 00:08 reportable6 joined 00:17 [Coke] joined 00:38 [Coke] left 00:44 [Coke] joined 00:51 [Coke] left 01:02 [Coke] joined 01:08 [Coke] left, [Coke] joined
japhb	lizmat: A little bit looser than that, I think. Since nqp::sha1 encodes as utf-8, it should produce the same hash if the original text was NFC utf-8 (and thus should round trip to the same bytes)	01:11	Copy link Message link Add to gist Remove
01:16 frost joined 01:25 [Coke] left 01:41 [Coke] joined 01:54 [Coke] left 01:59 [Coke] joined 02:05 [Coke] left 02:18 tellable6 left, evalable6 left, evalable6 joined, tellable6 joined 04:12 [Coke] joined 04:18 [Coke] left 04:29 [Coke] joined 04:33 [Coke] left 04:35 [Coke] joined 05:23 frost left 05:41 ilogger2 left, harrow left 05:42 ilogger2 joined 05:45 [Coke] left, [Coke] joined 05:48 gfldex left, gfldex joined, harrow joined
Nicholas	good ,	06:02	Copy link Message link Add to gist Remove
06:07 reportable6 left 06:08 reportable6 joined 06:12 [Coke] left
MasterDuke	i wasn't surprised that sha1 was different from binarysha1, it was that binarysh1 of `int @a =` was different from binarysha1 of `$a :=`	06:18	Copy link Message link Add to gist Remove
tellable6	2022-05-16T02:29:41Z #raku-dev <vrurg> MasterDuke Do you know why do I get 'java.lang.RuntimeException: java.lang.NoSuchFieldException: field_1' when use nqp::atomicbindattr/nqp::casattr and even just nqp::cas in the core?		Copy link Message link Add to gist Remove
06:44 [Coke] joined 06:56 [Coke] left
lizmat	japhb: but does nqp::sha1 encode as utf-8? Doesn't it just read the underlying Uni ints ?	07:19	Copy link Message link Add to gist Remove
07:24 [Coke] joined 07:39 [Coke] left 07:43 sena_kun joined 07:47 sena_kun left 07:49 sena_kun joined 08:19 frost joined 09:19 [Coke] joined 09:44 [Coke] left 09:50 [Coke] joined 10:08 [Coke] left 10:20 [Coke] joined 10:27 Altai-man joined, [Coke] left 11:19 Kaipei left 11:30 Kaipei joined 12:07 reportable6 left 12:09 reportable6 joined 12:33 Kaipei left 12:41 [Coke] joined 12:52 Kaipei joined
lizmat	something weird I just noticed in compiling regexes:	13:11	Copy link Message link Add to gist Remove
	m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..1_000_000 /		Copy link Message link Add to gist Remove Run code
camelia	1.674430109 matched		Copy link Message link Add to gist Remove
lizmat	m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..* /	13:12	Copy link Message link Add to gist Remove Run code
camelia	0.029003786 matched		Copy link Message link Add to gist Remove
lizmat	feels to me there is room for optimization here: by codegenning an 1..* with perhaps a postfix check on the length of the match, and discarding if not in range ?	13:13	Copy link Message link Add to gist Remove
nine	Why does compiling / ^ a ** 1..1_000_000 / take more than a second in the first place?	13:16	Copy link Message link Add to gist Remove
	nine@sphinx:~> /usr/bin/time rakudo -e '/ ^ a ** 1..1_000_000 /'	13:17	Copy link Message link Add to gist Remove
	1.35user 0.01system 0:01.35elapsed 101%CPU (0avgtext+0avgdata 398048maxresident)k		Copy link Message link Add to gist Remove
	Seems quite excessive		Copy link Message link Add to gist Remove
Nicholas	it has to make a MEEEEELION things, and that doesn't come cheap?		Copy link Message link Add to gist Remove
	(as in, does '/ ^ a ** 1..10_000_000 /' take 10 times as long?)	13:18	Copy link Message link Add to gist Remove
lizmat	m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..2_000_000 /	13:19	Copy link Message link Add to gist Remove Run code
camelia	3.606554059 matched		Copy link Message link Add to gist Remove
lizmat	I'd say that is roughly true, even worse :-)		Copy link Message link Add to gist Remove
nine	github.com/Raku/nqp/blob/master/sr...A.nqp#L445		Copy link Message link Add to gist Remove
Nicholas	it was worse than linear for me too (on a very debugging build) - factor of 24 slower to create 10 times as many	13:23	Copy link Message link Add to gist Remove
	GC, GC, it's off to sweep we go	13:24	Copy link Message link Add to gist Remove
jnthnwrthngtn	It'll be producing an NFA with the node repeated each time, because that's how you do repetitions with an explicit upper found in an NFA	13:33	Copy link Message link Add to gist Remove
	Of course, we never actually use the NFA at present, so it's rather wasteful	13:34	Copy link Message link Add to gist Remove
lizmat	why are we not using the NFA ?		Copy link Message link Add to gist Remove
	or just in this example ?		Copy link Message link Add to gist Remove
jnthnwrthngtn	At present we only use it when we need to make an LTM decision, so in an alternation or a proto regex	13:36	Copy link Message link Add to gist Remove
lizmat	and we can't know this at compile time, I guiess	13:38	Copy link Message link Add to gist Remove
	*guess		Copy link Message link Add to gist Remove
jnthnwrthngtn	We probably can, or at least can convey it to the right place	13:41	Copy link Message link Add to gist Remove
	I mean, an anonymous /.../ regex can't possibly be a proto regex		Copy link Message link Add to gist Remove
14:47 frost left
lizmat	and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2022/05/16/2022-20-439-468/	14:54	Copy link Message link Add to gist Remove
15:56 linkable6 left, evalable6 left, evalable6 joined 15:58 linkable6 joined 16:50 [Coke] left 17:02 Altai-man left 17:24 [Coke] joined 18:06 reportable6 left, reportable6 joined
MasterDuke	lizmat: yep, sha1 encodes as utf8 github.com/MoarVM/MoarVM/blob/mast...on.c#L3200	18:56	Copy link Message link Add to gist Remove
19:09 MasterDuke left 19:31 MasterDuke joined 20:00 [Coke] left, [Coke] joined
MasterDuke	anyone have some intuition whether it would make sense to create storage type specific versions of things like MVM_string_gi_get_grapheme github.com/MoarVM/MoarVM/blob/mast...#L155-L187 ?	20:07	Copy link Message link Add to gist Remove
	to get rid of some of those branches in what are frequently hot paths	20:08	Copy link Message link Add to gist Remove
japhb	MasterDuke: Since you asked for "some intuition", I have three pieces :-) -- 1. Depending on its callers, MVM_string_get_grapheme_at_nocheck may be soaking up some of the improvement you could make, by taking the shorter paths. 2. Depending on how smart the compiler, I can see a fair amount of code hoisting and optimization possible, so you'd probably want to look at the actual produced assembly code. 3.	20:33	Copy link Message link Add to gist Remove
	There might still be some advantage from fast-pathing a string that contains no MVM_STRING_STRAND components by grapheme type, using function pointers or something, but I expect a cascade of API changes from that, which could end up being a big diff with lots of opportunities for mistakes ....	20:34	Copy link Message link Add to gist Remove
	In this case the iterator is acting as a generic core of an inner loop that has been extracted into something that probably doesn't get inlined, so I suspect that's a fair percentage of the overall cost.	20:36	Copy link Message link Add to gist Remove
	OK, I guess that was #4.		Copy link Message link Add to gist Remove
MasterDuke	all true. but i look at something like github.com/MoarVM/MoarVM/blob/mast...#L582-L605 and wonder if even a single other path for `storage_type == MVM_STRING_GRAPHEME_32 && !replacement` might be worth it	20:37	Copy link Message link Add to gist Remove
	so that we can speed up encoding things to utf8 in a probably very common case	20:39	Copy link Message link Add to gist Remove
japhb	Are you trying to speed up grapheme iteration or codepoint iteration?	20:42	Copy link Message link Add to gist Remove
20:42 sena_kun left
MasterDuke	oh	20:42	Copy link Message link Add to gist Remove
	ha	20:43	Copy link Message link Add to gist Remove
	oh wait again, MVM_string_ci_get_codepoint does end up calling MVM_string_gi_get_grapheme	20:44	Copy link Message link Add to gist Remove
	i'm trying to speed up utf8 encoding	20:45	Copy link Message link Add to gist Remove
japhb	Yeah, I figured it probably would at some point, but it wasn't clear to me whether you were doing a holistic thing, or just trying to speed up something you'd determined to be a hotspot -- and if so, which it was. :-)	20:46	Copy link Message link Add to gist Remove
MasterDuke	well, i've seen MVM_string_utf8_encode_substr in a couple different profiles, but i'll admit i don't have anything more detailed than that level	20:47	Copy link Message link Add to gist Remove
japhb	I have two conflicting voices inside my head right now -- one is going "Yes! Pierce all the abstractions, make a fastpath that peels paint off the nearest office walls!" and the other is going "Oh god that will be a maintenance nightmare! What happens the next time Unicode decides to change some set of key rules?" Of course, encoding from already-normalized internal strings is probably less likely to	20:49	Copy link Message link Add to gist Remove
	change than decoding unnormalized utf8.		Copy link Message link Add to gist Remove
	For this I guess there's an old trick -- replace the actual iteration with a constant, and see if it gets any faster.	20:50	Copy link Message link Add to gist Remove
MasterDuke	looks like perf thinks a decent chunk of the time spent in *_encode_substr is from MVM_string_gi_get_grapheme		Copy link Message link Add to gist Remove
japhb	Ah, actual data! :-D	20:51	Copy link Message link Add to gist Remove
20:52 sena_kun joined
MasterDuke	maybe i'll experiment with doing just enough to create that single fast path and see it helps	20:53	Copy link Message link Add to gist Remove
japhb	(y)	20:54	Copy link Message link Add to gist Remove
21:33 sena_kun left 22:04 samcv left 22:05 samcv joined 22:14 codesections left, codesections joined 22:18 discord-raku-bot left, discord-raku-bot joined 22:28 [Coke] left 22:37 [Coke] joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!