Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:07 reportable6 left 00:08 reportable6 joined 00:17 [Coke] joined 00:38 [Coke] left 00:44 [Coke] joined 00:51 [Coke] left 01:02 [Coke] joined 01:08 [Coke] left, [Coke] joined
japhb lizmat: A little bit looser than that, I think. Since nqp::sha1 encodes as utf-8, it should produce the same hash *if* the original text was NFC utf-8 (and thus should round trip to the same bytes) 01:11
01:16 frost joined 01:25 [Coke] left 01:41 [Coke] joined 01:54 [Coke] left 01:59 [Coke] joined 02:05 [Coke] left 02:18 tellable6 left, evalable6 left, evalable6 joined, tellable6 joined 04:12 [Coke] joined 04:18 [Coke] left 04:29 [Coke] joined 04:33 [Coke] left 04:35 [Coke] joined 05:23 frost left 05:41 ilogger2 left, harrow left 05:42 ilogger2 joined 05:45 [Coke] left, [Coke] joined 05:48 gfldex left, gfldex joined, harrow joined
Nicholas good *, * 06:02
06:07 reportable6 left 06:08 reportable6 joined 06:12 [Coke] left
MasterDuke i wasn't surprised that sha1 was different from binarysha1, it was that binarysh1 of `int @a =` was different from binarysha1 of `$a :=` 06:18
tellable6 2022-05-16T02:29:41Z #raku-dev <vrurg> MasterDuke Do you know why do I get 'java.lang.RuntimeException: java.lang.NoSuchFieldException: field_1' when use nqp::atomicbindattr/nqp::casattr and even just nqp::cas in the core?
06:44 [Coke] joined 06:56 [Coke] left
lizmat japhb: but does nqp::sha1 encode as utf-8? Doesn't it just read the underlying Uni ints ? 07:19
07:24 [Coke] joined 07:39 [Coke] left 07:43 sena_kun joined 07:47 sena_kun left 07:49 sena_kun joined 08:19 frost joined 09:19 [Coke] joined 09:44 [Coke] left 09:50 [Coke] joined 10:08 [Coke] left 10:20 [Coke] joined 10:27 Altai-man joined, [Coke] left 11:19 Kaipei left 11:30 Kaipei joined 12:07 reportable6 left 12:09 reportable6 joined 12:33 Kaipei left 12:41 [Coke] joined 12:52 Kaipei joined
lizmat something weird I just noticed in compiling regexes: 13:11
m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..1_000_000 /
camelia 1.674430109
lizmat m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..* / 13:12
camelia 0.029003786
lizmat feels to me there is room for optimization here: by codegenning an 1..* with perhaps a postfix check on the length of the match, and discarding if not in range ? 13:13
nine Why does compiling / ^ a ** 1..1_000_000 / take more than a second in the first place? 13:16
nine@sphinx:~> /usr/bin/time rakudo -e '/ ^ a ** 1..1_000_000 /' 13:17
1.35user 0.01system 0:01.35elapsed 101%CPU (0avgtext+0avgdata 398048maxresident)k
Seems quite excessive
Nicholas it has to make a MEEEEELION things, and that doesn't come cheap?
(as in, does '/ ^ a ** 1..10_000_000 /' take 10 times as long?) 13:18
lizmat m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..2_000_000 / 13:19
camelia 3.606554059
lizmat I'd say that is roughly true, even worse :-)
nine github.com/Raku/nqp/blob/master/sr...A.nqp#L445
Nicholas it was worse than linear for me too (on a very debugging build) - factor of 24 slower to create 10 times as many 13:23
GC, GC, it's off to sweep we go 13:24
jnthnwrthngtn It'll be producing an NFA with the node repeated each time, because that's how you do repetitions with an explicit upper found in an NFA 13:33
Of course, we never actually use the NFA at present, so it's rather wasteful 13:34
lizmat why are we not using the NFA ?
or just in this example ?
jnthnwrthngtn At present we only use it when we need to make an LTM decision, so in an alternation or a proto regex 13:36
lizmat and we can't know this at compile time, I guiess 13:38
jnthnwrthngtn We probably can, or at least can convey it to the right place 13:41
I mean, an anonymous /.../ regex can't possibly be a proto regex
14:47 frost left
lizmat and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2022/05/16/2022-20-439-468/ 14:54
15:56 linkable6 left, evalable6 left, evalable6 joined 15:58 linkable6 joined 16:50 [Coke] left 17:02 Altai-man left 17:24 [Coke] joined 18:06 reportable6 left, reportable6 joined
MasterDuke lizmat: yep, sha1 encodes as utf8 github.com/MoarVM/MoarVM/blob/mast...on.c#L3200 18:56
19:09 MasterDuke left 19:31 MasterDuke joined 20:00 [Coke] left, [Coke] joined
MasterDuke anyone have some intuition whether it would make sense to create storage type specific versions of things like MVM_string_gi_get_grapheme github.com/MoarVM/MoarVM/blob/mast...#L155-L187 ? 20:07
to get rid of some of those branches in what are frequently hot paths 20:08
japhb MasterDuke: Since you asked for "some intuition", I have three pieces :-) -- 1. Depending on its callers, MVM_string_get_grapheme_at_nocheck may be soaking up some of the improvement you could make, by taking the shorter paths. 2. Depending on how smart the compiler, I can see a fair amount of code hoisting and optimization possible, so you'd probably want to look at the actual produced assembly code. 3. 20:33
There might still be some advantage from fast-pathing a string that contains no MVM_STRING_STRAND components by grapheme type, using function pointers or something, but I expect a cascade of API changes from that, which could end up being a big diff with lots of opportunities for mistakes .... 20:34
In this case the iterator is acting as a generic core of an inner loop that has been extracted into something that probably doesn't get inlined, so I suspect that's a fair percentage of the overall cost. 20:36
OK, I guess that was #4.
MasterDuke all true. but i look at something like github.com/MoarVM/MoarVM/blob/mast...#L582-L605 and wonder if even a singleĀ  other path for `storage_type == MVM_STRING_GRAPHEME_32 && !replacement` might be worth it 20:37
so that we can speed up encoding things to utf8 in a probably very common case 20:39
japhb Are you trying to speed up grapheme iteration or codepoint iteration? 20:42
20:42 sena_kun left
MasterDuke oh 20:42
ha 20:43
oh wait again, MVM_string_ci_get_codepoint does end up calling MVM_string_gi_get_grapheme 20:44
i'm trying to speed up utf8 encoding 20:45
japhb Yeah, I figured it probably would at some point, but it wasn't clear to me whether you were doing a holistic thing, or just trying to speed up something you'd determined to be a hotspot -- and if so, which it was. :-) 20:46
MasterDuke well, i've seen MVM_string_utf8_encode_substr in a couple different profiles, but i'll admit i don't have anything more detailed than that level 20:47
japhb I have two conflicting voices inside my head right now -- one is going "Yes! Pierce all the abstractions, make a fastpath that peels paint off the nearest office walls!" and the other is going "Oh god that will be a maintenance nightmare! What happens the next time Unicode decides to change some set of key rules?" Of course, encoding from already-normalized internal strings is probably less likely to 20:49
change than *decoding* unnormalized utf8.
For this I guess there's an old trick -- replace the actual iteration with a constant, and see if it gets any faster. 20:50
MasterDuke looks like perf thinks a decent chunk of the time spent in *_encode_substr is from MVM_string_gi_get_grapheme
japhb Ah, actual data! :-D 20:51
20:52 sena_kun joined
MasterDuke maybe i'll experiment with doing just enough to create that single fast path and see it helps 20:53
japhb (y) 20:54
21:33 sena_kun left 22:04 samcv left 22:05 samcv joined 22:14 codesections left, codesections joined 22:18 discord-raku-bot left, discord-raku-bot joined 22:28 [Coke] left 22:37 [Coke] joined