Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:07
reportable6 left
00:08
reportable6 joined
00:17
[Coke] joined
00:38
[Coke] left
00:44
[Coke] joined
00:51
[Coke] left
01:02
[Coke] joined
01:08
[Coke] left,
[Coke] joined
|
|||
japhb | lizmat: A little bit looser than that, I think. Since nqp::sha1 encodes as utf-8, it should produce the same hash *if* the original text was NFC utf-8 (and thus should round trip to the same bytes) | 01:11 | |
01:16
frost joined
01:25
[Coke] left
01:41
[Coke] joined
01:54
[Coke] left
01:59
[Coke] joined
02:05
[Coke] left
02:18
tellable6 left,
evalable6 left,
evalable6 joined,
tellable6 joined
04:12
[Coke] joined
04:18
[Coke] left
04:29
[Coke] joined
04:33
[Coke] left
04:35
[Coke] joined
05:23
frost left
05:41
ilogger2 left,
harrow left
05:42
ilogger2 joined
05:45
[Coke] left,
[Coke] joined
05:48
gfldex left,
gfldex joined,
harrow joined
|
|||
Nicholas | good *, * | 06:02 | |
06:07
reportable6 left
06:08
reportable6 joined
06:12
[Coke] left
|
|||
MasterDuke | i wasn't surprised that sha1 was different from binarysha1, it was that binarysh1 of `int @a =` was different from binarysha1 of `$a :=` | 06:18 | |
tellable6 | 2022-05-16T02:29:41Z #raku-dev <vrurg> MasterDuke Do you know why do I get 'java.lang.RuntimeException: java.lang.NoSuchFieldException: field_1' when use nqp::atomicbindattr/nqp::casattr and even just nqp::cas in the core? | ||
06:44
[Coke] joined
06:56
[Coke] left
|
|||
lizmat | japhb: but does nqp::sha1 encode as utf-8? Doesn't it just read the underlying Uni ints ? | 07:19 | |
07:24
[Coke] joined
07:39
[Coke] left
07:43
sena_kun joined
07:47
sena_kun left
07:49
sena_kun joined
08:19
frost joined
09:19
[Coke] joined
09:44
[Coke] left
09:50
[Coke] joined
10:08
[Coke] left
10:20
[Coke] joined
10:27
Altai-man joined,
[Coke] left
11:19
Kaipei left
11:30
Kaipei joined
12:07
reportable6 left
12:09
reportable6 joined
12:33
Kaipei left
12:41
[Coke] joined
12:52
Kaipei joined
|
|||
lizmat | something weird I just noticed in compiling regexes: | 13:11 | |
m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..1_000_000 / | |||
camelia | 1.674430109 matched |
||
lizmat | m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..* / | 13:12 | |
camelia | 0.029003786 matched |
||
lizmat | feels to me there is room for optimization here: by codegenning an 1..* with perhaps a postfix check on the length of the match, and discarding if not in range ? | 13:13 | |
nine | Why does compiling / ^ a ** 1..1_000_000 / take more than a second in the first place? | 13:16 | |
nine@sphinx:~> /usr/bin/time rakudo -e '/ ^ a ** 1..1_000_000 /' | 13:17 | ||
1.35user 0.01system 0:01.35elapsed 101%CPU (0avgtext+0avgdata 398048maxresident)k | |||
Seems quite excessive | |||
Nicholas | it has to make a MEEEEELION things, and that doesn't come cheap? | ||
(as in, does '/ ^ a ** 1..10_000_000 /' take 10 times as long?) | 13:18 | ||
lizmat | m: my $a = "a" x 1_000_000; say now - BEGIN now; say "matched" if $a ~~ / ^ a ** 1..2_000_000 / | 13:19 | |
camelia | 3.606554059 matched |
||
lizmat | I'd say that is roughly true, even worse :-) | ||
nine | github.com/Raku/nqp/blob/master/sr...A.nqp#L445 | ||
Nicholas | it was worse than linear for me too (on a very debugging build) - factor of 24 slower to create 10 times as many | 13:23 | |
GC, GC, it's off to sweep we go | 13:24 | ||
jnthnwrthngtn | It'll be producing an NFA with the node repeated each time, because that's how you do repetitions with an explicit upper found in an NFA | 13:33 | |
Of course, we never actually use the NFA at present, so it's rather wasteful | 13:34 | ||
lizmat | why are we not using the NFA ? | ||
or just in this example ? | |||
jnthnwrthngtn | At present we only use it when we need to make an LTM decision, so in an alternation or a proto regex | 13:36 | |
lizmat | and we can't know this at compile time, I guiess | 13:38 | |
*guess | |||
jnthnwrthngtn | We probably can, or at least can convey it to the right place | 13:41 | |
I mean, an anonymous /.../ regex can't possibly be a proto regex | |||
14:47
frost left
|
|||
lizmat | and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2022/05/16/2022-20-439-468/ | 14:54 | |
15:56
linkable6 left,
evalable6 left,
evalable6 joined
15:58
linkable6 joined
16:50
[Coke] left
17:02
Altai-man left
17:24
[Coke] joined
18:06
reportable6 left,
reportable6 joined
|
|||
MasterDuke | lizmat: yep, sha1 encodes as utf8 github.com/MoarVM/MoarVM/blob/mast...on.c#L3200 | 18:56 | |
19:09
MasterDuke left
19:31
MasterDuke joined
20:00
[Coke] left,
[Coke] joined
|
|||
MasterDuke | anyone have some intuition whether it would make sense to create storage type specific versions of things like MVM_string_gi_get_grapheme github.com/MoarVM/MoarVM/blob/mast...#L155-L187 ? | 20:07 | |
to get rid of some of those branches in what are frequently hot paths | 20:08 | ||
japhb | MasterDuke: Since you asked for "some intuition", I have three pieces :-) -- 1. Depending on its callers, MVM_string_get_grapheme_at_nocheck may be soaking up some of the improvement you could make, by taking the shorter paths. 2. Depending on how smart the compiler, I can see a fair amount of code hoisting and optimization possible, so you'd probably want to look at the actual produced assembly code. 3. | 20:33 | |
There might still be some advantage from fast-pathing a string that contains no MVM_STRING_STRAND components by grapheme type, using function pointers or something, but I expect a cascade of API changes from that, which could end up being a big diff with lots of opportunities for mistakes .... | 20:34 | ||
In this case the iterator is acting as a generic core of an inner loop that has been extracted into something that probably doesn't get inlined, so I suspect that's a fair percentage of the overall cost. | 20:36 | ||
OK, I guess that was #4. | |||
MasterDuke | all true. but i look at something like github.com/MoarVM/MoarVM/blob/mast...#L582-L605 and wonder if even a singleĀ other path for `storage_type == MVM_STRING_GRAPHEME_32 && !replacement` might be worth it | 20:37 | |
so that we can speed up encoding things to utf8 in a probably very common case | 20:39 | ||
japhb | Are you trying to speed up grapheme iteration or codepoint iteration? | 20:42 | |
20:42
sena_kun left
|
|||
MasterDuke | oh | 20:42 | |
ha | 20:43 | ||
oh wait again, MVM_string_ci_get_codepoint does end up calling MVM_string_gi_get_grapheme | 20:44 | ||
i'm trying to speed up utf8 encoding | 20:45 | ||
japhb | Yeah, I figured it probably would at some point, but it wasn't clear to me whether you were doing a holistic thing, or just trying to speed up something you'd determined to be a hotspot -- and if so, which it was. :-) | 20:46 | |
MasterDuke | well, i've seen MVM_string_utf8_encode_substr in a couple different profiles, but i'll admit i don't have anything more detailed than that level | 20:47 | |
japhb | I have two conflicting voices inside my head right now -- one is going "Yes! Pierce all the abstractions, make a fastpath that peels paint off the nearest office walls!" and the other is going "Oh god that will be a maintenance nightmare! What happens the next time Unicode decides to change some set of key rules?" Of course, encoding from already-normalized internal strings is probably less likely to | 20:49 | |
change than *decoding* unnormalized utf8. | |||
For this I guess there's an old trick -- replace the actual iteration with a constant, and see if it gets any faster. | 20:50 | ||
MasterDuke | looks like perf thinks a decent chunk of the time spent in *_encode_substr is from MVM_string_gi_get_grapheme | ||
japhb | Ah, actual data! :-D | 20:51 | |
20:52
sena_kun joined
|
|||
MasterDuke | maybe i'll experiment with doing just enough to create that single fast path and see it helps | 20:53 | |
japhb | (y) | 20:54 | |
21:33
sena_kun left
22:04
samcv left
22:05
samcv joined
22:14
codesections left,
codesections joined
22:18
discord-raku-bot left,
discord-raku-bot joined
22:28
[Coke] left
22:37
[Coke] joined
|