Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:00 reportable6 left 00:02 reportable6 joined 04:14 shareable6 left, benchable6 left, unicodable6 left, statisfiable6 left, bloatable6 left, notable6 left, nativecallable6 left, bisectable6 left, evalable6 left, tellable6 left, coverable6 left, linkable6 left, reportable6 left, squashable6 left, releasable6 left, greppable6 left, sourceable6 left, shareable6 joined, reportable6 joined, sourceable6 joined, evalable6 joined, statisfiable6 joined 04:15 bisectable6 joined, squashable6 joined, nativecallable6 joined, unicodable6 joined, notable6 joined, releasable6 joined 04:16 linkable6 joined, greppable6 joined, coverable6 joined 04:17 tellable6 joined, bloatable6 joined, benchable6 joined 05:17 sourceable6 left, linkable6 left, evalable6 left, shareable6 left, notable6 left, squashable6 left, tellable6 left, statisfiable6 left, reportable6 left, greppable6 left, unicodable6 left, bisectable6 left, benchable6 left, coverable6 left, releasable6 left, bloatable6 left, quotable6 left, nativecallable6 left, committable6 left, committable6 joined, sourceable6 joined, squashable6 joined, greppable6 joined 05:18 releasable6 joined, linkable6 joined, benchable6 joined, unicodable6 joined, bisectable6 joined, bloatable6 joined 05:19 quotable6 joined, evalable6 joined, tellable6 joined, shareable6 joined, nativecallable6 joined 05:20 notable6 joined, coverable6 joined, reportable6 joined, statisfiable6 joined 06:00 reportable6 left 06:01 reportable6 joined 08:35 sena_kun joined
timo1 this isn't microoptimization, this is nanooptimization, except when you do nanotechnology you can do impressive things you wouldn't be able to do with regular "small stuff", and nanooptimization is just useless :P 09:52
nine It looks impressive though ;) 09:53
timo1 this improvement in the bytecode came from changing the nqp source code tho, so it really literally only applies in this one frame 10:41
nine It's a common frame though 10:42
lizmat and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2023/02/06/2023-...en-davies/ 11:53
12:00 reportable6 left 12:03 reportable6 joined
el gatito (** advocate) what is a "frame"? 13:41
Voldenet stack frame perhaps 13:49
nine Actually in this context a piece of bytecode, i.e. a code block 13:53
el gatito (** advocate) oh 13:54
timo1 nine: are we talking about the same code? the piece of code inside EXPORTHOW.nqp inside the nqp source? 14:04
does this actually run more than once?
nine I guess once in every process? 14:06
timo1 i see it up to twice (after putting an nqp::sin_n that fprintf stderr in) during build 14:10
nqp startup is only 0.04 so this can barely do anything :P 14:14
japhb I'd love to get that down an order of magnitude, but I suspect that requires more large-scale engineering. :-) 14:16
timo1 for sure
nine I have always wondered what exactly we spend that 100ms on in rakudo startup 14:17
timo1 we do a load of deserialization of stuff for example 14:18
nine Is there so much that we have to deserialize right away? 14:19
timo1 the "work queue" nature of the deserialization work code makes it a little tricky to attribute work to what "caused" it 14:24
japhb Deserialization is lazy, isn't it? 14:44
Kindof wonder if there's any performance benefit to figuring out what needs to be deserialized for -e '' and just doing that always, as fast as we can. 14:45
timo1 deserialization is lazy, yes 14:47
you're thinking maybe less "context switches" would benefit startup performance if we blaze through a whole chunk of serialized data ahead of time? 14:49
nine Even more if we store that stuff close together and benefit from caching 14:53
timo1 how hard is it going to be to reshuffle objects in the serialized blob? 14:54
japhb Yeah, what both of you said
Woodi maybe just serializing it once, compiling it, saving as executable to disc and then just loading it at startup ? if it is too specific then options for compiling different executables, eg. for one-liners, for long running services ? and if it can un-jit when needed then nothing is lost 15:57
16:01 evalable6 left, linkable6 left
Woodi ultimate option: generating asm code / executable that just do what asked, like generating grep-like binary without object stuff... 16:01
16:02 evalable6 joined 16:04 linkable6 joined
Voldenet perl5 starts in 5ms, nqp takes 33ms to start, raku takes 140ms to start, so nqp takes around 25% of startup time 16:07
timo1 the nqp command does some things that the raku command doesn't have to, i'm not sure how much sense it makes to express nqp as a fraction of raku startup 16:08
for example, rakudo shouldn't load the nqp grammar, actions, world, and optimizer 16:09
Voldenet I see, so the only real way to measure perf reliably is to actually use the profiler 16:11
timo1 i would say that is accurate, yeah
don't forget that rakudo doesn't start up too much slower than perl5 if you include some support for classes like moo or moose or whichever 16:12
Voldenet Moo itself takes 15ms to load 16:15
timo1 moose takes a lot longer, right? since moo is kind of "moose but lighter"? 16:16
Voldenet Yeah, moo and mouse are a lot faster, moose takes 120ms 16:18
nine While that makes us look less bad, it distracts from the fact that we could load a lot faster 16:19
timo1 right, not saying we shouldn't improve load times, it is definitely a goal up there in terms of priority 16:20
you think perl5 without any modules is something we could eventually reach? or at least load in 2x to 3x the time? 16:24
nine I honestly don't know. Perl doesn't have to do much when starting up 16:28
Voldenet python3: 23ms, nodejs: 60ms, ruby: 50ms 16:29
timo1 rakudo runs just over 200M branches during startup and misses 1.94% of them, where python runs 15M and misses 4.20% (lol) of them 16:30
how do we feel about turning spesh on a little later than when the program starts? 16:36
disabling spesh gives me a time of 0.181 wallclock vs spesh enabled gives 0.178, but the task-clock is 171msec vs 255msec, which just means when spesh is on we use 1.43 cpus and when it's off we use 0.95 cpus 16:37
Voldenet page-faults is especially big number on raku (17045) compared to python3 (1111) nodejs (2494) ruby (2263) 16:46
timo1 yeah that probably has something to do with how much ram we use also 16:49
and how much of the files we map we read from probably?
Voldenet ram usage probably matters a bit, but nodejs uses 3x more rss than ruby and there's not much of a difference in startup time 16:53
timo1 interesting 16:55
we can use perf to measure where page faults tend to happen
haha, 61% in __memset_sse2_unaligned_erms, 13.4% mi_page_free_list_extend, 9.18% in _dl_relocate_object 16:57
here, MVM_bytecode_unpack has 2.28%, maybe_grow_hash lands at 1.26%, MVM_spesh_log_entry at a surprising (to me) 1.1%, another .85% in MVM_spesh_log_decont, 0.73% in MVM_serialization_demand_object, 0.55% in MVM_spesh_log_type 16:59
this is the page-faults performance counter, not the minor-faults one 17:00
Voldenet so apparently just allocating larger chunks could improve performance 17:01
timo1 for spesh logs we already allocate one big chunk that we just write data to linearly 17:02
Voldenet I can see why page faults here are surprising
17:20 timo1 left 17:26 timo1 joined 18:00 reportable6 left 18:01 reportable6 joined 18:45 gfldex left, gfldex joined 21:51 mst_ joined 21:52 mst left, mst_ is now known as mst 23:17 squashable6 left 23:20 squashable6 joined 23:32 sena_kun left 23:53 rypervenche left