Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:00
reportable6 left
00:02
reportable6 joined
04:14
shareable6 left,
benchable6 left,
unicodable6 left,
statisfiable6 left,
bloatable6 left,
notable6 left,
nativecallable6 left,
bisectable6 left,
evalable6 left,
tellable6 left,
coverable6 left,
linkable6 left,
reportable6 left,
squashable6 left,
releasable6 left,
greppable6 left,
sourceable6 left,
shareable6 joined,
reportable6 joined,
sourceable6 joined,
evalable6 joined,
statisfiable6 joined
04:15
bisectable6 joined,
squashable6 joined,
nativecallable6 joined,
unicodable6 joined,
notable6 joined,
releasable6 joined
04:16
linkable6 joined,
greppable6 joined,
coverable6 joined
04:17
tellable6 joined,
bloatable6 joined,
benchable6 joined
05:17
sourceable6 left,
linkable6 left,
evalable6 left,
shareable6 left,
notable6 left,
squashable6 left,
tellable6 left,
statisfiable6 left,
reportable6 left,
greppable6 left,
unicodable6 left,
bisectable6 left,
benchable6 left,
coverable6 left,
releasable6 left,
bloatable6 left,
quotable6 left,
nativecallable6 left,
committable6 left,
committable6 joined,
sourceable6 joined,
squashable6 joined,
greppable6 joined
05:18
releasable6 joined,
linkable6 joined,
benchable6 joined,
unicodable6 joined,
bisectable6 joined,
bloatable6 joined
05:19
quotable6 joined,
evalable6 joined,
tellable6 joined,
shareable6 joined,
nativecallable6 joined
05:20
notable6 joined,
coverable6 joined,
reportable6 joined,
statisfiable6 joined
06:00
reportable6 left
06:01
reportable6 joined
08:35
sena_kun joined
|
|||
timo1 | this isn't microoptimization, this is nanooptimization, except when you do nanotechnology you can do impressive things you wouldn't be able to do with regular "small stuff", and nanooptimization is just useless :P | 09:52 | |
nine | It looks impressive though ;) | 09:53 | |
timo1 | this improvement in the bytecode came from changing the nqp source code tho, so it really literally only applies in this one frame | 10:41 | |
nine | It's a common frame though | 10:42 | |
lizmat | and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2023/02/06/2023-...en-davies/ | 11:53 | |
12:00
reportable6 left
12:03
reportable6 joined
|
|||
el gatito (** advocate) | what is a "frame"? | 13:41 | |
Voldenet | stack frame perhaps | 13:49 | |
nine | Actually in this context a piece of bytecode, i.e. a code block | 13:53 | |
el gatito (** advocate) | oh | 13:54 | |
timo1 | nine: are we talking about the same code? the piece of code inside EXPORTHOW.nqp inside the nqp source? | 14:04 | |
does this actually run more than once? | |||
nine | I guess once in every process? | 14:06 | |
timo1 | i see it up to twice (after putting an nqp::sin_n that fprintf stderr in) during build | 14:10 | |
nqp startup is only 0.04 so this can barely do anything :P | 14:14 | ||
japhb | I'd love to get that down an order of magnitude, but I suspect that requires more large-scale engineering. :-) | 14:16 | |
timo1 | for sure | ||
nine | I have always wondered what exactly we spend that 100ms on in rakudo startup | 14:17 | |
timo1 | we do a load of deserialization of stuff for example | 14:18 | |
nine | Is there so much that we have to deserialize right away? | 14:19 | |
timo1 | the "work queue" nature of the deserialization work code makes it a little tricky to attribute work to what "caused" it | 14:24 | |
japhb | Deserialization is lazy, isn't it? | 14:44 | |
Kindof wonder if there's any performance benefit to figuring out what needs to be deserialized for -e '' and just doing that always, as fast as we can. | 14:45 | ||
timo1 | deserialization is lazy, yes | 14:47 | |
you're thinking maybe less "context switches" would benefit startup performance if we blaze through a whole chunk of serialized data ahead of time? | 14:49 | ||
nine | Even more if we store that stuff close together and benefit from caching | 14:53 | |
timo1 | how hard is it going to be to reshuffle objects in the serialized blob? | 14:54 | |
japhb | Yeah, what both of you said | ||
Woodi | maybe just serializing it once, compiling it, saving as executable to disc and then just loading it at startup ? if it is too specific then options for compiling different executables, eg. for one-liners, for long running services ? and if it can un-jit when needed then nothing is lost | 15:57 | |
16:01
evalable6 left,
linkable6 left
|
|||
Woodi | ultimate option: generating asm code / executable that just do what asked, like generating grep-like binary without object stuff... | 16:01 | |
16:02
evalable6 joined
16:04
linkable6 joined
|
|||
Voldenet | perl5 starts in 5ms, nqp takes 33ms to start, raku takes 140ms to start, so nqp takes around 25% of startup time | 16:07 | |
timo1 | the nqp command does some things that the raku command doesn't have to, i'm not sure how much sense it makes to express nqp as a fraction of raku startup | 16:08 | |
for example, rakudo shouldn't load the nqp grammar, actions, world, and optimizer | 16:09 | ||
Voldenet | I see, so the only real way to measure perf reliably is to actually use the profiler | 16:11 | |
timo1 | i would say that is accurate, yeah | ||
don't forget that rakudo doesn't start up too much slower than perl5 if you include some support for classes like moo or moose or whichever | 16:12 | ||
Voldenet | Moo itself takes 15ms to load | 16:15 | |
timo1 | moose takes a lot longer, right? since moo is kind of "moose but lighter"? | 16:16 | |
Voldenet | Yeah, moo and mouse are a lot faster, moose takes 120ms | 16:18 | |
nine | While that makes us look less bad, it distracts from the fact that we could load a lot faster | 16:19 | |
timo1 | right, not saying we shouldn't improve load times, it is definitely a goal up there in terms of priority | 16:20 | |
you think perl5 without any modules is something we could eventually reach? or at least load in 2x to 3x the time? | 16:24 | ||
nine | I honestly don't know. Perl doesn't have to do much when starting up | 16:28 | |
Voldenet | python3: 23ms, nodejs: 60ms, ruby: 50ms | 16:29 | |
timo1 | rakudo runs just over 200M branches during startup and misses 1.94% of them, where python runs 15M and misses 4.20% (lol) of them | 16:30 | |
how do we feel about turning spesh on a little later than when the program starts? | 16:36 | ||
disabling spesh gives me a time of 0.181 wallclock vs spesh enabled gives 0.178, but the task-clock is 171msec vs 255msec, which just means when spesh is on we use 1.43 cpus and when it's off we use 0.95 cpus | 16:37 | ||
Voldenet | page-faults is especially big number on raku (17045) compared to python3 (1111) nodejs (2494) ruby (2263) | 16:46 | |
timo1 | yeah that probably has something to do with how much ram we use also | 16:49 | |
and how much of the files we map we read from probably? | |||
Voldenet | ram usage probably matters a bit, but nodejs uses 3x more rss than ruby and there's not much of a difference in startup time | 16:53 | |
timo1 | interesting | 16:55 | |
we can use perf to measure where page faults tend to happen | |||
haha, 61% in __memset_sse2_unaligned_erms, 13.4% mi_page_free_list_extend, 9.18% in _dl_relocate_object | 16:57 | ||
here, MVM_bytecode_unpack has 2.28%, maybe_grow_hash lands at 1.26%, MVM_spesh_log_entry at a surprising (to me) 1.1%, another .85% in MVM_spesh_log_decont, 0.73% in MVM_serialization_demand_object, 0.55% in MVM_spesh_log_type | 16:59 | ||
this is the page-faults performance counter, not the minor-faults one | 17:00 | ||
Voldenet | so apparently just allocating larger chunks could improve performance | 17:01 | |
timo1 | for spesh logs we already allocate one big chunk that we just write data to linearly | 17:02 | |
Voldenet | I can see why page faults here are surprising | ||
17:20
timo1 left
17:26
timo1 joined
18:00
reportable6 left
18:01
reportable6 joined
18:45
gfldex left,
gfldex joined
21:51
mst_ joined
21:52
mst left,
mst_ is now known as mst
23:17
squashable6 left
23:20
squashable6 joined
23:32
sena_kun left
23:53
rypervenche left
|