Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
timo | did we ever have a discussion about zstd put right into moarvm vs only system-wide dependency? | 01:05 | |
i would be okay with shipping it | |||
MasterDuke | i always assumed it'd be a 3rdparty with a --has-zstd option like most of the other external libs we use | 01:17 | |
01:22
japhb joined
|
|||
timo | that's how it is now i think | 01:22 | |
MasterDuke | it's not in 3rdparty | 01:28 | |
timo | oh yes i misread | 01:33 | |
wonder what else we may want to try zstd for | 01:54 | ||
one thing you can't have when you have to decompress stuff before you use it is sharing the memory from stuff you mmapped off disk with other processes | 01:56 | ||
unless you decompress, use, and immediately throw away the decompressed bits again, then you're at least not keeping a copy per process of the stuff around | 01:57 | ||
there's many places where we use random access, too. like in our serialized blobs | 01:58 | ||
on the other hand, if the compression ratio is truly huge, having to read a little bit extra in order to find the spot you want might not matter in the end | 02:00 | ||
and of course if you have one single object that has a huge serialized blob all on its own, you don't need to random-access into the middle, you only need to reliably and quickly seek to the beginning of any one serialized object | 02:02 | ||
MasterDuke | why/when do we randomly access out serialized blobs? aren't they deserialized linearly? | 02:09 | |
timo | when we "demand" one object, like the first time a specific wval is called, we jump to the spot where it lives and start reading sequentially, but objects do have inter-dependencies, and when we hit one of those, we jump to many different points in the blob | 02:10 | |
MasterDuke | ah, didn't realize | ||
timo | gist.github.com/timo/0c52f79d46e71...c0ec68ef2b here's a printout of "make install" of rakudo with any object that is serialized to a blob of over 2048 printed out | 02:34 | |
i'm a little surprised to see MAST::Bytecode objects be put into serialization. i wonder what that's from | 02:35 | ||
aah, i suppose NativeCall generates bytecode? could that be it? | 02:41 | ||
MasterDuke | the two biggest are an NQPArray (158707) and an NFAType (214023). any idea what's in them? | 02:43 | |
timo | i bet the first array is actually the same as what's in the NFAType :P | 02:45 | |
MasterDuke | doh | 02:48 | |
timo | the gist has a patch in it now, feel free to experiment | 02:49 | |
also, there's other kinds of things than just objects that we may serialize, you can copy the same thing into their functions, but the "get debug name" won't work for non-objects. there's a different get debug name function for STables though | |||
and you can breakpoint the fprintf and look at the object in question | 02:50 | ||
heading out o/ | |||
08:26
sena_kun joined
|
|||
timo | the gist now has compression results (but not yet timings) for the serialized blobs at different compression levels | 09:33 | |
so even at the highest compression levels, for the majority of rakudo build files the savings are relatively modest | 10:04 | ||
exceptions do exist, such as one NQPArray that apparently turned to 10% its size at levels 19 and above | 10:08 | ||
and a few arrays that probably just have the same value from start to end in them? they get factors of like 150x and up | |||
outside of that, there are no spots where the compressed serialized blob is compressed better than 10x | 10:10 | ||
i'm limiting compression attempts to anything larger than 512 (i started with 2048) | 10:12 | ||
going to the smaller sizes should really benefit from training a dictionary, but that leaves the issue of managing the dictionary since a .moarvm file would become unreadable without the correct dictionary available | 10:18 | ||
even with the lower limit at 256 bytes, core setting serialized blob gets "Considered compression for 1592633 bytes vs not for 3327807 bytes (32.368%)", so 68% are blobs smaller than 256 bytes | 10:21 | ||
nine | Why would we need to compress sources when using them for annotation? Even the complete CORE.c setting is just 3.2 MB of sources. That's tiny by today's standards. Even more so when comparing with the 14 MB of bytecode this source compiles to. 14 vs. 17 MB, who would even notice? | 10:42 | |
timo | nine: do you know where all the MAST::Bytecode objects come from in the core modules installation? | 10:55 | |
nine | Not sure what you mean | ||
timo | gist.github.com/timo/0c52f79d46e71...d-txt-L556 | 10:56 | |
i'm not sure i'm reading the RMD output correctly | |||
MAST::Frame and MAST::Bytecode objects make it into that serialization | |||
nine | MAST::Bytecode is basically a lightweight Buf | 10:57 | |
timo | the name i see pop up before and after this bit is NativeCall::Types, but i'm not sure why that module would have many of these objects, and so big, too | ||
right, that's what i figured | |||
nine | It's used for several things by QASTCompilerMAST including call site identifiers | 10:58 | |
timo | ok but call site identifiers are probably not 32 kilobytes big :D | 10:59 | |
nine | Sounds more like an SC | ||
timo | hm, rr can figure this out | 11:00 | |
[FATAL src/ReplayTask.cc:130:validate_regs()] | 11:02 | ||
it occurs to me that before merging the new mvmroot i should really make sure the code that our target compilers generate for them isn't total trash | 12:27 | ||
ahahaha, i copied the source, replaced MVMROOT with MVMROOT_OLD in the copy, and gcc compiles the function to just a jmp to the other function | 12:44 | ||
and i found a way to make the code less terrible | 12:49 | ||
Geth | MoarVM/coolroot: 3f0058ba69 | (Timo Paulssen)++ | src/gc/roots.h chain one nonvoid root push into the next This frees us from having to declare one trash variable per object we want to root. Also, ensure there's enough temp root space once up front and then do all the pushes with a fast path for hopefully more efficient code generated by our Sufficiently Smart Compilerā¢. |
13:30 | |
MoarVM/coolroot: c5e0c37210 | (Timo Paulssen)++ | src/gc/roots.h chain one nonvoid root push into the next This frees us from having to declare one trash variable per object we want to root. Also, ensure there's enough temp root space once up front and then do all the pushes with a fast path for hopefully more efficient code generated by our Sufficiently Smart Compilerā¢. |
13:36 | ||
timo | 170403169 ensure space for 2 roots | 13:44 | |
12225803 ensure space for 4 roots | |||
1296490 ensure space for 3 roots | |||
godbolt.org/z/8Y3Gx744d | 13:53 | ||
here's some example code that (i hope!) doesn't allow the compiler to make unreasonable shortcuts | |||
godbolt.org/z/YnvbW1GEK also with MSVC now | 14:04 | ||
i don't see anything obviously wrong with it right away | |||
but also, my eyes kind of glaze over | |||
clang thinks my code doesn't deserve to exist in the output | 14:47 | ||
got it. final(?) godbolt link in the github pull request | 14:52 | ||
how do y'all feel in here about using MVM_ROOT for the coolroot and keep MVMROOT for the oldroot? | 14:55 | ||
so, i'm thinking the relation between how often we push and pop roots compared to how often we run GC ... it's a pretty huge rift, right? | 15:31 | ||
i'm not saying we should do stack walking instead of explicit pushes and pops, but ... :) :) :) :) | 15:32 | ||
for a core setting compile i get 0.4575% hits in the gc_root_temp_push function | 16:40 | ||
so ... i guess it's not really worth trying to optimize too much? | |||
Geth | MoarVM/coolroot: 80a3790551 | (Timo Paulssen)++ | src/gc/roots.h fix wrong check for need to grow temp root array |
16:46 | |
MoarVM/coolroot: 91b9ebbcd7 | (Timo Paulssen)++ | src/gc/roots.h fix check for MVM_TEMP_ROOT_DEBUG (like in main branch) |
|||
MoarVM/coolroot: 354e624bf2 | (Timo Paulssen)++ | src/gc/roots.h mark need for growing temp root array unlikely |
|||
timo | nine: since you already have experience with gcc plugins, how hard do you think is it to write a checker that at a given point there's never more than x temp roots on the temp root stack? | 16:47 | |
then we could have an *even cheaper* temproot variant for functions that we have statically proven don't need to check if tc->num_temproots is above 16 or below | 16:48 | ||
nine | Doesn't sound very hard | 17:36 | |
timo | though a good chunk of these functions are probably public API so we couldn't do that in the first place | 17:38 | |
nine | I wish someone would take care of MoarVM's stability issues :/ | 17:42 | |
timo | that sounds like a tricky and fuzzy target | 17:43 | |
nine | Program terminated with signal SIGSEGV, Segmentation fault. | 17:44 | |
#0 0x00007ff57525ce74 in MVM_gc_collect_free_gen2_unmarked (executing_thread=executing_thread@entry=0x3a929e40, tc=tc@entry=0x3a929e40, global_destruction=global_destruction@entry=0) at src/gc/collect.c:773 | |||
773 if (REPR(obj)->gc_free) | |||
[Current thread is 1 (Thread 0x7ff575953b80 (LWP 188986))] | |||
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.40-1.1.x86_64 libtommath1-x86-64-v3-debuginfo-1.3.0-1.1.x86_64 libuv1-debuginfo-1.48.0-1.1.x86_64 libzstd1-x86-64-v3-debuginfo-1.5.6-1.1.x86_64 | |||
(gdb) p obj | |||
$1 = (MVMObject *) 0x3ffbd460 | |||
(gdb) p *obj | |||
$2 = {header = {sc_forward_u = {forwarder = 0x0, sc = {sc_idx = 0, idx = 0}, st = 0x0}, owner = 0, flags1 = 0 '\000', flags2 = 2 '\002', size = 0}, st = 0x0} | |||
MasterDuke | nine: didn't you fix something recently? i have seen a lot less flapping since whatever it was you did | 17:45 | |
but yeah, guess there still are some bugs... | |||
timo | nine: any chance of a rr recording? | 17:46 | |
and which exact moar version? | 17:48 | ||
i don't think obj->st should be allowed to be zero | |||
nine | Of course it defies rr | 17:58 | |
MasterDuke | chaos mode? | ||
nine | That's what I'm trying. Though that has never helped me before | ||
Geth | MoarVM/coolroot: 12 commits pushed by (Timo Paulssen)++ review: github.com/MoarVM/MoarVM/compare/3...9a6f2c007a |
18:06 | |
timo | ^- rebase on top of current main, plus adding the missing root that nine found recently | ||
MasterDuke | oh, i was going to ask about that, but i thought the old name used the new code so it would have been picked up automatically? | 18:09 | |
timo | you mean the MVMROOT? | 18:10 | |
MasterDuke | yeah | 18:11 | |
timo | the new macro uses the old name, but it's not source-compatible, you do have to rewrite your code to use it | ||
MasterDuke | oh, right | ||
timo | MVMROOT(tc, a, {...}); becomes MVMROOT(tc, a) {...} | 18:12 | |
MasterDuke | i just tried this patch gist.github.com/MasterDuke17/66435...343f834d50 and the total number of allocations and number of temporary allocations when compiling CORE.c dropped from ~69.7m and ~6.4m to ~68.8 and ~5.4m | 18:35 | |
but it doesn't seem to be any faster or reduce the total memory used | 18:36 | ||
a different patch+rebootstrapping nqp to handling getting a VMNull from runproto drops allocations and temp allocations to ~66.7m and ~4m | 19:46 | ||
lizmat | that feels like becoming significant ? | 19:47 | |
patrickb | Random guess, could the recent moar instability be caused by us disabling exprjit? | 20:14 | |
lizmat | feels unexpected to me, but still possibe: bugs hiding other bugs is not an uncommon phenomenon | 20:15 | |
nine | I'm pretty sure the instability was there before | 20:17 | |
lizmat | also: some instability may be caused by recent speed improvements, making undiscovered race conditions more likely? | 20:19 | |
the disable of exprjit might well be one of those | |||
22:08
sena_kun left
|