Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:15
MasterDuke joined
|
|||
MasterDuke | ==25674== Process terminating with default action of signal 6 (SIGABRT) | 00:15 | |
==25674== at 0x5206898: __pthread_kill_implementation (pthread_kill.c:44) | |||
==25674== by 0x51B683B: raise (raise.c:26) | |||
==25674== by 0x51A1A7F: abort (abort.c:79) | |||
==25674== by 0x4C3BABB: uv_mutex_destroy (thread.c:360) | |||
==25674== by 0x4B597A3: gc_free (ConcBlockingQueue.c:80) | |||
==25674== by 0x4B1C18B: MVM_gc_collect_free_gen2_unmarked (collect.c:738) | |||
==25674== by 0x4B188D3: MVM_gc_global_destruction (orchestrate.c:795) | |||
==25674== by 0x4C11747: MVM_vm_destroy_instance (moar.c:660) | |||
==25674== by 0x12161B: main (main.c:531) | |||
00:35
MasterDuke left
02:01
lizmat left
|
|||
timo | isn't that the problem that we've always had? | 09:00 | |
09:16
Geth joined
|
|||
timo | we aren't "creating" the MVMP6int in that case at all, the P6intBody is just slapped inside of an P6OpaqueBody with an offset and the size specified in the STable's storage spec; there is no MVMP6int that corresponds to the Body at all, and since we only use as much space as the chosen int inside the body, it's not equivalent to an MVMP6intBody either | 09:53 | |
10:07
lizmat_ left,
lizmat joined
10:17
sena_kun joined
|
|||
Geth | MoarVM/main: 282254418f | timo++ (committed using GitHub Web editor) | src/spesh/args.c Don't turn slurpies into huge amounts of code (#1921) For example, a file with a huge hash literal created with the `infix:<,>` operator may end up with a callsite with thousands of entries, and peak memory usage from register facts and other stuff can balloon quite massively. |
10:57 | |
10:59
^Dan joined
|
|||
timo | lizmat: try setting MVM_SPESH_BLOCKING=1 for the compilation of Emoji::Text; getting the memory usage depends on one of the infix:<,> calls being OSR'd, and whether or not that happens is subject to timing of data accumulating in the spesh logs etc etc | 10:59 | |
lizmat | when I run: raku lib/Text/Emoji.rakumod with the snapper, it doesn't show any difference with or without MVM_SPESH_BLOCKING | 11:01 | |
Initial/Final Size: 131 / 2584 Kbytes | |||
timo | OK, can you make a spesh log and check `grep 'r..\?.5000' emoji.speshlog.txt`? | 11:05 | |
lizmat | before or after I bump Rakudo ? | 11:06 | |
timo | that should be the one ith high memory usage | 11:08 | |
lizmat | const_i64_16 r16(5000), liti16(687) | 11:09 | |
sp_getarg_o r17(5000), liti16(687) | |||
bindpos_o r2(2), r16(5000), r17(5000) | |||
r16(5000): usages=1, flags=2 KnVal | |||
r17(5000): usages=1, flags=0 | |||
11:10
^Dan left
|
|||
timo | ok, that should be using lots of memory | 11:11 | |
lizmat | that's before bump | ||
timo | Initial/Final Size: 152328 / 3343044 Kbytes | 11:12 | |
100153 5.10 | |||
928155 4.32 3133152 | |||
28628 7.60 5568 | |||
lizmat | after the bump, no output on the grep, and the snapper output: | 11:14 | |
Initial/Final Size: 135 / 200 Kbytes | |||
so that's 2584 -> 200 so yes, that's good :-) | 11:15 | ||
timo | uhhhh there's no way rakudo uses 200 kilobytes | ||
i *wish* | |||
lizmat | heh, maybe that K should be an M ? | 11:16 | |
timo | on your system yes, on my system no | ||
because there is also no way that rakudo uses 3 terabytes of ram for that | |||
lizmat | ah, maybe on MacOS there's a factor 1000 difference in reporting hmmm | 11:17 | |
timo | probably, yeah | 11:18 | |
you're using nqp::getrusage right? | |||
i think getrusage itself is kind of platform-dependent already | |||
but we are using a libuv provided function for that | |||
"The maximum resident set size is reported in kilobytes, the unit most platforms use natively." | 11:19 | ||
this is what the libuv documentation says | |||
lizmat | it takes the max-rss of the getrusage | ||
timo | github.com/bnoordhuis/libuv/commit...39ce8e3c13 | 11:20 | |
lizmat | my int $b2kb = VM.osname eq 'darwin' ?? 10 !! 0; | 11:21 | |
perhaps that should go now | |||
timo | 1.45.0 is supposed to have the fix | ||
lizmat | right, and snapper never got updated | ||
timo | and we are on 1.50.0 | 11:22 | |
lizmat | right | ||
timo | oh, i see | ||
cool | |||
lizmat | Initial/Final Size: 138736 / 206592 Kbytes # after removing the MacOS factor | 11:29 | |
timo | do we want to put a _ or , in there for thousands-separators? | ||
nah, don't bother | 11:31 | ||
plenty of other stuff to do | |||
lizmat | m: say 206592.polymod(1000 xx *).reverse.join("_") # polymod to the rescue | 11:32 | |
camelia | 206_592 | ||
timo | that's nicer than the solution with .comb because it needs .reverse | ||
i also kind of like using ansi escapes with colors or bold / normal to make separate parts of a number stand out | 11:39 | ||
but we don't have ansi escapes in core yet right? | 11:40 | ||
lizmat | nope, we don't, but we could in the mapper: it's basically 2 constants that need to be defined for bold | 11:41 | |
colours I'm not too sure about | |||
if we would like to have them in that report, I mean :-) | |||
timo | plus / minus making sure it doesn't look like ass on windows | 11:42 | |
could just be VM.osname ne 'windows' or whatever | 11:43 | ||
lizmat | so what would you like bolded in the snapper output ? | 11:44 | |
timo | m: dd 123456789.polymod(1000 xx *) Z (|("bold", "normal") xx *) | 11:45 | |
camelia | ((789, "bold"), (456, "normal"), (123, "bold")).Seq | ||
timo | hm. there it would be cool to start on the right instead of the left tbh | ||
so back to the reverse + Z + reverse :D | 11:46 | ||
lizmat | well... I'll think about it, or you beat me to it :-) | 11:47 | |
timo | really not that important | ||
github.com/MoarVM/MoarVM/pull/1921...2708189210 - you could put an edit or additional comment here to explain the source of the confusion, maybe a link to the fix as well | 11:55 | ||
aww yiss i made the zstd speshlog output a lot smaller | |||
m: dd 1761366808.polymod(1024 xx *) | 11:57 | ||
camelia | (792, 788, 655, 1).Seq | ||
timo | m: dd 1761366808.polymod(1000 xx *) | 11:58 | |
camelia | (808, 366, 761, 1).Seq | ||
timo | 1.66 gigs down to 113 megs with my new patch, but it was 191 megs without | 11:59 | |
lizmat | m: say "1234567890".flip.comb(3).join("_").flip | ||
camelia | 1_234_567_890 | ||
lizmat | another way not using polymod | ||
timo | i was being stupid and that change i just praised myself for was already on the branch, i just accidentally stopped rebasing after the first conflict | 13:01 | |
Geth | MoarVM/zstd_speshlog: 34b78f3251 | (Timo Paulssen)++ | 4 files compress spesh log output with zstd |
13:02 | |
MoarVM/zstd_speshlog: 8bf7aea7a5 | (Timo Paulssen)++ | 8 files check SPESH_LOG for .zst, store flag, fix truncated bits |
|||
MoarVM/zstd_speshlog: 3339f7c3a4 | (Timo Paulssen)++ | 6 files DumpStr->MVMDumpStr, keep more stuff in one ds to zstd at once This gives drastically better compression rate. We can use the spots where one frame ends and another starts to efficiently skip through the compressed file and identify semantically relevant pieces even when loading really big spesh logs (like the core setting compilation generating a ~1.5GiB file) |
|||
13:04
MasterDuke joined
|
|||
MasterDuke | timo: yeah, the 15-gh_1202.t backtrace hasn't changed | 13:05 | |
timo | that's not the one with MVM_gc_global_destruction right? | 13:06 | |
MasterDuke | it is | 13:07 | |
btw, a MVMP6int had to be created at one point, right? i mean it's just standard assigning to an attribute | 13:09 | ||
github.com/MoarVM/MoarVM/blob/main...s.nqp#L819 | 13:10 | ||
i don't have the hll backtrace, but there are two more of those errors at #0 0xffffa4500064 in MVMP6int_set_int src/6model/reprs/P6int.c:83 and #0 0xffffa4500264 in MVMP6int_get_int src/6model/reprs/P6int.c:93 | 13:13 | ||
timo | oh. why are we running a spec test with full cleanup? | ||
no wonder it's failing | |||
MasterDuke | hm, that might have just been me running that test in a loop under valgrind and putting --full-cleanup to reduce the noise from valgrind | 13:14 | |
timo | the assigning to the attribute happens after an integer has been put into a register | ||
and the P6int_get_int and P6int_set_int are called on the MVMP6intBody that is "theoretically" inside the P6opaque | |||
but the way we manage that bit of memory is not compatible with the way C does | 13:15 | ||
MasterDuke | ok, so how do we make it compatible? | 13:16 | |
timo | we shouldn't | ||
MasterDuke | well, i don't think relying on undefined behavior is a good idea | ||
timo | yeah, we need to toss the approach based on "union" in the trash | ||
it doesn't reflect the reality of what we're doing, and we don't want to make P6opaque worse (and CStruct wrong) by forcing the other parts of the code to fit with what the requirements are for the union of these different int types | 13:18 | ||
what we are doing is really much more like working with (void *) and casting to the integer type that the STable is identifying it as, rather than (MVMP6intBody *) and using the union's .i8 / .i16 / .i32 or whatever | 13:19 | ||
MasterDuke | so just cast down to the desired size, but assign to the full size? | 13:20 | |
timo | if we assign to the full size we would overwrite nearby attributes | ||
i guess it depends onw hat you mean by assign there | |||
MasterDuke | ah! you're saying there actually is only 32bits there and we're casting it to the 32bit union member? | 13:23 | |
timo | yes that's what i mean | ||
i think in most cases we're just using the union as a shorthand for the syntax of casting to the different types | |||
MasterDuke | unrelated, but i just got a coredump in t/spec/S16-filehandles/argfiles.t | 13:33 | |
on the laptop, and one in t/spec/S06-currying/positional.rakudo.moar on the desktop | 13:34 | ||
i kind of wish we didn't deliberately cause a segfault during spectests | 13:38 | ||
t/spec/S11-modules/versioning.t and t/spec/S06-currying/positional.t on the laptop | |||
t/spec/S16-io/lines.t on the laptop | 13:45 | ||
timo | do you have any hints yet what they are from? | 13:46 | |
MasterDuke | no, i'm not actually seeing any core files | ||
on the laptop i'm running 15-gh_1202.t in a loop under valgrind, with normal spectests running in another terminal. on the desktop i'm running 15-gh_1202.t in a loop recording them with rr, with a normal spectest running in another terminal | 13:49 | ||
timo | do you have an ulimit for core size, or do you need to look in coredumpctl if that's catching the files? | 13:50 | |
MasterDuke | it's unlimited | ||
on the desktop it just seemed to catch the strdup ones we deliberately cause. guess i could just comment that test out for now... | 13:51 | ||
i need to install coredumpctl on the laptop... | |||
timo | no need for that | 13:53 | |
if it's not installed, then they are probably not landing in there | |||
check `cat /proc/sys/kernel/core_pattern` | 13:54 | ||
MasterDuke | yeah, but i do want it anyway | ||
timo | has this on my system: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h | ||
coredumpctl works by looking in the journal for core dumps | |||
MasterDuke | heh, that's what i have now | ||
timo | oh huh | ||
MasterDuke | i just installed coredumpctl | 13:55 | |
timo | OK | ||
`journalctl MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1 -o verbose` can give you information about stored core dumps as well, without the coredumpctl tool | 13:56 | ||
but now that you have it, might as well use it | |||
MasterDuke | t/spec/S02-types/subset-6e.t on the desktop | 13:57 | |
timo | what the fuck is going on that you get so many crashes?! | ||
MasterDuke | just run spectests in a loop | 13:58 | |
got a core for the t/spec/S02-types/subset-6e.t one on the desktop | 13:59 | ||
gist.github.com/MasterDuke17/3ab10...a113bf3a5c | |||
timo | ok, there we are exiting, the malloc heap is being torn down, and it looks like the async IO thread is running a callback for a child process having exited | 14:02 | |
so like, we already called exit() from MVM_vm_exit, which normally we have decided is good enough to tear stuff down, but mimalloc is doing work long enough for another thread to have the opportunity to jump right in the middle of that | 14:03 | ||
MasterDuke | oh, mimalloc just released 2.2.2, we should probably try to update to that | 14:05 | |
we're on 2.1.7 | |||
but is there a way we can make this safe regardless of how long mimalloc takes? | 14:08 | ||
timo | yes, i have a branch, hold on | 14:09 | |
MasterDuke | gist.github.com/MasterDuke17/d39a1...2db099fed2 t/spec/6.c/MISC/bug-coverage.t on laptop | ||
Geth | MoarVM/eventloop_dont_crash_in_regular_exit: eddc725444 | (Timo Paulssen)++ | src/moar.c In regular vm exit, stop and join eventloop This prevents stuff from happening on the event loop thread while mimalloc is destroying its structures in its atexit handler, which can otherwise cause aborts and other undesired behaviour. |
14:10 | |
MasterDuke | i'll run with that for a bit | 14:12 | |
timo | usually there's also a bit of console output before an abort(), is there anything for that one? | ||
the backtraces are not very helpful there | 14:13 | ||
not sure what __cxa_finalize is about | |||
oh it's also related to atexit? | |||
MasterDuke | no, i think running under prove or something gobbles the stderr | 14:14 | |
timo | there is probably a flag for that | 14:15 | |
looks like -v does that | 14:16 | ||
MasterDuke | i can try that later, gotta go afk for a while. but i'm running everything on that branch | 14:18 | |
not ok 1 - shell output | |||
# Failed test 'shell output' | |||
# at rakudo/t/02-rakudo/15-gh_1202.t line 17 | |||
# expected a match with: /[ <[0..4]> \n ] ** 250/ | |||
# got: "1\n2\n3\n0\n4\n0\n1\n3\n4\n2\n3\n4\n1\n2\n0\n0\n2\n3\n1\n4\n3\n4\n0\n1\n2\n2\n1\n0\n3\n4\n1\n0\n3\n4\n2\n0\n2\n3\n4\n1\n2\n3\n4\n1\n0\n0\n2\n4\n1\n3\n1\n3\n4\n2\n0\n2\n1\n3\n0\n4\n0\n1\n2\n3\n4\n2\n0\n1\n4\n3\n3\n0\n1\n4\n2\n0\n4\n3\n2\n1\n0\n1\n2\n3\n4\n2\n3\n1\n0\n4\n0\n1\n3\n2\n4\n0\n3\n1\n2\n4\n2\n3\n | |||
4\n0\n1\n0\n3\n1\n2\n4\n1\n3\n0\n2\n4\n1\n0\n4\n2\n3\n0\n1\n2\n3\n4\n3\n2\n0\n1\n4\n0\n2\n1\n3\n4\n2\n1\n0\n3\n4\n0\n2\n4\n3\n1\n0\n3\n2\n4\n1\n0\n3\n4\n1\n2\n0\n1\n2\n3\n4\n0\n2\n4\n3\n1\n1\n4\n0\n2\n3\n0\n3\n1\n2\n4\n0\n1\n2\n3\n4\n0\n1\n2\n3\n4\n2\n4\n1\n0\n3\n1\n2\n3\n4\n0\n0\n1\n2\n3\n4\n0\n1\n2\n3\n4\n1\n2\n3\n4\n0\n2\n4\n0\n3\n1\n2\n3\n0\n1\ | |||
n4\n0\n1\n3\n4\n2\n1\n0\n2\n3\n4\n1\n2\n0\n3\n4\n1\n0\n4\n3\n2\n0\n2\n3\n4\n1\n" | |||
ok 2 - all runs completed | |||
# You failed 1 test of 2 | |||
that's weird, never seen that before | |||
timo | it occurs to me that that one test shouldn't be exiting when there's still child processes exiting. possibly the test is wrong | 14:21 | |
that string has 5 numbers missing | 14:23 | ||
did you have that under "rr record"? | 14:32 | ||
MasterDuke | yeah | 14:39 | |
timo | rr record on gh_1202 sometimes crashes rr for me | 14:40 | |
can you pack and send that recording? | |||
MasterDuke | hm, i did rr pack, now do i just zip up the directory? or does it create a single file somewhere> | 14:46 | |
timo | the whole directory | ||
AIUI it adds a file that has extra stuff from your system | 14:47 | ||
MasterDuke | 452mb raku-119.tar.zst | 14:48 | |
what's a good way to send that? | |||
timo | one sec | 14:49 | |
DM'd you a link | 14:51 | ||
ok one of my regular (non-rr-recorded) runs of gh1202.t got "MoarVM panic: Internal error: invalid thread ID -59560384 in GC work pass" | 14:53 | ||
MasterDuke | things *seem* to be a bit more stable on your branch... | 14:55 | |
uploaded | 14:56 | ||
timo | [FATAL src/ReplaySession.cc:255:ReplaySession()] Trace was recorded on a machine with different CPUID values | 14:59 | |
and CPUID faulting is not enabled; replay will not work. | |||
bah. | |||
MasterDuke | let me try re-packing with --disable-cpuid-faulting | 15:01 | |
timo | you can get into the process that crashes with `rr replay -f 4066353`, that's from the first column in `rr ps` for the one that exited with -11 | ||
MasterDuke | re uploaded | 15:04 | |
timo | hum. still getting that error | 15:06 | |
i'm not sure if the --disable-cpuid-faulting flag does something for "rr pack", possibly only for "rr record" | |||
it doesn't help when i put it on my "rr replay" commandline either | 15:07 | ||
MasterDuke | ah | 15:08 | |
oh, just got another t/spec/S06-currying/positional.t on the laptop | |||
gist.github.com/MasterDuke17/904f5...ef37d11f03 | 15:09 | ||
unfortunately rr doesn't seem to work in asahi | |||
timo | huh that tells us barely anything with just the stack trace. and there's only a single thread? not even the spesh thread? | 15:11 | |
MasterDuke | get a lot of these for the valgrind loop, but no vm level crash, so no backtrace | 15:13 | |
An operation first awaited: | 15:14 | ||
in block <unit> at -e line 1 | |||
Died with the exception: | |||
Failed to write 2 bytes to filehandle: Broken pipe | |||
in block at -e line 1 | |||
timo | finally i have a recording of gh1202.t | ||
MasterDuke | nice | ||
timo | oh so that process just exited | 15:15 | |
Geth | MoarVM: MasterDuke17++ created pull request #1922: Bump mimalloc to v2.2.2 |
15:19 | |
timo | AFKBBL | 15:45 | |
15:59
MasterDuke left
|
|||
timo | hm. is this the same thing i've investigated before but didn't know how to fix? | 20:25 | |
nine: what do you know about the async io and eventloop stuff? | 20:31 | ||
nine | barely anything |