Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:04 reportable6 joined 00:20 shareable6 joined 00:21 quotable6 joined, bloatable6 joined, notable6 joined 00:22 benchable6 joined 01:22 statisfiable6 joined 04:14 benchable6 left, tellable6 left, nativecallable6 left, evalable6 left, shareable6 left, linkable6 left, quotable6 left, statisfiable6 left, bisectable6 left, reportable6 left, releasable6 left, sourceable6 left, coverable6 left, committable6 left, squashable6 left, greppable6 left, unicodable6 left, bloatable6 left, notable6 left, coverable6 joined, squashable6 joined 04:15 nativecallable6 joined, reportable6 joined, tellable6 joined, linkable6 joined, bisectable6 joined 04:16 notable6 joined, statisfiable6 joined, sourceable6 joined 04:17 committable6 joined 05:16 unicodable6 joined 05:52 frost joined 06:02 reportable6 left 06:15 greppable6 joined, releasable6 joined, quotable6 joined 06:16 benchable6 joined, evalable6 joined 06:37 frost is now known as llllweqwe, llllweqwe is now known as frost
nine Nicholas: MoarVM fails to build on the Open Build Service with: [ 40s] src/core/coerce.c:2:10: fatal error: ryu/ryu.h: No such file or directory 06:40
06:48 frost is now known as niner 06:49 niner is now known as frost 06:58 frost left, frost joined 07:02 reportable6 joined 07:14 bloatable6 joined 07:15 shareable6 joined 08:45 Geth left, Geth joined
dogbert2 nine++ (SEGV fix), does this mean that you're now out of work :) 09:33
nine dogbert2: no, the t/02-rakudo/14-revisions.t one still needs fixing. Have to re-reproduce it though first. I did catch it in rr once, but by finishing the other fix, I've changed libmoar.so and rr doesn't cope too well 09:40
dogbert2 I might have found something new 09:42
haven't seen this before: MoarVM oops: Unknown handler in inline cache entry
nine Sounds new, yes.
dogbert2 'MVM_SPESH_NODELAY=1 ./rakudo-m -Ilib t/02-rakudo/15-gh_1202.t', 6k nursery and MVM_GC_DEBUG=1 09:43
09:45 evalable6 left, linkable6 left
lizmat nine: nqp seems to build ok though 09:46
shall I bump MoarVM anyway?
nine lizmat: without MoarVM, NQP can't build? 09:47
lizmat I mean, nqp builds with MoarVM on head
nine Not on the OBS
lizmat I understand, but locally I don't have that problem 09:48
which feels... weird?
in any case, I wanted to say that I cannot reproduce the failure to build
10:46 evalable6 joined 10:48 linkable6 joined
dogbert2 it seems as if (sometimes) entry->run_dispatch is null in github.com/MoarVM/MoarVM/blob/new-...che.c#L228 11:23
but not always, hmm 11:26
anyways, running this test file in gdb hangs, i.e. the program does not finish 11:42
lizmat will not bump MoarVM until the OBS issue is fixed 11:56
12:01 linkable6 left, evalable6 left 12:02 evalable6 joined, linkable6 joined, reportable6 left 12:05 reportable6 joined 14:42 frost left
nine Apparently it's far easier to reproduce dead locks in t/02-rakudo/14-revisions.t than to reproduce that segfault that I needed to re-reproduce 16:50
dogbert2 some bugs are very good a hiding 16:59
*at
nine At least, that dead lock seems to be fairly straight forward to fix. It's also on master btw.
dogbert2 that sounds good, aren't they usually hard to find? 17:00
nine I very much hope, this finally fixes those build timeouts I see on the OBS. 17:01
Oh, it won't :/ It only appears with MVM_SPESH_BLOCKING set 17:02
dogbert2 so there might be more deadlock bugs hiding 17:04
trying to run t/02-rakudo/14-revisions.t under ASAN doesn't seem to work, the program hangs :( 17:10
nine It's because send_log creates a new spesh log entry with an already locked mutex, then marks the current thread blocked for GC (which might enter the GC right away). The spesh worker thread picks up that spesh lock and waits for the mutex . 17:11
The spesh thread never joins the GC, so everyone ends up waiting
Good news is: I just cought the segfault again :)
lizmat gesundheit! 17:12
nine And apparently a different dead lock, too 17:14
timo uh oh what have i come back to 17:20
17:35 evalable6 left, linkable6 left
MasterDuke reminds me, i still haven't done anything about that gc'ed mutex, have i? 17:35
17:37 linkable6 joined
dogbert2 and here I was thinking that we were running out of bugs :) 17:44
18:02 reportable6 left 18:05 reportable6 joined
lizmat dogbert2: nah, they're just mutating into more resilient and more hard to find ones :-) 18:06
timo is it bad to miss nidividual spesh log entries? we could just drop some if weve got that one mutex already locked. otoh, not necessarily want to make moar more complex for an optional feature 18:17
nine Nah, just needs marking the spesh thread blocked while waiting for that mutex 18:29
The new deadlock is more interesting and it could be the one I've been struggling with for quite some time. The suspect here is the async io thread. 18:30
dogbert2 and what kind of shenanigans is that thread up to then? 18:31
18:37 evalable6 joined
nine Thread 6: waiting for queue's cond var gc_status STOLEN 18:38
Thread 5: waiting for tc->instance->cond_blocked_can_continue gc_status STOLEN
Thread 4: waiting for tc->instance->cond_gc_finish gc_status NORMAL
Thread 3: waiting for queue's cond var gc_status STOLEN
Thread 2: waiting for tc->instance->cond_gc_start gc_status INTERRUPT
Thread 1: waitinf for queue's cond var gc_status STOLEN
So, 3 worker threads waiting for jobs to appear in the queue, blocked for GC. Thread 4 orchestrating the GC run and just waiting for the other threads to finish. But thread 2 is actually waiting for the GC run to start! 18:42
Thread 2 is the async io thread
MasterDuke should have known, the second child is always the troublemaker 18:48
nine Wait a minute.... I'm a second child :D 18:49
japhb ... ;-)
I guess I should have said: ∴‭ …‭ 😉‭ 18:51
dogbert2 heh, interesting 18:52
nine AFAICT Thread 2 should have been signalled and tc->instance->gc_start is 0 (which is the condition the thread is waiting for). No idea why it never wakes up from that uv_cond_wait 19:27
MasterDuke potential libuv bug? 19:29
fwiw, looks like 1.42.0 was released last month github.com/libuv/libuv/blob/v1.42.0/ChangeLog 19:32
19:36 MasterDuke58 joined 19:39 MasterDuke left
nine libuv just wraps pthread_cond_wait and pthread_cond_broadcast 19:41
About that segfault. Doesn't this look kinda fishy? 20:09
#12 0x00007fb22c4a8cb0 in dispatch_polymorphic (tc=0x1902e70, entry_ptr=0x442f7c0, seen=0x5f0ce88, id=<optimized out>, callsite=<optimized out>, arg_indices=0x29330d5e7352, source=0x4393088, sf=<optimized out>, bytecode_offset=230) at src/disp/inline_cache.c:176
#13 0x00007fb22c407d2a in MVM_interp_run (tc=0x6084068, initial_invoke=0x189, initial_invoke@entry=0x7fb22c52c6a0 <toplevel_initial_invoke>, invoke_data=0x189, invoke_data@entry=0x7fb22c52c6a0 <toplevel_initial_invoke>, outer_runloop=0xffffffffffffffff, outer_runloop@entry=0x0) at src/core/interp.c:5507
MasterDuke58 because i'm not thinking too quickly today: if a function takes a c str which may have been malloced or may not, and then throws an exception, the only way to know if it should free that str is if there's also a flag passed in (because the caller knows whether it was malloced or not)?
nine dispatch_polymorphic is being called with a different tc than the MVM_interp_run it's called from 20:10
MasterDuke58: yes
MasterDuke58 cool
is it a valid tc?
nine Yes, it's a different thread's. Actually the segfaulting one's 20:11
MasterDuke58 tut tut, those poly dispatchers, sharing tcs all over the place... 20:13
nine Ah, the tc in dispatch_polymorphic is valid. And I believe it's also correct. It also appears in the other thread's backtrace, because the thread running dispatch_polymorphic is currently marked blocked for GC and waiting in send_log. 20:27
The tc=0x6084068 is bogus however. I think it's just gdb getting confused.
Geth MoarVM: MasterDuke17++ created pull request #1528:
Free filename if exception when loading bytecode
20:39
nine Oooooh....I got it! 20:42
It's actually fairly bening. We're dealing with a half initialized frame. And the problem only occurs if logging the frame entry in the spesh log triggers a send_log and if that happens precisely when some thread wants to start a GC run. 20:43
MasterDuke58 sounds like an uncommon occurrence 20:45
nine This GC run will happen before the call to MVM_args_proc_setup which would initialize the frame's params member. And this params member is what we trip over, because it contains bogus pointers
Uncommon indeed. That's why it takes so many tries to catch it in rr.
20:49 dogbert2 left
nine I'm not even sure if this is new-disp specific. Ye olde MVM_frame_invoke on master seems to have exactly the same structure as MVM_frame_dispatch. 20:50
I dare say this is good enough work to call it a day. There's still tomorrow for a fix :) 20:51
MasterDuke58 nice. i'm about off to play some more horizon zero dawn. just took down redmaw last night 20:52
21:04 dogbert17 joined 21:36 squashable6 left 22:38 squashable6 joined