Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:02
reportable6 left
00:04
reportable6 joined
00:20
shareable6 joined
00:21
quotable6 joined,
bloatable6 joined,
notable6 joined
00:22
benchable6 joined
01:22
statisfiable6 joined
04:14
benchable6 left,
tellable6 left,
nativecallable6 left,
evalable6 left,
shareable6 left,
linkable6 left,
quotable6 left,
statisfiable6 left,
bisectable6 left,
reportable6 left,
releasable6 left,
sourceable6 left,
coverable6 left,
committable6 left,
squashable6 left,
greppable6 left,
unicodable6 left,
bloatable6 left,
notable6 left,
coverable6 joined,
squashable6 joined
04:15
nativecallable6 joined,
reportable6 joined,
tellable6 joined,
linkable6 joined,
bisectable6 joined
04:16
notable6 joined,
statisfiable6 joined,
sourceable6 joined
04:17
committable6 joined
05:16
unicodable6 joined
05:52
frost joined
06:02
reportable6 left
06:15
greppable6 joined,
releasable6 joined,
quotable6 joined
06:16
benchable6 joined,
evalable6 joined
06:37
frost is now known as llllweqwe,
llllweqwe is now known as frost
|
|||
nine | Nicholas: MoarVM fails to build on the Open Build Service with: [ 40s] src/core/coerce.c:2:10: fatal error: ryu/ryu.h: No such file or directory | 06:40 | |
06:48
frost is now known as niner
06:49
niner is now known as frost
06:58
frost left,
frost joined
07:02
reportable6 joined
07:14
bloatable6 joined
07:15
shareable6 joined
08:45
Geth left,
Geth joined
|
|||
dogbert2 | nine++ (SEGV fix), does this mean that you're now out of work :) | 09:33 | |
nine | dogbert2: no, the t/02-rakudo/14-revisions.t one still needs fixing. Have to re-reproduce it though first. I did catch it in rr once, but by finishing the other fix, I've changed libmoar.so and rr doesn't cope too well | 09:40 | |
dogbert2 | I might have found something new | 09:42 | |
haven't seen this before: MoarVM oops: Unknown handler in inline cache entry | |||
nine | Sounds new, yes. | ||
dogbert2 | 'MVM_SPESH_NODELAY=1 ./rakudo-m -Ilib t/02-rakudo/15-gh_1202.t', 6k nursery and MVM_GC_DEBUG=1 | 09:43 | |
09:45
evalable6 left,
linkable6 left
|
|||
lizmat | nine: nqp seems to build ok though | 09:46 | |
shall I bump MoarVM anyway? | |||
nine | lizmat: without MoarVM, NQP can't build? | 09:47 | |
lizmat | I mean, nqp builds with MoarVM on head | ||
nine | Not on the OBS | ||
lizmat | I understand, but locally I don't have that problem | 09:48 | |
which feels... weird? | |||
in any case, I wanted to say that I cannot reproduce the failure to build | |||
10:46
evalable6 joined
10:48
linkable6 joined
|
|||
dogbert2 | it seems as if (sometimes) entry->run_dispatch is null in github.com/MoarVM/MoarVM/blob/new-...che.c#L228 | 11:23 | |
but not always, hmm | 11:26 | ||
anyways, running this test file in gdb hangs, i.e. the program does not finish | 11:42 | ||
lizmat will not bump MoarVM until the OBS issue is fixed | 11:56 | ||
12:01
linkable6 left,
evalable6 left
12:02
evalable6 joined,
linkable6 joined,
reportable6 left
12:05
reportable6 joined
14:42
frost left
|
|||
nine | Apparently it's far easier to reproduce dead locks in t/02-rakudo/14-revisions.t than to reproduce that segfault that I needed to re-reproduce | 16:50 | |
dogbert2 | some bugs are very good a hiding | 16:59 | |
*at | |||
nine | At least, that dead lock seems to be fairly straight forward to fix. It's also on master btw. | ||
dogbert2 | that sounds good, aren't they usually hard to find? | 17:00 | |
nine | I very much hope, this finally fixes those build timeouts I see on the OBS. | 17:01 | |
Oh, it won't :/ It only appears with MVM_SPESH_BLOCKING set | 17:02 | ||
dogbert2 | so there might be more deadlock bugs hiding | 17:04 | |
trying to run t/02-rakudo/14-revisions.t under ASAN doesn't seem to work, the program hangs :( | 17:10 | ||
nine | It's because send_log creates a new spesh log entry with an already locked mutex, then marks the current thread blocked for GC (which might enter the GC right away). The spesh worker thread picks up that spesh lock and waits for the mutex . | 17:11 | |
The spesh thread never joins the GC, so everyone ends up waiting | |||
Good news is: I just cought the segfault again :) | |||
lizmat | gesundheit! | 17:12 | |
nine | And apparently a different dead lock, too | 17:14 | |
timo | uh oh what have i come back to | 17:20 | |
17:35
evalable6 left,
linkable6 left
|
|||
MasterDuke | reminds me, i still haven't done anything about that gc'ed mutex, have i? | 17:35 | |
17:37
linkable6 joined
|
|||
dogbert2 | and here I was thinking that we were running out of bugs :) | 17:44 | |
18:02
reportable6 left
18:05
reportable6 joined
|
|||
lizmat | dogbert2: nah, they're just mutating into more resilient and more hard to find ones :-) | 18:06 | |
timo | is it bad to miss nidividual spesh log entries? we could just drop some if weve got that one mutex already locked. otoh, not necessarily want to make moar more complex for an optional feature | 18:17 | |
nine | Nah, just needs marking the spesh thread blocked while waiting for that mutex | 18:29 | |
The new deadlock is more interesting and it could be the one I've been struggling with for quite some time. The suspect here is the async io thread. | 18:30 | ||
dogbert2 | and what kind of shenanigans is that thread up to then? | 18:31 | |
18:37
evalable6 joined
|
|||
nine | Thread 6: waiting for queue's cond var gc_status STOLEN | 18:38 | |
Thread 5: waiting for tc->instance->cond_blocked_can_continue gc_status STOLEN | |||
Thread 4: waiting for tc->instance->cond_gc_finish gc_status NORMAL | |||
Thread 3: waiting for queue's cond var gc_status STOLEN | |||
Thread 2: waiting for tc->instance->cond_gc_start gc_status INTERRUPT | |||
Thread 1: waitinf for queue's cond var gc_status STOLEN | |||
So, 3 worker threads waiting for jobs to appear in the queue, blocked for GC. Thread 4 orchestrating the GC run and just waiting for the other threads to finish. But thread 2 is actually waiting for the GC run to start! | 18:42 | ||
Thread 2 is the async io thread | |||
MasterDuke | should have known, the second child is always the troublemaker | 18:48 | |
nine | Wait a minute.... I'm a second child :D | 18:49 | |
japhb | ... ;-) | ||
I guess I should have said: ā“ā ā¦ā šā | 18:51 | ||
dogbert2 | heh, interesting | 18:52 | |
nine | AFAICT Thread 2 should have been signalled and tc->instance->gc_start is 0 (which is the condition the thread is waiting for). No idea why it never wakes up from that uv_cond_wait | 19:27 | |
MasterDuke | potential libuv bug? | 19:29 | |
fwiw, looks like 1.42.0 was released last month github.com/libuv/libuv/blob/v1.42.0/ChangeLog | 19:32 | ||
19:36
MasterDuke58 joined
19:39
MasterDuke left
|
|||
nine | libuv just wraps pthread_cond_wait and pthread_cond_broadcast | 19:41 | |
About that segfault. Doesn't this look kinda fishy? | 20:09 | ||
#12 0x00007fb22c4a8cb0 in dispatch_polymorphic (tc=0x1902e70, entry_ptr=0x442f7c0, seen=0x5f0ce88, id=<optimized out>, callsite=<optimized out>, arg_indices=0x29330d5e7352, source=0x4393088, sf=<optimized out>, bytecode_offset=230) at src/disp/inline_cache.c:176 | |||
#13 0x00007fb22c407d2a in MVM_interp_run (tc=0x6084068, initial_invoke=0x189, initial_invoke@entry=0x7fb22c52c6a0 <toplevel_initial_invoke>, invoke_data=0x189, invoke_data@entry=0x7fb22c52c6a0 <toplevel_initial_invoke>, outer_runloop=0xffffffffffffffff, outer_runloop@entry=0x0) at src/core/interp.c:5507 | |||
MasterDuke58 | because i'm not thinking too quickly today: if a function takes a c str which may have been malloced or may not, and then throws an exception, the only way to know if it should free that str is if there's also a flag passed in (because the caller knows whether it was malloced or not)? | ||
nine | dispatch_polymorphic is being called with a different tc than the MVM_interp_run it's called from | 20:10 | |
MasterDuke58: yes | |||
MasterDuke58 | cool | ||
is it a valid tc? | |||
nine | Yes, it's a different thread's. Actually the segfaulting one's | 20:11 | |
MasterDuke58 | tut tut, those poly dispatchers, sharing tcs all over the place... | 20:13 | |
nine | Ah, the tc in dispatch_polymorphic is valid. And I believe it's also correct. It also appears in the other thread's backtrace, because the thread running dispatch_polymorphic is currently marked blocked for GC and waiting in send_log. | 20:27 | |
The tc=0x6084068 is bogus however. I think it's just gdb getting confused. | |||
Geth | MoarVM: MasterDuke17++ created pull request #1528: Free filename if exception when loading bytecode |
20:39 | |
nine | Oooooh....I got it! | 20:42 | |
It's actually fairly bening. We're dealing with a half initialized frame. And the problem only occurs if logging the frame entry in the spesh log triggers a send_log and if that happens precisely when some thread wants to start a GC run. | 20:43 | ||
MasterDuke58 | sounds like an uncommon occurrence | 20:45 | |
nine | This GC run will happen before the call to MVM_args_proc_setup which would initialize the frame's params member. And this params member is what we trip over, because it contains bogus pointers | ||
Uncommon indeed. That's why it takes so many tries to catch it in rr. | |||
20:49
dogbert2 left
|
|||
nine | I'm not even sure if this is new-disp specific. Ye olde MVM_frame_invoke on master seems to have exactly the same structure as MVM_frame_dispatch. | 20:50 | |
I dare say this is good enough work to call it a day. There's still tomorrow for a fix :) | 20:51 | ||
MasterDuke58 | nice. i'm about off to play some more horizon zero dawn. just took down redmaw last night | 20:52 | |
21:04
dogbert17 joined
21:36
squashable6 left
22:38
squashable6 joined
|