#moarvm on 11 August 2021 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:04 reportable6 joined 00:20 shareable6 joined 00:21 quotable6 joined, bloatable6 joined, notable6 joined 00:22 benchable6 joined 01:22 statisfiable6 joined 04:14 benchable6 left, tellable6 left, nativecallable6 left, evalable6 left, shareable6 left, linkable6 left, quotable6 left, statisfiable6 left, bisectable6 left, reportable6 left, releasable6 left, sourceable6 left, coverable6 left, committable6 left, squashable6 left, greppable6 left, unicodable6 left, bloatable6 left, notable6 left, coverable6 joined, squashable6 joined 04:15 nativecallable6 joined, reportable6 joined, tellable6 joined, linkable6 joined, bisectable6 joined 04:16 notable6 joined, statisfiable6 joined, sourceable6 joined 04:17 committable6 joined 05:16 unicodable6 joined 05:52 frost joined 06:02 reportable6 left 06:15 greppable6 joined, releasable6 joined, quotable6 joined 06:16 benchable6 joined, evalable6 joined 06:37 frost is now known as llllweqwe, llllweqwe is now known as frost
nine	Nicholas: MoarVM fails to build on the Open Build Service with: [ 40s] src/core/coerce.c:2:10: fatal error: ryu/ryu.h: No such file or directory	06:40	Copy link Message link Add to gist Remove
06:48 frost is now known as niner 06:49 niner is now known as frost 06:58 frost left, frost joined 07:02 reportable6 joined 07:14 bloatable6 joined 07:15 shareable6 joined 08:45 Geth left, Geth joined
dogbert2	nine++ (SEGV fix), does this mean that you're now out of work :)	09:33	Copy link Message link Add to gist Remove
nine	dogbert2: no, the t/02-rakudo/14-revisions.t one still needs fixing. Have to re-reproduce it though first. I did catch it in rr once, but by finishing the other fix, I've changed libmoar.so and rr doesn't cope too well	09:40	Copy link Message link Add to gist Remove
dogbert2	I might have found something new	09:42	Copy link Message link Add to gist Remove
	haven't seen this before: MoarVM oops: Unknown handler in inline cache entry		Copy link Message link Add to gist Remove
nine	Sounds new, yes.		Copy link Message link Add to gist Remove
dogbert2	'MVM_SPESH_NODELAY=1 ./rakudo-m -Ilib t/02-rakudo/15-gh_1202.t', 6k nursery and MVM_GC_DEBUG=1	09:43	Copy link Message link Add to gist Remove
09:45 evalable6 left, linkable6 left
lizmat	nine: nqp seems to build ok though	09:46	Copy link Message link Add to gist Remove
	shall I bump MoarVM anyway?		Copy link Message link Add to gist Remove
nine	lizmat: without MoarVM, NQP can't build?	09:47	Copy link Message link Add to gist Remove
lizmat	I mean, nqp builds with MoarVM on head		Copy link Message link Add to gist Remove
nine	Not on the OBS		Copy link Message link Add to gist Remove
lizmat	I understand, but locally I don't have that problem	09:48	Copy link Message link Add to gist Remove
	which feels... weird?		Copy link Message link Add to gist Remove
	in any case, I wanted to say that I cannot reproduce the failure to build		Copy link Message link Add to gist Remove
10:46 evalable6 joined 10:48 linkable6 joined
dogbert2	it seems as if (sometimes) entry->run_dispatch is null in github.com/MoarVM/MoarVM/blob/new-...che.c#L228	11:23	Copy link Message link Add to gist Remove
	but not always, hmm	11:26	Copy link Message link Add to gist Remove
	anyways, running this test file in gdb hangs, i.e. the program does not finish	11:42	Copy link Message link Add to gist Remove
	lizmat will not bump MoarVM until the OBS issue is fixed	11:56	Copy link Message link Add to gist Remove
12:01 linkable6 left, evalable6 left 12:02 evalable6 joined, linkable6 joined, reportable6 left 12:05 reportable6 joined 14:42 frost left
nine	Apparently it's far easier to reproduce dead locks in t/02-rakudo/14-revisions.t than to reproduce that segfault that I needed to re-reproduce	16:50	Copy link Message link Add to gist Remove
dogbert2	some bugs are very good a hiding	16:59	Copy link Message link Add to gist Remove
	*at		Copy link Message link Add to gist Remove
nine	At least, that dead lock seems to be fairly straight forward to fix. It's also on master btw.		Copy link Message link Add to gist Remove
dogbert2	that sounds good, aren't they usually hard to find?	17:00	Copy link Message link Add to gist Remove
nine	I very much hope, this finally fixes those build timeouts I see on the OBS.	17:01	Copy link Message link Add to gist Remove
	Oh, it won't :/ It only appears with MVM_SPESH_BLOCKING set	17:02	Copy link Message link Add to gist Remove
dogbert2	so there might be more deadlock bugs hiding	17:04	Copy link Message link Add to gist Remove
	trying to run t/02-rakudo/14-revisions.t under ASAN doesn't seem to work, the program hangs :(	17:10	Copy link Message link Add to gist Remove
nine	It's because send_log creates a new spesh log entry with an already locked mutex, then marks the current thread blocked for GC (which might enter the GC right away). The spesh worker thread picks up that spesh lock and waits for the mutex .	17:11	Copy link Message link Add to gist Remove
	The spesh thread never joins the GC, so everyone ends up waiting		Copy link Message link Add to gist Remove
	Good news is: I just cought the segfault again :)		Copy link Message link Add to gist Remove
lizmat	gesundheit!	17:12	Copy link Message link Add to gist Remove
nine	And apparently a different dead lock, too	17:14	Copy link Message link Add to gist Remove
timo	uh oh what have i come back to	17:20	Copy link Message link Add to gist Remove
17:35 evalable6 left, linkable6 left
MasterDuke	reminds me, i still haven't done anything about that gc'ed mutex, have i?	17:35	Copy link Message link Add to gist Remove
17:37 linkable6 joined
dogbert2	and here I was thinking that we were running out of bugs :)	17:44	Copy link Message link Add to gist Remove
18:02 reportable6 left 18:05 reportable6 joined
lizmat	dogbert2: nah, they're just mutating into more resilient and more hard to find ones :-)	18:06	Copy link Message link Add to gist Remove
timo	is it bad to miss nidividual spesh log entries? we could just drop some if weve got that one mutex already locked. otoh, not necessarily want to make moar more complex for an optional feature	18:17	Copy link Message link Add to gist Remove
nine	Nah, just needs marking the spesh thread blocked while waiting for that mutex	18:29	Copy link Message link Add to gist Remove
	The new deadlock is more interesting and it could be the one I've been struggling with for quite some time. The suspect here is the async io thread.	18:30	Copy link Message link Add to gist Remove
dogbert2	and what kind of shenanigans is that thread up to then?	18:31	Copy link Message link Add to gist Remove
18:37 evalable6 joined
nine	Thread 6: waiting for queue's cond var gc_status STOLEN	18:38	Copy link Message link Add to gist Remove
	Thread 5: waiting for tc->instance->cond_blocked_can_continue gc_status STOLEN		Copy link Message link Add to gist Remove
	Thread 4: waiting for tc->instance->cond_gc_finish gc_status NORMAL		Copy link Message link Add to gist Remove
	Thread 3: waiting for queue's cond var gc_status STOLEN		Copy link Message link Add to gist Remove
	Thread 2: waiting for tc->instance->cond_gc_start gc_status INTERRUPT		Copy link Message link Add to gist Remove
	Thread 1: waitinf for queue's cond var gc_status STOLEN		Copy link Message link Add to gist Remove
	So, 3 worker threads waiting for jobs to appear in the queue, blocked for GC. Thread 4 orchestrating the GC run and just waiting for the other threads to finish. But thread 2 is actually waiting for the GC run to start!	18:42	Copy link Message link Add to gist Remove
	Thread 2 is the async io thread		Copy link Message link Add to gist Remove
MasterDuke	should have known, the second child is always the troublemaker	18:48	Copy link Message link Add to gist Remove
nine	Wait a minute.... I'm a second child :D	18:49	Copy link Message link Add to gist Remove
japhb	... ;-)		Copy link Message link Add to gist Remove
	I guess I should have said: ∴‭ …‭ 😉‭	18:51	Copy link Message link Add to gist Remove
dogbert2	heh, interesting	18:52	Copy link Message link Add to gist Remove
nine	AFAICT Thread 2 should have been signalled and tc->instance->gc_start is 0 (which is the condition the thread is waiting for). No idea why it never wakes up from that uv_cond_wait	19:27	Copy link Message link Add to gist Remove
MasterDuke	potential libuv bug?	19:29	Copy link Message link Add to gist Remove
	fwiw, looks like 1.42.0 was released last month github.com/libuv/libuv/blob/v1.42.0/ChangeLog	19:32	Copy link Message link Add to gist Remove
19:36 MasterDuke58 joined 19:39 MasterDuke left
nine	libuv just wraps pthread_cond_wait and pthread_cond_broadcast	19:41	Copy link Message link Add to gist Remove
	About that segfault. Doesn't this look kinda fishy?	20:09	Copy link Message link Add to gist Remove
	#12 0x00007fb22c4a8cb0 in dispatch_polymorphic (tc=0x1902e70, entry_ptr=0x442f7c0, seen=0x5f0ce88, id=<optimized out>, callsite=<optimized out>, arg_indices=0x29330d5e7352, source=0x4393088, sf=<optimized out>, bytecode_offset=230) at src/disp/inline_cache.c:176		Copy link Message link Add to gist Remove
	#13 0x00007fb22c407d2a in MVM_interp_run (tc=0x6084068, initial_invoke=0x189, initial_invoke@entry=0x7fb22c52c6a0 <toplevel_initial_invoke>, invoke_data=0x189, invoke_data@entry=0x7fb22c52c6a0 <toplevel_initial_invoke>, outer_runloop=0xffffffffffffffff, outer_runloop@entry=0x0) at src/core/interp.c:5507		Copy link Message link Add to gist Remove
MasterDuke58	because i'm not thinking too quickly today: if a function takes a c str which may have been malloced or may not, and then throws an exception, the only way to know if it should free that str is if there's also a flag passed in (because the caller knows whether it was malloced or not)?		Copy link Message link Add to gist Remove
nine	dispatch_polymorphic is being called with a different tc than the MVM_interp_run it's called from	20:10	Copy link Message link Add to gist Remove
	MasterDuke58: yes		Copy link Message link Add to gist Remove
MasterDuke58	cool		Copy link Message link Add to gist Remove
	is it a valid tc?		Copy link Message link Add to gist Remove
nine	Yes, it's a different thread's. Actually the segfaulting one's	20:11	Copy link Message link Add to gist Remove
MasterDuke58	tut tut, those poly dispatchers, sharing tcs all over the place...	20:13	Copy link Message link Add to gist Remove
nine	Ah, the tc in dispatch_polymorphic is valid. And I believe it's also correct. It also appears in the other thread's backtrace, because the thread running dispatch_polymorphic is currently marked blocked for GC and waiting in send_log.	20:27	Copy link Message link Add to gist Remove
	The tc=0x6084068 is bogus however. I think it's just gdb getting confused.		Copy link Message link Add to gist Remove
Geth	MoarVM: MasterDuke17++ created pull request #1528: Free filename if exception when loading bytecode	20:39	Copy link Message link Add to gist Remove
nine	Oooooh....I got it!	20:42	Copy link Message link Add to gist Remove
	It's actually fairly bening. We're dealing with a half initialized frame. And the problem only occurs if logging the frame entry in the spesh log triggers a send_log and if that happens precisely when some thread wants to start a GC run.	20:43	Copy link Message link Add to gist Remove
MasterDuke58	sounds like an uncommon occurrence	20:45	Copy link Message link Add to gist Remove
nine	This GC run will happen before the call to MVM_args_proc_setup which would initialize the frame's params member. And this params member is what we trip over, because it contains bogus pointers		Copy link Message link Add to gist Remove
	Uncommon indeed. That's why it takes so many tries to catch it in rr.		Copy link Message link Add to gist Remove
20:49 dogbert2 left
nine	I'm not even sure if this is new-disp specific. Ye olde MVM_frame_invoke on master seems to have exactly the same structure as MVM_frame_dispatch.	20:50	Copy link Message link Add to gist Remove
	I dare say this is good enough work to call it a day. There's still tomorrow for a fix :)	20:51	Copy link Message link Add to gist Remove
MasterDuke58	nice. i'm about off to play some more horizon zero dawn. just took down redmaw last night	20:52	Copy link Message link Add to gist Remove
21:04 dogbert17 joined 21:36 squashable6 left 22:38 squashable6 joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!