#moarvm on 23 July 2019 - Raku Programming Language Log

github.com/moarvm/moarvm \| IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018.
00:10 lucasb left 01:10 Kaiepi left 01:14 Kaiepi joined 01:19 Kaiepi left, Kaiepi joined 02:21 pamplemousse_ left 02:32 Kaiepi left, Kaiepi joined
samcv	jnthn, neat	04:03	Copy link Message link Add to gist Remove
	do you have a link to the issue?	04:04	Copy link Message link Add to gist Remove
06:37 squashable6 left 06:40 squashable6 joined 06:53 domidumont joined 07:15 patrickb joined
patrickb	re that open suse patch: It's already in master: github.com/MoarVM/MoarVM/commit/f1...74b820eb39	07:22	Copy link Message link Add to gist Remove
	There is no weird reason for the need for that patch. When building a shared moar and installing to /usr the moarshared variable stays uninitialized and thus causes grief. That happens on every platform.	07:23	Copy link Message link Add to gist Remove
07:24 mst left, mst joined, mst left, mst joined, ChanServ sets mode: +o mst 09:06 zakharyas joined
jnthn	samcv: github.com/croservices/cro-core/issues/11	09:14	Copy link Message link Add to gist Remove
10:05 domidumont left 10:13 reportable6 left, shareable6 left, greppable6 left, committable6 left 10:14 bisectable6 left, quotable6 left, evalable6 left, shareable6 joined, committable6 joined, bisectable6 joined 10:17 greppable6 joined, quotable6 joined, evalable6 joined 10:18 reportable6 joined 10:21 sena_kun joined 10:51 sena_kun left 10:52 sena_kun joined, sena_kun left 10:53 sena_kun joined 11:00 zakharyas left 11:05 zakharyas joined 11:33 domidumont joined 11:38 domidumont left 11:52 domidumont joined 12:39 sena_kun left
nine	A bit of a worrysome segfault: #0 __GI___pthread_mutex_lock (mutex=0xffffffff00000057) at ../nptl/pthread_mutex_lock.c:67	12:59	Copy link Message link Add to gist Remove
	#1 0x00007f54a45d3d39 in uv_mutex_lock (mutex=mutex@entry=0xffffffff00000057) at 3rdparty/libuv/src/unix/thread.c:310		Copy link Message link Add to gist Remove
	#2 0x00007f54a4510096 in push (tc=0x55e2f4e69670, st=<optimized out>, root=<optimized out>, data=<optimized out>, value=..., kind=<optimized out>) at src/6model/reprs/ConcBlockingQueue.c:158		Copy link Message link Add to gist Remove
	Especially considering that it's produced (sometimes) by this rather simple script: gist.github.com/niner/0f24bdb76080...3f6376b4b0	13:00	Copy link Message link Add to gist Remove
lizmat	nine: does the program you run, actually matter ?	13:16	Copy link Message link Add to gist Remove
13:21 lucasb joined
nine	Of course I cannot reproduce the segfault when I run it with perl6-gdb-m. Though the deciding difference is that I ran it through xargs -P5 (5 parallel processes) in the segfaulting case and can only use -P1 for gdb	13:23	Copy link Message link Add to gist Remove
timotimo	hum, the processes shouldn't influence each other, but system load can sometimes change program behavior	13:25	Copy link Message link Add to gist Remove
	fwiw, rr has a "chaos" mode that perturbs scheduler events in an attempt to make weird bugs happen more often		Copy link Message link Add to gist Remove
	but iirc you're on ryzen?		Copy link Message link Add to gist Remove
nine	Oh, I can also run the other processes in a different shell of course	13:28	Copy link Message link Add to gist Remove
13:28 pamplemousse joined 13:37 sena_kun joined
nine	Which doesn't seem to make it fail either	13:45	Copy link Message link Add to gist Remove
	lizmat: yes, managed to make it segfault even when it run()s /bin/true	13:54	Copy link Message link Add to gist Remove
lizmat	yikes, but probably also the reason we see flappers in spectest	13:55	Copy link Message link Add to gist Remove
	so good that it is somewhat reproducible		Copy link Message link Add to gist Remove
13:56 pamplemousse left
nine	Just guessing, but what if between this call to MVM_gc_mark_thread_blocked github.com/MoarVM/MoarVM/blob/mast...eue.c#L159 and the following line the GC runs and collects root and thus body?	14:02	Copy link Message link Add to gist Remove
	Ah, no, can't be as root is MVM_ROOTed	14:03	Copy link Message link Add to gist Remove
	Err....not collect but move to gen2. Then the body pointer would be out of date while root would still be ok		Copy link Message link Add to gist Remove
14:06 zakharyas left
nine	Nah, the body is malloced	14:06	Copy link Message link Add to gist Remove
14:08 robertle joined
nine	I start to believe that it just doesn't happen in gdb	14:16	Copy link Message link Add to gist Remove
	nor with a debug build	14:39	Copy link Message link Add to gist Remove
15:01 domidumont left 15:17 robertle left 15:32 patrickb left
timotimo	we may actually be able to remove the "better JIT-compilation of big integer operations" entry off of the moarvm.org roadmap page	15:44	Copy link Message link Add to gist Remove
	the wording of "better optimization around closures" seems a little odd: "Today's optimizer does a poor job of, and has an inability to inline, first class functions and closures"	15:45	Copy link Message link Add to gist Remove
	either the "and" or the "an" wants negated i think?!		Copy link Message link Add to gist Remove
	and perhaps we'll want to turn the commit hashes in the releases page into links, and maybe even change the [abcdef] into [X] or [commit] and multiples into [commit, commit, commit, ...] or [commit 1, 2, 3, 4]	15:47	Copy link Message link Add to gist Remove
16:05 robertle joined 16:27 AlexDaniel left
nine	Seems like the error can be reproduced with just: perl6 -e 'my $err = run("/usr/bin/true", :err).err.slurp-rest'	16:47	Copy link Message link Add to gist Remove
	It just takes a lot of tries		Copy link Message link Add to gist Remove
	timotimo: could you try to catch the error in rr on your machine?	16:48	Copy link Message link Add to gist Remove
16:52 brrt joined, sena_kun left, sena_kun joined
brrt	\o	16:58	Copy link Message link Add to gist Remove
	pamplemousse: dev.to/jeremycmorgan/creating-trim...-core-4m08 is maybe of interest to you	16:59	Copy link Message link Add to gist Remove
	.tell pamplemousse check out dev.to/jeremycmorgan/creating-trim...-core-4m08		Copy link Message link Add to gist Remove
yoleaux	brrt: I'll pass your message to pamplemousse.		Copy link Message link Add to gist Remove
17:08 chloekek joined 17:15 pamplemousse joined 17:39 sena_kun left 17:40 sena_kun joined
nine	Oh, I finally got a coredump of the failure with debug symbols!	17:40	Copy link Message link Add to gist Remove
	Turns out, rr does work a bit on Ryzen. At least enough to run the failing program with chaos mode. It fails to replay, but I can at least open the coredump with plain gdb	17:41	Copy link Message link Add to gist Remove
	While the MVMConcBlockingQueueBody is still there, it's apparently corrupted: $5 = {head = 0xffffffff00000017, tail = 0x18001100000001, elems = 67124680, head_lock = {__data = {__lock = 23, __count = 4294967295, __owner = 1, __nusers = 1572881, __kind = 67124880, __spins = 0	17:44	Copy link Message link Add to gist Remove
17:46 sena_kun left, sena_kun joined
nine	Oh, and the ConcBlockingQueue in question is the ThreadPoolScheduler::Queue	17:58	Copy link Message link Add to gist Remove
pamplemousse	brrt: Thanks for the article! I looked at using the self contained executables as a model for what to do, but when I was digging through .NET Core's implementation of it, realized it probably wasn't the most viable way for me to attempt it if I wanted to finish by August, so have been mostly using the framework dependent executable as inspiration. I'm hoping to keep moving towards having a fully self contained executable as an option though	18:00	Copy link Message link Add to gist Remove
yoleaux	16:59Z <brrt> pamplemousse: check out dev.to/jeremycmorgan/creating-trim...-core-4m08		Copy link Message link Add to gist Remove
brrt	nine: that's bad....	18:15	Copy link Message link Add to gist Remove
	pamplemousse: cool, hoped you'd find it interesting	18:16	Copy link Message link Add to gist Remove
	I think, regarding your project, 'there's more than one way to do it' applies		Copy link Message link Add to gist Remove
18:20 zakharyas joined
nine	With a 4K nursery size, it fails more often with perl6: src/6model/sc.c:401: MVM_SC_WB_OBJ: Assertion `!(obj->header.flags & MVM_CF_FORWARDER_VALID)' failed. trying to bindkey_o on an MVMContext	18:25	Copy link Message link Add to gist Remove
brrt	I'm afk. Hope the european folks are handling the heat wave well	18:27	Copy link Message link Add to gist Remove
18:28 brrt left
nine	It's not much, but at least I now know that the ThreadPoolScheduler::Queue does not get freed by the GC. So it's not a use-after-free situation	18:40	Copy link Message link Add to gist Remove
	Also curious. body always seems to be 0xffffffff00000017		Copy link Message link Add to gist Remove
	Huh....the MVM_SC_WB_OBJ assertion failure is the binder when calling the multi ThreadPoolScheduler.cue which...would push onto a ConcBlockingQueue. Coincidence? I think not.	19:02	Copy link Message link Add to gist Remove
19:08 domidumont joined 19:12 domidumont left 19:17 MasterDuke joined
nine	The error seems to happen when Proc::Async's done handler tries to keep the exit_promise.	19:23	Copy link Message link Add to gist Remove
	Now if only I knew what the assert(!(obj->header.flags & MVM_CF_FORWARDER_VALID)); is trying to prevent...		Copy link Message link Add to gist Remove
19:25 sena_kun left, sena_kun joined
nine	Or what MVM_SC_WB_OBJ does in general. It appears to me that it doesn't like to run concurrently with the GC	19:38	Copy link Message link Add to gist Remove
	OTOH every other thread is only waiting for GC to start	19:40	Copy link Message link Add to gist Remove
	Ah, I understand. When the GC copies an object from the old nursery to the new one (and probably to gen2) it stores the pointer to the copy in the old object's header.sc_forward_u.forwarder and sets the MVM_CF_FORWARDER_VALID flag to indicate that the sc part of the union is invalid	19:45	Copy link Message link Add to gist Remove
	In my case the MVM_CF_NURSERY_SEEN flag is set, so the object (the 'cue' method's lexpad) is in the new nursery	19:48	Copy link Message link Add to gist Remove
19:51 zakharyas left
nine	But I guess the MVM_CF_FORWARDER_VALID should only be a temporary state during GC? So how can it be set when everyone's still waiting to start with GC?	20:07	Copy link Message link Add to gist Remove
	So....maybe it's a leftover from a previous GC run? That'd be the case with a missing MVM_ROOT I guess	20:08	Copy link Message link Add to gist Remove
	Ok. bindkey_o gets the hashy object from the register into obj, then calls the bind_key repr function, then calls MVM_SC_WB_OBJ on obj. But MVMContext's bind_key may actually allocate, triggering the GC. So interp's obj may indeed be out of date after bind_key.	20:17	Copy link Message link Add to gist Remove
	Now since registers are rooted automatically, I wonder what's better? MVMROOT obj or just access the original register again?	20:22	Copy link Message link Add to gist Remove
	Well accessing the register must be faster	20:31	Copy link Message link Add to gist Remove
20:33 pamplemousse left 21:12 chloekek left
jnthn	nine: Sorry, was busy earlier and then afk now so didn't get to follow the debugging...	21:15	Copy link Message link Add to gist Remove
	nine: Seeing something with MVM_CF_FORWARDER_VALID set when not in the middle of a GC run means dealing with an out of date pointer to an object that has moved		Copy link Message link Add to gist Remove
	nine: You may get more clues if setting MVM_GC_DEBUG to 1, which checks for object references in fromspace, BUT the slowdown may hide what sounds like a very time-sensitive bug.	21:16	Copy link Message link Add to gist Remove
	Hm, bind_key causing allocation is...likely to be an issue	21:17	Copy link Message link Add to gist Remove
21:20 robertle left
Geth	MoarVM: 0082687ec9 \| (Stefan Seifert)++ \| src/core/interp.c Fix possible memory corruption in bindkey_* bindkey reads the target object from a register, calls the bind_key repr function and then calls MVM_SC_WB_OBJ with the object. The repr function however may allocate and thus trigger a GC run which may move the target object. In that case we'd end up calling MVM_SC_WB_OBJ on the outdated copy of the object. Fix by reading it fresh from the register as those get updated automatically by the GC.	21:35	Copy link Message link Add to gist Remove
nine	That got me some 10K runs without error. But the loop running it in rr ended with: MoarVM panic: Adding pointer 0x6c272c001460 to past fromspace to GC worklist	21:36	Copy link Message link Add to gist Remove
	So there may be another issue still		Copy link Message link Add to gist Remove
	jnthn: well good to know that I was on the right track :)	21:37	Copy link Message link Add to gist Remove
	tc->nursery_alloc is 0x6c272c001460, i.e. the same address as the work item	21:42	Copy link Message link Add to gist Remove
jnthn	hmm, where's the report of the thing I just pushed...	21:44	Copy link Message link Add to gist Remove
	So I discovered my debug/profiler concurrency fixes the other day managed to break something, and just patched it.		Copy link Message link Add to gist Remove
nine	jnthn: maybe the push got rejected because mine came in between?		Copy link Message link Add to gist Remove
timotimo	it's been a quarter hour though :)		Copy link Message link Add to gist Remove
nine	oh		Copy link Message link Add to gist Remove
jnthn	nine: No, I pulled first :)	21:45	Copy link Message link Add to gist Remove
lizmat	d80e296c82e6a2b65256 is what I see after a git pull		Copy link Message link Add to gist Remove
jnthn	And it shows on github		Copy link Message link Add to gist Remove
lizmat	Unbreak debugger instrumentation		Copy link Message link Add to gist Remove
jnthn	That's the one		Copy link Message link Add to gist Remove
nine	yep, seeing it too		Copy link Message link Add to gist Remove
lizmat	so I guess Geth is forgetting / awol	21:46	Copy link Message link Add to gist Remove
jnthn	OK, just GitHub being slow sending notifications		Copy link Message link Add to gist Remove
	Well, or Geth		Copy link Message link Add to gist Remove
	I could believe the two equally :)		Copy link Message link Add to gist Remove
	nine: Which bind_key allocates, btw?		Copy link Message link Add to gist Remove
nine	jnthn: the one in MVMContext	21:47	Copy link Message link Add to gist Remove
	because frame walker		Copy link Message link Add to gist Remove
jnthn	oops, I didn't think fw allocated though...	21:48	Copy link Message link Add to gist Remove
	ohh...hmm		Copy link Message link Add to gist Remove
nine	The presence of MVM_gc_root_temp_push in bind_key strongly suggests to me that allocation might happen ;)	21:49	Copy link Message link Add to gist Remove
jnthn	oh, hmm, it passes 1 to vivify...	21:50	Copy link Message link Add to gist Remove
	but why is it vivifying a lexical it's about to bind to...		Copy link Message link Add to gist Remove
nine	jnthn: how likely do you think it is that the "Adding pointer 0x6c272c001460 to past fromspace to GC worklist" is just an off-by-one error in the GC debug check?	21:51	Copy link Message link Add to gist Remove
jnthn	hmm	21:52	Copy link Message link Add to gist Remove
	if (thread_tc && thread_tc->nursery_fromspace && \		Copy link Message link Add to gist Remove
	(char )(c) >= (char )thread_tc->nursery_fromspace && \		Copy link Message link Add to gist Remove
	(char )(c) < (char )thread_tc->nursery_fromspace + \		Copy link Message link Add to gist Remove
	thread_tc->nursery_fromspace_size) \		Copy link Message link Add to gist Remove
	If c is the pointer and thread_tc->nursery_fromspace is the start of the fromspace then I'd think being equal just means the thing is allocated right at the start of it		Copy link Message link Add to gist Remove
	So it doens't immediately look wrong to me	21:53	Copy link Message link Add to gist Remove
nine	Isn't this the check? if ((char )item_to_add >= (char *)tc->nursery_alloc && \		Copy link Message link Add to gist Remove
	(char )item_to_add < (char *)tc->nursery_alloc_limit) \		Copy link Message link Add to gist Remove
	MVM_panic(1, "Adding pointer %p to past fromspace to GC worklist", \		Copy link Message link Add to gist Remove
jnthn	Oh, bub that's the wrong check...	21:54	Copy link Message link Add to gist Remove
	Right, two similar errors :)		Copy link Message link Add to gist Remove
	But same logic applies, I think		Copy link Message link Add to gist Remove
	It doesn't look wrong to me		Copy link Message link Add to gist Remove
nine	Ok, just checking. Would have been nice to find the much less often run debug code being wrong :)	21:57	Copy link Message link Add to gist Remove
jnthn	Yes, indeed		Copy link Message link Add to gist Remove
nine	Anyway, running stuff with a tiny nursery shows that there are still a couple of GC related issues left...	21:58	Copy link Message link Add to gist Remove
jnthn	Did you ever get to the bottom of that other framewalker issue, btw?	22:00	Copy link Message link Add to gist Remove
nine	What was that?		Copy link Message link Add to gist Remove
jnthn	github.com/MoarVM/MoarVM/issues/1113	22:01	Copy link Message link Add to gist Remove
nine	Oh, no, I almost forgot about it. Our main issue has been a deadlock that I fixed yesterday. Segfaults are rather trivial to work around using systemd's Restart=always ;)	22:03	Copy link Message link Add to gist Remove
jnthn	hah :)		Copy link Message link Add to gist Remove
nine	Maybe I can have another look at it tomorrow		Copy link Message link Add to gist Remove
jnthn	Somehow it triggers a lot more often on MacOs		Copy link Message link Add to gist Remove
	No idea why		Copy link Message link Add to gist Remove
nine	Maybe it's just coincidence. But as long as we don't know how it comes about...	22:04	Copy link Message link Add to gist Remove
	jnthn is trying to get his PerlCon prep done this week so he doesn't have to do any (or at least much) of it during his vacation next week :)		Copy link Message link Add to gist Remove
nine	That sounds like a sensible plan!	22:05	Copy link Message link Add to gist Remove
	I'm a bit sad to miss PerlCon this year. But only a little as the reason is me getting married this Thursday :)	22:07	Copy link Message link Add to gist Remove
jnthn	Oh! That's an excellent reason to miss it! Congratulations; have a lovely day.		Copy link Message link Add to gist Remove
timotimo	oh, congrats :)		Copy link Message link Add to gist Remove
nine	Thank you :)	22:10	Copy link Message link Add to gist Remove
	Reminds me...I should go to bed now. Good night!	22:11	Copy link Message link Add to gist Remove
jnthn	Rest well; 'night o/	22:14	Copy link Message link Add to gist Remove
22:43 pamplemousse joined 23:42 pamplemousse left 23:43 pamplemousse joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!