github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:10 lucasb left 01:10 Kaiepi left 01:14 Kaiepi joined 01:19 Kaiepi left, Kaiepi joined 02:21 pamplemousse_ left 02:32 Kaiepi left, Kaiepi joined
samcv jnthn, neat 04:03
do you have a link to the issue? 04:04
06:37 squashable6 left 06:40 squashable6 joined 06:53 domidumont joined 07:15 patrickb joined
patrickb re that open suse patch: It's already in master: github.com/MoarVM/MoarVM/commit/f1...74b820eb39 07:22
There is no weird reason for the need for that patch. When building a shared moar and installing to /usr the moarshared variable stays uninitialized and thus causes grief. That happens on every platform. 07:23
07:24 mst left, mst joined, mst left, mst joined, ChanServ sets mode: +o mst 09:06 zakharyas joined
jnthn samcv: github.com/croservices/cro-core/issues/11 09:14
10:05 domidumont left 10:13 reportable6 left, shareable6 left, greppable6 left, committable6 left 10:14 bisectable6 left, quotable6 left, evalable6 left, shareable6 joined, committable6 joined, bisectable6 joined 10:17 greppable6 joined, quotable6 joined, evalable6 joined 10:18 reportable6 joined 10:21 sena_kun joined 10:51 sena_kun left 10:52 sena_kun joined, sena_kun left 10:53 sena_kun joined 11:00 zakharyas left 11:05 zakharyas joined 11:33 domidumont joined 11:38 domidumont left 11:52 domidumont joined 12:39 sena_kun left
nine A bit of a worrysome segfault: #0 __GI___pthread_mutex_lock (mutex=0xffffffff00000057) at ../nptl/pthread_mutex_lock.c:67 12:59
#1 0x00007f54a45d3d39 in uv_mutex_lock (mutex=mutex@entry=0xffffffff00000057) at 3rdparty/libuv/src/unix/thread.c:310
#2 0x00007f54a4510096 in push (tc=0x55e2f4e69670, st=<optimized out>, root=<optimized out>, data=<optimized out>, value=..., kind=<optimized out>) at src/6model/reprs/ConcBlockingQueue.c:158
Especially considering that it's produced (sometimes) by this rather simple script: gist.github.com/niner/0f24bdb76080...3f6376b4b0 13:00
lizmat nine: does the program you run, actually matter ? 13:16
13:21 lucasb joined
nine Of course I cannot reproduce the segfault when I run it with perl6-gdb-m. Though the deciding difference is that I ran it through xargs -P5 (5 parallel processes) in the segfaulting case and can only use -P1 for gdb 13:23
timotimo hum, the processes shouldn't influence each other, but system load can sometimes change program behavior 13:25
fwiw, rr has a "chaos" mode that perturbs scheduler events in an attempt to make weird bugs happen more often
but iirc you're on ryzen?
nine Oh, I can also run the other processes in a different shell of course 13:28
13:28 pamplemousse joined 13:37 sena_kun joined
nine Which doesn't seem to make it fail either 13:45
lizmat: yes, managed to make it segfault even when it run()s /bin/true 13:54
lizmat yikes, but probably also the reason we see flappers in spectest 13:55
so good that it is somewhat reproducible
13:56 pamplemousse left
nine Just guessing, but what if between this call to MVM_gc_mark_thread_blocked github.com/MoarVM/MoarVM/blob/mast...eue.c#L159 and the following line the GC runs and collects root and thus body? 14:02
Ah, no, can't be as root is MVM_ROOTed 14:03
Err....not collect but move to gen2. Then the body pointer would be out of date while root would still be ok
14:06 zakharyas left
nine Nah, the body is malloced 14:06
14:08 robertle joined
nine I start to believe that it just doesn't happen in gdb 14:16
nor with a debug build 14:39
15:01 domidumont left 15:17 robertle left 15:32 patrickb left
timotimo we may actually be able to remove the "better JIT-compilation of big integer operations" entry off of the moarvm.org roadmap page 15:44
the wording of "better optimization around closures" seems a little odd: "Today's optimizer does a poor job of, and has an inability to inline, first class functions and closures" 15:45
either the "and" or the "an" wants negated i think?!
and perhaps we'll want to turn the commit hashes in the releases page into links, and maybe even change the [abcdef] into [X] or [commit] and multiples into [commit, commit, commit, ...] or [commit 1, 2, 3, 4] 15:47
16:05 robertle joined 16:27 AlexDaniel left
nine Seems like the error can be reproduced with just: perl6 -e 'my $err = run("/usr/bin/true", :err).err.slurp-rest' 16:47
It just takes a lot of tries
timotimo: could you try to catch the error in rr on your machine? 16:48
16:52 brrt joined, sena_kun left, sena_kun joined
brrt \o 16:58
pamplemousse: dev.to/jeremycmorgan/creating-trim...-core-4m08 is maybe of interest to you 16:59
.tell pamplemousse check out dev.to/jeremycmorgan/creating-trim...-core-4m08
yoleaux brrt: I'll pass your message to pamplemousse.
17:08 chloekek joined 17:15 pamplemousse joined 17:39 sena_kun left 17:40 sena_kun joined
nine Oh, I finally got a coredump of the failure with debug symbols! 17:40
Turns out, rr does work a bit on Ryzen. At least enough to run the failing program with chaos mode. It fails to replay, but I can at least open the coredump with plain gdb 17:41
While the MVMConcBlockingQueueBody is still there, it's apparently corrupted: $5 = {head = 0xffffffff00000017, tail = 0x18001100000001, elems = 67124680, head_lock = {__data = {__lock = 23, __count = 4294967295, __owner = 1, __nusers = 1572881, __kind = 67124880, __spins = 0 17:44
17:46 sena_kun left, sena_kun joined
nine Oh, and the ConcBlockingQueue in question is the ThreadPoolScheduler::Queue 17:58
pamplemousse brrt: Thanks for the article! I looked at using the self contained executables as a model for what to do, but when I was digging through .NET Core's implementation of it, realized it probably wasn't the most viable way for me to attempt it if I wanted to finish by August, so have been mostly using the framework dependent executable as inspiration. I'm hoping to keep moving towards having a fully self contained executable as an option though 18:00
yoleaux 16:59Z <brrt> pamplemousse: check out dev.to/jeremycmorgan/creating-trim...-core-4m08
brrt nine: that's bad.... 18:15
pamplemousse: cool, hoped you'd find it interesting 18:16
I think, regarding your project, 'there's more than one way to do it' applies
18:20 zakharyas joined
nine With a 4K nursery size, it fails more often with perl6: src/6model/sc.c:401: MVM_SC_WB_OBJ: Assertion `!(obj->header.flags & MVM_CF_FORWARDER_VALID)' failed. trying to bindkey_o on an MVMContext 18:25
brrt I'm afk. Hope the european folks are handling the heat wave well 18:27
18:28 brrt left
nine It's not much, but at least I now know that the ThreadPoolScheduler::Queue does not get freed by the GC. So it's not a use-after-free situation 18:40
Also curious. body always seems to be 0xffffffff00000017
Huh....the MVM_SC_WB_OBJ assertion failure is the binder when calling the multi ThreadPoolScheduler.cue which...would push onto a ConcBlockingQueue. Coincidence? I think not. 19:02
19:08 domidumont joined 19:12 domidumont left 19:17 MasterDuke joined
nine The error seems to happen when Proc::Async's done handler tries to keep the exit_promise. 19:23
Now if only I knew what the assert(!(obj->header.flags & MVM_CF_FORWARDER_VALID)); is trying to prevent...
19:25 sena_kun left, sena_kun joined
nine Or what MVM_SC_WB_OBJ does in general. It appears to me that it doesn't like to run concurrently with the GC 19:38
OTOH every other thread is only waiting for GC to start 19:40
Ah, I understand. When the GC copies an object from the old nursery to the new one (and probably to gen2) it stores the pointer to the copy in the old object's header.sc_forward_u.forwarder and sets the MVM_CF_FORWARDER_VALID flag to indicate that the sc part of the union is invalid 19:45
In my case the MVM_CF_NURSERY_SEEN flag is set, so the object (the 'cue' method's lexpad) is in the new nursery 19:48
19:51 zakharyas left
nine But I guess the MVM_CF_FORWARDER_VALID should only be a temporary state during GC? So how can it be set when everyone's still waiting to start with GC? 20:07
So....maybe it's a leftover from a previous GC run? That'd be the case with a missing MVM_ROOT I guess 20:08
Ok. bindkey_o gets the hashy object from the register into obj, then calls the bind_key repr function, then calls MVM_SC_WB_OBJ on obj. But MVMContext's bind_key may actually allocate, triggering the GC. So interp's obj may indeed be out of date after bind_key. 20:17
Now since registers are rooted automatically, I wonder what's better? MVMROOT obj or just access the original register again? 20:22
Well accessing the register must be faster 20:31
20:33 pamplemousse left 21:12 chloekek left
jnthn nine: Sorry, was busy earlier and then afk now so didn't get to follow the debugging... 21:15
nine: Seeing something with MVM_CF_FORWARDER_VALID set when not in the middle of a GC run means dealing with an out of date pointer to an object that has moved
nine: You may get more clues if setting MVM_GC_DEBUG to 1, which checks for object references in fromspace, BUT the slowdown may hide what sounds like a very time-sensitive bug. 21:16
Hm, bind_key causing allocation is...likely to be an issue 21:17
21:20 robertle left
Geth MoarVM: 0082687ec9 | (Stefan Seifert)++ | src/core/interp.c
Fix possible memory corruption in bindkey_*

bindkey reads the target object from a register, calls the bind_key repr function and then calls MVM_SC_WB_OBJ with the object. The repr function however may allocate and thus trigger a GC run which may move the target object. In that case we'd end up calling MVM_SC_WB_OBJ on the outdated copy of the object. Fix by reading it fresh from the register as those get updated automatically by the GC.
21:35
nine That got me some 10K runs without error. But the loop running it in rr ended with: MoarVM panic: Adding pointer 0x6c272c001460 to past fromspace to GC worklist 21:36
So there may be another issue still
jnthn: well good to know that I was on the right track :) 21:37
tc->nursery_alloc is 0x6c272c001460, i.e. the same address as the work item 21:42
jnthn hmm, where's the report of the thing I just pushed... 21:44
So I discovered my debug/profiler concurrency fixes the other day managed to break something, and just patched it.
nine jnthn: maybe the push got rejected because mine came in between?
timotimo it's been a quarter hour though :)
nine oh
jnthn nine: No, I pulled first :) 21:45
lizmat d80e296c82e6a2b65256 is what I see after a git pull
jnthn And it shows on github
lizmat Unbreak debugger instrumentation
jnthn That's the one
nine yep, seeing it too
lizmat so I guess Geth is forgetting / awol 21:46
jnthn OK, just GitHub being slow sending notifications
Well, or Geth
I could believe the two equally :)
nine: Which bind_key allocates, btw?
nine jnthn: the one in MVMContext 21:47
because frame walker
jnthn oops, I didn't think fw allocated though... 21:48
ohh...hmm
nine The presence of MVM_gc_root_temp_push in bind_key strongly suggests to me that allocation might happen ;) 21:49
jnthn oh, hmm, it passes 1 to vivify... 21:50
but why is it vivifying a lexical it's about to bind to...
nine jnthn: how likely do you think it is that the "Adding pointer 0x6c272c001460 to past fromspace to GC worklist" is just an off-by-one error in the GC debug check? 21:51
jnthn hmm 21:52
if (thread_tc && thread_tc->nursery_fromspace && \
(char *)(c) >= (char *)thread_tc->nursery_fromspace && \
(char *)(c) < (char *)thread_tc->nursery_fromspace + \
thread_tc->nursery_fromspace_size) \
If c is the pointer and thread_tc->nursery_fromspace is the start of the fromspace then I'd think being equal just means the thing is allocated right at the start of it
So it doens't immediately look wrong to me 21:53
nine Isn't this the check? if ((char *)*item_to_add >= (char *)tc->nursery_alloc && \
(char *)*item_to_add < (char *)tc->nursery_alloc_limit) \
MVM_panic(1, "Adding pointer %p to past fromspace to GC worklist", \
jnthn Oh, bub that's the wrong check... 21:54
Right, two similar errors :)
But same logic applies, I think
It doesn't look wrong to me
nine Ok, just checking. Would have been nice to find the much less often run debug code being wrong :) 21:57
jnthn Yes, indeed
nine Anyway, running stuff with a tiny nursery shows that there are still a couple of GC related issues left... 21:58
jnthn Did you ever get to the bottom of that other framewalker issue, btw? 22:00
nine What was that?
jnthn github.com/MoarVM/MoarVM/issues/1113 22:01
nine Oh, no, I almost forgot about it. Our main issue has been a deadlock that I fixed yesterday. Segfaults are rather trivial to work around using systemd's Restart=always ;) 22:03
jnthn hah :)
nine Maybe I can have another look at it tomorrow
jnthn Somehow it triggers a lot more often on MacOs
No idea why
nine Maybe it's just coincidence. But as long as we don't know how it comes about... 22:04
jnthn is trying to get his PerlCon prep done this week so he doesn't have to do any (or at least much) of it during his vacation next week :)
nine That sounds like a sensible plan! 22:05
I'm a bit sad to miss PerlCon this year. But only a little as the reason is me getting married this Thursday :) 22:07
jnthn Oh! That's an excellent reason to miss it! Congratulations; have a lovely day.
timotimo oh, congrats :)
nine Thank you :) 22:10
Reminds me...I should go to bed now. Good night! 22:11
jnthn Rest well; 'night o/ 22:14
22:43 pamplemousse joined 23:42 pamplemousse left 23:43 pamplemousse joined