github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:10
lucasb left
01:10
Kaiepi left
01:14
Kaiepi joined
01:19
Kaiepi left,
Kaiepi joined
02:21
pamplemousse_ left
02:32
Kaiepi left,
Kaiepi joined
|
|||
samcv | jnthn, neat | 04:03 | |
do you have a link to the issue? | 04:04 | ||
06:37
squashable6 left
06:40
squashable6 joined
06:53
domidumont joined
07:15
patrickb joined
|
|||
patrickb | re that open suse patch: It's already in master: github.com/MoarVM/MoarVM/commit/f1...74b820eb39 | 07:22 | |
There is no weird reason for the need for that patch. When building a shared moar and installing to /usr the moarshared variable stays uninitialized and thus causes grief. That happens on every platform. | 07:23 | ||
07:24
mst left,
mst joined,
mst left,
mst joined,
ChanServ sets mode: +o mst
09:06
zakharyas joined
|
|||
jnthn | samcv: github.com/croservices/cro-core/issues/11 | 09:14 | |
10:05
domidumont left
10:13
reportable6 left,
shareable6 left,
greppable6 left,
committable6 left
10:14
bisectable6 left,
quotable6 left,
evalable6 left,
shareable6 joined,
committable6 joined,
bisectable6 joined
10:17
greppable6 joined,
quotable6 joined,
evalable6 joined
10:18
reportable6 joined
10:21
sena_kun joined
10:51
sena_kun left
10:52
sena_kun joined,
sena_kun left
10:53
sena_kun joined
11:00
zakharyas left
11:05
zakharyas joined
11:33
domidumont joined
11:38
domidumont left
11:52
domidumont joined
12:39
sena_kun left
|
|||
nine | A bit of a worrysome segfault: #0 __GI___pthread_mutex_lock (mutex=0xffffffff00000057) at ../nptl/pthread_mutex_lock.c:67 | 12:59 | |
#1 0x00007f54a45d3d39 in uv_mutex_lock (mutex=mutex@entry=0xffffffff00000057) at 3rdparty/libuv/src/unix/thread.c:310 | |||
#2 0x00007f54a4510096 in push (tc=0x55e2f4e69670, st=<optimized out>, root=<optimized out>, data=<optimized out>, value=..., kind=<optimized out>) at src/6model/reprs/ConcBlockingQueue.c:158 | |||
Especially considering that it's produced (sometimes) by this rather simple script: gist.github.com/niner/0f24bdb76080...3f6376b4b0 | 13:00 | ||
lizmat | nine: does the program you run, actually matter ? | 13:16 | |
13:21
lucasb joined
|
|||
nine | Of course I cannot reproduce the segfault when I run it with perl6-gdb-m. Though the deciding difference is that I ran it through xargs -P5 (5 parallel processes) in the segfaulting case and can only use -P1 for gdb | 13:23 | |
timotimo | hum, the processes shouldn't influence each other, but system load can sometimes change program behavior | 13:25 | |
fwiw, rr has a "chaos" mode that perturbs scheduler events in an attempt to make weird bugs happen more often | |||
but iirc you're on ryzen? | |||
nine | Oh, I can also run the other processes in a different shell of course | 13:28 | |
13:28
pamplemousse joined
13:37
sena_kun joined
|
|||
nine | Which doesn't seem to make it fail either | 13:45 | |
lizmat: yes, managed to make it segfault even when it run()s /bin/true | 13:54 | ||
lizmat | yikes, but probably also the reason we see flappers in spectest | 13:55 | |
so good that it is somewhat reproducible | |||
13:56
pamplemousse left
|
|||
nine | Just guessing, but what if between this call to MVM_gc_mark_thread_blocked github.com/MoarVM/MoarVM/blob/mast...eue.c#L159 and the following line the GC runs and collects root and thus body? | 14:02 | |
Ah, no, can't be as root is MVM_ROOTed | 14:03 | ||
Err....not collect but move to gen2. Then the body pointer would be out of date while root would still be ok | |||
14:06
zakharyas left
|
|||
nine | Nah, the body is malloced | 14:06 | |
14:08
robertle joined
|
|||
nine | I start to believe that it just doesn't happen in gdb | 14:16 | |
nor with a debug build | 14:39 | ||
15:01
domidumont left
15:17
robertle left
15:32
patrickb left
|
|||
timotimo | we may actually be able to remove the "better JIT-compilation of big integer operations" entry off of the moarvm.org roadmap page | 15:44 | |
the wording of "better optimization around closures" seems a little odd: "Today's optimizer does a poor job of, and has an inability to inline, first class functions and closures" | 15:45 | ||
either the "and" or the "an" wants negated i think?! | |||
and perhaps we'll want to turn the commit hashes in the releases page into links, and maybe even change the [abcdef] into [X] or [commit] and multiples into [commit, commit, commit, ...] or [commit 1, 2, 3, 4] | 15:47 | ||
16:05
robertle joined
16:27
AlexDaniel left
|
|||
nine | Seems like the error can be reproduced with just: perl6 -e 'my $err = run("/usr/bin/true", :err).err.slurp-rest' | 16:47 | |
It just takes a lot of tries | |||
timotimo: could you try to catch the error in rr on your machine? | 16:48 | ||
16:52
brrt joined,
sena_kun left,
sena_kun joined
|
|||
brrt | \o | 16:58 | |
pamplemousse: dev.to/jeremycmorgan/creating-trim...-core-4m08 is maybe of interest to you | 16:59 | ||
.tell pamplemousse check out dev.to/jeremycmorgan/creating-trim...-core-4m08 | |||
yoleaux | brrt: I'll pass your message to pamplemousse. | ||
17:08
chloekek joined
17:15
pamplemousse joined
17:39
sena_kun left
17:40
sena_kun joined
|
|||
nine | Oh, I finally got a coredump of the failure with debug symbols! | 17:40 | |
Turns out, rr does work a bit on Ryzen. At least enough to run the failing program with chaos mode. It fails to replay, but I can at least open the coredump with plain gdb | 17:41 | ||
While the MVMConcBlockingQueueBody is still there, it's apparently corrupted: $5 = {head = 0xffffffff00000017, tail = 0x18001100000001, elems = 67124680, head_lock = {__data = {__lock = 23, __count = 4294967295, __owner = 1, __nusers = 1572881, __kind = 67124880, __spins = 0 | 17:44 | ||
17:46
sena_kun left,
sena_kun joined
|
|||
nine | Oh, and the ConcBlockingQueue in question is the ThreadPoolScheduler::Queue | 17:58 | |
pamplemousse | brrt: Thanks for the article! I looked at using the self contained executables as a model for what to do, but when I was digging through .NET Core's implementation of it, realized it probably wasn't the most viable way for me to attempt it if I wanted to finish by August, so have been mostly using the framework dependent executable as inspiration. I'm hoping to keep moving towards having a fully self contained executable as an option though | 18:00 | |
yoleaux | 16:59Z <brrt> pamplemousse: check out dev.to/jeremycmorgan/creating-trim...-core-4m08 | ||
brrt | nine: that's bad.... | 18:15 | |
pamplemousse: cool, hoped you'd find it interesting | 18:16 | ||
I think, regarding your project, 'there's more than one way to do it' applies | |||
18:20
zakharyas joined
|
|||
nine | With a 4K nursery size, it fails more often with perl6: src/6model/sc.c:401: MVM_SC_WB_OBJ: Assertion `!(obj->header.flags & MVM_CF_FORWARDER_VALID)' failed. trying to bindkey_o on an MVMContext | 18:25 | |
brrt | I'm afk. Hope the european folks are handling the heat wave well | 18:27 | |
18:28
brrt left
|
|||
nine | It's not much, but at least I now know that the ThreadPoolScheduler::Queue does not get freed by the GC. So it's not a use-after-free situation | 18:40 | |
Also curious. body always seems to be 0xffffffff00000017 | |||
Huh....the MVM_SC_WB_OBJ assertion failure is the binder when calling the multi ThreadPoolScheduler.cue which...would push onto a ConcBlockingQueue. Coincidence? I think not. | 19:02 | ||
19:08
domidumont joined
19:12
domidumont left
19:17
MasterDuke joined
|
|||
nine | The error seems to happen when Proc::Async's done handler tries to keep the exit_promise. | 19:23 | |
Now if only I knew what the assert(!(obj->header.flags & MVM_CF_FORWARDER_VALID)); is trying to prevent... | |||
19:25
sena_kun left,
sena_kun joined
|
|||
nine | Or what MVM_SC_WB_OBJ does in general. It appears to me that it doesn't like to run concurrently with the GC | 19:38 | |
OTOH every other thread is only waiting for GC to start | 19:40 | ||
Ah, I understand. When the GC copies an object from the old nursery to the new one (and probably to gen2) it stores the pointer to the copy in the old object's header.sc_forward_u.forwarder and sets the MVM_CF_FORWARDER_VALID flag to indicate that the sc part of the union is invalid | 19:45 | ||
In my case the MVM_CF_NURSERY_SEEN flag is set, so the object (the 'cue' method's lexpad) is in the new nursery | 19:48 | ||
19:51
zakharyas left
|
|||
nine | But I guess the MVM_CF_FORWARDER_VALID should only be a temporary state during GC? So how can it be set when everyone's still waiting to start with GC? | 20:07 | |
So....maybe it's a leftover from a previous GC run? That'd be the case with a missing MVM_ROOT I guess | 20:08 | ||
Ok. bindkey_o gets the hashy object from the register into obj, then calls the bind_key repr function, then calls MVM_SC_WB_OBJ on obj. But MVMContext's bind_key may actually allocate, triggering the GC. So interp's obj may indeed be out of date after bind_key. | 20:17 | ||
Now since registers are rooted automatically, I wonder what's better? MVMROOT obj or just access the original register again? | 20:22 | ||
Well accessing the register must be faster | 20:31 | ||
20:33
pamplemousse left
21:12
chloekek left
|
|||
jnthn | nine: Sorry, was busy earlier and then afk now so didn't get to follow the debugging... | 21:15 | |
nine: Seeing something with MVM_CF_FORWARDER_VALID set when not in the middle of a GC run means dealing with an out of date pointer to an object that has moved | |||
nine: You may get more clues if setting MVM_GC_DEBUG to 1, which checks for object references in fromspace, BUT the slowdown may hide what sounds like a very time-sensitive bug. | 21:16 | ||
Hm, bind_key causing allocation is...likely to be an issue | 21:17 | ||
21:20
robertle left
|
|||
Geth | MoarVM: 0082687ec9 | (Stefan Seifert)++ | src/core/interp.c Fix possible memory corruption in bindkey_* bindkey reads the target object from a register, calls the bind_key repr function and then calls MVM_SC_WB_OBJ with the object. The repr function however may allocate and thus trigger a GC run which may move the target object. In that case we'd end up calling MVM_SC_WB_OBJ on the outdated copy of the object. Fix by reading it fresh from the register as those get updated automatically by the GC. |
21:35 | |
nine | That got me some 10K runs without error. But the loop running it in rr ended with: MoarVM panic: Adding pointer 0x6c272c001460 to past fromspace to GC worklist | 21:36 | |
So there may be another issue still | |||
jnthn: well good to know that I was on the right track :) | 21:37 | ||
tc->nursery_alloc is 0x6c272c001460, i.e. the same address as the work item | 21:42 | ||
jnthn | hmm, where's the report of the thing I just pushed... | 21:44 | |
So I discovered my debug/profiler concurrency fixes the other day managed to break something, and just patched it. | |||
nine | jnthn: maybe the push got rejected because mine came in between? | ||
timotimo | it's been a quarter hour though :) | ||
nine | oh | ||
jnthn | nine: No, I pulled first :) | 21:45 | |
lizmat | d80e296c82e6a2b65256 is what I see after a git pull | ||
jnthn | And it shows on github | ||
lizmat | Unbreak debugger instrumentation | ||
jnthn | That's the one | ||
nine | yep, seeing it too | ||
lizmat | so I guess Geth is forgetting / awol | 21:46 | |
jnthn | OK, just GitHub being slow sending notifications | ||
Well, or Geth | |||
I could believe the two equally :) | |||
nine: Which bind_key allocates, btw? | |||
nine | jnthn: the one in MVMContext | 21:47 | |
because frame walker | |||
jnthn | oops, I didn't think fw allocated though... | 21:48 | |
ohh...hmm | |||
nine | The presence of MVM_gc_root_temp_push in bind_key strongly suggests to me that allocation might happen ;) | 21:49 | |
jnthn | oh, hmm, it passes 1 to vivify... | 21:50 | |
but why is it vivifying a lexical it's about to bind to... | |||
nine | jnthn: how likely do you think it is that the "Adding pointer 0x6c272c001460 to past fromspace to GC worklist" is just an off-by-one error in the GC debug check? | 21:51 | |
jnthn | hmm | 21:52 | |
if (thread_tc && thread_tc->nursery_fromspace && \ | |||
(char *)(c) >= (char *)thread_tc->nursery_fromspace && \ | |||
(char *)(c) < (char *)thread_tc->nursery_fromspace + \ | |||
thread_tc->nursery_fromspace_size) \ | |||
If c is the pointer and thread_tc->nursery_fromspace is the start of the fromspace then I'd think being equal just means the thing is allocated right at the start of it | |||
So it doens't immediately look wrong to me | 21:53 | ||
nine | Isn't this the check? if ((char *)*item_to_add >= (char *)tc->nursery_alloc && \ | ||
(char *)*item_to_add < (char *)tc->nursery_alloc_limit) \ | |||
MVM_panic(1, "Adding pointer %p to past fromspace to GC worklist", \ | |||
jnthn | Oh, bub that's the wrong check... | 21:54 | |
Right, two similar errors :) | |||
But same logic applies, I think | |||
It doesn't look wrong to me | |||
nine | Ok, just checking. Would have been nice to find the much less often run debug code being wrong :) | 21:57 | |
jnthn | Yes, indeed | ||
nine | Anyway, running stuff with a tiny nursery shows that there are still a couple of GC related issues left... | 21:58 | |
jnthn | Did you ever get to the bottom of that other framewalker issue, btw? | 22:00 | |
nine | What was that? | ||
jnthn | github.com/MoarVM/MoarVM/issues/1113 | 22:01 | |
nine | Oh, no, I almost forgot about it. Our main issue has been a deadlock that I fixed yesterday. Segfaults are rather trivial to work around using systemd's Restart=always ;) | 22:03 | |
jnthn | hah :) | ||
nine | Maybe I can have another look at it tomorrow | ||
jnthn | Somehow it triggers a lot more often on MacOs | ||
No idea why | |||
nine | Maybe it's just coincidence. But as long as we don't know how it comes about... | 22:04 | |
jnthn is trying to get his PerlCon prep done this week so he doesn't have to do any (or at least much) of it during his vacation next week :) | |||
nine | That sounds like a sensible plan! | 22:05 | |
I'm a bit sad to miss PerlCon this year. But only a little as the reason is me getting married this Thursday :) | 22:07 | ||
jnthn | Oh! That's an excellent reason to miss it! Congratulations; have a lovely day. | ||
timotimo | oh, congrats :) | ||
nine | Thank you :) | 22:10 | |
Reminds me...I should go to bed now. Good night! | 22:11 | ||
jnthn | Rest well; 'night o/ | 22:14 | |
22:43
pamplemousse joined
23:42
pamplemousse left
23:43
pamplemousse joined
|