| IRC logs at
Set by AlexDaniel on 12 June 2018.
00:02 reportable6 left 00:04 reportable6 joined 01:04 lucasb left 01:06 frost-lab joined 01:33 ggoebel left 04:05 shareable6 left, reportable6 left, bloatable6 left, squashable6 left, releasable6 left, benchable6 left, linkable6 left, evalable6 left, committable6 left, nativecallable6 left, coverable6 left, tellable6 left, notable6 left, unicodable6 left, sourceable6 left, quotable6 left, greppable6 left, statisfiable6 left, bisectable6 left, nativecallable6 joined, sourceable6 joined, coverable6 joined 04:06 committable6 joined, notable6 joined, greppable6 joined, releasable6 joined, tellable6 joined, evalable6 joined, unicodable6 joined 04:07 bloatable6 joined, reportable6 joined, quotable6 joined, benchable6 joined, linkable6 joined 04:08 statisfiable6 joined, bisectable6 joined, shareable6 joined, squashable6 joined 05:37 statisfiable6 left, notable6 left, coverable6 left, greppable6 left, nativecallable6 left, bloatable6 left, shareable6 left, benchable6 left, releasable6 left, committable6 left, linkable6 left, unicodable6 left, sourceable6 left, bisectable6 left, quotable6 left, tellable6 left, evalable6 left, reportable6 left, squashable6 left 05:38 shareable6 joined, committable6 joined, evalable6 joined 05:39 unicodable6 joined, statisfiable6 joined, notable6 joined, nativecallable6 joined, releasable6 joined, linkable6 joined, greppable6 joined, quotable6 joined, tellable6 joined 05:40 bisectable6 joined, reportable6 joined, squashable6 joined, sourceable6 joined, bloatable6 joined, benchable6 joined, coverable6 joined 05:49 domidumont joined 06:02 reportable6 left 06:04 reportable6 joined 06:41 squashable6 left 06:44 squashable6 joined
nine dogbert11: oh, that's interesting 06:45
nwc10 good *, * 06:47
nine That a segfault is connected to GC may (yet again) explain the seeming randomness of segfaults we see on CI
07:38 brrt joined
Nicholas good *, brrt 07:40
brrt good * Nicholas 07:54
08:07 sena_kun left 08:10 [Coke]_ joined 08:11 sena_kun joined 08:13 [Coke] left 08:21 hankache joined 08:37 zakharyas joined 08:57 Voldenet_ joined 08:58 Voldenet left
dogbert11 nine: now I 'only' have to catch it in the debugger :) 09:07
09:10 Voldenet_ is now known as Voldenet, Voldenet left, Voldenet joined
sena_kun hi, folks 09:18
how is the state of the revert revert revert commit? I remember it exposed some issues we had to address before the release, are they patched already or we should do a re-re-re-revert as the release is tomorrow?
also, if there are any new blockers, please share. 09:19
09:26 brrt left 10:15 ggoebel joined 10:23 ggoebel left 10:27 brrt joined 10:33 ggoebel joined
nine sena_kun: AFAIK the commit is still in there. Have any issues come up? 10:34
sena_kun nine, not yet, though I have no means to do a Blin run now as usual, so I was wondering if something have show up on your (plural) side. 10:36
10:40 hankache left, hankache joined
nine Blin would be mighty helpful... 10:40
sena_kun :/
10:42 hankache left, hankache joined 10:59 hankache left
dogbert11 now I'm running with optimizations on, an 8k nursery and the gc debug flag set to two. It has now stopped, in gdb, with 'non-AsyncTask fetched from eventloop active work list' 11:00 11:02
nine: is it possible to make something out of this or do we need to catch things earlier? 11:03
nine dogbert11: the immediate question is: what _did_ it catch? 11:09
So, good *s work here the same as on freenode? Checked
11:10 zakharyas left 11:35 brrt left
nine Apparently a VMNull because the array slot work_idx is NULL 11:39
11:40 avar left 11:41 avar joined, avar left, avar joined, avar left 11:42 avar joined, avar left, avar joined, avar left 11:43 avar joined, avar left, avar joined
dogbert11 nine: (gdb) p REPR(task_obj)->ID 11:48
value has been optimized out
nine Yeah, you have to get it from the source: call MVM_repr_at_pos_o(tc, tc->instance->event_loop_active, work_idx)
Or: p ((MVMArray*)(tc->instance->event_loop_active))->body.slots.o[1] 11:49
dogbert11 (gdb) p ((MVMArray*)(tc->instance->event_loop_active))->body.slots.o[1]
$1 = (MVMObject *) 0x0
tbrowder hi, working issue #1469 has lead to needing a CFLAGS change for libuv that may conflict with other libs. a casual look at the build situation, and confirmed by MasterDuke17, shows all objects being built with same CFLAGS. seems to me we should compile 3rdparty lin 11:53
libs with the same CFLAGS they use.
would require an overhaul of build but it would be more robust for future 3rdparty libs 11:54
dogbert11 nine: in case you want to try teasing the error out, here's the 'golf':
I have also updated the Panic gist a bit, i.e. with some 'l' commands, your 'p' command and 'info threads' 11:57
nine oh a golf. That's useful!
11:57 hankache joined
dogbert11 more like a bogey :) 11:57
I'm running with 8k nursery and GC_DEBUG=1
nine of course it refuses to break in rr 11:59
12:02 reportable6 left
nine OTOH use Test can be removed from the golf 12:03
12:04 reportable6 joined 12:25 hankache left
nine The segfault happens because when run-one is called args[1] is NULL 12:39
The most curious thing about this is: since args[1] is a register it must not ever be NULL 12:43
dogbert11 so how can that happen? 12:45
it sounds like you've managed to repro :)
12:47 brrt joined
nine at SETTING::src/core.c/ThreadPoolScheduler.pm6:297 (/home/nine/rakudo/blib/CORE.c.setting.moarvm:) 12:58
That's where the call happens 12:59
And the NULL we get from nqp::shift($queue)
Added an assert in ConcBlockingQueue's shift and it triggers
dogbert11 cool 13:00
13:14 brrt left, brrt joined 13:19 brrt left
lizmat so it's shifting from the queue when it shouldn't? or another thread beat it to it ? 13:21
nine No, the whole point of ConcBlockingQueue is that it's safe to use from different threads. It's just that somehow a NULL ends up in that queue. But in both unshift and push we explicitly guard against that 13:22
lizmat so the number of elems is > 0 when the shift produces a NULL, so it really sits in the queue, is what you're saying ? 13:39
nine yes 13:48
lizmat is it clear if the value got produced by a push or an unshift ? 13:49
also: you said: "it's safe to use from different threads" 13:50
are we 200% sure of that ?
because *if* the guard in unshift / push is correct, the only other way *I* see is that another thread snatched it and thus you're looking at element #1 really, and if there is none left, that'd be a NULL ? 13:51
nine Well it's meant to be thread safe. Of course the implementation may have bugs 13:52
lizmat well, if it walks like a duck and talks like a duck (aka , push and unshift have guarded against NULL entry) 13:53
jnthn The bugs there in the past have always been about GC handling around the lock acquisitions
lizmat it can only be a duck (aka, a race on the queue.shift) 13:54
jnthn At least, those I can remember have :)
nine Well this bug seems to require a small nursery to reproduce, so maybe there's yet another GC handling issue there 13:56
Well the node got into the queue via push and it definitely had a value back then 14:02
dogbert11 (gdb) bt
#0 MVM_panic (exitCode=0, messageFormat=0x0) at src/core/exceptions.c:853
#1 0x00007ffff78d85d2 in gc_mark (tc=0x7fffe00d42e0, st=0x5555555b5178, data=0x5555576392e8, worklist=0x7fffdc1cbec0) at src/6model/reprs/MVMCode.c:48 14:03
#2 0x00007ffff7896c99 in MVM_gc_mark_collectable (tc=0x7fffe00d42e0, worklist=0x7fffdc1cbec0, new_addr=0x5555576392d0) at src/gc/collect.c:439
#3 0x00007ffff7890a40 in MVM_gc_root_add_gen2s_to_worklist (tc=0x7fffe00d42e0, worklist=0x7fffdc1cbec0) at src/gc/roots.c:349
#4 0x00007ffff7893870 in MVM_gc_collect (tc=0x7fffe00d42e0, what_to_do=1 '\001', gen=0 '\000') at src/gc/collect.c:155
#5 0x00007ffff788766f in run_gc (tc=0x7fffe00d42e0, what_to_do=1 '\001') at src/gc/orchestrate.c:443
#6 0x00007ffff78882e4 in MVM_gc_enter_from_interrupt (tc=0x7fffe00d42e0) at src/gc/orchestrate.c:728
Adding pointer %p to past fromspace to GC worklist 14:05
nine: should I do a MVM_dump_backtrace(tc) or something else 14:07
nine Can you have a look at what that collectable actually is? 14:08
dogbert11 48 MVM_gc_worklist_add(tc, worklist, &body->outer); is it body->outer we want?
14:11 gugod joined
nine Or even body itself since that's the one containing the outdated pointer. What code object is it? 14:11
dogbert11 (gdb) p *body 14:12
$3 = {sf = 0x55555741f070, outer = 0x7fffdc22cbb8, code_object = 0x0, name = 0x555556d1c110, state_vars = 0x0, is_static = 1, is_compiler_stub = 0}
nine name and sf-> are of interest 14:13
dogbert11 so how do I get an MVMString to something readable? 14:15
nine MVM_dump_string(tc, string)
dogbert11 thx
nine Or if it's not a debug build MVM_string_utf8_maybe_encode_C_string(tc, string) 14:16
dogbert11 I'll try that as well 14:17
(gdb) p MVM_string_utf8_maybe_encode_C_string(tc,
$8 = 0x7fffdc5b49b0 ""
(gdb) p MVM_string_utf8_maybe_encode_C_string(tc, body->name)
$9 = 0x7fffdc151dd0 ""
I'm probably doing something wrong but it seems to be the empty string 14:23
nine How on earth? It looks like we're pushing the same MVMConcBlockingQueueNode onto two different queues! A poll on the one queue sets the node's value to NULL (when it becomes the new dummy head node) and a shift on the other queue then finds the broken node 14:27
14:27 domidumont left
dogbert11 oops 14:29
14:29 domidumont joined 14:41 zakharyas joined 14:46 frost-lab left 14:48 lucasb joined
nine It gets weirder: even after replacing the FSA with plain calloc, not freeing the nodes at all anymore and commenting out the NULL assignment, I still get NULLs in node values 14:49
dogbert11 the plot thickens, will this be a one line fix 14:56
nine I fear it will be a fix at all only when I manage to reproduce in rr. Because I'm running out of ideas. There's just no code left that would overwrite a queue node's value with NULL 14:59
dogbert11 and rr is not cooperating 15:09
tbrowder seems embarassing to use python in our tool chain 15:39
nine feel free to change that :)
15:42 nevore joined
nine This just doesn't make sense. It's always the ConcBlockingQueueNode's value that suddenly turns into NULL, while it's next pointer stays intact. So it's a very precise change. 16:18
It's probably not a random memory overwrite as nothing else seems to get hit and when I replace usage of the FSA with malloc that would surely change the behavior as we're talking about different memory areas. But it stays the same 16:19
But ConcBlockingQueueNodes are only used and modified in src/6model/reprs/ConcBlockingQueue.c and I already removed all setting to NULL 16:20
So what's left?
16:37 domidumont left 16:46 nevore left
Geth MoarVM: tbrowder++ created pull request #1497:
Define _GNU_SOURCE for GNU builds
17:08 cog left 17:09 cog joined 17:11 [Coke] joined 17:19 ggoebel left 17:20 [Coke]_ left 18:02 reportable6 left, reportable6 joined 18:16 Altreus left 18:51 MasterDuke joined 18:52 zakharyas left 19:14 linkable6 left, linkable6 joined 19:15 linkable6 left, tellable6 left, evalable6 left, shareable6 left 19:16 tellable6 joined, evalable6 joined 19:17 linkable6 joined 19:18 shareable6 joined, linkable6 left 19:21 linkable6 joined 19:27 shareable6 left, nativecallable6 left, evalable6 left, linkable6 left, greppable6 left, bisectable6 left, unicodable6 left, reportable6 left, squashable6 left, benchable6 left, statisfiable6 left, committable6 left, sourceable6 left, bloatable6 left, releasable6 left, coverable6 left, quotable6 left, tellable6 left, notable6 left 19:30 [Coke] is now known as {Coke}, {Coke} is now known as [Coke] 19:45 zakharyas joined 19:47 nativecallable6 joined 19:48 bisectable6 joined, notable6 joined, sourceable6 joined, releasable6 joined 19:49 squashable6 joined, coverable6 joined, evalable6 joined, tellable6 joined, greppable6 joined, committable6 joined, shareable6 joined, quotable6 joined 19:50 reportable6 joined, benchable6 joined, bloatable6 joined, unicodable6 joined, linkable6 joined, statisfiable6 joined
Geth MoarVM: tbrowder++ created pull request #1498:
Quell compiler warnings on Linux with gcc
tbrowder nine: see last PR, two uninitiated values giving warnings about vfork and jumps 20:01
20:13 MasterDuke left, MasterDuke joined 20:19 zakharyas left
MasterDuke just got a segfault in t/spec/S17-lowlevel/cas.t with only change being an 8k nursery 20:19
haven't been able to catch it in rr though 20:25
ran it under rr ~250 times, but never an error of any kind 20:33
dogbert11 MasterDuke: I got it as well 20:38
0x00007ffff79b9e3b in evaluate_guards (gs=0x555558c0cac8, gs=0x555558c0cac8, callsite=0x555558c0cac8, guard_offset=0x7fffeea5ab66, tc=0x7fffe00d6ea0) at src/spesh/plugin.c:85
85 outcome = STABLE(test) == gs->guards[pos].u.type;
MasterDuke interesting 20:39
20:52 [Coke] left 21:53 lucasb left 21:54 kawaii left 21:59 kawaii joined, lucasb joined 22:02 ggoebel joined 23:34 evalable6 left, squashable6 left 23:36 evalable6 joined 23:37 squashable6 joined