github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:02
reportable6 left
00:04
reportable6 joined
01:04
lucasb left
01:06
frost-lab joined
01:33
ggoebel left
04:05
shareable6 left,
reportable6 left,
bloatable6 left,
squashable6 left,
releasable6 left,
benchable6 left,
linkable6 left,
evalable6 left,
committable6 left,
nativecallable6 left,
coverable6 left,
tellable6 left,
notable6 left,
unicodable6 left,
sourceable6 left,
quotable6 left,
greppable6 left,
statisfiable6 left,
bisectable6 left,
nativecallable6 joined,
sourceable6 joined,
coverable6 joined
04:06
committable6 joined,
notable6 joined,
greppable6 joined,
releasable6 joined,
tellable6 joined,
evalable6 joined,
unicodable6 joined
04:07
bloatable6 joined,
reportable6 joined,
quotable6 joined,
benchable6 joined,
linkable6 joined
04:08
statisfiable6 joined,
bisectable6 joined,
shareable6 joined,
squashable6 joined
05:37
statisfiable6 left,
notable6 left,
coverable6 left,
greppable6 left,
nativecallable6 left,
bloatable6 left,
shareable6 left,
benchable6 left,
releasable6 left,
committable6 left,
linkable6 left,
unicodable6 left,
sourceable6 left,
bisectable6 left,
quotable6 left,
tellable6 left,
evalable6 left,
reportable6 left,
squashable6 left
05:38
shareable6 joined,
committable6 joined,
evalable6 joined
05:39
unicodable6 joined,
statisfiable6 joined,
notable6 joined,
nativecallable6 joined,
releasable6 joined,
linkable6 joined,
greppable6 joined,
quotable6 joined,
tellable6 joined
05:40
bisectable6 joined,
reportable6 joined,
squashable6 joined,
sourceable6 joined,
bloatable6 joined,
benchable6 joined,
coverable6 joined
05:49
domidumont joined
06:02
reportable6 left
06:04
reportable6 joined
06:41
squashable6 left
06:44
squashable6 joined
|
|||
nine | dogbert11: oh, that's interesting | 06:45 | |
nwc10 | good *, * | 06:47 | |
nine | That a segfault is connected to GC may (yet again) explain the seeming randomness of segfaults we see on CI | ||
07:38
brrt joined
|
|||
Nicholas | good *, brrt | 07:40 | |
brrt | good * Nicholas | 07:54 | |
08:07
sena_kun left
08:10
[Coke]_ joined
08:11
sena_kun joined
08:13
[Coke] left
08:21
hankache joined
08:37
zakharyas joined
08:57
Voldenet_ joined
08:58
Voldenet left
|
|||
dogbert11 | nine: now I 'only' have to catch it in the debugger :) | 09:07 | |
09:10
Voldenet_ is now known as Voldenet,
Voldenet left,
Voldenet joined
|
|||
sena_kun | hi, folks | 09:18 | |
how is the state of the revert revert revert commit? I remember it exposed some issues we had to address before the release, are they patched already or we should do a re-re-re-revert as the release is tomorrow? | |||
also, if there are any new blockers, please share. | 09:19 | ||
09:26
brrt left
10:15
ggoebel joined
10:23
ggoebel left
10:27
brrt joined
10:33
ggoebel joined
|
|||
nine | sena_kun: AFAIK the commit is still in there. Have any issues come up? | 10:34 | |
sena_kun | nine, not yet, though I have no means to do a Blin run now as usual, so I was wondering if something have show up on your (plural) side. | 10:36 | |
10:40
hankache left,
hankache joined
|
|||
nine | Blin would be mighty helpful... | 10:40 | |
sena_kun | :/ | ||
10:42
hankache left,
hankache joined
10:59
hankache left
|
|||
dogbert11 | now I'm running with optimizations on, an 8k nursery and the gc debug flag set to two. It has now stopped, in gdb, with 'non-AsyncTask fetched from eventloop active work list' | 11:00 | |
gist.github.com/dogbert17/e4a3993b...853d014649 | 11:02 | ||
nine: is it possible to make something out of this or do we need to catch things earlier? | 11:03 | ||
nine | dogbert11: the immediate question is: what _did_ it catch? | 11:09 | |
So, good *s work here the same as on freenode? Checked | |||
11:10
zakharyas left
11:35
brrt left
|
|||
nine | Apparently a VMNull because the array slot work_idx is NULL | 11:39 | |
11:40
avar left
11:41
avar joined,
avar left,
avar joined,
avar left
11:42
avar joined,
avar left,
avar joined,
avar left
11:43
avar joined,
avar left,
avar joined
|
|||
dogbert11 | nine: (gdb) p REPR(task_obj)->ID | 11:48 | |
value has been optimized out | |||
:( | |||
nine | Yeah, you have to get it from the source: call MVM_repr_at_pos_o(tc, tc->instance->event_loop_active, work_idx) | ||
Or: p ((MVMArray*)(tc->instance->event_loop_active))->body.slots.o[1] | 11:49 | ||
dogbert11 | (gdb) p ((MVMArray*)(tc->instance->event_loop_active))->body.slots.o[1] | ||
$1 = (MVMObject *) 0x0 | |||
tbrowder | hi, working issue #1469 has lead to needing a CFLAGS change for libuv that may conflict with other libs. a casual look at the build situation, and confirmed by MasterDuke17, shows all objects being built with same CFLAGS. seems to me we should compile 3rdparty lin | 11:53 | |
libs with the same CFLAGS they use. | |||
would require an overhaul of build but it would be more robust for future 3rdparty libs | 11:54 | ||
dogbert11 | nine: in case you want to try teasing the error out, here's the 'golf': gist.github.com/dogbert17/8eded7bd...02c1781405 | ||
I have also updated the Panic gist a bit, i.e. with some 'l' commands, your 'p' command and 'info threads' | 11:57 | ||
nine | oh a golf. That's useful! | ||
11:57
hankache joined
|
|||
dogbert11 | more like a bogey :) | 11:57 | |
I'm running with 8k nursery and GC_DEBUG=1 | |||
nine | of course it refuses to break in rr | 11:59 | |
12:02
reportable6 left
|
|||
nine | OTOH use Test can be removed from the golf | 12:03 | |
12:04
reportable6 joined
12:25
hankache left
|
|||
nine | The segfault happens because when run-one is called args[1] is NULL | 12:39 | |
The most curious thing about this is: since args[1] is a register it must not ever be NULL | 12:43 | ||
dogbert11 | so how can that happen? | 12:45 | |
it sounds like you've managed to repro :) | |||
12:47
brrt joined
|
|||
nine | at SETTING::src/core.c/ThreadPoolScheduler.pm6:297 (/home/nine/rakudo/blib/CORE.c.setting.moarvm:) | 12:58 | |
That's where the call happens | 12:59 | ||
And the NULL we get from nqp::shift($queue) | |||
Added an assert in ConcBlockingQueue's shift and it triggers | |||
dogbert11 | cool | 13:00 | |
13:14
brrt left,
brrt joined
13:19
brrt left
|
|||
lizmat | so it's shifting from the queue when it shouldn't? or another thread beat it to it ? | 13:21 | |
nine | No, the whole point of ConcBlockingQueue is that it's safe to use from different threads. It's just that somehow a NULL ends up in that queue. But in both unshift and push we explicitly guard against that | 13:22 | |
lizmat | so the number of elems is > 0 when the shift produces a NULL, so it really sits in the queue, is what you're saying ? | 13:39 | |
nine | yes | 13:48 | |
lizmat | is it clear if the value got produced by a push or an unshift ? | 13:49 | |
also: you said: "it's safe to use from different threads" | 13:50 | ||
are we 200% sure of that ? | |||
because *if* the guard in unshift / push is correct, the only other way *I* see is that another thread snatched it and thus you're looking at element #1 really, and if there is none left, that'd be a NULL ? | 13:51 | ||
nine | Well it's meant to be thread safe. Of course the implementation may have bugs | 13:52 | |
lizmat | well, if it walks like a duck and talks like a duck (aka , push and unshift have guarded against NULL entry) | 13:53 | |
jnthn | The bugs there in the past have always been about GC handling around the lock acquisitions | ||
lizmat | it can only be a duck (aka, a race on the queue.shift) | 13:54 | |
jnthn | At least, those I can remember have :) | ||
nine | Well this bug seems to require a small nursery to reproduce, so maybe there's yet another GC handling issue there | 13:56 | |
Well the node got into the queue via push and it definitely had a value back then | 14:02 | ||
dogbert11 | (gdb) bt | ||
#0 MVM_panic (exitCode=0, messageFormat=0x0) at src/core/exceptions.c:853 | |||
#1 0x00007ffff78d85d2 in gc_mark (tc=0x7fffe00d42e0, st=0x5555555b5178, data=0x5555576392e8, worklist=0x7fffdc1cbec0) at src/6model/reprs/MVMCode.c:48 | 14:03 | ||
#2 0x00007ffff7896c99 in MVM_gc_mark_collectable (tc=0x7fffe00d42e0, worklist=0x7fffdc1cbec0, new_addr=0x5555576392d0) at src/gc/collect.c:439 | |||
#3 0x00007ffff7890a40 in MVM_gc_root_add_gen2s_to_worklist (tc=0x7fffe00d42e0, worklist=0x7fffdc1cbec0) at src/gc/roots.c:349 | |||
#4 0x00007ffff7893870 in MVM_gc_collect (tc=0x7fffe00d42e0, what_to_do=1 '\001', gen=0 '\000') at src/gc/collect.c:155 | |||
#5 0x00007ffff788766f in run_gc (tc=0x7fffe00d42e0, what_to_do=1 '\001') at src/gc/orchestrate.c:443 | |||
#6 0x00007ffff78882e4 in MVM_gc_enter_from_interrupt (tc=0x7fffe00d42e0) at src/gc/orchestrate.c:728 | |||
Adding pointer %p to past fromspace to GC worklist | 14:05 | ||
nine: should I do a MVM_dump_backtrace(tc) or something else | 14:07 | ||
nine | Can you have a look at what that collectable actually is? | 14:08 | |
dogbert11 | 48 MVM_gc_worklist_add(tc, worklist, &body->outer); is it body->outer we want? | ||
14:11
gugod joined
|
|||
nine | Or even body itself since that's the one containing the outdated pointer. What code object is it? | 14:11 | |
dogbert11 | (gdb) p *body | 14:12 | |
$3 = {sf = 0x55555741f070, outer = 0x7fffdc22cbb8, code_object = 0x0, name = 0x555556d1c110, state_vars = 0x0, is_static = 1, is_compiler_stub = 0} | |||
nine | name and sf->body.name are of interest | 14:13 | |
dogbert11 | so how do I get an MVMString to something readable? | 14:15 | |
nine | MVM_dump_string(tc, string) | ||
dogbert11 | thx | ||
nine | Or if it's not a debug build MVM_string_utf8_maybe_encode_C_string(tc, string) | 14:16 | |
dogbert11 | I'll try that as well | 14:17 | |
(gdb) p MVM_string_utf8_maybe_encode_C_string(tc, body.name) | |||
$8 = 0x7fffdc5b49b0 "" | |||
(gdb) p MVM_string_utf8_maybe_encode_C_string(tc, body->name) | |||
$9 = 0x7fffdc151dd0 "" | |||
(gdb) | |||
I'm probably doing something wrong but it seems to be the empty string | 14:23 | ||
nine | How on earth? It looks like we're pushing the same MVMConcBlockingQueueNode onto two different queues! A poll on the one queue sets the node's value to NULL (when it becomes the new dummy head node) and a shift on the other queue then finds the broken node | 14:27 | |
14:27
domidumont left
|
|||
dogbert11 | oops | 14:29 | |
14:29
domidumont joined
14:41
zakharyas joined
14:46
frost-lab left
14:48
lucasb joined
|
|||
nine | It gets weirder: even after replacing the FSA with plain calloc, not freeing the nodes at all anymore and commenting out the NULL assignment, I still get NULLs in node values | 14:49 | |
dogbert11 | the plot thickens, will this be a one line fix | 14:56 | |
nine | I fear it will be a fix at all only when I manage to reproduce in rr. Because I'm running out of ideas. There's just no code left that would overwrite a queue node's value with NULL | 14:59 | |
dogbert11 | and rr is not cooperating | 15:09 | |
tbrowder | seems embarassing to use python in our tool chain | 15:39 | |
nine | feel free to change that :) | ||
15:42
nevore joined
|
|||
nine | This just doesn't make sense. It's always the ConcBlockingQueueNode's value that suddenly turns into NULL, while it's next pointer stays intact. So it's a very precise change. | 16:18 | |
It's probably not a random memory overwrite as nothing else seems to get hit and when I replace usage of the FSA with malloc that would surely change the behavior as we're talking about different memory areas. But it stays the same | 16:19 | ||
But ConcBlockingQueueNodes are only used and modified in src/6model/reprs/ConcBlockingQueue.c and I already removed all setting to NULL | 16:20 | ||
So what's left? | |||
16:37
domidumont left
16:46
nevore left
|
|||
Geth | MoarVM: tbrowder++ created pull request #1497: Define _GNU_SOURCE for GNU builds |
16:52 | |
17:08
cog left
17:09
cog joined
17:11
[Coke] joined
17:19
ggoebel left
17:20
[Coke]_ left
18:02
reportable6 left,
reportable6 joined
18:16
Altreus left
18:51
MasterDuke joined
18:52
zakharyas left
19:14
linkable6 left,
linkable6 joined
19:15
linkable6 left,
tellable6 left,
evalable6 left,
shareable6 left
19:16
tellable6 joined,
evalable6 joined
19:17
linkable6 joined
19:18
shareable6 joined,
linkable6 left
19:21
linkable6 joined
19:27
shareable6 left,
nativecallable6 left,
evalable6 left,
linkable6 left,
greppable6 left,
bisectable6 left,
unicodable6 left,
reportable6 left,
squashable6 left,
benchable6 left,
statisfiable6 left,
committable6 left,
sourceable6 left,
bloatable6 left,
releasable6 left,
coverable6 left,
quotable6 left,
tellable6 left,
notable6 left
19:30
[Coke] is now known as {Coke},
{Coke} is now known as [Coke]
19:45
zakharyas joined
19:47
nativecallable6 joined
19:48
bisectable6 joined,
notable6 joined,
sourceable6 joined,
releasable6 joined
19:49
squashable6 joined,
coverable6 joined,
evalable6 joined,
tellable6 joined,
greppable6 joined,
committable6 joined,
shareable6 joined,
quotable6 joined
19:50
reportable6 joined,
benchable6 joined,
bloatable6 joined,
unicodable6 joined,
linkable6 joined,
statisfiable6 joined
|
|||
Geth | MoarVM: tbrowder++ created pull request #1498: Quell compiler warnings on Linux with gcc |
19:59 | |
tbrowder | nine: see last PR, two uninitiated values giving warnings about vfork and jumps | 20:01 | |
20:13
MasterDuke left,
MasterDuke joined
20:19
zakharyas left
|
|||
MasterDuke | just got a segfault in t/spec/S17-lowlevel/cas.t with only change being an 8k nursery | 20:19 | |
haven't been able to catch it in rr though | 20:25 | ||
ran it under rr ~250 times, but never an error of any kind | 20:33 | ||
dogbert11 | MasterDuke: I got it as well | 20:38 | |
0x00007ffff79b9e3b in evaluate_guards (gs=0x555558c0cac8, gs=0x555558c0cac8, callsite=0x555558c0cac8, guard_offset=0x7fffeea5ab66, tc=0x7fffe00d6ea0) at src/spesh/plugin.c:85 | |||
85 outcome = STABLE(test) == gs->guards[pos].u.type; | |||
MasterDuke | interesting | 20:39 | |
20:52
[Coke] left
21:53
lucasb left
21:54
kawaii left
21:59
kawaii joined,
lucasb joined
22:02
ggoebel joined
23:34
evalable6 left,
squashable6 left
23:36
evalable6 joined
23:37
squashable6 joined
|