05:25
kjp left
05:26
kjp joined
05:29
kjp left
05:30
kjp joined
05:31
kjp left
05:32
kjp joined
09:11
sena_kun joined
13:30
dogbert17 joined
|
|||
dogbert17 | m: await (^5).map({start { print qqx{echo $_} } }) | 13:38 | |
camelia | 2 0 4 1 3 |
||
dogbert17 | There's an insidious bug hiding here. Sometimes the code is terminated with "MoarVM panic: Collectable 0x2000a329230 in fromspace accessed" | 13:40 | |
I'm running with MVM_GC_DEBUG=2 and a small nursery | |||
lizmat | does it also happen without JITting? | 13:43 | |
dogbert17 | I can check ... | 13:44 | |
It does happen with MVM_JIT_DISABLE=1 as well | 13:46 | ||
The stack trace in gdb always looks the same | 13:48 | ||
#0 MVM_panic (exitCode=1, messageFormat=0x7ffff79a8e70 "Collectable %p in fromspace accessed") at src/core/exceptions.c:855 | 13:49 | ||
#1 0x00007ffff771235c in check_reg (tc=0x2000a130f00, reg_base=0x200120c1bb0, idx=20) at src/core/interp.c:13 | |||
#2 0x00007ffff7712da5 in MVM_interp_run (tc=0x2000a130f00, initial_invoke=0x7ffff7779736 <thread_initial_invoke>, invoke_data=0x2000a0931c0, outer_runloop=0x0) at src/core/interp.c:231 | |||
#3 0x00007ffff77798e9 in start_thread (data=0x2000a0931c0) at src/core/threads.c:101 | |||
#4 0x00007ffff734eac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 | |||
#5 0x00007ffff73e0850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 | |||
14:36
Geth__ joined
14:38
japhb_ joined,
jnthn1 joined
14:41
Geth left,
japhb left,
jnthn left
14:52
[Coke] left
|
|||
timo | did we get the recent spawn setup gc root fix wrong? | 16:36 | |
16:49
[Coke] joined
|
|||
timo | raku.zulipchat.com/user_uploads/56.../image.png | 19:19 | |
for ^100 { await (^120).map({ start { print qqx{echo $_ > /dev/null}; } }); note ((ENTER now) R- now) } | 19:20 | ||
92.41% 0.00% async io thread libmoar.so [.] uv_spawn | |||
dogbert17 | timo: on a higher level it looks like this | 19:32 | |
(gdb) p MVM_dump_backtrace(tc) | 19:33 | ||
at SETTING::src/core.c/ThreadPoolScheduler.rakumod:406 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:affinity-queue) | |||
from SETTING::src/core.c/ThreadPoolScheduler.rakumod:841 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:queue) | |||
from SETTING::src/core.c/Proc/Async.rakumod:419 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:start-internal) | |||
from SETTING::src/core.c/Proc/Async.rakumod:315 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:actually-start) | |||
from SETTING::src/core.c/Proc/Async.rakumod:325 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:start) | |||
from SETTING::src/core.c/Proc.rakumod:185 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:spawn-internal) | |||
from SETTING::src/core.c/Proc.rakumod:178 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:shell) | |||
from SETTING::src/core.c/Proc.rakumod:251 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:QX) | |||
timo | can you also MVM_dump_bytecode(tc)? | 19:34 | |
that line points at a relatively bug nqp::stmts tree | |||
relatively big* | 19:36 | ||
dogbert17 | will do ... | ||
it's a bit longer so I put it here: gist.github.com/dogbert17/23618069...cdd011d570 | 19:40 | ||
timo | hm, are you able to rr record? | 19:52 | |
it looks like this is asploding inside a handler for return or similar | 19:54 | ||
it could be that's the return from `nqp::unless( $cand.working, (return $queue), )` but impossible to say without reverse-stepping | 19:55 | ||
setting RAKUDO_SCHEDULER_DEBUG might be interesting to see if it happens immediately after it spits out something | 19:57 | ||
dogbert17 | dilbert@dilbert-VirtualBox:~/repos/rakudo$ MVM_JIT_DISABLE=1 RAKUDO_SCHEDULER_DEBUG=1 ./rakudo-m -e 'await (^5).map({start { print qqx{echo $_} } })' | 20:02 | |
[SCHEDULER 50657] Created initial general worker thread | |||
[SCHEDULER 50657] Supervisor started | |||
[SCHEDULER 50657] Supervisor thinks there are 7 CPU cores | |||
[SCHEDULER 50657] Added a general worker thread | |||
[SCHEDULER 50657] Added a general worker thread | |||
[SCHEDULER 50657] Added a general worker thread | |||
[SCHEDULER 50657] Added a general worker thread | |||
[SCHEDULER 50657] Created initial affinity worker thread | |||
[SCHEDULER 50657] Added a general worker thread | 20:03 | ||
after this the numbers 0..4 are printed out in random order. When it fails it's always after the line 'Created initial affinity worker thread' | |||
i.e. the last 'Added a general worker thread' line is not printed | 20:04 | ||
timo | can you turn spesh off entirely and see if it still asplodes? | 20:08 | |
dogbert17 | sure | ||
will have to run the code a few times ... | 20:10 | ||
it does fail with MVM_SPESH_DISABLE=1 as well | |||
timo | did you say something about rr? | 20:26 | |
what kind of system is this emulating? | 20:27 | ||
putting a note or scheduler-debug in the !affinity-queue method when entering the method, and when hitting the returns (probably actually want one for each of the protected blocks), as well as the "method queue" which is the only caller to self!affinity-queue, could be helpful | 20:34 | ||
messing around with LEAVE phasers is probably risky in terms of making the code / control flow change enough for the bug to go into hiding | 20:36 | ||
how small exactly is your nursery? | 20:49 | ||
... i was installing to a prefix i wasn't running from ... | 20:52 | ||
dogbert17 | #define MVM_NURSERY_SIZE 28000 | 20:53 | |
#define MVM_GC_DEBUG 2 | 20:54 | ||
those are the changes that I've made | |||
timo | because i can't reproduce the crash at all | ||
dogbert17 | I believe that the gc debug flag needs to be on | 20:55 | |
timo | i do have it turned on | 20:56 | |
to 2 as well | |||
dogbert17 | cool, and the nursery? | ||
timo | #define MVM_NURSERY_THREAD_START (131072 / 64) | 20:57 | |
#define MVM_NURSERY_SIZE MVM_NURSERY_THREAD_START | |||
dogbert17 | I haven't touched those | 20:58 | |
timo | now i'm dividing by 512 instead | ||
m: say 131072 / 512 | |||
camelia | 256 | ||
dogbert17 | --- a/src/gc/collect.h | ||
timo | oops that's rather a bit too small huh | ||
dogbert17 | +++ b/src/gc/collect.h | ||
@@ -1,6 +1,6 @@ | |||
/* The maximum size of the nursery area. Note that since it's semi-space | |||
* copying, we could actually have double this amount allocated per thread. */ | |||
-#define MVM_NURSERY_SIZE 4194304 | |||
+#define MVM_NURSERY_SIZE 28000 | |||
On my primary computer I'm running Linuz under vurtual box and I'm pretty certain that rr doesn't work under that configuration | 21:00 | ||
timo | you could run a linux under qemu-system i think that allows you to rr record LOL | ||
dogbert17 | perhaps my second machine, RPi 5, can handle it | 21:01 | |
timo | not sure rr works under arm | ||
dogbert17 | I can at least install it, version 5.6.0 | ||
timo | cool | 21:02 | |
dogbert17 | let me see if I can repro the problem on that machine | 21:04 | |
yes I could | 21:07 | ||
rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 2 # what is this I wonder | 21:09 | ||
timo | i think it's about being able to do side channel attacks more easily if you get sufficiently high-resolution timer values | 21:10 | |
timers / performance counters | 21:11 | ||
dogbert17 | ok, I changed the value but now rr complains about other things | ||
rr: Saving execution to trace directory `/home/dilbert/.local/share/rr/rakudo-m-0'. | 21:12 | ||
[FATAL ./src/PerfCounters.cc:779:read_ticks()] 3 (speculatively) executed strex instructions detected. | |||
On aarch64, rr only supports applications making use of LSE | |||
atomics rather than legacy LL/SC-based atomics. | |||
timo | ;( | 21:24 | |
dogbert17 | perhaps I need to set some mysterious gcc flag while building MoarVM | 21:27 | |
timo | time for a compile time flag that replaces all our usages of atomics with explicit locks ... one global lock for all atomic stuff | 21:29 | |
22:48
sena_kun left
|
|||
timo | hm, i wonder. is it enough to just run the program forced-single-threaded and make all atomic operations non-atomic? | 23:37 |