05:25 kjp left 05:26 kjp joined 05:29 kjp left 05:30 kjp joined 05:31 kjp left 05:32 kjp joined 09:11 sena_kun joined 13:30 dogbert17 joined
dogbert17 m: await (^5).map({start { print qqx{echo $_} } }) 13:38
camelia 2
0
4
1
3
dogbert17 There's an insidious bug hiding here. Sometimes the code is terminated with "MoarVM panic: Collectable 0x2000a329230 in fromspace accessed" 13:40
I'm running with MVM_GC_DEBUG=2 and a small nursery
lizmat does it also happen without JITting? 13:43
dogbert17 I can check ... 13:44
It does happen with MVM_JIT_DISABLE=1 as well 13:46
The stack trace in gdb always looks the same 13:48
#0 MVM_panic (exitCode=1, messageFormat=0x7ffff79a8e70 "Collectable %p in fromspace accessed") at src/core/exceptions.c:855 13:49
#1 0x00007ffff771235c in check_reg (tc=0x2000a130f00, reg_base=0x200120c1bb0, idx=20) at src/core/interp.c:13
#2 0x00007ffff7712da5 in MVM_interp_run (tc=0x2000a130f00, initial_invoke=0x7ffff7779736 <thread_initial_invoke>, invoke_data=0x2000a0931c0, outer_runloop=0x0) at src/core/interp.c:231
#3 0x00007ffff77798e9 in start_thread (data=0x2000a0931c0) at src/core/threads.c:101
#4 0x00007ffff734eac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5 0x00007ffff73e0850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
14:36 Geth__ joined 14:38 japhb_ joined, jnthn1 joined 14:41 Geth left, japhb left, jnthn left 14:52 [Coke] left
timo did we get the recent spawn setup gc root fix wrong? 16:36
16:49 [Coke] joined
timo raku.zulipchat.com/user_uploads/56.../image.png 19:19
for ^100 { await (^120).map({ start { print qqx{echo $_ > /dev/null}; } }); note ((ENTER now) R- now) } 19:20
92.41% 0.00% async io thread libmoar.so [.] uv_spawn
dogbert17 timo: on a higher level it looks like this 19:32
(gdb) p MVM_dump_backtrace(tc) 19:33
at SETTING::src/core.c/ThreadPoolScheduler.rakumod:406 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:affinity-queue)
from SETTING::src/core.c/ThreadPoolScheduler.rakumod:841 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:queue)
from SETTING::src/core.c/Proc/Async.rakumod:419 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:start-internal)
from SETTING::src/core.c/Proc/Async.rakumod:315 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:actually-start)
from SETTING::src/core.c/Proc/Async.rakumod:325 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:start)
from SETTING::src/core.c/Proc.rakumod:185 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:spawn-internal)
from SETTING::src/core.c/Proc.rakumod:178 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:shell)
from SETTING::src/core.c/Proc.rakumod:251 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:QX)
timo can you also MVM_dump_bytecode(tc)? 19:34
that line points at a relatively bug nqp::stmts tree
relatively big* 19:36
dogbert17 will do ...
it's a bit longer so I put it here: gist.github.com/dogbert17/23618069...cdd011d570 19:40
timo hm, are you able to rr record? 19:52
it looks like this is asploding inside a handler for return or similar 19:54
it could be that's the return from `nqp::unless( $cand.working, (return $queue), )` but impossible to say without reverse-stepping 19:55
setting RAKUDO_SCHEDULER_DEBUG might be interesting to see if it happens immediately after it spits out something 19:57
dogbert17 dilbert@dilbert-VirtualBox:~/repos/rakudo$ MVM_JIT_DISABLE=1 RAKUDO_SCHEDULER_DEBUG=1 ./rakudo-m -e 'await (^5).map({start { print qqx{echo $_} } })' 20:02
[SCHEDULER 50657] Created initial general worker thread
[SCHEDULER 50657] Supervisor started
[SCHEDULER 50657] Supervisor thinks there are 7 CPU cores
[SCHEDULER 50657] Added a general worker thread
[SCHEDULER 50657] Added a general worker thread
[SCHEDULER 50657] Added a general worker thread
[SCHEDULER 50657] Added a general worker thread
[SCHEDULER 50657] Created initial affinity worker thread
[SCHEDULER 50657] Added a general worker thread 20:03
after this the numbers 0..4 are printed out in random order. When it fails it's always after the line 'Created initial affinity worker thread'
i.e. the last 'Added a general worker thread' line is not printed 20:04
timo can you turn spesh off entirely and see if it still asplodes? 20:08
dogbert17 sure
will have to run the code a few times ... 20:10
it does fail with MVM_SPESH_DISABLE=1 as well
timo did you say something about rr? 20:26
what kind of system is this emulating? 20:27
putting a note or scheduler-debug in the !affinity-queue method when entering the method, and when hitting the returns (probably actually want one for each of the protected blocks), as well as the "method queue" which is the only caller to self!affinity-queue, could be helpful 20:34
messing around with LEAVE phasers is probably risky in terms of making the code / control flow change enough for the bug to go into hiding 20:36
how small exactly is your nursery? 20:49
... i was installing to a prefix i wasn't running from ... 20:52
dogbert17 #define MVM_NURSERY_SIZE 28000 20:53
#define MVM_GC_DEBUG 2 20:54
those are the changes that I've made
timo because i can't reproduce the crash at all
dogbert17 I believe that the gc debug flag needs to be on 20:55
timo i do have it turned on 20:56
to 2 as well
dogbert17 cool, and the nursery?
timo #define MVM_NURSERY_THREAD_START (131072 / 64) 20:57
#define MVM_NURSERY_SIZE MVM_NURSERY_THREAD_START
dogbert17 I haven't touched those 20:58
timo now i'm dividing by 512 instead
m: say 131072 / 512
camelia 256
dogbert17 --- a/src/gc/collect.h
timo oops that's rather a bit too small huh
dogbert17 +++ b/src/gc/collect.h
@@ -1,6 +1,6 @@
/* The maximum size of the nursery area. Note that since it's semi-space
* copying, we could actually have double this amount allocated per thread. */
-#define MVM_NURSERY_SIZE 4194304
+#define MVM_NURSERY_SIZE 28000
On my primary computer I'm running Linuz under vurtual box and I'm pretty certain that rr doesn't work under that configuration 21:00
timo you could run a linux under qemu-system i think that allows you to rr record LOL
dogbert17 perhaps my second machine, RPi 5, can handle it 21:01
timo not sure rr works under arm
dogbert17 I can at least install it, version 5.6.0
timo cool 21:02
dogbert17 let me see if I can repro the problem on that machine 21:04
yes I could 21:07
rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 2 # what is this I wonder 21:09
timo i think it's about being able to do side channel attacks more easily if you get sufficiently high-resolution timer values 21:10
timers / performance counters 21:11
dogbert17 ok, I changed the value but now rr complains about other things
rr: Saving execution to trace directory `/home/dilbert/.local/share/rr/rakudo-m-0'. 21:12
[FATAL ./src/PerfCounters.cc:779:read_ticks()] 3 (speculatively) executed strex instructions detected.
On aarch64, rr only supports applications making use of LSE
atomics rather than legacy LL/SC-based atomics.
timo ;( 21:24
dogbert17 perhaps I need to set some mysterious gcc flag while building MoarVM 21:27
timo time for a compile time flag that replaces all our usages of atomics with explicit locks ... one global lock for all atomic stuff 21:29
22:48 sena_kun left
timo hm, i wonder. is it enough to just run the program forced-single-threaded and make all atomic operations non-atomic? 23:37