#moarvm on 28 September 2024 - Raku Programming Language Log

05:25 kjp left 05:26 kjp joined 05:29 kjp left 05:30 kjp joined 05:31 kjp left 05:32 kjp joined 09:11 sena_kun joined 13:30 dogbert17 joined
dogbert17	m: await (^5).map({start { print qqx{echo $_} } })	13:38	Copy link Message link Add to gist Remove Run code
camelia	2 0 4 1 3		Copy link Message link Add to gist Remove
dogbert17	There's an insidious bug hiding here. Sometimes the code is terminated with "MoarVM panic: Collectable 0x2000a329230 in fromspace accessed"	13:40	Copy link Message link Add to gist Remove
	I'm running with MVM_GC_DEBUG=2 and a small nursery		Copy link Message link Add to gist Remove
lizmat	does it also happen without JITting?	13:43	Copy link Message link Add to gist Remove
dogbert17	I can check ...	13:44	Copy link Message link Add to gist Remove
	It does happen with MVM_JIT_DISABLE=1 as well	13:46	Copy link Message link Add to gist Remove
	The stack trace in gdb always looks the same	13:48	Copy link Message link Add to gist Remove
	#0 MVM_panic (exitCode=1, messageFormat=0x7ffff79a8e70 "Collectable %p in fromspace accessed") at src/core/exceptions.c:855	13:49	Copy link Message link Add to gist Remove
	#1 0x00007ffff771235c in check_reg (tc=0x2000a130f00, reg_base=0x200120c1bb0, idx=20) at src/core/interp.c:13		Copy link Message link Add to gist Remove
	#2 0x00007ffff7712da5 in MVM_interp_run (tc=0x2000a130f00, initial_invoke=0x7ffff7779736 <thread_initial_invoke>, invoke_data=0x2000a0931c0, outer_runloop=0x0) at src/core/interp.c:231		Copy link Message link Add to gist Remove
	#3 0x00007ffff77798e9 in start_thread (data=0x2000a0931c0) at src/core/threads.c:101		Copy link Message link Add to gist Remove
	#4 0x00007ffff734eac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442		Copy link Message link Add to gist Remove
	#5 0x00007ffff73e0850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81		Copy link Message link Add to gist Remove
14:36 Geth__ joined 14:38 japhb_ joined, jnthn1 joined 14:41 Geth left, japhb left, jnthn left 14:52 [Coke] left
timo	did we get the recent spawn setup gc root fix wrong?	16:36	Copy link Message link Add to gist Remove
16:49 [Coke] joined
timo	raku.zulipchat.com/user_uploads/56.../image.png	19:19	Copy link Message link Add to gist Remove
	for ^100 { await (^120).map({ start { print qqx{echo $_ > /dev/null}; } }); note ((ENTER now) R- now) }	19:20	Copy link Message link Add to gist Remove
	92.41% 0.00% async io thread libmoar.so [.] uv_spawn		Copy link Message link Add to gist Remove
dogbert17	timo: on a higher level it looks like this	19:32	Copy link Message link Add to gist Remove
	(gdb) p MVM_dump_backtrace(tc)	19:33	Copy link Message link Add to gist Remove
	at SETTING::src/core.c/ThreadPoolScheduler.rakumod:406 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:affinity-queue)		Copy link Message link Add to gist Remove
	from SETTING::src/core.c/ThreadPoolScheduler.rakumod:841 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:queue)		Copy link Message link Add to gist Remove
	from SETTING::src/core.c/Proc/Async.rakumod:419 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:start-internal)		Copy link Message link Add to gist Remove
	from SETTING::src/core.c/Proc/Async.rakumod:315 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:actually-start)		Copy link Message link Add to gist Remove
	from SETTING::src/core.c/Proc/Async.rakumod:325 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:start)		Copy link Message link Add to gist Remove
	from SETTING::src/core.c/Proc.rakumod:185 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:spawn-internal)		Copy link Message link Add to gist Remove
	from SETTING::src/core.c/Proc.rakumod:178 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:shell)		Copy link Message link Add to gist Remove
	from SETTING::src/core.c/Proc.rakumod:251 (/home/dilbert/repos/rakudo/blib/CORE.c.setting.moarvm:QX)		Copy link Message link Add to gist Remove
timo	can you also MVM_dump_bytecode(tc)?	19:34	Copy link Message link Add to gist Remove
	that line points at a relatively bug nqp::stmts tree		Copy link Message link Add to gist Remove
	relatively big*	19:36	Copy link Message link Add to gist Remove
dogbert17	will do ...		Copy link Message link Add to gist Remove
	it's a bit longer so I put it here: gist.github.com/dogbert17/23618069...cdd011d570	19:40	Copy link Message link Add to gist Remove
timo	hm, are you able to rr record?	19:52	Copy link Message link Add to gist Remove
	it looks like this is asploding inside a handler for return or similar	19:54	Copy link Message link Add to gist Remove
	it could be that's the return from `nqp::unless( $cand.working, (return $queue), )` but impossible to say without reverse-stepping	19:55	Copy link Message link Add to gist Remove
	setting RAKUDO_SCHEDULER_DEBUG might be interesting to see if it happens immediately after it spits out something	19:57	Copy link Message link Add to gist Remove
dogbert17	dilbert@dilbert-VirtualBox:~/repos/rakudo$ MVM_JIT_DISABLE=1 RAKUDO_SCHEDULER_DEBUG=1 ./rakudo-m -e 'await (^5).map({start { print qqx{echo $_} } })'	20:02	Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Created initial general worker thread		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Supervisor started		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Supervisor thinks there are 7 CPU cores		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Added a general worker thread		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Added a general worker thread		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Added a general worker thread		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Added a general worker thread		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Created initial affinity worker thread		Copy link Message link Add to gist Remove
	[SCHEDULER 50657] Added a general worker thread	20:03	Copy link Message link Add to gist Remove
	after this the numbers 0..4 are printed out in random order. When it fails it's always after the line 'Created initial affinity worker thread'		Copy link Message link Add to gist Remove
	i.e. the last 'Added a general worker thread' line is not printed	20:04	Copy link Message link Add to gist Remove
timo	can you turn spesh off entirely and see if it still asplodes?	20:08	Copy link Message link Add to gist Remove
dogbert17	sure		Copy link Message link Add to gist Remove
	will have to run the code a few times ...	20:10	Copy link Message link Add to gist Remove
	it does fail with MVM_SPESH_DISABLE=1 as well		Copy link Message link Add to gist Remove
timo	did you say something about rr?	20:26	Copy link Message link Add to gist Remove
	what kind of system is this emulating?	20:27	Copy link Message link Add to gist Remove
	putting a note or scheduler-debug in the !affinity-queue method when entering the method, and when hitting the returns (probably actually want one for each of the protected blocks), as well as the "method queue" which is the only caller to self!affinity-queue, could be helpful	20:34	Copy link Message link Add to gist Remove
	messing around with LEAVE phasers is probably risky in terms of making the code / control flow change enough for the bug to go into hiding	20:36	Copy link Message link Add to gist Remove
	how small exactly is your nursery?	20:49	Copy link Message link Add to gist Remove
	... i was installing to a prefix i wasn't running from ...	20:52	Copy link Message link Add to gist Remove
dogbert17	#define MVM_NURSERY_SIZE 28000	20:53	Copy link Message link Add to gist Remove
	#define MVM_GC_DEBUG 2	20:54	Copy link Message link Add to gist Remove
	those are the changes that I've made		Copy link Message link Add to gist Remove
timo	because i can't reproduce the crash at all		Copy link Message link Add to gist Remove
dogbert17	I believe that the gc debug flag needs to be on	20:55	Copy link Message link Add to gist Remove
timo	i do have it turned on	20:56	Copy link Message link Add to gist Remove
	to 2 as well		Copy link Message link Add to gist Remove
dogbert17	cool, and the nursery?		Copy link Message link Add to gist Remove
timo	#define MVM_NURSERY_THREAD_START (131072 / 64)	20:57	Copy link Message link Add to gist Remove
	#define MVM_NURSERY_SIZE MVM_NURSERY_THREAD_START		Copy link Message link Add to gist Remove
dogbert17	I haven't touched those	20:58	Copy link Message link Add to gist Remove
timo	now i'm dividing by 512 instead		Copy link Message link Add to gist Remove
	m: say 131072 / 512		Copy link Message link Add to gist Remove Run code
camelia	256		Copy link Message link Add to gist Remove
dogbert17	--- a/src/gc/collect.h		Copy link Message link Add to gist Remove
timo	oops that's rather a bit too small huh		Copy link Message link Add to gist Remove
dogbert17	+++ b/src/gc/collect.h		Copy link Message link Add to gist Remove
	@@ -1,6 +1,6 @@		Copy link Message link Add to gist Remove
	/* The maximum size of the nursery area. Note that since it's semi-space		Copy link Message link Add to gist Remove
	* copying, we could actually have double this amount allocated per thread. */		Copy link Message link Add to gist Remove
	-#define MVM_NURSERY_SIZE 4194304		Copy link Message link Add to gist Remove
	+#define MVM_NURSERY_SIZE 28000		Copy link Message link Add to gist Remove
	On my primary computer I'm running Linuz under vurtual box and I'm pretty certain that rr doesn't work under that configuration	21:00	Copy link Message link Add to gist Remove
timo	you could run a linux under qemu-system i think that allows you to rr record LOL		Copy link Message link Add to gist Remove
dogbert17	perhaps my second machine, RPi 5, can handle it	21:01	Copy link Message link Add to gist Remove
timo	not sure rr works under arm		Copy link Message link Add to gist Remove
dogbert17	I can at least install it, version 5.6.0		Copy link Message link Add to gist Remove
timo	cool	21:02	Copy link Message link Add to gist Remove
dogbert17	let me see if I can repro the problem on that machine	21:04	Copy link Message link Add to gist Remove
	yes I could	21:07	Copy link Message link Add to gist Remove
	rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 2 # what is this I wonder	21:09	Copy link Message link Add to gist Remove
timo	i think it's about being able to do side channel attacks more easily if you get sufficiently high-resolution timer values	21:10	Copy link Message link Add to gist Remove
	timers / performance counters	21:11	Copy link Message link Add to gist Remove
dogbert17	ok, I changed the value but now rr complains about other things		Copy link Message link Add to gist Remove
	rr: Saving execution to trace directory `/home/dilbert/.local/share/rr/rakudo-m-0'.	21:12	Copy link Message link Add to gist Remove
	[FATAL ./src/PerfCounters.cc:779:read_ticks()] 3 (speculatively) executed strex instructions detected.		Copy link Message link Add to gist Remove
	On aarch64, rr only supports applications making use of LSE		Copy link Message link Add to gist Remove
	atomics rather than legacy LL/SC-based atomics.		Copy link Message link Add to gist Remove
timo	;(	21:24	Copy link Message link Add to gist Remove
dogbert17	perhaps I need to set some mysterious gcc flag while building MoarVM	21:27	Copy link Message link Add to gist Remove
timo	time for a compile time flag that replaces all our usages of atomics with explicit locks ... one global lock for all atomic stuff	21:29	Copy link Message link Add to gist Remove
22:48 sena_kun left
timo	hm, i wonder. is it enough to just run the program forced-single-threaded and make all atomic operations non-atomic?	23:37	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!