Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
07:40 sena_kun joined
lizmat_ I just had a MoarVM crash on upload to zef: gist.github.com/lizmat/02f45bd5044...a7e3dfd404 09:49
just leaving it here for possible inspiration: it did not re-occur
09:49 lizmat_ left 09:50 lizmat joined 11:08 sena_kun left 11:37 lizmat left 12:20 lizmat joined 12:21 lizmat_ joined 12:24 lizmat left 12:35 lizmat_ left, lizmat joined 14:55 sena_kun joined
timo unfortunately, "invalid thread id in gc work pass" is impossible to debug with just a traceback, since that's some kind of memory corruption that happened earlier that's just unearthed by the next gc run 15:59
btw for those who use "perf stat", there's a -r "repeats" argument that will run your command multiple times and give you the averages and standard deviations for all the counts (also -d once out of up to three times may be interesting sometimes??) 17:37
lizmat ok, I'll remove the gist then 17:50
18:12 MasterDuke joined
MasterDuke caught `MoarVM panic: non-AsyncTask fetched from eventloop active work list` while running t/spec/S32-io/IO-Socket-Async.t in rr 18:16
any suggestions for debugging? 18:23
also caught a fail of t/spec/S12-construction/destruction.t. planned 6 tests but only ran 5, although there was no message or obvious error 18:30
timo ok for the non-asynctask fetched one, you'd use "rr replay -M" to get a number for when the error happens (optional) and "rr replay -g <that number>" to reach that point 19:25
alternatively just "rr replay" and "break MVM_panic" then "yes" if it asks to delay until a library is loaded
then you'd go back to the spot where it popped the item off the queue 19:26
and now i'd have to look at the code how exactly you'd add a good watchpoint for the creation of that item
19:57 Nicholas left 20:07 Nicholas joined 20:58 sena_kun left
MasterDuke yeah, i don't really know the async socket code to know what's supposed to happen 21:19
timo right, i'm not sure the sockets have too much to do with it though. we'd find out what puts the broken item in there by reverse-continue-ing while watching the item 21:26
do you have time now to go through it?
MasterDuke couple minutes
timo we could do a tmux share with tmate, it also has a read-only mode if you prefer
MasterDuke never used tmate before, how does that work 21:27
timo from your side you just "tmate" and copy what it tells you and send that to me and i'd be connected to your session 21:28
MasterDuke ot, but this looks like something you'd be interested in pypy.org/posts/2024/07/mining-jit-...ns-z3.html 21:32
timo oh i like the title already 22:06
so i've looked at the recording a little bit and here's what i think i'm seeing
1) we cancel the read on a socket
in eventloop.c:56 "cancel_work", asyncsocket.c:153 "read_cancel" we call to eventloop_remove_active_work, that nulls out the "active_work" slot for work_idx 0 22:08
that was 2)
3) then in asyncsocket.c:996 in "listen_cancel" we're calling uv_close with the on_listen_cancelled callback 22:10
4) we come back down to async_handler which uv_async_io was calling for us 22:12
5) in uv_run we reach uv__run_closing_handles 22:14
6) uv__finish_close -> uv__stream_destroy -> uv__stream_flush_write_queue -> uv__write_callbacks, then in uv__finish_close we call handle->close_cb, which finally reaches on_listen_cancelled in asyncsocket.c:989 22:15
7) the first thing on_listen_cancelled does is try to MVM_io_eventloop_send_cancellation_notification on MVM_io_eventloop_get_active_work, which we saw earlier was nulled out already 22:16
i don't know terribly much about the "active work" system and what we expect to happen in what order 22:17
i may have to go through again and make sure i'm actually looking at the same thing in each of these steps and that there's nothing unrelated in between 22:18
not being able to "call" stuff in an rr session is all the more reason to add more stuff to the moar gdb plugins 22:27
m: my $t = IO::Socket::Async.listen("0.0.0.0", 5000).tap -> $conn { }; my $conn = await IO::Socket::Async.connect("127.0.0.1", 5000); $conn.close; say $conn.Supply.list; $t.close; 23:18
camelia ===SORRY!=== Error while compiling <tmp>
Unexpected block in infix position (missing statement control word before the expression?)
at <tmp>:1
------> ocket::Async.listen("0.0.0.0", 5000).tap⏏ -> $conn { }; my $conn = await IO::Soc
timo m: my $t = IO::Socket::Async.listen("0.0.0.0", 5000).tap: -> $conn { }; my $conn = await IO::Socket::Async.connect("127.0.0.1", 5000); $conn.close; say $conn.Supply.list; $t.close;
camelia Failed to resolve host name '0.0.0.0' with family 0.
Error: Address family for hostname not supported
in block <unit> at <tmp> line 1
timo m: my $t = IO::Socket::Async.listen("0.0.0.0", 54345).tap: -> $conn { }; $t.close; 23:19
camelia Failed to resolve host name '0.0.0.0' with family 0.
Error: Address family for hostname not supported
in block <unit> at <tmp> line 1
timo maybe not allowed to listen on camelia?
m: my $t = IO::Socket::Async.listen("127.0.0.1", 54345).tap: -> $conn { }; $t.close;
camelia Failed to resolve host name '127.0.0.1' with family 0.
Error: Address family for hostname not supported
in block <unit> at <tmp> line 1
MasterDuke evalable6: my $t = IO::Socket::Async.listen("127.0.0.1", 54345).tap: -> $conn { }; $t.close;
evalable6
timo evalable: my $t = IO::Socket::Async.listen("0.0.0.0", 54345).tap: -> $conn { }; $t.close; 23:20
evalable6
timo evalable: my $t = IO::Socket::Async.listen("0.0.0.0", 5000).tap: -> $conn { }; my $conn = await IO::Socket::Async.connect("127.0.0.1", 5000); $conn.close; say $conn.Supply.list; $t.close;
evalable6 (exit code 1) ()
MoarVM panic: non-AsyncTask fetched from eventloop active work list
timo there, this reliably reproduces the bug
now we can pepper the code with fprintf and see if we can see something :D 23:21