Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
07:40
sena_kun joined
|
|||
lizmat_ | I just had a MoarVM crash on upload to zef: gist.github.com/lizmat/02f45bd5044...a7e3dfd404 | 09:49 | |
just leaving it here for possible inspiration: it did not re-occur | |||
09:49
lizmat_ left
09:50
lizmat joined
11:08
sena_kun left
11:37
lizmat left
12:20
lizmat joined
12:21
lizmat_ joined
12:24
lizmat left
12:35
lizmat_ left,
lizmat joined
14:55
sena_kun joined
|
|||
timo | unfortunately, "invalid thread id in gc work pass" is impossible to debug with just a traceback, since that's some kind of memory corruption that happened earlier that's just unearthed by the next gc run | 15:59 | |
btw for those who use "perf stat", there's a -r "repeats" argument that will run your command multiple times and give you the averages and standard deviations for all the counts (also -d once out of up to three times may be interesting sometimes??) | 17:37 | ||
lizmat | ok, I'll remove the gist then | 17:50 | |
18:12
MasterDuke joined
|
|||
MasterDuke | caught `MoarVM panic: non-AsyncTask fetched from eventloop active work list` while running t/spec/S32-io/IO-Socket-Async.t in rr | 18:16 | |
any suggestions for debugging? | 18:23 | ||
also caught a fail of t/spec/S12-construction/destruction.t. planned 6 tests but only ran 5, although there was no message or obvious error | 18:30 | ||
timo | ok for the non-asynctask fetched one, you'd use "rr replay -M" to get a number for when the error happens (optional) and "rr replay -g <that number>" to reach that point | 19:25 | |
alternatively just "rr replay" and "break MVM_panic" then "yes" if it asks to delay until a library is loaded | |||
then you'd go back to the spot where it popped the item off the queue | 19:26 | ||
and now i'd have to look at the code how exactly you'd add a good watchpoint for the creation of that item | |||
19:57
Nicholas left
20:07
Nicholas joined
20:58
sena_kun left
|
|||
MasterDuke | yeah, i don't really know the async socket code to know what's supposed to happen | 21:19 | |
timo | right, i'm not sure the sockets have too much to do with it though. we'd find out what puts the broken item in there by reverse-continue-ing while watching the item | 21:26 | |
do you have time now to go through it? | |||
MasterDuke | couple minutes | ||
timo | we could do a tmux share with tmate, it also has a read-only mode if you prefer | ||
MasterDuke | never used tmate before, how does that work | 21:27 | |
timo | from your side you just "tmate" and copy what it tells you and send that to me and i'd be connected to your session | 21:28 | |
MasterDuke | ot, but this looks like something you'd be interested in pypy.org/posts/2024/07/mining-jit-...ns-z3.html | 21:32 | |
timo | oh i like the title already | 22:06 | |
so i've looked at the recording a little bit and here's what i think i'm seeing | |||
1) we cancel the read on a socket | |||
in eventloop.c:56 "cancel_work", asyncsocket.c:153 "read_cancel" we call to eventloop_remove_active_work, that nulls out the "active_work" slot for work_idx 0 | 22:08 | ||
that was 2) | |||
3) then in asyncsocket.c:996 in "listen_cancel" we're calling uv_close with the on_listen_cancelled callback | 22:10 | ||
4) we come back down to async_handler which uv_async_io was calling for us | 22:12 | ||
5) in uv_run we reach uv__run_closing_handles | 22:14 | ||
6) uv__finish_close -> uv__stream_destroy -> uv__stream_flush_write_queue -> uv__write_callbacks, then in uv__finish_close we call handle->close_cb, which finally reaches on_listen_cancelled in asyncsocket.c:989 | 22:15 | ||
7) the first thing on_listen_cancelled does is try to MVM_io_eventloop_send_cancellation_notification on MVM_io_eventloop_get_active_work, which we saw earlier was nulled out already | 22:16 | ||
i don't know terribly much about the "active work" system and what we expect to happen in what order | 22:17 | ||
i may have to go through again and make sure i'm actually looking at the same thing in each of these steps and that there's nothing unrelated in between | 22:18 | ||
not being able to "call" stuff in an rr session is all the more reason to add more stuff to the moar gdb plugins | 22:27 | ||
m: my $t = IO::Socket::Async.listen("0.0.0.0", 5000).tap -> $conn { }; my $conn = await IO::Socket::Async.connect("127.0.0.1", 5000); $conn.close; say $conn.Supply.list; $t.close; | 23:18 | ||
camelia | ===SORRY!=== Error while compiling <tmp> Unexpected block in infix position (missing statement control word before the expression?) at <tmp>:1 ------> ocket::Async.listen("0.0.0.0", 5000).tap⏏ -> $conn { }; my $conn = await IO::Soc… |
||
timo | m: my $t = IO::Socket::Async.listen("0.0.0.0", 5000).tap: -> $conn { }; my $conn = await IO::Socket::Async.connect("127.0.0.1", 5000); $conn.close; say $conn.Supply.list; $t.close; | ||
camelia | Failed to resolve host name '0.0.0.0' with family 0. Error: Address family for hostname not supported in block <unit> at <tmp> line 1 |
||
timo | m: my $t = IO::Socket::Async.listen("0.0.0.0", 54345).tap: -> $conn { }; $t.close; | 23:19 | |
camelia | Failed to resolve host name '0.0.0.0' with family 0. Error: Address family for hostname not supported in block <unit> at <tmp> line 1 |
||
timo | maybe not allowed to listen on camelia? | ||
m: my $t = IO::Socket::Async.listen("127.0.0.1", 54345).tap: -> $conn { }; $t.close; | |||
camelia | Failed to resolve host name '127.0.0.1' with family 0. Error: Address family for hostname not supported in block <unit> at <tmp> line 1 |
||
MasterDuke | evalable6: my $t = IO::Socket::Async.listen("127.0.0.1", 54345).tap: -> $conn { }; $t.close; | ||
evalable6 | |||
timo | evalable: my $t = IO::Socket::Async.listen("0.0.0.0", 54345).tap: -> $conn { }; $t.close; | 23:20 | |
evalable6 | |||
timo | evalable: my $t = IO::Socket::Async.listen("0.0.0.0", 5000).tap: -> $conn { }; my $conn = await IO::Socket::Async.connect("127.0.0.1", 5000); $conn.close; say $conn.Supply.list; $t.close; | ||
evalable6 | (exit code 1) () MoarVM panic: non-AsyncTask fetched from eventloop active work list |
||
timo | there, this reliably reproduces the bug | ||
now we can pepper the code with fprintf and see if we can see something :D | 23:21 |