lizmat_ I just had a MoarVM crash on upload to zef: gist.github.com/lizmat/02f45bd5044...a7e3dfd404 09:49
just leaving it here for possible inspiration: it did not re-occur
timo unfortunately, "invalid thread id in gc work pass" is impossible to debug with just a traceback, since that's some kind of memory corruption that happened earlier that's just unearthed by the next gc run 15:59
btw for those who use "perf stat", there's a -r "repeats" argument that will run your command multiple times and give you the averages and standard deviations for all the counts (also -d once out of up to three times may be interesting sometimes??) 17:37
lizmat ok, I'll remove the gist then 17:50
MasterDuke caught `MoarVM panic: non-AsyncTask fetched from eventloop active work list` while running t/spec/S32-io/IO-Socket-Async.t in rr 18:16
any suggestions for debugging? 18:23
also caught a fail of t/spec/S12-construction/destruction.t. planned 6 tests but only ran 5, although there was no message or obvious error 18:30
timo ok for the non-asynctask fetched one, you'd use "rr replay -M" to get a number for when the error happens (optional) and "rr replay -g <that number>" to reach that point 19:25
alternatively just "rr replay" and "break MVM_panic" then "yes" if it asks to delay until a library is loaded
then you'd go back to the spot where it popped the item off the queue 19:26
and now i'd have to look at the code how exactly you'd add a good watchpoint for the creation of that item
MasterDuke yeah, i don't really know the async socket code to know what's supposed to happen 21:19
timo right, i'm not sure the sockets have too much to do with it though. we'd find out what puts the broken item in there by reverse-continue-ing while watching the item 21:26
do you have time now to go through it?
MasterDuke couple minutes
timo we could do a tmux share with tmate, it also has a read-only mode if you prefer
MasterDuke never used tmate before, how does that work 21:27
timo from your side you just "tmate" and copy what it tells you and send that to me and i'd be connected to your session 21:28
MasterDuke ot, but this looks like something you'd be interested in pypy.org/posts/2024/07/mining-jit-...ns-z3.html 21:32
timo oh i like the title already 22:06
so i've looked at the recording a little bit and here's what i think i'm seeing
1) we cancel the read on a socket
in eventloop.c:56 "cancel_work", asyncsocket.c:153 "read_cancel" we call to eventloop_remove_active_work, that nulls out the "active_work" slot for work_idx 0 22:08
that was 2)
3) then in asyncsocket.c:996 in "listen_cancel" we're calling uv_close with the on_listen_cancelled callback 22:10
4) we come back down to async_handler which uv_async_io was calling for us 22:12
5) in uv_run we reach uv__run_closing_handles 22:14
6) uv__finish_close -> uv__stream_destroy -> uv__stream_flush_write_queue -> uv__write_callbacks, then in uv__finish_close we call handle->close_cb, which finally reaches on_listen_cancelled in asyncsocket.c:989 22:15
7) the first thing on_listen_cancelled does is try to MVM_io_eventloop_send_cancellation_notification on MVM_io_eventloop_get_active_work, which we saw earlier was nulled out already 22:16
i don't know terribly much about the "active work" system and what we expect to happen in what order 22:17
i may have to go through again and make sure i'm actually looking at the same thing in each of these steps and that there's nothing unrelated in between 22:18
not being able to "call" stuff in an rr session is all the more reason to add more stuff to the moar gdb plugins 22:27
