github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
timotimo finally back at my regular desktop system ... 08:03
... wow, one of the drives on the raid isn't showing up, eh? that's great 08:05
dumarchie Isn't github.com/MoarVM/MoarVM/blob/82a3...#L160-L164 a classical example of a data race? 08:14
nine timotimo: can you re-add it? 08:36
I've had devices fall out of the RAID twice in the past couple of weeks. But a simple add and they're up again 08:37
MasterDuke timotimo: ping re github.com/MoarVM/MoarVM/pull/1399 09:53
Geth MoarVM/modern-raku-script-extensions: a7efa956f9 | (Elizabeth Mattijsen)++ | 6 files
Fix some .p6 -> .raku changes that were missed

Nine++ for the spot
10:02
dumarchie MasterDuke, can you have a look at github.com/MoarVM/MoarVM/issues/12...-744273918 ? 10:05
MasterDuke i'm not sure what's the race there? 10:08
but i suspect you'll really need nine, jnthn, timotimo, or nwc10 to take a look 10:09
dumarchie Two threads see the cache is not properly initialized. Both assign a new object. Then later one of them finds out the object is different than expected.
MasterDuke hm 10:11
dumarchie I'm not a C programmer, but maybe even the pointer itself could become corrupted? 10:15
MasterDuke there is a mutex that's used for some multicache operations. e.g., github.com/MoarVM/MoarVM/blob/82a3...che.c#L209 10:17
you could try wrapping that init in a lock/unlock 10:18
dumarchie I guess it's safe to use the mutex_multi_cache_add for initialization as well. I'll give it a try. It's probably more interesting than writing a hello_world.c :) 10:22
Otoh, do we want a lock/unlock for every MVM_multi_cache_add ? 10:23
jnthn iirc, the idea was that threads may race to install the multi cache, and whoever loses just ends up with that being GC'd later
tellable6 hey jnthn, you have a message: gist.github.com/9b64d0622fc34b57db...90e6562af7
dumarchie jnthn, that makes sense to me. But does the cur_node check in line 258 make sense from that perspective? 10:26
dumarchie line 257 that is 10:29
MasterDuke dumarchie: do you still have the problem with a current rakudo? according to one of your recent comments, you still have a rakudo from 2020.10, even though your moarvm is from 2020.11 10:34
dumarchie All my recent debugging was with Rakudo v2020.10-275-gc63f078a2 10:36
MasterDuke there were some changes to cas between now and then, but i think they were just optimizations and shouldn't have changed how it works
dumarchie I don't see them in github.com/dumarchie/rakudo/compar...udo:master 10:39
MasterDuke github.com/rakudo/rakudo/commit/1c...03dac6f8e3 10:40
oh ha. github.com/rakudo/rakudo/commit/96...ba491a4663 was by you 10:41
dumarchie :) I checked and they are part of the Rakudo I built 10:42
MasterDuke but it says v2020.10-275-gc63f078a2 ? something's off then 10:43
did you pull tags? 10:47
dumarchie iirc I just did a `git pull upstream master` 10:48
That should pull tags, right? 10:49
MasterDuke nope. need to do a round of pull/push with `--tags`
dumarchie Oh... maybe a `--rebase` does it as well? That's what I normally use. 10:50
MasterDuke but this does make it interesting that i can't repro. i thought you had an old rakudo, but if not than i wonder why i don't get the panic
what's your OS and hardware? 10:51
dumarchie Windows 10. Let me figure out how to get CPU info. 10:52
Intel Core i5-6200U 10:53
I guess most devs have an AMD? 10:54
MasterDuke well, i know nine and i do 10:56
dumarchie A colleague of mine also failed to repro on Linux on AMD.
MasterDuke yeah, i'm on linux also
dumarchie But note that I can't consistently repro. Only once in a while I have the "Switching to Thread" followed by a panic. 10:58
MasterDuke what compiler are you using? 10:59
i usually build with gcc, but i'll try switching to clang
dumarchie For my latest build I used gcc provided by MinGW 11:00
MasterDuke been running in a loop for a couple minutes, still no panic 11:06
dumarchie Maybe you can speed it up with `benchmark/stack.raku 100` to limit the number of values pushed and popped per run. 11:12
MasterDuke couple minutes of that each with gcc and clang, no panic 11:23
dumarchie You also didn't see the "Switching to Thread"? 11:24
MasterDuke nope 11:28
dumarchie Can you limit the number of CPU cores MoarVM uses? 11:50
MasterDuke i think so 11:55
dumarchie Maybe it helps if you limit them to 4 or 2. 12:02
MasterDuke been running various configurations with 2 cores, no dice so far 12:02
MasterDuke how long does it usually take for you to get an error? 12:04
dumarchie It varies between 1 and about 20 runs. 12:10
MasterDuke i've probably done over 1k across the different configurations. seems like it might be a windows thing 12:11
i think [Coke] runs on windows, maybe he can repro to confirm 12:12
afk for a bit 12:13
nine dumarchie: ignore the "Switching to Thread". That's just gdb saying that it switches to the thread that called abort() as that's most likely what you as a user want to look at 14:02
tellable6 nine, I'll pass your message to dumarchie
nine Naively putting that allocation insided the locked area can cause a deadlock: allocation may trigger garbage collection. If another thread is waiting for that mutex, it won't be able to enter the GC, so GCing threads are waiting for that thread to join and that thread is waiting for the multi cache mutex to become available 14:08
dumarchie nine, good point I guess 14:37
tellable6 2020-12-14T14:02:39Z #moarvm <nine> dumarchie: ignore the "Switching to Thread". That's just gdb saying that it switches to the thread that called abort() as that's most likely what you as a user want to look at
dumarchie Maybe it would be better to obtain and manipulate a private pointer to the `cache` (i.e. the `cache_obj->body`) instead of manipulating and checking the `cache_obj` itself? 14:58
MasterDuke dumarchie: is valgrind available for windows? 15:00
nine dumarchie: I don't see what that would change 15:02
also that's exactly what happens in line 165 15:03
dumarchie: did you notice that the message says cur_node != 0, re-check == 0000000000000000? In my book, 0000000000000000 is decidedly 0. Previous messages in the same GH issue said <nil> which should be quite 0, too 15:09
dumarchie: what does gdb say that cur_node actually is?
dumarchie How do I ask gdb? 15:11
nine p cur_node 15:12
dumarchie Let met try to trigger another panic.
nine in that call frame. From MVM_panic you'll have to do up
dumarchie You mean `(gdb) up` and then `(gdb) p cur_node` ? 15:13
nine yes
MasterDuke might need to recompile with `--optimize=0` 15:18
dumarchie Just MoarVM, I suppose. How do I do that efficiently? 15:20
nine Just run MoarVM's Configure.pl manually and do a make install. No need to touch nqp or rakudo 15:21
dumarchie Running `nqp\MoarVM>perl Configure.pl --optimize=0 --debug=3` 15:28
[Coke] 15:39
dumarchie OK, I hit the breakpoint and did `bt`. Should I also do `f 1`? 15:42
With just `up` and `p cur_node` I get `$1 = <optimized out>` 15:44
MasterDuke that should be the same as `up`, so yeah
you did `make install` after the configure? 15:45
dumarchie Yes, both in `nqp\MoarVM` and in the main `rakudo` directory. But I see that *install\bin\moar.dll* was not changed... 15:47
nine dumarchie: did you still have the debugger running when doing make install?
That could prevent make install from replacing the file 15:48
dumarchie No, I'm using just one command prompt.
Geth MoarVM/try_fix_multi_cache_add: f19e0e9f67 | (Stefan Seifert)++ | 2 files
Try to fix "Corrupt multi dispatch cache: cur_node != 0, re-check == 0"

MVM_multi_cache_find did a couple more checks when looking for a multi candidate than MVM_multi_cache_add, i.e. it was more picky. This could lead to the check for a pre-existing candidate to not find one and the code for finding the right tree node to extend getting surprised by finding a matching candidate after all.
15:53
nine dumarchie: can you please try this patch?
dumarchie It looks like the first `gmake install` from *nqp\MoarVM>* installed in *nqp\MoarVM\install\bin\* and the second `gmake install` thought that was OK... 15:56
nine dumarchie: oh, you have to specify the --prefix to Configure.pl
Same prefix as for rakudo 15:57
dumarchie The other files in the main *install\bin\* _were_ updated ...
nine by the make install in rakudo
dumarchie I guess so. To apply your patch I can just do a `git pull` in *nqp\MoarVM\* ? 15:59
nine and `git checkout try_fix_multi_cache_add`
then make install. After you ran Configure.pl again with a correct --prefix 16:00
dumarchie That would be `--prefix="c:\raku\rakudo\install\" in my case, I guess. 16:02
nine looks good
dumarchie Do you prefer the first or the last compile error? 16:05
First: src\gen\config.c:238: error: unterminated argument list invoking macro "MVMROOT" 16:06
nine There shouldn't be any :D
what the? I was nowhere near that code
dumarchie src\gen\config.c:26:5: error: 'MVMROOT' undeclared (first use in this function); did you mean 'MVMIOOps'?
src\gen\config.c:26:12: error: expected ';' at end of input 16:07
MVMROOT(tc, config, {
^ 16:08
nine There's something seriously wrong.
Can you please try a `git clean -xfd` followed by `perl Configure.-l --prefix="c:\raku\rakudo\install"` and `make install`?
dumarchie OK 16:09
nine If that alone doesn't help, then maybe deleting the c:\raku\rakudo\install completely before that will 16:10
dumarchie I did the `git clean -xfd` and `perl Configure.pl --optimize=0 --debug=3 --prefix="c:\raku\rakudo\install\".
nine No need for --optimize=0 --debug=3 16:11
dumarchie Just to be sure: does it matter where I do the `gmake install`?
nine in the MoarVM directory
dumarchie Same error. I will delete both c:\raku\rakudo\install and c:\raku\rakudo\nqp\MoarVM\install and try again. 16:14
B.t.w. `git status` tells me "modified: 3rdparty/libuv (untracked content)". Does that matter? 16:17
config.o is now compiled fine, so I guess not 16:20
nine step by step :) 16:21
dumarchie Okay, compiled the moar binaries. I guess I now have to do another `gmake install` in c:\raku\rakudo\ 16:25
nine no 16:26
or yes, since you deleted install previously
in that case you also need a make install in nqp
dumarchie Indeed, thanks :) 16:27
+++ Rakudo installed succesfully! 16:28
5th run without debugger: 16:30
MoarVM panic: Corrupt multi dispatch cache: cur_node != 0, re-check == 000000000892F3C0
Would you like me to try again with gdb?
nine Well....this is interesting. This is the first time that re-check actually isn't 0 16:32
please, gdb would be nice
Actually it would have been a good idea to just extend the error message to include cur_node's actual value
Then one wouldn't need the debugger for that
dumarchie Shall I wait for you commit? 16:38
Geth MoarVM/try_fix_multi_cache_add: a344c580ed | (Stefan Seifert)++ | src/6model/reprs/MVMMultiCache.c
Add some more diagnostics to "Corrupt multi dispatch cache" error
16:40
nine there ^^^
MasterDuke -cur_node? why the minus? 16:42
nine Because leaf nodes (containing the index into the results array) in the multi dispatch cache tree are marked by being negative 16:46
MasterDuke ah
dumarchie Hmm, now it takes more time to reproduce the panic. 16:54
MasterDuke but it does still panic? 16:56
dumarchie Not yet
dumarchie Finally: MoarVM panic: Corrupt multi dispatch cache: cur_node != 0: -1 (000000000891F3C0), re-check == 000000000891F3C0 16:58
nine Makes me wonder: what if there are actually 2 bugs here and I did fix the one? 16:59
dumarchie In line 210, the `MVM_multi_cache_find` could operate on a `cache_obj` that is modified by a thread that has not reached line 209 yet, I think 17:03
nine How would that be modified? 17:04
dumarchie Line 162.
nine cache_obj is a local variable
line 162 allocates a new object that other threads do not have any reference to 17:05
dumarchie I'm not going to argue with you :) 17:06
I don't know C. I just saw the parameter declared as `MVMObject *cache_obj`
dumarchie I guess that dereferences the pointer? 17:08
nine that declares a pointer. cache_obj is just a pointer. dereferencing is via -> 17:11
I only see 2 ways how a second check could find a candidate where the first one didn't: either the cache changed, or the thing we're looking for changed between the first and the second find 17:13
The cache is protected by the mutex. Unless there's some other code that modifies the cache without taking the mutex, I don't see how the cache could change 17:14
MasterDuke there is a MVM_MULTICACHE_DEBUG that enables a `dump_cache`. dump at every find?
dumarchie Afk to buy food. 17:15
nine Following the hypthesis that the thing we're looking for changes mid-way: we're looking for an MVMCallCapture. That's a set of arguments. The arguments to cas are: (Mu $target is rw, Mu \expected, Mu \value) 18:08
nine What if $target gets changed by a different thread between us checking for that cas candidate for the first time and the second time? 18:13
timotimo yooooo 18:38
i was without a usable desktop machine for a couple of days :| 18:39
nine ouch 18:40
timotimo upgraded my media file storage from a 12tb raid0 to a 14tb raid1 18:43
three disks to two disks
nine Can't confirm my hypothesis of the changing capture 18:57
lizmat and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2020/12/14/2020-...wikipedia/ 19:09
dumarchie nine: in my code the type of value contained in `$target` changes from `:U` to `:D` 19:31
dumarchie Furthermore the panic appears to be triggered by `prefix:<⚛>`, but maybe that first sees a `:U` and then a `:D` because another thread already performed a `cas`. 19:46
nine yes, that's what I figured, but I still cannot reproduce it even with creating those exact circumstances 20:00
dumarchie nine: I enabled MVM_MULTICACHE_DEBUG and created a gist with the output of the last four additions before a panic: gist.github.com/dumarchie/1dec8e93...71276ae3f4 21:20
The first and last tree dump appear to be the same cache. Maybe the difference between them tells you something.