github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:02
evalable6 left,
committable6 left,
linkable6 left
00:03
evalable6 joined,
committable6 joined
00:04
linkable6 joined
00:30
frost-lab left
01:07
dogbert11 joined
01:10
dogbert17 left
02:10
quotable6 left,
notable6 left,
squashable6 left,
bisectable6 left,
committable6 left,
bloatable6 left,
evalable6 left,
linkable6 left,
nativecallable6 left,
releasable6 left,
benchable6 left,
greppable6 left,
coverable6 left,
sourceable6 left,
shareable6 left,
unicodable6 left,
tellable6 left,
statisfiable6 left
02:11
sourceable6 joined,
greppable6 joined,
nativecallable6 joined,
bloatable6 joined
02:12
tellable6 joined,
notable6 joined,
releasable6 joined,
squashable6 joined,
coverable6 joined,
linkable6 joined,
committable6 joined
02:13
shareable6 joined,
benchable6 joined,
quotable6 joined,
statisfiable6 joined,
bisectable6 joined
02:14
evalable6 joined,
unicodable6 joined
02:41
lucasb left
03:13
leont left
05:36
MasterDuke left
07:40
domidumont joined
07:43
domidumont left
07:49
domidumont joined
07:56
domidumont left
07:58
domidumont joined
|
|||
timotimo | finally back at my regular desktop system ... | 08:03 | |
... wow, one of the drives on the raid isn't showing up, eh? that's great | 08:05 | ||
08:09
sena_kun joined
08:14
dumarchie joined
|
|||
dumarchie | Isn't github.com/MoarVM/MoarVM/blob/82a3...#L160-L164 a classical example of a data race? | 08:14 | |
08:16
sivoais left,
camelia left
08:20
Geth left,
Geth joined
08:21
domidumont left
08:22
sivoais joined,
camelia joined
08:28
domidumont joined
|
|||
nine | timotimo: can you re-add it? | 08:36 | |
I've had devices fall out of the RAID twice in the past couple of weeks. But a simple add and they're up again | 08:37 | ||
08:52
zakharyas joined
09:08
Altai-man joined
09:10
sena_kun left
09:22
MasterDuke joined
|
|||
MasterDuke | timotimo: ping re github.com/MoarVM/MoarVM/pull/1399 | 09:53 | |
Geth | MoarVM/modern-raku-script-extensions: a7efa956f9 | (Elizabeth Mattijsen)++ | 6 files Fix some .p6 -> .raku changes that were missed Nine++ for the spot |
10:02 | |
dumarchie | MasterDuke, can you have a look at github.com/MoarVM/MoarVM/issues/12...-744273918 ? | 10:05 | |
MasterDuke | i'm not sure what's the race there? | 10:08 | |
but i suspect you'll really need nine, jnthn, timotimo, or nwc10 to take a look | 10:09 | ||
dumarchie | Two threads see the cache is not properly initialized. Both assign a new object. Then later one of them finds out the object is different than expected. | ||
MasterDuke | hm | 10:11 | |
dumarchie | I'm not a C programmer, but maybe even the pointer itself could become corrupted? | 10:15 | |
MasterDuke | there is a mutex that's used for some multicache operations. e.g., github.com/MoarVM/MoarVM/blob/82a3...che.c#L209 | 10:17 | |
you could try wrapping that init in a lock/unlock | 10:18 | ||
dumarchie | I guess it's safe to use the mutex_multi_cache_add for initialization as well. I'll give it a try. It's probably more interesting than writing a hello_world.c :) | 10:22 | |
Otoh, do we want a lock/unlock for every MVM_multi_cache_add ? | 10:23 | ||
jnthn | iirc, the idea was that threads may race to install the multi cache, and whoever loses just ends up with that being GC'd later | ||
tellable6 | hey jnthn, you have a message: gist.github.com/9b64d0622fc34b57db...90e6562af7 | ||
dumarchie | jnthn, that makes sense to me. But does the cur_node check in line 258 make sense from that perspective? | 10:26 | |
10:28
frost-lab joined
10:29
Kaeipi joined
|
|||
dumarchie | line 257 that is | 10:29 | |
10:29
Kaiepi left
|
|||
MasterDuke | dumarchie: do you still have the problem with a current rakudo? according to one of your recent comments, you still have a rakudo from 2020.10, even though your moarvm is from 2020.11 | 10:34 | |
dumarchie | All my recent debugging was with Rakudo v2020.10-275-gc63f078a2 | 10:36 | |
MasterDuke | there were some changes to cas between now and then, but i think they were just optimizations and shouldn't have changed how it works | ||
dumarchie | I don't see them in github.com/dumarchie/rakudo/compar...udo:master | 10:39 | |
MasterDuke | github.com/rakudo/rakudo/commit/1c...03dac6f8e3 | 10:40 | |
oh ha. github.com/rakudo/rakudo/commit/96...ba491a4663 was by you | 10:41 | ||
dumarchie | :) I checked and they are part of the Rakudo I built | 10:42 | |
MasterDuke | but it says v2020.10-275-gc63f078a2 ? something's off then | 10:43 | |
did you pull tags? | 10:47 | ||
dumarchie | iirc I just did a `git pull upstream master` | 10:48 | |
That should pull tags, right? | 10:49 | ||
MasterDuke | nope. need to do a round of pull/push with `--tags` | ||
dumarchie | Oh... maybe a `--rebase` does it as well? That's what I normally use. | 10:50 | |
MasterDuke | but this does make it interesting that i can't repro. i thought you had an old rakudo, but if not than i wonder why i don't get the panic | ||
what's your OS and hardware? | 10:51 | ||
dumarchie | Windows 10. Let me figure out how to get CPU info. | 10:52 | |
Intel Core i5-6200U | 10:53 | ||
I guess most devs have an AMD? | 10:54 | ||
MasterDuke | well, i know nine and i do | 10:56 | |
dumarchie | A colleague of mine also failed to repro on Linux on AMD. | ||
MasterDuke | yeah, i'm on linux also | ||
dumarchie | But note that I can't consistently repro. Only once in a while I have the "Switching to Thread" followed by a panic. | 10:58 | |
MasterDuke | what compiler are you using? | 10:59 | |
i usually build with gcc, but i'll try switching to clang | |||
dumarchie | For my latest build I used gcc provided by MinGW | 11:00 | |
11:02
tib left
|
|||
MasterDuke | been running in a loop for a couple minutes, still no panic | 11:06 | |
dumarchie | Maybe you can speed it up with `benchmark/stack.raku 100` to limit the number of values pushed and popped per run. | 11:12 | |
MasterDuke | couple minutes of that each with gcc and clang, no panic | 11:23 | |
dumarchie | You also didn't see the "Switching to Thread"? | 11:24 | |
MasterDuke | nope | 11:28 | |
dumarchie | Can you limit the number of CPU cores MoarVM uses? | 11:50 | |
MasterDuke | i think so | 11:55 | |
dumarchie | Maybe it helps if you limit them to 4 or 2. | 12:02 | |
12:02
linkable6 left,
evalable6 left
|
|||
MasterDuke | been running various configurations with 2 cores, no dice so far | 12:02 | |
12:03
linkable6 joined
|
|||
MasterDuke | how long does it usually take for you to get an error? | 12:04 | |
12:05
evalable6 joined
|
|||
dumarchie | It varies between 1 and about 20 runs. | 12:10 | |
MasterDuke | i've probably done over 1k across the different configurations. seems like it might be a windows thing | 12:11 | |
i think [Coke] runs on windows, maybe he can repro to confirm | 12:12 | ||
afk for a bit | 12:13 | ||
13:02
lizmat_ joined
13:04
lizmat left
13:09
sena_kun joined
13:10
Altai-man left
13:12
frost-lab left
13:20
zakharyas left
13:28
leont joined
14:00
domidumont1 joined
14:02
dumarchie left,
domidumont left
|
|||
nine | dumarchie: ignore the "Switching to Thread". That's just gdb saying that it switches to the thread that called abort() as that's most likely what you as a user want to look at | 14:02 | |
tellable6 | nine, I'll pass your message to dumarchie | ||
14:06
lucasb joined
|
|||
nine | Naively putting that allocation insided the locked area can cause a deadlock: allocation may trigger garbage collection. If another thread is waiting for that mutex, it won't be able to enter the GC, so GCing threads are waiting for that thread to join and that thread is waiting for the multi cache mutex to become available | 14:08 | |
14:11
lizmat_ is now known as lizmat
14:29
zakharyas joined
14:36
dumarchie joined
|
|||
dumarchie | nine, good point I guess | 14:37 | |
tellable6 | 2020-12-14T14:02:39Z #moarvm <nine> dumarchie: ignore the "Switching to Thread". That's just gdb saying that it switches to the thread that called abort() as that's most likely what you as a user want to look at | ||
14:44
zakharyas1 joined
14:47
zakharyas left
|
|||
dumarchie | Maybe it would be better to obtain and manipulate a private pointer to the `cache` (i.e. the `cache_obj->body`) instead of manipulating and checking the `cache_obj` itself? | 14:58 | |
MasterDuke | dumarchie: is valgrind available for windows? | 15:00 | |
nine | dumarchie: I don't see what that would change | 15:02 | |
also that's exactly what happens in line 165 | 15:03 | ||
dumarchie: did you notice that the message says cur_node != 0, re-check == 0000000000000000? In my book, 0000000000000000 is decidedly 0. Previous messages in the same GH issue said <nil> which should be quite 0, too | 15:09 | ||
dumarchie: what does gdb say that cur_node actually is? | |||
dumarchie | How do I ask gdb? | 15:11 | |
nine | p cur_node | 15:12 | |
dumarchie | Let met try to trigger another panic. | ||
nine | in that call frame. From MVM_panic you'll have to do up | ||
dumarchie | You mean `(gdb) up` and then `(gdb) p cur_node` ? | 15:13 | |
nine | yes | ||
MasterDuke | might need to recompile with `--optimize=0` | 15:18 | |
dumarchie | Just MoarVM, I suppose. How do I do that efficiently? | 15:20 | |
nine | Just run MoarVM's Configure.pl manually and do a make install. No need to touch nqp or rakudo | 15:21 | |
dumarchie | Running `nqp\MoarVM>perl Configure.pl --optimize=0 --debug=3` | 15:28 | |
15:34
MasterDuke left,
Kaeipi left
15:35
Kaeipi joined,
Kaeipi left
15:36
Kaeipi joined
|
|||
[Coke] | 15:39 | ||
15:41
MasterDuke joined
|
|||
dumarchie | OK, I hit the breakpoint and did `bt`. Should I also do `f 1`? | 15:42 | |
With just `up` and `p cur_node` I get `$1 = <optimized out>` | 15:44 | ||
MasterDuke | that should be the same as `up`, so yeah | ||
you did `make install` after the configure? | 15:45 | ||
dumarchie | Yes, both in `nqp\MoarVM` and in the main `rakudo` directory. But I see that *install\bin\moar.dll* was not changed... | 15:47 | |
nine | dumarchie: did you still have the debugger running when doing make install? | ||
That could prevent make install from replacing the file | 15:48 | ||
dumarchie | No, I'm using just one command prompt. | ||
Geth | MoarVM/try_fix_multi_cache_add: f19e0e9f67 | (Stefan Seifert)++ | 2 files Try to fix "Corrupt multi dispatch cache: cur_node != 0, re-check == 0" MVM_multi_cache_find did a couple more checks when looking for a multi candidate than MVM_multi_cache_add, i.e. it was more picky. This could lead to the check for a pre-existing candidate to not find one and the code for finding the right tree node to extend getting surprised by finding a matching candidate after all. |
15:53 | |
nine | dumarchie: can you please try this patch? | ||
dumarchie | It looks like the first `gmake install` from *nqp\MoarVM>* installed in *nqp\MoarVM\install\bin\* and the second `gmake install` thought that was OK... | 15:56 | |
nine | dumarchie: oh, you have to specify the --prefix to Configure.pl | ||
Same prefix as for rakudo | 15:57 | ||
dumarchie | The other files in the main *install\bin\* _were_ updated ... | ||
nine | by the make install in rakudo | ||
dumarchie | I guess so. To apply your patch I can just do a `git pull` in *nqp\MoarVM\* ? | 15:59 | |
nine | and `git checkout try_fix_multi_cache_add` | ||
then make install. After you ran Configure.pl again with a correct --prefix | 16:00 | ||
dumarchie | That would be `--prefix="c:\raku\rakudo\install\" in my case, I guess. | 16:02 | |
nine | looks good | ||
dumarchie | Do you prefer the first or the last compile error? | 16:05 | |
First: src\gen\config.c:238: error: unterminated argument list invoking macro "MVMROOT" | 16:06 | ||
nine | There shouldn't be any :D | ||
what the? I was nowhere near that code | |||
dumarchie | src\gen\config.c:26:5: error: 'MVMROOT' undeclared (first use in this function); did you mean 'MVMIOOps'? | ||
src\gen\config.c:26:12: error: expected ';' at end of input | 16:07 | ||
MVMROOT(tc, config, { | |||
^ | 16:08 | ||
nine | There's something seriously wrong. | ||
Can you please try a `git clean -xfd` followed by `perl Configure.-l --prefix="c:\raku\rakudo\install"` and `make install`? | |||
dumarchie | OK | 16:09 | |
nine | If that alone doesn't help, then maybe deleting the c:\raku\rakudo\install completely before that will | 16:10 | |
dumarchie | I did the `git clean -xfd` and `perl Configure.pl --optimize=0 --debug=3 --prefix="c:\raku\rakudo\install\". | ||
nine | No need for --optimize=0 --debug=3 | 16:11 | |
dumarchie | Just to be sure: does it matter where I do the `gmake install`? | ||
nine | in the MoarVM directory | ||
dumarchie | Same error. I will delete both c:\raku\rakudo\install and c:\raku\rakudo\nqp\MoarVM\install and try again. | 16:14 | |
B.t.w. `git status` tells me "modified: 3rdparty/libuv (untracked content)". Does that matter? | 16:17 | ||
config.o is now compiled fine, so I guess not | 16:20 | ||
nine | step by step :) | 16:21 | |
dumarchie | Okay, compiled the moar binaries. I guess I now have to do another `gmake install` in c:\raku\rakudo\ | 16:25 | |
nine | no | 16:26 | |
or yes, since you deleted install previously | |||
in that case you also need a make install in nqp | |||
dumarchie | Indeed, thanks :) | 16:27 | |
+++ Rakudo installed succesfully! | 16:28 | ||
5th run without debugger: | 16:30 | ||
MoarVM panic: Corrupt multi dispatch cache: cur_node != 0, re-check == 000000000892F3C0 | |||
Would you like me to try again with gdb? | |||
nine | Well....this is interesting. This is the first time that re-check actually isn't 0 | 16:32 | |
please, gdb would be nice | |||
Actually it would have been a good idea to just extend the error message to include cur_node's actual value | |||
Then one wouldn't need the debugger for that | |||
dumarchie | Shall I wait for you commit? | 16:38 | |
Geth | MoarVM/try_fix_multi_cache_add: a344c580ed | (Stefan Seifert)++ | src/6model/reprs/MVMMultiCache.c Add some more diagnostics to "Corrupt multi dispatch cache" error |
16:40 | |
nine | there ^^^ | ||
MasterDuke | -cur_node? why the minus? | 16:42 | |
nine | Because leaf nodes (containing the index into the results array) in the multi dispatch cache tree are marked by being negative | 16:46 | |
MasterDuke | ah | ||
dumarchie | Hmm, now it takes more time to reproduce the panic. | 16:54 | |
MasterDuke | but it does still panic? | 16:56 | |
dumarchie | Not yet | ||
16:56
dumarchie left
16:57
dumarchie joined
|
|||
dumarchie | Finally: MoarVM panic: Corrupt multi dispatch cache: cur_node != 0: -1 (000000000891F3C0), re-check == 000000000891F3C0 | 16:58 | |
nine | Makes me wonder: what if there are actually 2 bugs here and I did fix the one? | 16:59 | |
dumarchie | In line 210, the `MVM_multi_cache_find` could operate on a `cache_obj` that is modified by a thread that has not reached line 209 yet, I think | 17:03 | |
nine | How would that be modified? | 17:04 | |
dumarchie | Line 162. | ||
nine | cache_obj is a local variable | ||
line 162 allocates a new object that other threads do not have any reference to | 17:05 | ||
dumarchie | I'm not going to argue with you :) | 17:06 | |
I don't know C. I just saw the parameter declared as `MVMObject *cache_obj` | |||
17:08
Altai-man joined
|
|||
dumarchie | I guess that dereferences the pointer? | 17:08 | |
17:10
sena_kun left
|
|||
nine | that declares a pointer. cache_obj is just a pointer. dereferencing is via -> | 17:11 | |
I only see 2 ways how a second check could find a candidate where the first one didn't: either the cache changed, or the thing we're looking for changed between the first and the second find | 17:13 | ||
The cache is protected by the mutex. Unless there's some other code that modifies the cache without taking the mutex, I don't see how the cache could change | 17:14 | ||
MasterDuke | there is a MVM_MULTICACHE_DEBUG that enables a `dump_cache`. dump at every find? | ||
dumarchie | Afk to buy food. | 17:15 | |
17:49
vrurg left
18:04
domidumont1 left
|
|||
nine | Following the hypthesis that the thing we're looking for changes mid-way: we're looking for an MVMCallCapture. That's a set of arguments. The arguments to cas are: (Mu $target is rw, Mu \expected, Mu \value) | 18:08 | |
18:12
vrurg joined
|
|||
nine | What if $target gets changed by a different thread between us checking for that cas candidate for the first time and the second time? | 18:13 | |
18:17
vrurg left
|
|||
timotimo | yooooo | 18:38 | |
i was without a usable desktop machine for a couple of days :| | 18:39 | ||
nine | ouch | 18:40 | |
timotimo | upgraded my media file storage from a 12tb raid0 to a 14tb raid1 | 18:43 | |
three disks to two disks | |||
nine | Can't confirm my hypothesis of the changing capture | 18:57 | |
lizmat | and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2020/12/14/2020-...wikipedia/ | 19:09 | |
19:30
zakharyas1 left
|
|||
dumarchie | nine: in my code the type of value contained in `$target` changes from `:U` to `:D` | 19:31 | |
19:33
MasterDuke left
|
|||
dumarchie | Furthermore the panic appears to be triggered by `prefix:<⚛>`, but maybe that first sees a `:U` and then a `:D` because another thread already performed a `cas`. | 19:46 | |
nine | yes, that's what I figured, but I still cannot reproduce it even with creating those exact circumstances | 20:00 | |
20:02
dumarchie left
20:17
vrurg joined
20:18
patrickb joined
20:21
vrurg left
21:09
sena_kun joined
21:10
Altai-man left
21:18
dumarchie joined
|
|||
dumarchie | nine: I enabled MVM_MULTICACHE_DEBUG and created a gist with the output of the last four additions before a panic: gist.github.com/dumarchie/1dec8e93...71276ae3f4 | 21:20 | |
The first and last tree dump appear to be the same cache. Maybe the difference between them tells you something. | |||
21:35
patrickb left
21:44
zakharyas joined
21:53
zakharyas left
21:56
vrurg joined
22:01
vrurg left
22:04
vrurg joined
22:09
vrurg left
22:12
sena_kun left
22:19
MasterDuke joined
23:28
vrurg joined
23:33
vrurg left
23:56
vrurg joined
|