github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:12 Altai-man_ joined 00:14 sena_kun left 02:01 jnthn left 02:12 sena_kun joined 02:14 Altai-man_ left 04:04 mst left 04:12 Altai-man_ joined 04:15 sena_kun left 05:50 Util left 06:09 Util joined 06:13 sena_kun joined 06:15 Altai-man_ left 06:34 Geth_ joined 06:40 Geth left 06:43 Geth joined 06:48 Geth left 07:16 zakharyas joined 07:33 squashable6 left 07:35 squashable6 joined 09:12 Altai-man_ joined 09:14 domidumont joined, sena_kun left 09:18 raku-bridge1 joined, raku-bridge left 09:19 raku-bridge joined 09:20 raku-bridge1 left
nwc10 good *, #moarvm 10:27
finally, after something like 24 hours of looping, I can provoke the "oops" on MoarVM master with UT Hash
there exists, somehow, a rare race condition whereby tc->instance->mutex_object_ids can already have an entry for obj 10:28
when MVM_gc_object_id is called
11:20 zakharyas left 12:13 sena_kun joined 12:14 Altai-man_ left 12:28 mst joined, ChanServ sets mode: +o mst
nine nwc10: but, but, but, how? 12:28
12:37 zakharyas joined 12:56 domidumont left 13:08 dogbert17 joined
dogbert17 nwc10: have you tried using tsan? 13:10
tellable6 2020-08-03T10:19:58Z #moarvm <nwc10> dogbert17: yes, that was my assumption to. I think that it has to be an existing corner-case bug that I've exposed. So, I've added similar code to MoarVM master, and now I have that one looping until the test fails
dogbert17 according to tsan we have (among other things): SUMMARY: ThreadSanitizer: data race src/gc/objectid.c:29 in MVM_gc_object_id 13:17
nine is there any more information about that? I just don't see it 13:54
14:02 squashable6 left
dogbert17 nine: my bad, here's an example: gist.github.com/dogbert17/7105f17e...0bf8ebc7d5 14:04
14:04 squashable6 joined 14:12 Altai-man_ joined 14:15 sena_kun left
MasterDuke huh, github.com/MoarVM/MoarVM/blob/mast...rp.c#L6419 appears in both the read and write backtrace. but that looks like a line that really shouldn't be getting hit 14:17
dogbert17 hmm, perhaps I should run with --no-optimize 14:22
I built MoarVm with 'perl Configure.pl --debug --tsan --prefix=../../install/' which might have been a bit sloppy 14:23
nine Setting an object's header flags that way is reacey! 14:24
MasterDuke what way? 14:28
14:28 patrickb joined
nine obj->header.flags |= MVM_CF_HAS_OBJECT_ID 14:29
MasterDuke that's done in a couple different places. what should be done instead? 14:30
14:30 lucasb joined
MasterDuke get flags into a temp variable, update that, then set them to the temp variable? 14:32
nine That's be just the same. What it needs is an atomic CAS 14:37
Which is complicated by the fact that those work only on native sized ints while flags is an MVMuint16 14:38
14:39 patrickb left
MasterDuke so we're forced to pay the cost of making every single object bigger? 14:42
nine No. We're just gonna have to swap more than just those flags
Depending on sizeof(AO_t) it's either flags and size or owner, flags and size 14:43
The good thing is that owner and size won't be changing 14:44
MasterDuke heh, this sounds like a problem that someone other than i should fix 14:46
14:52 patrickb joined 15:25 patrickb left 15:27 Kaiepi left, Kaiepi joined
dogbert17 note the the gist I posted refers to the XXX-A-Better-Hash branch 16:19
s/the/that/
nwc10 oh my (Been out for the afternoon) 16:45
So it's an even deeper bug than I suspect
(at the point of going out I was thinking "in my backtrace, why is size 0"?
)
data race is because some other thread updates flags?
data race is because some other thread updates flags? 16:48
timotimo alternatively, we can just??? put a MVM_barrier after the flag update 16:49
nine Yes. MVM_gc_object_id can be called by any thread, not only the owner of the object. 16:50
nwc10 and the issue here is that every *other* thread is accessing flags without any sort of write barrier?
(in that, MVM_gc_object_id is doing things with a mutex locked, and I *thought* that unlockign a mutex was a write barrier. But this stuff is well beyond my comfort zone) 16:51
nine MVM_gc_write_barrier_hit_by is probably the most likely competitor when writing to those flags
nwc10 anyway, it's a real bug, and it's an existing bug, because I can recreate it on master.
nine Both MVM_gc_write_barrier_hit_by and MVM_gc_object_id read an object's flags and write them back with an additional bit set. May well happen that this bit gets lost if they compete. 16:52
MVM_gc_object_id actually reads it twice, but of course the compiler is allowed to reduce that to just one read 16:54
nwc10 "benchmarking might reveal" - it might be faster, but it might just be simpler, to get rid of the flag MVM_CF_HAS_OBJECT_ID and rely totally on hash lookups in tc->instance->object_ids 17:07
If that way, we don't need to start using atomic ops in the GC write barrier code
17:11 domidumont joined
nine True. Write barriers are certainly much hotter than objectid 17:11
17:13 sena_kun joined 17:15 Altai-man_ left 17:19 patrickb joined
timotimo just wait until someone puts every single object in the vm into an object hash 17:31
nine But....writing to a hash triggers a write barrier ;) 17:33
Actually MVM_gc_write_barrier_hit_by would only need an atomic op for setting the flag, i.e. the first time the object got git. We can still test for the flag using a normal op 17:37
s/got git/got hit/ 17:41
17:42 raku-bridge left, raku-bridge joined, raku-bridge left, raku-bridge joined
timotimo are you suggesting the flag still sticks around even though we rely on the hash, and we'll just use the flag to not have to look at the hash so often? 17:43
nine that's what we currently do 17:44
17:45 zakharyas left
timotimo ah 17:46
nwc10 also, after thinking about this in the bath (but not yet looked at the code) we need MVM_CF_HAS_OBJECT_ID to exist for the GC to be efficient - it needs to know whether to promote an object from the nursery immediately 18:01
18:03 domidumont left 19:12 Altai-man_ joined 19:14 sena_kun left 19:25 sena_kun joined 19:26 Altai-man_ left 19:51 zakharyas joined
dogbert17 fun fact, running t/spec/S17-lowlevel/cas.t causes tsan to report ~1800 data race warnings 20:21
20:40 zakharyas left 21:24 Altai-man_ joined 21:26 sena_kun left
timotimo yeah, neither tsan nor coverity understand cas 21:37
21:39 Kaeipi joined, Kaiepi left 21:58 linkable6 left 22:06 bisectable6 left, bisectable6 joined 22:10 bisectable6 left
timotimo and/or clang analyze, i think 22:34
22:42 linkable6 joined 22:47 leont left 22:55 linkable6 left, squashable6 left, nativecallable6 left, notable6 left, greppable6 left, coverable6 left, tellable6 left, shareable6 left, benchable6 left, evalable6 left, releasable6 left, sourceable6 left, guifa2 left, leedo left, xiaomiao left 22:57 linkable6 joined, squashable6 joined, guifa2 joined, nativecallable6 joined, notable6 joined, greppable6 joined, coverable6 joined, tellable6 joined, shareable6 joined, benchable6 joined, evalable6 joined, sourceable6 joined, releasable6 joined, leedo joined, xiaomiao joined 23:01 bisectable6 joined 23:20 patrickz joined 23:24 patrickb left