github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:40 Kaiepi joined 00:59 lucasb left 03:50 rba left, rba joined 05:23 [Coke] left 05:26 [Coke] joined, [Coke] left, [Coke] joined 06:32 evalable6 left, linkable6 left, linkable6 joined 06:33 evalable6 joined 08:01 domidumont joined 08:42 MasterDuke joined 08:44 zakharyas joined
nwc10 good *, #moarvm 09:53
jnthn morning o/ 10:31
nwc10 \o
jnthn Fresh snow! Fresh coffee! Life could be worse. 10:32
nwc10 We had a very small amount of snow overnight.
But our here in the flatlands there was always less snow than up in the hills where we first lived
jnthn I had to check it, but it turns out Vienna is some couple of hundred meters higher than Prague 10:38
moritz oh, Austria had the high ground. That explains a lot, in hindsight :-) 10:41
MasterDuke we actually got about an inch (enough and wet enough to make a snowman) two days ago and it's been just barely cold enough that it's sort of stuck around 10:48
nine Almost all the snow here melted over the weekend, but yesterday night it started snowing again and everything's white now :) Though that will last only a couple more days. Will get much warmer 10:59
nwc10 *had* a very small amount of snow. "Strange bright thing in the sky" has seen to it
I though it likely that you were going to report stuff that makes us all a bit jealous :-) 11:00
11:07 MasterDuke left 11:09 MasterDuke joined 12:22 zakharyas left 12:52 Voldenet left 12:54 Voldenet joined, Voldenet left, Voldenet joined
[Coke] We've had a dip in temperatures, but no snow expected until later today. 13:29
(below freezing most of the week)
MasterDuke nine: i set a watchpoint on the address in frame->work[i].o, but nothing jumped out at me. a bunch of str and vmarray functions in the backtraces, but still nothing in any of the code i touched 13:31
13:34 zakharyas joined
MasterDuke last time jnthn spotted some missing MVM_ASSIGN_REFs and some that needed changing. maybe he can see something in the new diff gist.github.com/MasterDuke17/fd236...07aad7722f (this is implementing the removal via deopt/log/stats/plan/worker instead of directly via deopt) 13:35
jnthn Maybe when I escape from a somewhat involved rebase... 13:45
nine MasterDuke: only one watchpoint? 13:48
MasterDuke jnthn: yeah, no worries
nine: yep, should there be more? 13:49
nine Yes. I meant you need to trace that value (the pointer to the outdated object) back to the place where it gets outdated. That means that you will have to set watchpoints on every place where the pointer gets stored
MasterDuke oh, ah... 13:50
right
nine I.e. first on frame->work[i].o. There you see who writes to the register. Clear that watchpoint and set one on where the source is (may be a different register or a local variable)
13:53 lucasb joined 14:12 brrt joined
MasterDuke fyi, not terribly surprised, but no problems building nqp and rakudo if i comment out the call to MVM_spesh_candidate_discard_one in worker.c 14:14
jnthn Hmm 14:39
Just been rebasing new-disp, and see a failure in the dispatch tests: MoarVM oops: MVM_fixed_size_alloc request for 0 bytes 14:40
Anyone know if something changed there?
nwc10 yes
yes, I know, yes we did, but forget *exactly* what
but there was a reason to add a sanity check 14:41
commit 60070970c4a74627c60ac11a3d3fd111d8f7cad2
14:41 linkable6 left
nwc10 oh, because "we corrupt the malloc heap" 14:42
14:43 linkable6 joined
jnthn Oh? 14:45
malloc also doesn't care for zero size allocations though, I thought..
nwc10 IIRC it's implementation defined what you get - a pointer (to nothign you can use) or NULL 14:46
however, I suspect there was somethign else going on here
but I forget *why* that commit got found
jnthn I'm not even sure I'd agree the caller is confused, it just forces more conditionals on the caller side
Anyway, will track it down and adapt 14:47
nwc10 nine might remember
nine POSIX says: If the size of the space requested is 0, the behavior is implementation-defined: either a null pointer shall be returned, or the behavior shall be as if the size were some non-zero value, except that the behavior is undefined if the returned pointer is used to access an object.
nwc10 OK, that's what I remembered and paraphrased
nine I guess the FSA could just return NULL and properly ignore requests to free an area of 0 bytes 14:48
nwc10 I really can't remember. I remember that buggy code could end up calling MVM_fixed_size_alloc with 0 14:50
"be happy" that it got a non-NULL pointer 14:51
and then did something daft that must have corrupted the heap
MVM_fixed_size_alloc_zeroed() called with a size of 0 would have ended up calling memset(..., 0, 0) which likely is undefined behaviour if ... is NULL
I really can't remember, other than it felt like the better choice at the time, to deal with the observed problems 14:52
jnthn It was easy to adapt to it 14:53
So far as I understand the malloc rule, it's implementation defined exactly what it does, but a malloc(0) and later free of the thing you got back should be consistent 14:54
nwc10 that was my understanding too
jnthn OK, NQP tests in new-disp are happy again 15:03
Something looks less good in the Rakudo tests on a rebased new-disp...
CALL-ME handling NYI in new dispatcher; got NQPRoutine
Hmm
Probably a Rakudo change rather than a mis-rebase 15:05
Geth MoarVM/new-disp: 168 commits pushed by (Jonathan Worthington)++, (Timo Paulssen)++, (Daniel Green)++
review: github.com/MoarVM/MoarVM/compare/0...03ba30e786
jnthn Now let's see if I can get this thing done without too much more moving under me :) 15:09
nwc10 maybe you can do something to amuse ASAN, because right now it doesn't find the rebased new-disp at all exciting. (a good thing) 15:48
jnthn Dispatch resumption has enough moving parts that there's every chance! 15:49
nwc10 fun, excitement, technicolor bactraces
MasterDuke nine: just to be explicit, this is what you meant? gist.github.com/MasterDuke17/462f7...9152947165 15:54
nine MasterDuke: in principle yes. But it looks like you took a wrong turn. Initially you see 0x55963e330140, that's the address of the broken object. But after the second reverse-cont you get a change from 0x55963e2f4840 to 0x0. That's a different address, so you watched the wrong place 15:58
I think you wanted watch -l *((MVMString **)((char *)GET_REG(cur_op, 2).o + GET_UI16(cur_op, 4)))
in line 49
MasterDuke ok, thanks 15:59
nine As this shows, one needs to be extra careful as its super easy to get a wrong result. I usually double-check every step and also print the addresses of the memory locations and the contents of index variables like the i in frame->work[i].o 16:00
That way its also easy to re-instate a watch, like watch -l 0x55963e330140 16:01
MasterDuke made that change and the immediate next break is __memmove_avx_unaligned, so i assume i actually want whatever is in frame 1 (process_worklist)? 16:02
nine Also printing the result of e.g. GET_UI16(cur_op, 4) makes it easier to correlate with the output of MVM_dump_bytecode(tc)
what's the backtrace at that point?
MasterDuke gist updated 16:03
nine Ah, yes, that's memcpy(new_addr, item, item->size) in the GC, i.e. the garbage collector moved the object that contains the faulty pointer. 16:05
16:06 MasterDuke left, MasterDuke joined
MasterDuke so watch item? 16:07
nine Again, in principle yes. But the thing you're interested in is not item itself, but an attribute in item. What kind of object is item in the first place? 16:11
The first hit is in sp_get_s, i.e. get a string (from an attribute) at a fixed offset
MasterDuke isn't it an iter? 16:13
MVMIter
nine does that contain a string? 16:14
MasterDuke not exactly. `struct { MVMStrHashIterator curr; MVMStrHashIterator next; } hash_state;` 16:15
nine In that case it may not be the usual GC issue at all and instead is a mismatch between the code and the data
Which considering you're working with spesh candidates is not that far fetched. Could it be that you're running the wrong spesh candidate with respect to the frame's registers? 16:16
MasterDuke given that this only happens after removing a spesh candidate, seems possible 16:17
gist.github.com/MasterDuke17/fd236...h-L98-L164 is what causes this
nine Have you checked if it's running a removed spesh candidate? 16:19
MasterDuke hm. that would have made sense to do. not sure exactly how to do that 16:22
nine I guess the most straight forward way is to overwrite the memory before freeing it. Same as I did for GC_DEBUG 3 16:25
You will notice pretty fast if it's trying to treat 0xfefefefefefefefefefefe as bytecode :) 16:26
MasterDuke good idea 16:28
17:02 sxmx left 17:07 sxmx joined
MasterDuke segfault instead of MVM_panic 17:17
nine that's good!
Why does it segfault exactly? Just to be sure
MasterDuke running it under rr now to find out 17:21
i re-ran the entire nqp build, looks like the segfault was in a different, earlier step 17:22
this is not fast 17:23
17:23 sxmx left 18:00 brrt left 18:03 MasterDuke left 18:06 domidumont left
nine If anyone is in the mood for a bit of a detective story demonstrating tools like rr, I'm quite proud to have just solved a mystery at work :) gist.github.com/niner/7f7cb0a79db9...47d44c2029 18:26
[Coke] nine++ 18:34
18:52 zakharyas left 19:15 sxmx joined 20:04 MasterDuke joined
MasterDuke ah, no ->spesh_cand in this frame 20:21
so segv here github.com/MoarVM/MoarVM/blob/mast...opt.c#L247 20:24
and what do you know, some 0xefefefefefefefef show up gist.github.com/MasterDuke17/b33ca...9bd5154a57 20:26
20:41 MasterDuke left 20:42 MasterDuke joined
nine An important step forward :) 20:42
20:48 ilogger2 joined
MasterDuke hm. sticking an `if *->spesh_cand` everywhere won't be fun. there must be something better 20:51
20:55 ilogger2_ joined
jnthn MasterDuke: The idea was that the candidate would stay around long enough that everything referencing it would just continue to work fine 21:35
Thus why it because possible to GC
Nothing should NULL it out on a frame that is still running and using it
And the removal from the log guards mean that no *new* calls will end up using it
So it goes away once everything that was in it goes out of dynamic scope 21:36
MasterDuke hm
jnthn The free_at_safepoint I thought was for the guard tree and array of candidates
*not* for the candidate itself
The memory associated with the candidate (slots, specialized bytecode, JITted machine code) gets cleared up in the gc_free of MVMSpeshCandidate 21:37
MasterDuke i don't think i ever remove/overwrite the candidate from anywhere
jnthn Having a NULL spesh cand means that the thing isn't specialized; if we're somehow in specialized code but spesh_cand isn't set, something is very odd. 21:38
MasterDuke i.e., not *->spesh_cand. i just don't memcpy it when copying the array
jnthn I thought it was an array of pointers to spesh candidates nowadays, which are the GC-able objects? 21:39
f->spesh_cand->body.inlines is consistent with that too
MasterDuke gist.github.com/MasterDuke17/fd236...-L134-L143
jnthn If the memory on the other end of f->spesh_cand is junk, then that feels like a GC issue 21:40
Yeah, that code matches my expectations
Or at least, it does to the degree it's a pointer array 21:41
MasterDuke i added `memset(cands_and_arg_guards->spesh_candidates, 0xfe, (new_num + 1) * sizeof(MVMSpeshCandidate *));` between the `MVM_spesh_arg_guard_destroy` and `MVM_fixed_size_free_at_safepoint(..., cands_and_arg_guards->spesh_candidates);`
jnthn oh
MasterDuke and now gist.github.com/MasterDuke17/b33ca...og-L54-L55
jnthn oh, I think there's an odering issue here 21:42
+ spesh->body.spesh_cands_and_arg_guards = new_cands_and_arg_guards;
+
+ MVM_spesh_arg_guard_regenerate(tc, &(new_cands_and_arg_guards->spesh_arg_guard),
+ new_cands_and_arg_guards->spesh_candidates, new_num);
new_cands_and_arg_guards must be fully set up *before* we assign it
Otherwise something can read it before we have regenerated the guard tree, and see the old index
Swapping those two statements around will likely help something, and I can imagine it fixing this 21:43
*ordering
.oO( code smell )
MasterDuke i definitely wasn't 100% sure of the correct ordering. i pretty much tried to do the reverse of what happens when adding a candidate, but some of it did/does need to be in the same order 21:46
jnthn Yes, the general principle is get everything in order, and then publish it when it's ready 21:47
Heh, "in order" in the sense of "organized, completed" :)
MasterDuke dropping GC_DEBUG down to 1 for speed got past where that segv was happening, but a different one popped up, so i bumped it back up to 3 and restarted the build. it's gonna take a little while to get back to where it was happening 21:50
jnthn OK. afk for a bit for a game
MasterDuke looks like the same segv as before 22:27
22:47 leont joined 23:05 harrow joined
jnthn back for a bit 23:06
MasterDuke huh. now f->static_info is 0 23:16
gist.github.com/MasterDuke17/57b58...0701b2b2eb 23:17
jnthn Hm, that sounds like a rather different kind of problem :s 23:24
At a wild guess, it's created an uninlined frame from broken inline state, but...I'm a bit surprised it got this far 23:25
Is the inline table getting marked properly?
MasterDuke i changed `if (f->spesh_cand->body.inlines) {` to `if (f->spesh_cand && f->spesh_cand->body.inlines) {` 23:26
jnthn If you have rr then I guess you have that to hand
If we end up in this place and f->spesh_cand is NULL something was really rotten already
MasterDuke what is the inline table? 23:27
jnthn f->spesh_cand->body.inlines is an array of inlines; when we uninline the static frame comes from there 23:28
MasterDuke f is not in a good state 23:29
jnthn Ah, then all bets are off :)
Though ooc, you could try running with MVM_SPESH_INLINE_DISABLE=1 just to see if it is related to that at all 23:30
MasterDuke gist updated 23:31
jnthn wow, zeroes all the way 23:35
There's a #define at the top of deopt.c that tells something about the deopt it's doing; could you try turning that on?
It'll maybe give us a clue about what it things its doing 23:36
*thinks
MasterDuke i'll give that a run after this MVM_SPESH_INLINE_DISABLE=1 finishes
ugh, i need to stop setting GC_DEBUG to 3, it's just so slow 23:38
jnthn 3 is a mixed bag in terms of what it finds/hides also 23:41
Compare to 1, 2 is a fairly straight "checks more things", but 3 changes how often GC happens 23:42
MasterDuke a while ago i was only getting an error to appear under 3, but i think that was fixed and i could go back, i've just kept it there for no particularly good reason 23:43