github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:40
Kaiepi joined
00:59
lucasb left
03:50
rba left,
rba joined
05:23
[Coke] left
05:26
[Coke] joined,
[Coke] left,
[Coke] joined
06:32
evalable6 left,
linkable6 left,
linkable6 joined
06:33
evalable6 joined
08:01
domidumont joined
08:42
MasterDuke joined
08:44
zakharyas joined
|
|||
nwc10 | good *, #moarvm | 09:53 | |
jnthn | morning o/ | 10:31 | |
nwc10 | \o | ||
jnthn | Fresh snow! Fresh coffee! Life could be worse. | 10:32 | |
nwc10 | We had a very small amount of snow overnight. | ||
But our here in the flatlands there was always less snow than up in the hills where we first lived | |||
jnthn | I had to check it, but it turns out Vienna is some couple of hundred meters higher than Prague | 10:38 | |
moritz | oh, Austria had the high ground. That explains a lot, in hindsight :-) | 10:41 | |
MasterDuke | we actually got about an inch (enough and wet enough to make a snowman) two days ago and it's been just barely cold enough that it's sort of stuck around | 10:48 | |
nine | Almost all the snow here melted over the weekend, but yesterday night it started snowing again and everything's white now :) Though that will last only a couple more days. Will get much warmer | 10:59 | |
nwc10 | *had* a very small amount of snow. "Strange bright thing in the sky" has seen to it | ||
I though it likely that you were going to report stuff that makes us all a bit jealous :-) | 11:00 | ||
11:07
MasterDuke left
11:09
MasterDuke joined
12:22
zakharyas left
12:52
Voldenet left
12:54
Voldenet joined,
Voldenet left,
Voldenet joined
|
|||
[Coke] | We've had a dip in temperatures, but no snow expected until later today. | 13:29 | |
(below freezing most of the week) | |||
MasterDuke | nine: i set a watchpoint on the address in frame->work[i].o, but nothing jumped out at me. a bunch of str and vmarray functions in the backtraces, but still nothing in any of the code i touched | 13:31 | |
13:34
zakharyas joined
|
|||
MasterDuke | last time jnthn spotted some missing MVM_ASSIGN_REFs and some that needed changing. maybe he can see something in the new diff gist.github.com/MasterDuke17/fd236...07aad7722f (this is implementing the removal via deopt/log/stats/plan/worker instead of directly via deopt) | 13:35 | |
jnthn | Maybe when I escape from a somewhat involved rebase... | 13:45 | |
nine | MasterDuke: only one watchpoint? | 13:48 | |
MasterDuke | jnthn: yeah, no worries | ||
nine: yep, should there be more? | 13:49 | ||
nine | Yes. I meant you need to trace that value (the pointer to the outdated object) back to the place where it gets outdated. That means that you will have to set watchpoints on every place where the pointer gets stored | ||
MasterDuke | oh, ah... | 13:50 | |
right | |||
nine | I.e. first on frame->work[i].o. There you see who writes to the register. Clear that watchpoint and set one on where the source is (may be a different register or a local variable) | ||
13:53
lucasb joined
14:12
brrt joined
|
|||
MasterDuke | fyi, not terribly surprised, but no problems building nqp and rakudo if i comment out the call to MVM_spesh_candidate_discard_one in worker.c | 14:14 | |
jnthn | Hmm | 14:39 | |
Just been rebasing new-disp, and see a failure in the dispatch tests: MoarVM oops: MVM_fixed_size_alloc request for 0 bytes | 14:40 | ||
Anyone know if something changed there? | |||
nwc10 | yes | ||
yes, I know, yes we did, but forget *exactly* what | |||
but there was a reason to add a sanity check | 14:41 | ||
commit 60070970c4a74627c60ac11a3d3fd111d8f7cad2 | |||
14:41
linkable6 left
|
|||
nwc10 | oh, because "we corrupt the malloc heap" | 14:42 | |
14:43
linkable6 joined
|
|||
jnthn | Oh? | 14:45 | |
malloc also doesn't care for zero size allocations though, I thought.. | |||
nwc10 | IIRC it's implementation defined what you get - a pointer (to nothign you can use) or NULL | 14:46 | |
however, I suspect there was somethign else going on here | |||
but I forget *why* that commit got found | |||
jnthn | I'm not even sure I'd agree the caller is confused, it just forces more conditionals on the caller side | ||
Anyway, will track it down and adapt | 14:47 | ||
nwc10 | nine might remember | ||
nine | POSIX says: If the size of the space requested is 0, the behavior is implementation-defined: either a null pointer shall be returned, or the behavior shall be as if the size were some non-zero value, except that the behavior is undefined if the returned pointer is used to access an object. | ||
nwc10 | OK, that's what I remembered and paraphrased | ||
nine | I guess the FSA could just return NULL and properly ignore requests to free an area of 0 bytes | 14:48 | |
nwc10 | I really can't remember. I remember that buggy code could end up calling MVM_fixed_size_alloc with 0 | 14:50 | |
"be happy" that it got a non-NULL pointer | 14:51 | ||
and then did something daft that must have corrupted the heap | |||
MVM_fixed_size_alloc_zeroed() called with a size of 0 would have ended up calling memset(..., 0, 0) which likely is undefined behaviour if ... is NULL | |||
I really can't remember, other than it felt like the better choice at the time, to deal with the observed problems | 14:52 | ||
jnthn | It was easy to adapt to it | 14:53 | |
So far as I understand the malloc rule, it's implementation defined exactly what it does, but a malloc(0) and later free of the thing you got back should be consistent | 14:54 | ||
nwc10 | that was my understanding too | ||
jnthn | OK, NQP tests in new-disp are happy again | 15:03 | |
Something looks less good in the Rakudo tests on a rebased new-disp... | |||
CALL-ME handling NYI in new dispatcher; got NQPRoutine | |||
Hmm | |||
Probably a Rakudo change rather than a mis-rebase | 15:05 | ||
Geth | MoarVM/new-disp: 168 commits pushed by (Jonathan Worthington)++, (Timo Paulssen)++, (Daniel Green)++ review: github.com/MoarVM/MoarVM/compare/0...03ba30e786 |
||
jnthn | Now let's see if I can get this thing done without too much more moving under me :) | 15:09 | |
nwc10 | maybe you can do something to amuse ASAN, because right now it doesn't find the rebased new-disp at all exciting. (a good thing) | 15:48 | |
jnthn | Dispatch resumption has enough moving parts that there's every chance! | 15:49 | |
nwc10 | fun, excitement, technicolor bactraces | ||
MasterDuke | nine: just to be explicit, this is what you meant? gist.github.com/MasterDuke17/462f7...9152947165 | 15:54 | |
nine | MasterDuke: in principle yes. But it looks like you took a wrong turn. Initially you see 0x55963e330140, that's the address of the broken object. But after the second reverse-cont you get a change from 0x55963e2f4840 to 0x0. That's a different address, so you watched the wrong place | 15:58 | |
I think you wanted watch -l *((MVMString **)((char *)GET_REG(cur_op, 2).o + GET_UI16(cur_op, 4))) | |||
in line 49 | |||
MasterDuke | ok, thanks | 15:59 | |
nine | As this shows, one needs to be extra careful as its super easy to get a wrong result. I usually double-check every step and also print the addresses of the memory locations and the contents of index variables like the i in frame->work[i].o | 16:00 | |
That way its also easy to re-instate a watch, like watch -l 0x55963e330140 | 16:01 | ||
MasterDuke | made that change and the immediate next break is __memmove_avx_unaligned, so i assume i actually want whatever is in frame 1 (process_worklist)? | 16:02 | |
nine | Also printing the result of e.g. GET_UI16(cur_op, 4) makes it easier to correlate with the output of MVM_dump_bytecode(tc) | ||
what's the backtrace at that point? | |||
MasterDuke | gist updated | 16:03 | |
nine | Ah, yes, that's memcpy(new_addr, item, item->size) in the GC, i.e. the garbage collector moved the object that contains the faulty pointer. | 16:05 | |
16:06
MasterDuke left,
MasterDuke joined
|
|||
MasterDuke | so watch item? | 16:07 | |
nine | Again, in principle yes. But the thing you're interested in is not item itself, but an attribute in item. What kind of object is item in the first place? | 16:11 | |
The first hit is in sp_get_s, i.e. get a string (from an attribute) at a fixed offset | |||
MasterDuke | isn't it an iter? | 16:13 | |
MVMIter | |||
nine | does that contain a string? | 16:14 | |
MasterDuke | not exactly. `struct { MVMStrHashIterator curr; MVMStrHashIterator next; } hash_state;` | 16:15 | |
nine | In that case it may not be the usual GC issue at all and instead is a mismatch between the code and the data | ||
Which considering you're working with spesh candidates is not that far fetched. Could it be that you're running the wrong spesh candidate with respect to the frame's registers? | 16:16 | ||
MasterDuke | given that this only happens after removing a spesh candidate, seems possible | 16:17 | |
gist.github.com/MasterDuke17/fd236...h-L98-L164 is what causes this | |||
nine | Have you checked if it's running a removed spesh candidate? | 16:19 | |
MasterDuke | hm. that would have made sense to do. not sure exactly how to do that | 16:22 | |
nine | I guess the most straight forward way is to overwrite the memory before freeing it. Same as I did for GC_DEBUG 3 | 16:25 | |
You will notice pretty fast if it's trying to treat 0xfefefefefefefefefefefe as bytecode :) | 16:26 | ||
MasterDuke | good idea | 16:28 | |
17:02
sxmx left
17:07
sxmx joined
|
|||
MasterDuke | segfault instead of MVM_panic | 17:17 | |
nine | that's good! | ||
Why does it segfault exactly? Just to be sure | |||
MasterDuke | running it under rr now to find out | 17:21 | |
i re-ran the entire nqp build, looks like the segfault was in a different, earlier step | 17:22 | ||
this is not fast | 17:23 | ||
17:23
sxmx left
18:00
brrt left
18:03
MasterDuke left
18:06
domidumont left
|
|||
nine | If anyone is in the mood for a bit of a detective story demonstrating tools like rr, I'm quite proud to have just solved a mystery at work :) gist.github.com/niner/7f7cb0a79db9...47d44c2029 | 18:26 | |
[Coke] | nine++ | 18:34 | |
18:52
zakharyas left
19:15
sxmx joined
20:04
MasterDuke joined
|
|||
MasterDuke | ah, no ->spesh_cand in this frame | 20:21 | |
so segv here github.com/MoarVM/MoarVM/blob/mast...opt.c#L247 | 20:24 | ||
and what do you know, some 0xefefefefefefefef show up gist.github.com/MasterDuke17/b33ca...9bd5154a57 | 20:26 | ||
20:41
MasterDuke left
20:42
MasterDuke joined
|
|||
nine | An important step forward :) | 20:42 | |
20:48
ilogger2 joined
|
|||
MasterDuke | hm. sticking an `if *->spesh_cand` everywhere won't be fun. there must be something better | 20:51 | |
20:55
ilogger2_ joined
|
|||
jnthn | MasterDuke: The idea was that the candidate would stay around long enough that everything referencing it would just continue to work fine | 21:35 | |
Thus why it because possible to GC | |||
Nothing should NULL it out on a frame that is still running and using it | |||
And the removal from the log guards mean that no *new* calls will end up using it | |||
So it goes away once everything that was in it goes out of dynamic scope | 21:36 | ||
MasterDuke | hm | ||
jnthn | The free_at_safepoint I thought was for the guard tree and array of candidates | ||
*not* for the candidate itself | |||
The memory associated with the candidate (slots, specialized bytecode, JITted machine code) gets cleared up in the gc_free of MVMSpeshCandidate | 21:37 | ||
MasterDuke | i don't think i ever remove/overwrite the candidate from anywhere | ||
jnthn | Having a NULL spesh cand means that the thing isn't specialized; if we're somehow in specialized code but spesh_cand isn't set, something is very odd. | 21:38 | |
MasterDuke | i.e., not *->spesh_cand. i just don't memcpy it when copying the array | ||
jnthn | I thought it was an array of pointers to spesh candidates nowadays, which are the GC-able objects? | 21:39 | |
f->spesh_cand->body.inlines is consistent with that too | |||
MasterDuke | gist.github.com/MasterDuke17/fd236...-L134-L143 | ||
jnthn | If the memory on the other end of f->spesh_cand is junk, then that feels like a GC issue | 21:40 | |
Yeah, that code matches my expectations | |||
Or at least, it does to the degree it's a pointer array | 21:41 | ||
MasterDuke | i added `memset(cands_and_arg_guards->spesh_candidates, 0xfe, (new_num + 1) * sizeof(MVMSpeshCandidate *));` between the `MVM_spesh_arg_guard_destroy` and `MVM_fixed_size_free_at_safepoint(..., cands_and_arg_guards->spesh_candidates);` | ||
jnthn | oh | ||
MasterDuke | and now gist.github.com/MasterDuke17/b33ca...og-L54-L55 | ||
jnthn | oh, I think there's an odering issue here | 21:42 | |
+ spesh->body.spesh_cands_and_arg_guards = new_cands_and_arg_guards; | |||
+ | |||
+ MVM_spesh_arg_guard_regenerate(tc, &(new_cands_and_arg_guards->spesh_arg_guard), | |||
+ new_cands_and_arg_guards->spesh_candidates, new_num); | |||
new_cands_and_arg_guards must be fully set up *before* we assign it | |||
Otherwise something can read it before we have regenerated the guard tree, and see the old index | |||
Swapping those two statements around will likely help something, and I can imagine it fixing this | 21:43 | ||
*ordering | |||
.oO( code smell ) |
|||
MasterDuke | i definitely wasn't 100% sure of the correct ordering. i pretty much tried to do the reverse of what happens when adding a candidate, but some of it did/does need to be in the same order | 21:46 | |
jnthn | Yes, the general principle is get everything in order, and then publish it when it's ready | 21:47 | |
Heh, "in order" in the sense of "organized, completed" :) | |||
MasterDuke | dropping GC_DEBUG down to 1 for speed got past where that segv was happening, but a different one popped up, so i bumped it back up to 3 and restarted the build. it's gonna take a little while to get back to where it was happening | 21:50 | |
jnthn | OK. afk for a bit for a game | ||
MasterDuke | looks like the same segv as before | 22:27 | |
22:47
leont joined
23:05
harrow joined
|
|||
jnthn back for a bit | 23:06 | ||
MasterDuke | huh. now f->static_info is 0 | 23:16 | |
gist.github.com/MasterDuke17/57b58...0701b2b2eb | 23:17 | ||
jnthn | Hm, that sounds like a rather different kind of problem :s | 23:24 | |
At a wild guess, it's created an uninlined frame from broken inline state, but...I'm a bit surprised it got this far | 23:25 | ||
Is the inline table getting marked properly? | |||
MasterDuke | i changed `if (f->spesh_cand->body.inlines) {` to `if (f->spesh_cand && f->spesh_cand->body.inlines) {` | 23:26 | |
jnthn | If you have rr then I guess you have that to hand | ||
If we end up in this place and f->spesh_cand is NULL something was really rotten already | |||
MasterDuke | what is the inline table? | 23:27 | |
jnthn | f->spesh_cand->body.inlines is an array of inlines; when we uninline the static frame comes from there | 23:28 | |
MasterDuke | f is not in a good state | 23:29 | |
jnthn | Ah, then all bets are off :) | ||
Though ooc, you could try running with MVM_SPESH_INLINE_DISABLE=1 just to see if it is related to that at all | 23:30 | ||
MasterDuke | gist updated | 23:31 | |
jnthn | wow, zeroes all the way | 23:35 | |
There's a #define at the top of deopt.c that tells something about the deopt it's doing; could you try turning that on? | |||
It'll maybe give us a clue about what it things its doing | 23:36 | ||
*thinks | |||
MasterDuke | i'll give that a run after this MVM_SPESH_INLINE_DISABLE=1 finishes | ||
ugh, i need to stop setting GC_DEBUG to 3, it's just so slow | 23:38 | ||
jnthn | 3 is a mixed bag in terms of what it finds/hides also | 23:41 | |
Compare to 1, 2 is a fairly straight "checks more things", but 3 changes how often GC happens | 23:42 | ||
MasterDuke | a while ago i was only getting an error to appear under 3, but i think that was fixed and i could go back, i've just kept it there for no particularly good reason | 23:43 |