github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:00
reportable6 left,
reportable6 joined
00:44
sena_kun left
02:23
pamplemousse left,
pamplemousse joined
02:34
MasterDuke left
04:46
pamplemousse left
05:29
robertle left
06:00
reportable6 left
06:02
reportable6 joined
06:56
robertle joined
07:33
brrt joined
|
|||
brrt | \o | 07:48 | |
07:59
zakharyas joined
08:06
squashable6 left
08:10
squashable6 joined
08:22
brrt left
09:02
brrt joined
09:56
brrt left
|
|||
Guest37021 | nine: do you want to reach 25 GC related fixes? | 11:15 | |
dogbert@dogbert-ubuntu:~/repos/rakudo$ while ./perl6-m -Ilib t/spec/S02-literals/quoting.t; do :; done | 11:16 | ||
MoarVM panic: Collectable 0x560f1920d000 in fromspace accessed | |||
nursery size was set to 2048 | |||
11:17
zakharyas left
|
|||
nine | LOL I think even parrot was faster: Stage parse : 5293.738 | 11:32 | |
lizmat | not if you had any non-ascii characters in the setting | 11:33 | |
I once killed it with that after several hours | 11:34 | ||
nine | Guest37021: that's odd...it doesn't show up here despite running a collection at every allocation | 11:36 | |
Guest37021: do you have a backtrace of that? | 11:38 | ||
lizmat | perhaps it only happens with a nursery size of 2048 ? | 11:43 | |
nine | But why would it? | 11:46 | |
Well, this string is definitely corrupted: (gdb) call fprintf(stderr, "%p -> %p -> %p\n", orig, orig->body.storage.strands[0].blob_string, orig->body.storage.strands[0].blob_string->body.storage.strands[0].blob_string) | 11:47 | ||
0x67d55f0 -> 0x5be2dc0 -> 0x5be2dc0 | |||
String has a strand that's again a string consisting of strands (which I think we don't allow) and the first strand is the string itself | |||
12:00
reportable6 left
|
|||
Guest37021 | nine: backtrace coming up, just came back from a quick lunch | 12:00 | |
12:01
reportable6 joined
|
|||
Guest37021 | oops, I lied nursery size is set to 512 | 12:02 | |
gist.github.com/dogbert17/9e4bd97d...c3156635c8 | |||
the problem vanishes if I set MVM_JIT_DISABLE=1 | 12:03 | ||
nine | Looks like the actual error happens either in the JITed code or somewhere before that | 12:18 | |
Guest37021 | so how do we tackle this? | 12:20 | |
12:26
robertle left
12:28
robertle joined
12:45
pamplemousse joined
|
|||
nine | This is really odd: when I turn on MVM_DEBUG_STRANDS stuff reliably segfaults | 12:47 | |
But not in any of the checking code. Instead it fails in MVM_string_gi_init because that is used on strings that have storage type MVM_STRING_STRANDS but 0 strands | 12:48 | ||
Ah, god damn it! if (s->body.storage_type = MVM_STRING_STRAND) | 12:50 | ||
nine's debugging rule #1: the bug is always in my own code! | 12:51 | ||
Guest37021 | oops | 12:52 | |
13:11
pamplemousse left,
pamplemousse joined,
zakharyas joined
|
|||
timotimo | whoops | 13:11 | |
Guest37021 | timotimo: did you see that MasterDuke had received his Ryzen 3700X machine? | 13:16 | |
timotimo | yeah | ||
Guest37021 | spectest run in 133 secs and Rakudo parse time of 36s is very fast | 13:17 | |
timotimo | jealous of the performance, but i depend on rr too often | ||
Guest37021 | there's always the Intel 9900k | 13:18 | |
a tad expensive though | |||
nwc10 | timotimo: "rr" in this context is? | 13:24 | |
13:28
brrt joined
|
|||
Guest37021 | dogbert@dogbert-ubuntu:~/repos/rakudo$ while ./perl6-m -Ilib t/spec/S17-procasync/no-runaway-file-limit.t; do :; done | 13:30 | |
1..1 | |||
timotimo | mozilla's "rr" is a recorder and reverse debugger | ||
Guest37021 | Segmentation fault (core dumped) | ||
13:45
sena_kun joined
13:59
brrt left
|
|||
nine | How on earth can a string end up in its own strands? | 13:59 | |
13:59
lucasb joined
|
|||
timotimo | it could be an empty string :P | 14:01 | |
yeah, that don't make much sense either | |||
use rr to figure out what writes the strands array :) | |||
nine | I guess not even rr would be of much use with objects moving around all the time? | 14:06 | |
timotimo | well, the strands array in particular is a malloced thing | 14:11 | |
so that doesn't move unless it gets realloced | |||
nine | If I had to guess, I'd say it's a missing write barrier somewhere | 14:20 | |
But adding MVM_gc_write_barrier_hit(tc, (MVMCollectable *)result); after every write to a result->body.storage.strands[0].blob_string makes it explode right after startup in the ModuleLoader | 14:28 | ||
timotimo | um, why would you put barrier_hit in there | 14:29 | |
the write barrier is supposed to be conditional | 14:30 | ||
sena_kun | o/ | 14:31 | |
so I took a two heap snapshots in a separate thread, one is after 10 seconds of run, second is after 20 seconds, between them like 1 gb of memory was eaten | |||
what can I do to be helpful next? | 14:32 | ||
timotimo | open both in one instance of the moarvm-heapanalyzer each | ||
sena_kun | done, now `summary`? | 14:33 | |
timotimo | probably wants the "incidents" branch of the moarvm heapanalyzer because i started working on stuff in there and never bothered to pick apart the feature back into master | ||
or something | |||
sena_kun | oki, getting it... | 14:34 | |
timotimo | but yeah, summary can be interesting | ||
and also "top objects by count", "top objects by size", "top frames by count", "top frames by size" | |||
sena_kun | github gists are ok? | ||
timotimo | sure | 14:35 | |
nine | timotimo: probably because I don't know what I'm doing :) | 14:36 | |
timotimo | it seems a little dangerous to hit the wb every time | ||
like, wouldn't that add the same memory location in the inter-generation set multiple times? | 14:37 | ||
and then the same pointer would be updated multiple times, which might upset an assert somewhere perhaps | |||
sena_kun | gist.github.com/Altai-man/b6e6c40b...46dadf4460 <- this is without incidents branch, now preparing branch version... | ||
seems like only Buf is growing | |||
timotimo, I can re-run it with incidents branch, should I? | 14:39 | ||
nine | I think I meant MVM_gc_write_barrier(tc, (MVMCollectable *)result, (MVMCollectable *)a); instead | ||
timotimo | ok | ||
sena_kun: now you can 'find 1000 objects type="Buf"' i think? maybe? | 14:40 | ||
sena_kun | got a bunch of IDs | ||
timotimo | right, and you can "path" each of the IDs | ||
sena_kun | only 13 of them | ||
timotimo | even better | 14:41 | |
you can "info" them to get their size | |||
(maybe only in the incidents branch tho) | |||
"path" will let you find what keeps a given object alive | 14:42 | ||
sena_kun | I am running `perl6 -Ip6-app-moarvm-heapanalyzer/lib p6-app-moarvm-heapanalyzer/bin/moar-ha heapsnapshot-493-2.mvmheap` but info is not working | 14:43 | |
I checkouted the branch, of course | |||
Guest37021 | timotimo: do you see anything suspicious here: gist.github.com/dogbert17/1c8a0dcc...f9681f9ef6 | ||
timotimo | oh maybe it's "show" rather than "info" | ||
sena_kun | don't understand. :S | 14:44 | |
timotimo | Guest37021: whenever i see "dogbert17" at the top of a gist, it's very suspicious already | ||
sena_kun: just "show 1234" | |||
sena_kun | d'oh | ||
timotimo, sorry | |||
nine | And that does seem to help! | ||
Guest37021 | :) | 14:45 | |
timotimo | Guest37021: well, that looks damning | 14:46 | |
Guest37021 | it's with a 512 byte nursery | 14:47 | |
nine | This could have caused a number of string related issues | ||
sena_kun | show for all objects returns me `Buf (Object) --[ <STable> ]--> Buf (STable) (34330) | ||
nine | With lots of action at a distance | ||
timotimo | that's the list of things it points at. isn't there a section at the top with some infos? | 14:48 | |
Guest37021: could try to use the FSA_DEBUG define to figure this one out | 14:49 | ||
sena_kun | timotimo, gist.github.com/Altai-man/b6e6c40b...ile-3-info | ||
timotimo | damn it, it's supposed to show some extra info | 14:50 | |
Guest37021 | timotimo: will do | ||
timotimo | well, you can get the "path" to each of these objects and see which seem unrelated | ||
sena_kun | timotimo, how can I know unrelated if not? gist.github.com/Altai-man/b6e6c40b...le-4-paths | 14:53 | |
I see traces to HTTP::HPACK, HTTP2 code, Async::SSL | |||
and the leak is visible when using HTTP/2, while HTTP/1.1 is fine, so all components are used | 14:54 | ||
Geth | MoarVM: 695a24de35 | (Stefan Seifert)++ | src/strings/ops.c Fix memory corruption caused by missing write barriers in string ops result may be promoted to gen2 in some cases while one of its strands is still in the nursery. |
14:55 | |
timotimo | yeah, that's not quite as obvious | 14:56 | |
let me go dig up code that shows the size of an object | |||
you can try to manually apply the commit 024a06db1b5065e20edcfb1af351ef80aeaae33a | 14:59 | ||
sena_kun | timotimo, to analyzer? | 15:00 | |
timotimo | yes | ||
can also ugly-hack it in | |||
only need to see which object id has which size and unmanaged size | |||
unmanaged size will be the interesting one for Buf | |||
sena_kun | `cherry-pick 024a06db1b5065e20edcfb1af351ef80aeaae33a` says bad object | 15:02 | |
timotimo | oh, perhaps it's only in my repo | 15:03 | |
[email@hidden.address] | |||
sena_kun | github.com/jnthn/p6-app-moarvm-hea...80aeaae33a | ||
no, it is here | |||
hmm | |||
timotimo | since it's just a one-line patch | ||
just copy-paste it :D | |||
sena_kun | oki | ||
ok, now `show` shows sizes as `56 + 496 bytes` | 15:06 | ||
timotimo | yeah, that's not big | ||
sena_kun | 56 + 1511251968 bytes | ||
:) | |||
timotimo | hm, that could be big | ||
sena_kun | path is... | ||
timotimo | of course that depends on how big a byte is on your system | ||
sena_kun | I also see 56 + 32768 bytes | 15:07 | |
everything else is not above 1024 | |||
not the not fun part, or not sure... | |||
path is very undescriptive | 15:08 | ||
timotimo | "path" shows only the shortest path; that's the reason i built the "incidents" command | ||
sena_kun | gist.github.com/Altai-man/b6e6c40b...#file-path | 15:09 | |
I am running `incidents 106880` | |||
timotimo | ha, yeah, that's very short | ||
"last handler result" has something to do with what implements "return" and other stuff | 15:10 | ||
nine: good work :) | |||
sena_kun | gist.github.com/Altai-man/b6e6c40b...-incidents | 15:11 | |
timotimo | OK, looking at the scalar object (with incidents again) should give us the name of the variable that holds on to it | 15:12 | |
but the frame is also interesting | |||
sena_kun | (Cro::HTTP2::FrameParser):31) is `whenever $in -> Cro::TCP::Message $packet` | ||
hmm | 15:13 | ||
timotimo | it's possible that the http2 frame parser keeps adding bytes to one end of the buf an consuming bytes from the other and somehow rakuo isn't reclaiming the space in front of the "moving window" of data? | ||
sena_kun | no idea, but I am going to play a bit with FrameParser, like print packages and see what it can give us... | ||
timotimo | unfortunately, the "actual size" of the buffer isn't visible to perl6 code, you can get at it with gdb, though | 15:14 | |
sena_kun | oh, so it won't help me much? | ||
we use $buffer for storing incomplete message piece, but we `$buffer = Buf.new;` it on every new package, so hmm | 15:15 | ||
timotimo | a cleverly placed nqp::sin(1), turning off the jit, and a corresponding breakpoint in interp.c can get you a long way | ||
sena_kun | >OK, looking at the scalar object (with incidents again) should give us the name of the variable that holds on to it | 15:16 | |
how can I do it? | |||
ah, incidents and ID... | |||
timotimo | literally just "incients 122704" | ||
sena_kun | <anon> (FrameParser.pm6 (Cro::HTTP2::FrameParser):22) (Frame) (106858) | ||
a supply. :S | |||
timotimo | then "show 106858" will give you all links, incluing the one to the scalar, which will tell us the name | 15:17 | |
if it's in a lexical, that is | |||
sena_kun | there are a lot of scalars, and the one with 122704 is $_. | 15:18 | |
timotimo | OK | ||
well, we've close in a lot alreay | |||
sena_kun | ah, stop | ||
gist.github.com/Altai-man/8c6e6987...7aa54a50fb <- am I interpreting it correctly? seems like it is $buffer, after all | 15:19 | ||
timotimo | yeah, that's $buffer | 15:21 | |
gotta be afk for a bit | |||
sena_kun | oh wow | 15:24 | |
seems like it is fixed | |||
I have a feeling that it somehow relied on a bug somewhere | |||
let me check... | 15:25 | ||
15:25
AlexDaniel is now known as testable9291
15:27
testable9291 is now known as AlexDaniel
|
|||
nine | Guest37021: does my commit by any chane fix your issue, too? | 15:32 | |
15:34
robertle left
|
|||
sena_kun | timotimo, commited a working fix. thanks a lot for your help! | 15:40 | |
15:43
zakharyas left
|
|||
dogbert17 | nine: let me check ... | 15:56 | |
this still fails with a 512 byte nursery: while ./perl6-m -Ilib t/spec/S02-literals/quoting.t; do :; done | 16:00 | ||
timotimo | sena_kun: you're very welcome! | ||
dogbert17 | t/spec/S17-procasync/no-runaway-file-limit.t still SEGV's | 16:02 | |
timotimo | dogbert17: any difference in output with the fsa ebug thing turne on? | 16:06 | |
oh no | |||
my d key is starting to misbehave i think :o | |||
ddddd - pressed 8 times, got 5 ( | |||
also, : is on the d key so the smiley also b0rked | |||
lizmat | ah, the joys of a German keyboard :-) | ||
dogbert17 | timotimo: no difference | 16:07 | |
timotimo | dogbert17: none at all, though? add_page shouldn't even exist any more if that's turned on | ||
dogbert17 | perhaps a 512 byte nursery is too small | ||
timotimo | lizmat: wellllll, it's actually a neo2 variant of the german keyboard, so the d is actually on what others have as the ƶ key | ||
lizmat | ah... ok :-) | ||
I wouldn't know: the "d" only has "d" on it on my keyboard | 16:08 | ||
timotimo | :) | 16:10 | |
the key that has a "d" on it works fine | |||
see: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | |||
lizmat | .oO( encrypted keyboards, the new trend! ) |
16:11 | |
timotimo | at one point i tweeted a tweet where i typed on de as if i had neo2 or the other way around, which resulted in basically gibberish, and i tagged it with #neo2 | ||
a few months ago some magazine calling themselves "neo2 magazine" appeared out of nowhere | |||
now every other day i get a notification on twitter that they liked that tweet | 16:12 | ||
lizmat | :-) | ||
timotimo | like, they have some automated thing that keeps liking that same tweet over and over and over again | ||
dogbert17 | I'll recheck | 16:13 | |
nine | dogbert17: LOL apparently your bug goes away when we collect garbage all the time?! | 16:19 | |
I can reproduce it easily with just a 512 byte nursery | |||
Yes, something happens in the JITed code | 16:27 | ||
16:29
brrt joined
|
|||
brrt | oh hey, I hear the signal | 16:37 | |
timotimo | brrtman, the brrtsignal is burning bright! | 16:38 | |
dogbert17 | nine: cool, where is brrt :) | 16:41 | |
lizmat | nnnnnnnnnnnnnnnnnnnnnn brrtman! | ||
brrt | ... but what's the issue | ||
dogbert17 | timotimo: gist.github.com/dogbert17/5da2ca38...88f75b23c5 | ||
brrt: gist.github.com/dogbert17/9e4bd97d...c3156635c8 | 16:42 | ||
16:42
pamplemousse_ joined
16:44
pamplemousse left
16:47
brrt left,
brrt joined
|
|||
nine | Well it's at least not the expr jit | 16:56 | |
16:57
sena_kun left,
sena_kun joined
|
|||
brrt | hmmm | 16:59 | |
well that reduces the surface area quite considerably | |||
nine | Since MVM_repr_bind_attr_inso gets called from JITed code, we're looking for a frame where the bindattr did not get turned into sp_bind_o | ||
16:59
mst joined,
ChanServ sets mode: +o mst
|
|||
brrt | nine: does jit-bisect.pl help? | 16:59 | |
nine | This program cannot be bisected: 256 at nqp/MoarVM/tools/jit-bisect.pl line 181. | 17:05 | |
Apparently jit-bisect.pl only disables the expr jit when doing the initial run | 17:06 | ||
Ah, the lego JIT does not support bisection at all | 17:09 | ||
But since I can catch it in the debugger, shouldn't I be able to find out which frame is currently running? | 17:10 | ||
Even tc->cur_frame claims to be !cursor_start. But in the spesh log I don't find any untouched bindattr_* ops | 17:14 | ||
timotimo | nine: it could be an attrref | 17:15 | |
nine | Oh, there are multiple specializations! | ||
timotimo | ah, yes there are | ||
nine | One of them does a getattr_o, then an getlexperinvtype_o (which may allocate), then the bindattr_o | 17:21 | |
So what's the JIT equivalent of an MVMROOT? | |||
brrt: ^^^ | |||
timotimo | it doesn't have that, it just has all locals registered with the gc if they are object locals | 17:22 | |
other than that you can always gc_root_temp_push gc_root_temp_pop | |||
nine | Apparently this one isn't? | ||
timotimo | this is the template jit, yeah? | 17:25 | |
nine | no | ||
timotimo | exprjit? | ||
nine | Ok, confusion of terms. It's the lego jit | ||
timotimo | OK | ||
brrt | nine: no, there's nothing like MVMROOT | 17:26 | |
timotimo | that will store everything back into the locals after every compiled instruction | ||
17:26
sena_kun left
|
|||
brrt | it's not something we do.... any idea which template doe sit | 17:26 | |
nine: lego jit does support bisect, try jit-bisect.pl with --spesh option | |||
nine | Of course it's also possible that the pointer returned from getattr_o was already outdated | ||
timotimo | that would be impressive | 17:27 | |
brrt | (I've been meaning to unify that behaviour) | ||
brrt will have a quick look | |||
17:30
chloekek joined
|
|||
nine | A debug check in getattr_o does not trigger though | 17:36 | |
But there's a second bindattr_o in that spesh cand. One of the 2 PHI nodes is the result of an invocation | 17:37 | ||
17:41
brrt left
|
|||
nine | Stupid me...it's easy to identify the right bindattr_o: I do have the name of the attribute that gets set | 17:43 | |
So it's bindattr_o r1(2), r8(6), lits($!regexsub), r3(15), liti16(-1) | |||
Which is the result of a getcodeobj which gives me a place to add a check | 17:45 | ||
Gotcha! This check triggers. getcodeobj returns an outdated pointer | 17:46 | ||
Alas the code_object was already outdated when we enter MVM_frame_get_code_object. | 17:52 | ||
timotimo | wow | 17:54 | |
nine | So the next challenge will be to find out where that MVMCode thing comes from | 17:55 | |
18:00
reportable6 left,
reportable6 joined
|
|||
nine | It's not from deserialization of closures as those will be allocated in gen2 | 18:01 | |
19:17
robertle joined
19:37
brrt joined
|
|||
nine | Huh! I replaced the setcodeobj primitive in the JIT with a call to a new MVM_frame_set_code_object function that just does the MVM_ASSIGN_REF thing and the error went away | 20:21 | |
So either I made a mistake, or the JIT implementation of setcodeobj is simply wrong. I suspect the latter, because it does assign a reference without any GC related magic. It just sets a pointer | 20:22 | ||
And even worse! The JIT implementation of setcodeobj was done by me! Now I'm back to my debugging rule #1 the bug is always in my own code | 20:23 | ||
timotimo | the jit doesn't have MVMROOT, but i'm pretty sure it has MVM_ASSIGN or what it's called | 20:28 | |
nine | check_wb and hit_wb | 20:29 | |
Geth | MoarVM: 9c969b1bac | (Stefan Seifert)++ | src/jit/x64/emit.dasc Add missing write barrier to lego JIT implementation of setcodeobj Thanks to dogbert17++ for finding the error! |
||
nine | I think that's now #26. I guess I can go on vacation now with a pretty good conscience :) | 20:31 | |
timotimo | publish the patch you're using to cause GC on every allocation so someone else can also give it a try while you're away? | 20:32 | |
Geth | MoarVM: 35101eb850 | (Stefan Seifert)++ | 6 files New GC debug level MVM_GC_DEBUG 3 will trigger the GC on every allocation. In addition it will poison the past fromspace by overwriting it with 0xef (which is easily recognizable as garbage and cause explosions when dereferenced). Note that this will also disable destructors as they rely on STables remaining in past fromspace till the next GC run |
20:36 | |
timotimo | destructors means finalizers? | 20:37 | |
nine | yes | ||
A couple more debug assertions and a nonworking implementation of saving those STables even when poisoning the rest: gist.github.com/niner/065c8585c477...816b54cd4b | 20:38 | ||
dogbert17: it's fixed! | 20:39 | ||
dogbert17 | nine+++, very impressive | 20:43 | |
timotimo | .o( time for another point release so that the upcoming rakudo star can be really good? ) | ||
dogbert17 | we might actually be out of fromspace errors now | 20:44 | |
I had just run a spectest with a 512 byte nursery, there were 5 fromspace errors but nine's setcodeobj fix seems to have taken care of them all :) | 20:45 | ||
nine | Ok, boarding now. Wish you all a nice 3 weeks! | 20:56 | |
timotimo | have a good one! | 20:57 | |
nine | thanks! | ||
20:58
travis-ci joined
|
|||
travis-ci | MoarVM build errored. Stefan Seifert 'New GC debug level | 20:58 | |
travis-ci.org/MoarVM/MoarVM/builds/567116024 github.com/MoarVM/MoarVM/compare/9...101eb8501d | |||
20:58
travis-ci left
|
|||
dogbert17 | have a nice vacation | 21:00 | |
21:09
lucasb left
|
|||
brrt | noine++ | 21:15 | |
nine++ well done | 21:16 | ||
21:17
chloekek left
21:27
MasterDuke joined
21:33
pamplemousse_ left
21:38
brrt left
21:48
pamplemousse_ joined
21:59
tellable6 joined
22:02
tellable6 left
|
|||
Geth | MoarVM: b3469f9264 | (Daniel Green)++ | src/spesh/optimize.c Actually optimize smrt_intify when able |
22:04 | |
22:23
travis-ci joined
|
|||
travis-ci | MoarVM build passed. Daniel Green 'Actually optimize smrt_intify when able' | 22:23 | |
travis-ci.org/MoarVM/MoarVM/builds/567145134 github.com/MoarVM/MoarVM/compare/3...469f926454 | |||
22:23
travis-ci left
|