github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:00 reportable6 left, reportable6 joined 00:44 sena_kun left 02:23 pamplemousse left, pamplemousse joined 02:34 MasterDuke left 04:46 pamplemousse left 05:29 robertle left 06:00 reportable6 left 06:02 reportable6 joined 06:56 robertle joined 07:33 brrt joined
brrt \o 07:48
07:59 zakharyas joined 08:06 squashable6 left 08:10 squashable6 joined 08:22 brrt left 09:02 brrt joined 09:56 brrt left
Guest37021 nine: do you want to reach 25 GC related fixes? 11:15
dogbert@dogbert-ubuntu:~/repos/rakudo$ while ./perl6-m -Ilib t/spec/S02-literals/quoting.t; do :; done 11:16
MoarVM panic: Collectable 0x560f1920d000 in fromspace accessed
nursery size was set to 2048
11:17 zakharyas left
nine LOL I think even parrot was faster: Stage parse : 5293.738 11:32
lizmat not if you had any non-ascii characters in the setting 11:33
I once killed it with that after several hours 11:34
nine Guest37021: that's odd...it doesn't show up here despite running a collection at every allocation 11:36
Guest37021: do you have a backtrace of that? 11:38
lizmat perhaps it only happens with a nursery size of 2048 ? 11:43
nine But why would it? 11:46
Well, this string is definitely corrupted: (gdb) call fprintf(stderr, "%p -> %p -> %p\n", orig, orig->body.storage.strands[0].blob_string, orig->body.storage.strands[0].blob_string->body.storage.strands[0].blob_string) 11:47
0x67d55f0 -> 0x5be2dc0 -> 0x5be2dc0
String has a strand that's again a string consisting of strands (which I think we don't allow) and the first strand is the string itself
12:00 reportable6 left
Guest37021 nine: backtrace coming up, just came back from a quick lunch 12:00
12:01 reportable6 joined
Guest37021 oops, I lied nursery size is set to 512 12:02
gist.github.com/dogbert17/9e4bd97d...c3156635c8
the problem vanishes if I set MVM_JIT_DISABLE=1 12:03
nine Looks like the actual error happens either in the JITed code or somewhere before that 12:18
Guest37021 so how do we tackle this? 12:20
12:26 robertle left 12:28 robertle joined 12:45 pamplemousse joined
nine This is really odd: when I turn on MVM_DEBUG_STRANDS stuff reliably segfaults 12:47
But not in any of the checking code. Instead it fails in MVM_string_gi_init because that is used on strings that have storage type MVM_STRING_STRANDS but 0 strands 12:48
Ah, god damn it! if (s->body.storage_type = MVM_STRING_STRAND) 12:50
nine's debugging rule #1: the bug is always in my own code! 12:51
Guest37021 oops 12:52
13:11 pamplemousse left, pamplemousse joined, zakharyas joined
timotimo whoops 13:11
Guest37021 timotimo: did you see that MasterDuke had received his Ryzen 3700X machine? 13:16
timotimo yeah
Guest37021 spectest run in 133 secs and Rakudo parse time of 36s is very fast 13:17
timotimo jealous of the performance, but i depend on rr too often
Guest37021 there's always the Intel 9900k 13:18
a tad expensive though
nwc10 timotimo: "rr" in this context is? 13:24
13:28 brrt joined
Guest37021 dogbert@dogbert-ubuntu:~/repos/rakudo$ while ./perl6-m -Ilib t/spec/S17-procasync/no-runaway-file-limit.t; do :; done 13:30
1..1
timotimo mozilla's "rr" is a recorder and reverse debugger
Guest37021 Segmentation fault (core dumped)
13:45 sena_kun joined 13:59 brrt left
nine How on earth can a string end up in its own strands? 13:59
13:59 lucasb joined
timotimo it could be an empty string :P 14:01
yeah, that don't make much sense either
use rr to figure out what writes the strands array :)
nine I guess not even rr would be of much use with objects moving around all the time? 14:06
timotimo well, the strands array in particular is a malloced thing 14:11
so that doesn't move unless it gets realloced
nine If I had to guess, I'd say it's a missing write barrier somewhere 14:20
But adding MVM_gc_write_barrier_hit(tc, (MVMCollectable *)result); after every write to a result->body.storage.strands[0].blob_string makes it explode right after startup in the ModuleLoader 14:28
timotimo um, why would you put barrier_hit in there 14:29
the write barrier is supposed to be conditional 14:30
sena_kun o/ 14:31
so I took a two heap snapshots in a separate thread, one is after 10 seconds of run, second is after 20 seconds, between them like 1 gb of memory was eaten
what can I do to be helpful next? 14:32
timotimo open both in one instance of the moarvm-heapanalyzer each
sena_kun done, now `summary`? 14:33
timotimo probably wants the "incidents" branch of the moarvm heapanalyzer because i started working on stuff in there and never bothered to pick apart the feature back into master
or something
sena_kun oki, getting it... 14:34
timotimo but yeah, summary can be interesting
and also "top objects by count", "top objects by size", "top frames by count", "top frames by size"
sena_kun github gists are ok?
timotimo sure 14:35
nine timotimo: probably because I don't know what I'm doing :) 14:36
timotimo it seems a little dangerous to hit the wb every time
like, wouldn't that add the same memory location in the inter-generation set multiple times? 14:37
and then the same pointer would be updated multiple times, which might upset an assert somewhere perhaps
sena_kun gist.github.com/Altai-man/b6e6c40b...46dadf4460 <- this is without incidents branch, now preparing branch version...
seems like only Buf is growing
timotimo, I can re-run it with incidents branch, should I? 14:39
nine I think I meant MVM_gc_write_barrier(tc, (MVMCollectable *)result, (MVMCollectable *)a); instead
timotimo ok
sena_kun: now you can 'find 1000 objects type="Buf"' i think? maybe? 14:40
sena_kun got a bunch of IDs
timotimo right, and you can "path" each of the IDs
sena_kun only 13 of them
timotimo even better 14:41
you can "info" them to get their size
(maybe only in the incidents branch tho)
"path" will let you find what keeps a given object alive 14:42
sena_kun I am running `perl6 -Ip6-app-moarvm-heapanalyzer/lib p6-app-moarvm-heapanalyzer/bin/moar-ha heapsnapshot-493-2.mvmheap` but info is not working 14:43
I checkouted the branch, of course
Guest37021 timotimo: do you see anything suspicious here: gist.github.com/dogbert17/1c8a0dcc...f9681f9ef6
timotimo oh maybe it's "show" rather than "info"
sena_kun don't understand. :S 14:44
timotimo Guest37021: whenever i see "dogbert17" at the top of a gist, it's very suspicious already
sena_kun: just "show 1234"
sena_kun d'oh
timotimo, sorry
nine And that does seem to help!
Guest37021 :) 14:45
timotimo Guest37021: well, that looks damning 14:46
Guest37021 it's with a 512 byte nursery 14:47
nine This could have caused a number of string related issues
sena_kun show for all objects returns me `Buf (Object) --[ <STable> ]--> Buf (STable) (34330)
nine With lots of action at a distance
timotimo that's the list of things it points at. isn't there a section at the top with some infos? 14:48
Guest37021: could try to use the FSA_DEBUG define to figure this one out 14:49
sena_kun timotimo, gist.github.com/Altai-man/b6e6c40b...ile-3-info
timotimo damn it, it's supposed to show some extra info 14:50
Guest37021 timotimo: will do
timotimo well, you can get the "path" to each of these objects and see which seem unrelated
sena_kun timotimo, how can I know unrelated if not? gist.github.com/Altai-man/b6e6c40b...le-4-paths 14:53
I see traces to HTTP::HPACK, HTTP2 code, Async::SSL
and the leak is visible when using HTTP/2, while HTTP/1.1 is fine, so all components are used 14:54
Geth MoarVM: 695a24de35 | (Stefan Seifert)++ | src/strings/ops.c
Fix memory corruption caused by missing write barriers in string ops

result may be promoted to gen2 in some cases while one of its strands is still in the nursery.
14:55
timotimo yeah, that's not quite as obvious 14:56
let me go dig up code that shows the size of an object
you can try to manually apply the commit 024a06db1b5065e20edcfb1af351ef80aeaae33a 14:59
sena_kun timotimo, to analyzer? 15:00
timotimo yes
can also ugly-hack it in
only need to see which object id has which size and unmanaged size
unmanaged size will be the interesting one for Buf
sena_kun `cherry-pick 024a06db1b5065e20edcfb1af351ef80aeaae33a` says bad object 15:02
timotimo oh, perhaps it's only in my repo 15:03
[email@hidden.address]
sena_kun github.com/jnthn/p6-app-moarvm-hea...80aeaae33a
no, it is here
hmm
timotimo since it's just a one-line patch
just copy-paste it :D
sena_kun oki
ok, now `show` shows sizes as `56 + 496 bytes` 15:06
timotimo yeah, that's not big
sena_kun 56 + 1511251968 bytes
:)
timotimo hm, that could be big
sena_kun path is...
timotimo of course that depends on how big a byte is on your system
sena_kun I also see 56 + 32768 bytes 15:07
everything else is not above 1024
not the not fun part, or not sure...
path is very undescriptive 15:08
timotimo "path" shows only the shortest path; that's the reason i built the "incidents" command
sena_kun gist.github.com/Altai-man/b6e6c40b...#file-path 15:09
I am running `incidents 106880`
timotimo ha, yeah, that's very short
"last handler result" has something to do with what implements "return" and other stuff 15:10
nine: good work :)
sena_kun gist.github.com/Altai-man/b6e6c40b...-incidents 15:11
timotimo OK, looking at the scalar object (with incidents again) should give us the name of the variable that holds on to it 15:12
but the frame is also interesting
sena_kun (Cro::HTTP2::FrameParser):31) is `whenever $in -> Cro::TCP::Message $packet`
hmm 15:13
timotimo it's possible that the http2 frame parser keeps adding bytes to one end of the buf an consuming bytes from the other and somehow rakuo isn't reclaiming the space in front of the "moving window" of data?
sena_kun no idea, but I am going to play a bit with FrameParser, like print packages and see what it can give us...
timotimo unfortunately, the "actual size" of the buffer isn't visible to perl6 code, you can get at it with gdb, though 15:14
sena_kun oh, so it won't help me much?
we use $buffer for storing incomplete message piece, but we `$buffer = Buf.new;` it on every new package, so hmm 15:15
timotimo a cleverly placed nqp::sin(1), turning off the jit, and a corresponding breakpoint in interp.c can get you a long way
sena_kun >OK, looking at the scalar object (with incidents again) should give us the name of the variable that holds on to it 15:16
how can I do it?
ah, incidents and ID...
timotimo literally just "incients 122704"
sena_kun <anon> (FrameParser.pm6 (Cro::HTTP2::FrameParser):22) (Frame) (106858)
a supply. :S
timotimo then "show 106858" will give you all links, incluing the one to the scalar, which will tell us the name 15:17
if it's in a lexical, that is
sena_kun there are a lot of scalars, and the one with 122704 is $_. 15:18
timotimo OK
well, we've close in a lot alreay
sena_kun ah, stop
gist.github.com/Altai-man/8c6e6987...7aa54a50fb <- am I interpreting it correctly? seems like it is $buffer, after all 15:19
timotimo yeah, that's $buffer 15:21
gotta be afk for a bit
sena_kun oh wow 15:24
seems like it is fixed
I have a feeling that it somehow relied on a bug somewhere
let me check... 15:25
15:25 AlexDaniel is now known as testable9291 15:27 testable9291 is now known as AlexDaniel
nine Guest37021: does my commit by any chane fix your issue, too? 15:32
15:34 robertle left
sena_kun timotimo, commited a working fix. thanks a lot for your help! 15:40
15:43 zakharyas left
dogbert17 nine: let me check ... 15:56
this still fails with a 512 byte nursery: while ./perl6-m -Ilib t/spec/S02-literals/quoting.t; do :; done 16:00
timotimo sena_kun: you're very welcome!
dogbert17 t/spec/S17-procasync/no-runaway-file-limit.t still SEGV's 16:02
timotimo dogbert17: any difference in output with the fsa ebug thing turne on? 16:06
oh no
my d key is starting to misbehave i think :o
ddddd - pressed 8 times, got 5 (
also, : is on the d key so the smiley also b0rked
lizmat ah, the joys of a German keyboard :-)
dogbert17 timotimo: no difference 16:07
timotimo dogbert17: none at all, though? add_page shouldn't even exist any more if that's turned on
dogbert17 perhaps a 512 byte nursery is too small
timotimo lizmat: wellllll, it's actually a neo2 variant of the german keyboard, so the d is actually on what others have as the ƶ key
lizmat ah... ok :-)
I wouldn't know: the "d" only has "d" on it on my keyboard 16:08
timotimo :) 16:10
the key that has a "d" on it works fine
see: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
lizmat
.oO( encrypted keyboards, the new trend! )
16:11
timotimo at one point i tweeted a tweet where i typed on de as if i had neo2 or the other way around, which resulted in basically gibberish, and i tagged it with #neo2
a few months ago some magazine calling themselves "neo2 magazine" appeared out of nowhere
now every other day i get a notification on twitter that they liked that tweet 16:12
lizmat :-)
timotimo like, they have some automated thing that keeps liking that same tweet over and over and over again
dogbert17 I'll recheck 16:13
nine dogbert17: LOL apparently your bug goes away when we collect garbage all the time?! 16:19
I can reproduce it easily with just a 512 byte nursery
Yes, something happens in the JITed code 16:27
16:29 brrt joined
brrt oh hey, I hear the signal 16:37
timotimo brrtman, the brrtsignal is burning bright! 16:38
dogbert17 nine: cool, where is brrt :) 16:41
lizmat nnnnnnnnnnnnnnnnnnnnnn brrtman!
brrt ... but what's the issue
dogbert17 timotimo: gist.github.com/dogbert17/5da2ca38...88f75b23c5
brrt: gist.github.com/dogbert17/9e4bd97d...c3156635c8 16:42
16:42 pamplemousse_ joined 16:44 pamplemousse left 16:47 brrt left, brrt joined
nine Well it's at least not the expr jit 16:56
16:57 sena_kun left, sena_kun joined
brrt hmmm 16:59
well that reduces the surface area quite considerably
nine Since MVM_repr_bind_attr_inso gets called from JITed code, we're looking for a frame where the bindattr did not get turned into sp_bind_o
16:59 mst joined, ChanServ sets mode: +o mst
brrt nine: does jit-bisect.pl help? 16:59
nine This program cannot be bisected: 256 at nqp/MoarVM/tools/jit-bisect.pl line 181. 17:05
Apparently jit-bisect.pl only disables the expr jit when doing the initial run 17:06
Ah, the lego JIT does not support bisection at all 17:09
But since I can catch it in the debugger, shouldn't I be able to find out which frame is currently running? 17:10
Even tc->cur_frame claims to be !cursor_start. But in the spesh log I don't find any untouched bindattr_* ops 17:14
timotimo nine: it could be an attrref 17:15
nine Oh, there are multiple specializations!
timotimo ah, yes there are
nine One of them does a getattr_o, then an getlexperinvtype_o (which may allocate), then the bindattr_o 17:21
So what's the JIT equivalent of an MVMROOT?
brrt: ^^^
timotimo it doesn't have that, it just has all locals registered with the gc if they are object locals 17:22
other than that you can always gc_root_temp_push gc_root_temp_pop
nine Apparently this one isn't?
timotimo this is the template jit, yeah? 17:25
nine no
timotimo exprjit?
nine Ok, confusion of terms. It's the lego jit
timotimo OK
brrt nine: no, there's nothing like MVMROOT 17:26
timotimo that will store everything back into the locals after every compiled instruction
17:26 sena_kun left
brrt it's not something we do.... any idea which template doe sit 17:26
nine: lego jit does support bisect, try jit-bisect.pl with --spesh option
nine Of course it's also possible that the pointer returned from getattr_o was already outdated
timotimo that would be impressive 17:27
brrt (I've been meaning to unify that behaviour)
brrt will have a quick look
17:30 chloekek joined
nine A debug check in getattr_o does not trigger though 17:36
But there's a second bindattr_o in that spesh cand. One of the 2 PHI nodes is the result of an invocation 17:37
17:41 brrt left
nine Stupid me...it's easy to identify the right bindattr_o: I do have the name of the attribute that gets set 17:43
So it's bindattr_o r1(2), r8(6), lits($!regexsub), r3(15), liti16(-1)
Which is the result of a getcodeobj which gives me a place to add a check 17:45
Gotcha! This check triggers. getcodeobj returns an outdated pointer 17:46
Alas the code_object was already outdated when we enter MVM_frame_get_code_object. 17:52
timotimo wow 17:54
nine So the next challenge will be to find out where that MVMCode thing comes from 17:55
18:00 reportable6 left, reportable6 joined
nine It's not from deserialization of closures as those will be allocated in gen2 18:01
19:17 robertle joined 19:37 brrt joined
nine Huh! I replaced the setcodeobj primitive in the JIT with a call to a new MVM_frame_set_code_object function that just does the MVM_ASSIGN_REF thing and the error went away 20:21
So either I made a mistake, or the JIT implementation of setcodeobj is simply wrong. I suspect the latter, because it does assign a reference without any GC related magic. It just sets a pointer 20:22
And even worse! The JIT implementation of setcodeobj was done by me! Now I'm back to my debugging rule #1 the bug is always in my own code 20:23
timotimo the jit doesn't have MVMROOT, but i'm pretty sure it has MVM_ASSIGN or what it's called 20:28
nine check_wb and hit_wb 20:29
Geth MoarVM: 9c969b1bac | (Stefan Seifert)++ | src/jit/x64/emit.dasc
Add missing write barrier to lego JIT implementation of setcodeobj

  Thanks to dogbert17++ for finding the error!
nine I think that's now #26. I guess I can go on vacation now with a pretty good conscience :) 20:31
timotimo publish the patch you're using to cause GC on every allocation so someone else can also give it a try while you're away? 20:32
Geth MoarVM: 35101eb850 | (Stefan Seifert)++ | 6 files
New GC debug level

MVM_GC_DEBUG 3 will trigger the GC on every allocation. In addition it will poison the past fromspace by overwriting it with 0xef (which is easily recognizable as garbage and cause explosions when dereferenced). Note that this will also disable destructors as they rely on STables remaining in past fromspace till the next GC run
20:36
timotimo destructors means finalizers? 20:37
nine yes
A couple more debug assertions and a nonworking implementation of saving those STables even when poisoning the rest: gist.github.com/niner/065c8585c477...816b54cd4b 20:38
dogbert17: it's fixed! 20:39
dogbert17 nine+++, very impressive 20:43
timotimo .o( time for another point release so that the upcoming rakudo star can be really good? )
dogbert17 we might actually be out of fromspace errors now 20:44
I had just run a spectest with a 512 byte nursery, there were 5 fromspace errors but nine's setcodeobj fix seems to have taken care of them all :) 20:45
nine Ok, boarding now. Wish you all a nice 3 weeks! 20:56
timotimo have a good one! 20:57
nine thanks!
20:58 travis-ci joined
travis-ci MoarVM build errored. Stefan Seifert 'New GC debug level 20:58
travis-ci.org/MoarVM/MoarVM/builds/567116024 github.com/MoarVM/MoarVM/compare/9...101eb8501d
20:58 travis-ci left
dogbert17 have a nice vacation 21:00
21:09 lucasb left
brrt noine++ 21:15
nine++ well done 21:16
21:17 chloekek left 21:27 MasterDuke joined 21:33 pamplemousse_ left 21:38 brrt left 21:48 pamplemousse_ joined 21:59 tellable6 joined 22:02 tellable6 left
Geth MoarVM: b3469f9264 | (Daniel Green)++ | src/spesh/optimize.c
Actually optimize smrt_intify when able
22:04
22:23 travis-ci joined
travis-ci MoarVM build passed. Daniel Green 'Actually optimize smrt_intify when able' 22:23
travis-ci.org/MoarVM/MoarVM/builds/567145134 github.com/MoarVM/MoarVM/compare/3...469f926454
22:23 travis-ci left