02:56 ilbot3 joined
lizmat And another Perl 6 Weekly hits the Net: p6weekly.wordpress.com/2018/02/12/...om-apopka/ 06:56
08:08 brrt joined 08:31 zakharyas joined 08:38 zakharyas joined 08:39 reportable6 joined 08:41 zakharyas joined 09:00 zakharyas joined 09:31 notable6 joined 10:17 Kaiepi joined 10:29 bloatable6 joined
brrt good * #moarvm 10:31
dogbert2_ good morning brrt 10:35
are you still fighting with the JIT bug?
brrt not at the moment, but it's still unfixed, yes
how badly is the release process affected 10:36
dogbert2_ some 'mean' person put the release blocker tag on it
do you have any idea about for how long the bug has been present? 10:38
brrt no. i know the template has been present for months
i think that's valid though
stuff should not break 10:39
(putting release blocker on it)
on the other hand, we're bus-factor constrained
dogbert2_ yeah, just wondering if it was present when the previous release was dome
*done
brrt we could ehm. check
:-)
dogbert2_ indeed
brrt anyway, i'm not going to say 'this was broken before, so i don't care about fixing it' 10:40
dogbert2_ I don't doubt that you'll fix it :-) 10:41
but I still think it's interesting to see for how long it's been there
brrt for long, that's for sure, but it just might not have been triggerable 10:42
dogbert2_ true, 2017.12 is ok while 2018.01 is not 10:43
brrt aha 10:44
dogbert2_ the bisectbot points to github.com/MoarVM/MoarVM/commit/db...bf3b4afb59
dunno if it gives any clues though 10:46
brrt not really
i know fairly well where the script miscompiles
i just don't know what the miscompile is, precisely :-) 10:47
10:52 shareable6 joined
dogbert2_ so it's difficult to debug then 10:53
11:10 zakharyas1 joined
brrt yeah. doesn't break on 'first try', either 11:11
so the normal 'set-a-breakpoint' doesn't work here
oh, hang o, i have an idea 11:12
11:15 zakharyas joined
dogbert2_ .oO 11:17
11:18 zakharyas joined 11:25 zakharyas joined 11:45 brrt joined
Geth MoarVM: wukgdu++ created pull request #801:
fix format string's parameters
12:33
MoarVM: c27af6a54b | wukgdu++ | src/io/syncsocket.c
fix format string's parameters
12:38
MoarVM: 48d98a1831 | (Zoffix Znet)++ (committed using GitHub Web editor) | src/io/syncsocket.c
Merge pull request #801 from wukgdu/fix_format

fix format string's parameters
12:47 robertle joined
robertle I am trying to understand some of the tradeoffs of huge-stack-and-lazy-page vs segmented-stack-extend-in-vm vs non-segmented-stack-and-realloc strategies in the wider context, sparked by something the guile guy wrote, and was wondering what moarvm is doing, and more importantly why the design decision were made that way. anyone knows? 12:49
I mostly understand the consequence of having the stack move around in terms of being able to reference directly into the stack of course, and that huge-alloc doesn't really work on 32bit 12:50
but I am wondering about the cost of chacking whether you have enough space left on the stack 12:51
is there some more clever way than just checking? I thought I did read something about protected memory after the stack and trapping or so, btu can't find it anymore. some lisp source or so...
brrt so, i'm a bit vague on details, but iirc the stack used to be a): a tree (because we wanted to support closures) and b): noncontiguous, and currently we do have a contiguous stack, but i'm not sure whether it's huge or not 12:53
i don't think it is actually very large
also, you only need to check on subroutine invocation
in perl6, subroutine invocation is not *quite* as cheap as it could be 12:54
so the added cost of one more check is not that significant
robertle but it's a tree of stack contiguous stack segments, right? not a tree of individual stack entries? 12:55
some listp/scheme things have the latter, but that sounds prohibitively expensive
brrt contigouous segments. and we don't have a tree anymore (again, iirc) 13:00
because we moved that to an optimistic scheme wherein we'd only copy the stack to heap in the case of taking a continuation 13:01
robertle a, so a closure means a full stack copy?
I was trying to understand the paper by dybvig, where he explains how you can have closures with the environment mostly on the stack. too dumb tough... 13:03
brrt not sure about a closure, but a continuation does 13:10
robertle right! I think that is also what makes me wonder about the stack strategy: if you can do most things on the stack, that would allow you to get really fast in some languages. but if every push onto the stack involves checking if there is still space, then that introduces a lot of checks... 13:12
jnthn At present, we try to allocate invocation records in a contiguous buffer. This is used for frames that don't "escape" - that is, end up referenced from the heap. If that happens, they get promoted to GC-managed objects (and, critically, get the generational treatement, which turns out to be very important for programs that have large numbers of closures or continuations held at once) 13:23
We also note which frames have a tendency to get promoted anyway, and allocate those directly with the GC in the future
robertle ok, but how do you grow that contiguous buffer? and how do you know when to grow it? 13:24
jnthn We don't grow it, we keep a linked list of them 13:25
robertle ok, and each has a fixed size?
jnthn It's done with a bounds check
Yes
robertle ok, understood
jnthn You'd need to recurse or have a pretty deep stack to fill the segment though 13:26
robertle but this is just the method invocation "call stack", you don't use that as a general purpose work stack like native code does?
jnthn Meaning the branch is predictable, which means it isn't so bad
robertle right because the buffer is rarely full 13:27
jnthn No, MoarVM doesn't use the "system stack" to represent the stack of the program it is running. Generally it runs very shallow on the real system stack.
The only time that ever happens if with native callbacks
When we don't really have a choice
robertle ok, I get that. what I meant was that this buffer is used for method invocation "frames", but you don't push/pop stuff onto it while executing a method? 13:28
timotimo that's right
robertle k
timotimo we're register-based, but our registers are basically offsets into the frame size
so no pushing and popping happens
jnthn Correct, we keep a register set
Also worth noting that we do quite a lot of inlining
robertle so where do you spill to if you run out of regs?
jnthn We don't run out of regs. 13:29
Each frame specifies the number it needs in its metadata
We just allocate that much space
robertle ok, get it. I think that's what guile does too
jnthn Of course, the JIT compiler has to worry about such things :)
brrt can probably tell you what happens there, but it must be something that means the GC knows where we spilled to 13:30
robertle ok, great food for thought. thanks! 13:33
jnthn There's no doubt lots of ways we can do better in all of this, fwiw. 13:36
As with most things, we're working under resource contraints, so "how quickly can we implement X" is often a design consideration too :) 13:37
13:55 zakharyas joined 14:28 zakharyas joined
jnthn git push 14:33
d'oh :)
Geth MoarVM: da41e397f1 | (Jonathan Worthington)++ | src/6model/reprs/MVMSpeshLog.c
Implement unmanaged_size in MVMSpeshLog repr

This means the GC understands the amount of space it really takes, and so can trigger a full collection in a far more timely manner if we are doing nothing but accumulating spesh logs (why that happens is another issue, however). With this, the "leak" reported in Rakudo #1513 does at least reach an upper boundary and stop growing. Prior to this, since only the directly allocated memory of the spesh log was accounted for, it would have taken a very long time for the GC to decide enough had been promoted into gen2 to do a full collection (long enough for the memory use to grow giant).
dogbert2_ .oO jnthn reclaiming memory 14:40
14:43 zakharyas1 joined
jnthn Yeah, figured out the spurious log entries too 14:49
timotimo that might explain why the heap analyzer didn't account fro everything in these
jnthn Indeed, it also uses that
dogbert2_ jnthn: do you think that your fix will affect github.com/MoarVM/MoarVM/issues/680 as well 14:52
jnthn Doubtful
Though they can't hurt
dogbert2_ I'm retesting that moving the variable declarations still impacts maxrss, guess I should test it with you patch as well 14:53
Geth MoarVM: 004680a03a | (Jonathan Worthington)++ | src/spesh/log.h
Don't spesh log if we have a spesh_cand

This check will rule out most cases we shouldn't be logging nice and quickly. It also rules out some cases we did not before, namely that where we performed OSR. That meant we had a spesh correlation ID in place (since the frame was entered through the non-specialized path initially), resulting in the frame wrongly being considered logged beyond being specialized and OSR'd. That in turn resulted in spurious spesh log entries, and was at the root of the memory growth issue in Rakudo #1513.
14:58
dogbert2_ notes that the mysterious change of maxrss when running the gist in github.com/MoarVM/MoarVM/issues/680 remains, i.e. moving out declarations of @tags and @commits from the loop 15:09
dogbert2_ original code has a maxrss of 531128k while the midified code stays at 327884k 15:11
15:11 zakharyas joined
jnthn Yeah, it's an interesting observation 15:12
Hm, the bug is filed against MoarVM but I don't know it's going to turn out to be there
dogbert2_ perhaps it should be moved 15:13
jnthn Well, doesn't matter in a sense 15:14
15:14 travis-ci joined
travis-ci MoarVM build failed. Jonathan Worthington 'Don't spesh log if we have a spesh_cand 15:14
travis-ci.org/MoarVM/MoarVM/builds/340521536 github.com/MoarVM/MoarVM/compare/d...4680a03a0c
15:14 travis-ci left
jnthn Yowser 15:15
That fix has made the expr JIT very reliably explosive, it seems
nwc10 this looks just like what I'm getting with a (gcc) ASAN build 15:16
src/spesh/log.c:152:41: runtime error: member access within null pointer of type 'struct MVMSpeshLog'
jnthn Yup, and MVM_JIT_EXPR_DISABLE=1 seems to help
Why on earth would the JITted code be trying to do something with the spesh log, though?! 15:17
nwc10 I was going to ask earlier "why do I seem to be in a minority of one?"
brrt o.O
jnthn I don't know how the above change, short of resulting in less polluted spesh data, could cause that change 15:18
nwc10 for me culprit(s) seem to be MoarVM commit 0e737146b73d994d9bd38208088771deb4dd6f4d or its parent
and yes, with MVM_JIT_EXPR_DISABLE=1 I can build
(not yet finished, but past that SEGV)
brrt hmmm 15:19
jnthn Yeah, the commit at HEAD seems to make it trip up over the expr jit bug a lot more 15:23
Making sure of that 15:24
15:25 AlexDaniel joined
jnthn Yes, HEAD~1 completes the NQP build 15:25
HEAD trips over the EXPR JIT 15:26
The only thing it could be doing is making the spesh log contain less junk 15:27
brrt and thereby making it compile more frames and breaking faster
jnthn Yeah
So we...get worse because we got better :P
brrt it's probably a good thing though 15:28
jnthn Well yes, in that it gives you a very ready supply of reproductions :)
brrt nwc10 has been consistenly reporting this problem and i've consistently not been able to find anything
indeed
15:32 zakharyas joined
jnthn dogbert2_: I can reproduce #680 15:35
dogbert2_: As well as the effect of moving the decls
To the heap analyzer! 15:36
dogbert2_ hooray :)
so now it's time for the heap analyzer to show what it's made of :) 15:37
nwc10 software :-(
jnthn SEGV :P 15:38
Well, the analyzer not, but the snapshot mechanism apparently :/
ooh 15:40
That's a silly typo, and it may have been around for a year or more 15:41
15:43 unicodable6 joined
Geth MoarVM: cf523c89c0 | (Jonathan Worthington)++ | src/profiler/heapsnapshot.c
Test the current thread's frame in heap snapshot

Fixes a bug that can in the best case cause a SEGV (which is how I discovered it), and in the worst case lead to missing data in the report.
15:44
[Coke] oops. :) 15:49
jnthn huh, what... 15:54
The heap snapshot came out as a binary file, but the analyzer doesn't read that?
Just complains about invalid utf-8 15:55
That's after uninstalling and reinstalling it
Even installing the latest version from the repo doesn't help 15:59
dogbert2_ jnthn: have you checked commit de6dceda8102fab4b58ebe03 16:00
jnthn huh what, I added a line to print out the excepiton and it worked?!
dogbert2_: Which repo? 16:01
dogbert2_ i.e. MoarVM, title is 'Merge branch 'heapsnapshot_binary_format''
jnthn Yeah, I'm not sure what's going on, but did at least now get it to load the snapshot 16:02
dogbert2_ cool, the solutions is getting closer and closer
*solution 16:03
16:03 travis-ci joined
travis-ci MoarVM build failed. Jonathan Worthington 'Test the current thread's frame in heap snapshot 16:04
travis-ci.org/MoarVM/MoarVM/builds/340543283 github.com/MoarVM/MoarVM/compare/0...523c89c06a
16:04 travis-ci left
jnthn Hm, didn't really reveal what I was thinking 16:04
ah, ok, now it does 16:05
I added a class LeakTracer {} and then a my $x = LeakTracer.new in the loop 16:06
And there are as many instances of that in the final snapshot as there are iterations
Also, darn, the binary format loads faster at least, when it works :P 16:07
ahhh 16:08
timotimo <3
jnthn gist.github.com/jnthn/c56ddd837508...c49e1031c6 16:09
So the timer (from Promise.in) stays active, because there's not a cancellation mechanism for a Promise
It takes a closure 16:10
Uh, refs a closure
Which is the timer callback
timotimo ooooooh
jnthn and the snapshot tells the rest of the story
uh, path even
timotimo wow, so any react or supply that has a promise.in will keep around everything reachable for that particular whenever through its call stack? 16:12
jnthn Yup
timotimo oh, is that only if the promise.in doesn't actually resolve? i.e. if the react shuts down before that? 16:13
jnthn Right, because there's currently no cancellation mechanism on a Promise
timotimo so would we mix in a cancel method to some where we know we can do it and have a .^can in the react implementation? 16:16
not sure if we can have something sensible for start blocks; if a task is currently awaiting it could throw an exception like in java, but i think that'll lead to some rather ugly code 16:17
dogbert2_ removed the promise.in code from the original gist, maxrss 255360k 16:18
timotimo passing a callback to be called on cancellation might be a way, but that'll lead to lots of boilerplate for signalling across that the work is supposed to be done
jnthn Yeah 16:19
this change
- whenever Promise.in(10) {
+ whenever Supply.interval(10, 10).head {
Eliminates the leak
Well, "leak" in that we don't actually lose track of memory
We just keep it around a good bit longer than needed 16:20
So, this is very much not a MoarVM issue
Actually a language design issue :)
So, who's the concurrency designer? :P
timotimo: fwiw, I think a Promise::Cancellable subclass of Promise could be a way to go. It's just have an overridden method Supply that maps tap close to the cancellation 16:21
timotimo "good bit longer"; do we eventually reclaim those closures/continuations?
jnthn Yes
timotimo oh, when the time elapses? 16:22
jnthn Right
Will make sure of it now, but that matches the data I saw
timotimo i had imagined it a lot worse in my head :)
but it also explains why moving the array outside fixes things; the closures all refer to the same array and old data is overriden every time
jnthn Because my first attempt showed only one LeakTracer instance
Right. :) 16:23
To get all of the instances, I had to shorten the time the program ran for
By making it collect less data
oh, another way to verify this
Bump up to Promise.in(40) and see if we end up using even more memory 16:24
Hm, curiously not 16:25
timotimo not enough iterations? or does it run forever?
jnthn It runs for 60s
I made the Promise.in 100s now
So "never"
Hm, curious. Doesn't have quite the impact I expected 16:26
Which means it may actually be as bad as timotimo feared 16:28
timotimo that it never actually reclaims it at all?
jnthn yeah
I need to check in a few more places
timotimo: yes, it was that bad :S 16:36
16:36 shareable6 joined
jnthn Either Rakudo can do the cancel itself after a one-shot timer fires, or we can just clean it up in MoarVM 16:37
16:37 benchable6 joined
jnthn I've done a patch for the second 16:37
It's still more maxrss then replacing it with Supply.interval(10, 10)
But it's a lot less than it was
jnthn spectests 16:38
japhb jnthn: BTW, is today the beginning of your grant work already? (If so, AWESOME BTW) 16:39
jnthn Yes :)
Decided to start out with some leak hunting :)
japhb I think that's an excellent choice. :-)
timotimo jnthn++ # grant request approved 16:42
i'm also glad my early work on the heap analyzer has already made working more comfortable for jnthn :) 16:43
jnthn d'oh, my fix busts some tests
[Coke] is reminded he has many grant related things to post tonight. :|
(which is :| only because it's work for me. :) 16:44
timotimo i'm still bummed i haven't properly started my grant work yet, but the apartment search and subsequent move - which is not actually finished in any way yet - have left me pretty much drained of energy 16:55
[Coke] timotimo: please note that the GC has rules about long running grants with no progress. :(
Geth MoarVM: c6519f4c32 | (Jonathan Worthington)++ | src/io/timers.c
Clean up one-shot timers after firing

Otherwise, we will end up holding on to the callback functions for them, which is a memory leak. We could in theory have solved the issue by making Rakudo do the cancellation upon first firing also, but this feels a tad more robust.
16:56
[Coke] (for the kind that go through the voting part of the GC)
jnthn So there was a MoarVM issue in github.com/MoarVM/MoarVM/issues/680 after all 16:57
timotimo [Coke]: should not be a problem once work starts, though, right? 16:58
[Coke] timotimo: except it's been 4 months since the grant was awarded. 17:02
timotimo can you point me at the rules in question? 17:03
[Coke] We're discussing it on the GC list now, obv. Alan will reach out to you if needed. 17:04
timotimo ah
[Coke] www.perlfoundation.org/rules_of_operation - Linked to off the main nav on that site.; section 2.6
timotimo yeah, that's a sensible rule 17:07
17:09 hoelzro_ joined 17:14 travis-ci joined
travis-ci MoarVM build failed. Jonathan Worthington 'Clean up one-shot timers after firing 17:14
travis-ci.org/MoarVM/MoarVM/builds/340577255 github.com/MoarVM/MoarVM/compare/c...519f4c32d9
17:14 travis-ci left 17:20 statisfiable6 joined 17:36 dogbert17 joined
dogbert17 jnthn++: very nice, the unmodified script now shows a maxrss of 206364k, which is less than half of what it was before 17:38
jnthn yay :) 17:39
[Coke] jnthn++ 17:53
jnthn wanders home 18:02
18:04 zakharyas joined 18:06 zakharyas joined 19:23 squashable6 joined
nine :q 19:43
20:21 zakharyas joined 20:38 bart__ joined
brrt good * 20:40
i also get breakage in nqp build
however, i don't get a breakage when MVM_SPESH_BLOCKING=1 20:43
it is also sensitive to MVM_JIT_EXPR_DISABLE=1 20:45
w.t.f
aha, that's interesting 20:49
where do we insert the MVM_spesh_log_static things? 20:51
timotimo you mean getlexstatic_o? 20:55
brrt oh
is that a thing
timotimo it is
brrt hang on a minute
timotimo we gen that op instead of getlex if we know something is not going to change, or something like that 20:56
Geth MoarVM: a01cdb449c | (Bart Wiegmans)++ | 2 files
Disable getlexstatic_o for the time being

This breaks the NQP build, but only when MVM_SPESH_BLOCKING isn't set. No idea why yet.
20:59
brrt i see 21:05
anyway, the template looks good, so i'm curious where the jit fails 21:06
timotimo hm, do we actually log anything when the frame is already jitted? 21:07
brrt don't know 21:09
but i do note that this case looks suspiciously like the sp_p6ogetvt_o fail 21:10
and i hope the cause is the same
timotimo ah, i already forgot what went wrong with that one
21:10 ChanServ joined
brrt one thing that was wrong, but not *the* thing interestingly, is that we wouldn't allocate registers for live ranges created during the tile rollup process 21:11
prescription for linear scan allocation is to iterate by popping-off the heap
doesn't work when you have individual instructions that need caretaking by the register allocator
like CALL, for one thing 21:12
so, we iterate over each tile (instruction) as well
and because we can allocate the last live range before running out of tiles to process
we could miss processing tiles, and i had a 'rollup' loop to process all the last tiles
but, if those last tiles would then create new live ranges (because of spilling...), those then wouldn't be processed 21:13
so... altogether, i fixed that by having a single loop do both things
and either proceed on tiles, or proceed on the live ranges 21:14
this works, except, it doesn't actually fix the thing that was broken, which is Something Else that I don't know about just yet
21:16 travis-ci joined
travis-ci MoarVM build passed. Bart Wiegmans 'Disable getlexstatic_o for the time being 21:16
travis-ci.org/MoarVM/MoarVM/builds/340678599 github.com/MoarVM/MoarVM/compare/c...1cdb449c96
21:16 travis-ci left
brrt thanks timotimo++, wouldn't have found it otherwise 21:17
timotimo hm? 21:20
brrt disabling getlexstatic_o 21:22
timotimo oh? i literally just grepped interp.c for that function :) 21:47
22:18 greppable6 joined 22:31 Kaiepi joined 23:09 dogbert2 joined