02:56
ilbot3 joined
|
|||
lizmat | And another Perl 6 Weekly hits the Net: p6weekly.wordpress.com/2018/02/12/...om-apopka/ | 06:56 | |
08:08
brrt joined
08:31
zakharyas joined
08:38
zakharyas joined
08:39
reportable6 joined
08:41
zakharyas joined
09:00
zakharyas joined
09:31
notable6 joined
10:17
Kaiepi joined
10:29
bloatable6 joined
|
|||
brrt | good * #moarvm | 10:31 | |
dogbert2_ | good morning brrt | 10:35 | |
are you still fighting with the JIT bug? | |||
brrt | not at the moment, but it's still unfixed, yes | ||
how badly is the release process affected | 10:36 | ||
dogbert2_ | some 'mean' person put the release blocker tag on it | ||
do you have any idea about for how long the bug has been present? | 10:38 | ||
brrt | no. i know the template has been present for months | ||
i think that's valid though | |||
stuff should not break | 10:39 | ||
(putting release blocker on it) | |||
on the other hand, we're bus-factor constrained | |||
dogbert2_ | yeah, just wondering if it was present when the previous release was dome | ||
*done | |||
brrt | we could ehm. check | ||
:-) | |||
dogbert2_ | indeed | ||
brrt | anyway, i'm not going to say 'this was broken before, so i don't care about fixing it' | 10:40 | |
dogbert2_ | I don't doubt that you'll fix it :-) | 10:41 | |
but I still think it's interesting to see for how long it's been there | |||
brrt | for long, that's for sure, but it just might not have been triggerable | 10:42 | |
dogbert2_ | true, 2017.12 is ok while 2018.01 is not | 10:43 | |
brrt | aha | 10:44 | |
dogbert2_ | the bisectbot points to github.com/MoarVM/MoarVM/commit/db...bf3b4afb59 | ||
dunno if it gives any clues though | 10:46 | ||
brrt | not really | ||
i know fairly well where the script miscompiles | |||
i just don't know what the miscompile is, precisely :-) | 10:47 | ||
10:52
shareable6 joined
|
|||
dogbert2_ | so it's difficult to debug then | 10:53 | |
11:10
zakharyas1 joined
|
|||
brrt | yeah. doesn't break on 'first try', either | 11:11 | |
so the normal 'set-a-breakpoint' doesn't work here | |||
oh, hang o, i have an idea | 11:12 | ||
11:15
zakharyas joined
|
|||
dogbert2_ | .oO | 11:17 | |
11:18
zakharyas joined
11:25
zakharyas joined
11:45
brrt joined
|
|||
Geth | MoarVM: wukgdu++ created pull request #801: fix format string's parameters |
12:33 | |
MoarVM: c27af6a54b | wukgdu++ | src/io/syncsocket.c fix format string's parameters |
12:38 | ||
MoarVM: 48d98a1831 | (Zoffix Znet)++ (committed using GitHub Web editor) | src/io/syncsocket.c Merge pull request #801 from wukgdu/fix_format fix format string's parameters |
|||
12:47
robertle joined
|
|||
robertle | I am trying to understand some of the tradeoffs of huge-stack-and-lazy-page vs segmented-stack-extend-in-vm vs non-segmented-stack-and-realloc strategies in the wider context, sparked by something the guile guy wrote, and was wondering what moarvm is doing, and more importantly why the design decision were made that way. anyone knows? | 12:49 | |
I mostly understand the consequence of having the stack move around in terms of being able to reference directly into the stack of course, and that huge-alloc doesn't really work on 32bit | 12:50 | ||
but I am wondering about the cost of chacking whether you have enough space left on the stack | 12:51 | ||
is there some more clever way than just checking? I thought I did read something about protected memory after the stack and trapping or so, btu can't find it anymore. some lisp source or so... | |||
brrt | so, i'm a bit vague on details, but iirc the stack used to be a): a tree (because we wanted to support closures) and b): noncontiguous, and currently we do have a contiguous stack, but i'm not sure whether it's huge or not | 12:53 | |
i don't think it is actually very large | |||
also, you only need to check on subroutine invocation | |||
in perl6, subroutine invocation is not *quite* as cheap as it could be | 12:54 | ||
so the added cost of one more check is not that significant | |||
robertle | but it's a tree of stack contiguous stack segments, right? not a tree of individual stack entries? | 12:55 | |
some listp/scheme things have the latter, but that sounds prohibitively expensive | |||
brrt | contigouous segments. and we don't have a tree anymore (again, iirc) | 13:00 | |
because we moved that to an optimistic scheme wherein we'd only copy the stack to heap in the case of taking a continuation | 13:01 | ||
robertle | a, so a closure means a full stack copy? | ||
I was trying to understand the paper by dybvig, where he explains how you can have closures with the environment mostly on the stack. too dumb tough... | 13:03 | ||
brrt | not sure about a closure, but a continuation does | 13:10 | |
robertle | right! I think that is also what makes me wonder about the stack strategy: if you can do most things on the stack, that would allow you to get really fast in some languages. but if every push onto the stack involves checking if there is still space, then that introduces a lot of checks... | 13:12 | |
jnthn | At present, we try to allocate invocation records in a contiguous buffer. This is used for frames that don't "escape" - that is, end up referenced from the heap. If that happens, they get promoted to GC-managed objects (and, critically, get the generational treatement, which turns out to be very important for programs that have large numbers of closures or continuations held at once) | 13:23 | |
We also note which frames have a tendency to get promoted anyway, and allocate those directly with the GC in the future | |||
robertle | ok, but how do you grow that contiguous buffer? and how do you know when to grow it? | 13:24 | |
jnthn | We don't grow it, we keep a linked list of them | 13:25 | |
robertle | ok, and each has a fixed size? | ||
jnthn | It's done with a bounds check | ||
Yes | |||
robertle | ok, understood | ||
jnthn | You'd need to recurse or have a pretty deep stack to fill the segment though | 13:26 | |
robertle | but this is just the method invocation "call stack", you don't use that as a general purpose work stack like native code does? | ||
jnthn | Meaning the branch is predictable, which means it isn't so bad | ||
robertle | right because the buffer is rarely full | 13:27 | |
jnthn | No, MoarVM doesn't use the "system stack" to represent the stack of the program it is running. Generally it runs very shallow on the real system stack. | ||
The only time that ever happens if with native callbacks | |||
When we don't really have a choice | |||
robertle | ok, I get that. what I meant was that this buffer is used for method invocation "frames", but you don't push/pop stuff onto it while executing a method? | 13:28 | |
timotimo | that's right | ||
robertle | k | ||
timotimo | we're register-based, but our registers are basically offsets into the frame size | ||
so no pushing and popping happens | |||
jnthn | Correct, we keep a register set | ||
Also worth noting that we do quite a lot of inlining | |||
robertle | so where do you spill to if you run out of regs? | ||
jnthn | We don't run out of regs. | 13:29 | |
Each frame specifies the number it needs in its metadata | |||
We just allocate that much space | |||
robertle | ok, get it. I think that's what guile does too | ||
jnthn | Of course, the JIT compiler has to worry about such things :) | ||
brrt can probably tell you what happens there, but it must be something that means the GC knows where we spilled to | 13:30 | ||
robertle | ok, great food for thought. thanks! | 13:33 | |
jnthn | There's no doubt lots of ways we can do better in all of this, fwiw. | 13:36 | |
As with most things, we're working under resource contraints, so "how quickly can we implement X" is often a design consideration too :) | 13:37 | ||
13:55
zakharyas joined
14:28
zakharyas joined
|
|||
jnthn | git push | 14:33 | |
d'oh :) | |||
Geth | MoarVM: da41e397f1 | (Jonathan Worthington)++ | src/6model/reprs/MVMSpeshLog.c Implement unmanaged_size in MVMSpeshLog repr This means the GC understands the amount of space it really takes, and so can trigger a full collection in a far more timely manner if we are doing nothing but accumulating spesh logs (why that happens is another issue, however). With this, the "leak" reported in Rakudo #1513 does at least reach an upper boundary and stop growing. Prior to this, since only the directly allocated memory of the spesh log was accounted for, it would have taken a very long time for the GC to decide enough had been promoted into gen2 to do a full collection (long enough for the memory use to grow giant). |
||
dogbert2_ | .oO jnthn reclaiming memory | 14:40 | |
14:43
zakharyas1 joined
|
|||
jnthn | Yeah, figured out the spurious log entries too | 14:49 | |
timotimo | that might explain why the heap analyzer didn't account fro everything in these | ||
jnthn | Indeed, it also uses that | ||
dogbert2_ | jnthn: do you think that your fix will affect github.com/MoarVM/MoarVM/issues/680 as well | 14:52 | |
jnthn | Doubtful | ||
Though they can't hurt | |||
dogbert2_ | I'm retesting that moving the variable declarations still impacts maxrss, guess I should test it with you patch as well | 14:53 | |
Geth | MoarVM: 004680a03a | (Jonathan Worthington)++ | src/spesh/log.h Don't spesh log if we have a spesh_cand This check will rule out most cases we shouldn't be logging nice and quickly. It also rules out some cases we did not before, namely that where we performed OSR. That meant we had a spesh correlation ID in place (since the frame was entered through the non-specialized path initially), resulting in the frame wrongly being considered logged beyond being specialized and OSR'd. That in turn resulted in spurious spesh log entries, and was at the root of the memory growth issue in Rakudo #1513. |
14:58 | |
dogbert2_ notes that the mysterious change of maxrss when running the gist in github.com/MoarVM/MoarVM/issues/680 remains, i.e. moving out declarations of @tags and @commits from the loop | 15:09 | ||
dogbert2_ original code has a maxrss of 531128k while the midified code stays at 327884k | 15:11 | ||
15:11
zakharyas joined
|
|||
jnthn | Yeah, it's an interesting observation | 15:12 | |
Hm, the bug is filed against MoarVM but I don't know it's going to turn out to be there | |||
dogbert2_ | perhaps it should be moved | 15:13 | |
jnthn | Well, doesn't matter in a sense | 15:14 | |
15:14
travis-ci joined
|
|||
travis-ci | MoarVM build failed. Jonathan Worthington 'Don't spesh log if we have a spesh_cand | 15:14 | |
travis-ci.org/MoarVM/MoarVM/builds/340521536 github.com/MoarVM/MoarVM/compare/d...4680a03a0c | |||
15:14
travis-ci left
|
|||
jnthn | Yowser | 15:15 | |
That fix has made the expr JIT very reliably explosive, it seems | |||
nwc10 | this looks just like what I'm getting with a (gcc) ASAN build | 15:16 | |
src/spesh/log.c:152:41: runtime error: member access within null pointer of type 'struct MVMSpeshLog' | |||
jnthn | Yup, and MVM_JIT_EXPR_DISABLE=1 seems to help | ||
Why on earth would the JITted code be trying to do something with the spesh log, though?! | 15:17 | ||
nwc10 | I was going to ask earlier "why do I seem to be in a minority of one?" | ||
brrt | o.O | ||
jnthn | I don't know how the above change, short of resulting in less polluted spesh data, could cause that change | 15:18 | |
nwc10 | for me culprit(s) seem to be MoarVM commit 0e737146b73d994d9bd38208088771deb4dd6f4d or its parent | ||
and yes, with MVM_JIT_EXPR_DISABLE=1 I can build | |||
(not yet finished, but past that SEGV) | |||
brrt | hmmm | 15:19 | |
jnthn | Yeah, the commit at HEAD seems to make it trip up over the expr jit bug a lot more | 15:23 | |
Making sure of that | 15:24 | ||
15:25
AlexDaniel joined
|
|||
jnthn | Yes, HEAD~1 completes the NQP build | 15:25 | |
HEAD trips over the EXPR JIT | 15:26 | ||
The only thing it could be doing is making the spesh log contain less junk | 15:27 | ||
brrt | and thereby making it compile more frames and breaking faster | ||
jnthn | Yeah | ||
So we...get worse because we got better :P | |||
brrt | it's probably a good thing though | 15:28 | |
jnthn | Well yes, in that it gives you a very ready supply of reproductions :) | ||
brrt | nwc10 has been consistenly reporting this problem and i've consistently not been able to find anything | ||
indeed | |||
15:32
zakharyas joined
|
|||
jnthn | dogbert2_: I can reproduce #680 | 15:35 | |
dogbert2_: As well as the effect of moving the decls | |||
To the heap analyzer! | 15:36 | ||
dogbert2_ | hooray :) | ||
so now it's time for the heap analyzer to show what it's made of :) | 15:37 | ||
nwc10 | software :-( | ||
jnthn | SEGV :P | 15:38 | |
Well, the analyzer not, but the snapshot mechanism apparently :/ | |||
ooh | 15:40 | ||
That's a silly typo, and it may have been around for a year or more | 15:41 | ||
15:43
unicodable6 joined
|
|||
Geth | MoarVM: cf523c89c0 | (Jonathan Worthington)++ | src/profiler/heapsnapshot.c Test the current thread's frame in heap snapshot Fixes a bug that can in the best case cause a SEGV (which is how I discovered it), and in the worst case lead to missing data in the report. |
15:44 | |
[Coke] | oops. :) | 15:49 | |
jnthn | huh, what... | 15:54 | |
The heap snapshot came out as a binary file, but the analyzer doesn't read that? | |||
Just complains about invalid utf-8 | 15:55 | ||
That's after uninstalling and reinstalling it | |||
Even installing the latest version from the repo doesn't help | 15:59 | ||
dogbert2_ | jnthn: have you checked commit de6dceda8102fab4b58ebe03 | 16:00 | |
jnthn | huh what, I added a line to print out the excepiton and it worked?! | ||
dogbert2_: Which repo? | 16:01 | ||
dogbert2_ | i.e. MoarVM, title is 'Merge branch 'heapsnapshot_binary_format'' | ||
jnthn | Yeah, I'm not sure what's going on, but did at least now get it to load the snapshot | 16:02 | |
dogbert2_ | cool, the solutions is getting closer and closer | ||
*solution | 16:03 | ||
16:03
travis-ci joined
|
|||
travis-ci | MoarVM build failed. Jonathan Worthington 'Test the current thread's frame in heap snapshot | 16:04 | |
travis-ci.org/MoarVM/MoarVM/builds/340543283 github.com/MoarVM/MoarVM/compare/0...523c89c06a | |||
16:04
travis-ci left
|
|||
jnthn | Hm, didn't really reveal what I was thinking | 16:04 | |
ah, ok, now it does | 16:05 | ||
I added a class LeakTracer {} and then a my $x = LeakTracer.new in the loop | 16:06 | ||
And there are as many instances of that in the final snapshot as there are iterations | |||
Also, darn, the binary format loads faster at least, when it works :P | 16:07 | ||
ahhh | 16:08 | ||
timotimo | <3 | ||
jnthn | gist.github.com/jnthn/c56ddd837508...c49e1031c6 | 16:09 | |
So the timer (from Promise.in) stays active, because there's not a cancellation mechanism for a Promise | |||
It takes a closure | 16:10 | ||
Uh, refs a closure | |||
Which is the timer callback | |||
timotimo | ooooooh | ||
jnthn | and the snapshot tells the rest of the story | ||
uh, path even | |||
timotimo | wow, so any react or supply that has a promise.in will keep around everything reachable for that particular whenever through its call stack? | 16:12 | |
jnthn | Yup | ||
timotimo | oh, is that only if the promise.in doesn't actually resolve? i.e. if the react shuts down before that? | 16:13 | |
jnthn | Right, because there's currently no cancellation mechanism on a Promise | ||
timotimo | so would we mix in a cancel method to some where we know we can do it and have a .^can in the react implementation? | 16:16 | |
not sure if we can have something sensible for start blocks; if a task is currently awaiting it could throw an exception like in java, but i think that'll lead to some rather ugly code | 16:17 | ||
dogbert2_ | removed the promise.in code from the original gist, maxrss 255360k | 16:18 | |
timotimo | passing a callback to be called on cancellation might be a way, but that'll lead to lots of boilerplate for signalling across that the work is supposed to be done | ||
jnthn | Yeah | 16:19 | |
this change | |||
- whenever Promise.in(10) { | |||
+ whenever Supply.interval(10, 10).head { | |||
Eliminates the leak | |||
Well, "leak" in that we don't actually lose track of memory | |||
We just keep it around a good bit longer than needed | 16:20 | ||
So, this is very much not a MoarVM issue | |||
Actually a language design issue :) | |||
So, who's the concurrency designer? :P | |||
timotimo: fwiw, I think a Promise::Cancellable subclass of Promise could be a way to go. It's just have an overridden method Supply that maps tap close to the cancellation | 16:21 | ||
timotimo | "good bit longer"; do we eventually reclaim those closures/continuations? | ||
jnthn | Yes | ||
timotimo | oh, when the time elapses? | 16:22 | |
jnthn | Right | ||
Will make sure of it now, but that matches the data I saw | |||
timotimo | i had imagined it a lot worse in my head :) | ||
but it also explains why moving the array outside fixes things; the closures all refer to the same array and old data is overriden every time | |||
jnthn | Because my first attempt showed only one LeakTracer instance | ||
Right. :) | 16:23 | ||
To get all of the instances, I had to shorten the time the program ran for | |||
By making it collect less data | |||
oh, another way to verify this | |||
Bump up to Promise.in(40) and see if we end up using even more memory | 16:24 | ||
Hm, curiously not | 16:25 | ||
timotimo | not enough iterations? or does it run forever? | ||
jnthn | It runs for 60s | ||
I made the Promise.in 100s now | |||
So "never" | |||
Hm, curious. Doesn't have quite the impact I expected | 16:26 | ||
Which means it may actually be as bad as timotimo feared | 16:28 | ||
timotimo | that it never actually reclaims it at all? | ||
jnthn | yeah | ||
I need to check in a few more places | |||
timotimo: yes, it was that bad :S | 16:36 | ||
16:36
shareable6 joined
|
|||
jnthn | Either Rakudo can do the cancel itself after a one-shot timer fires, or we can just clean it up in MoarVM | 16:37 | |
16:37
benchable6 joined
|
|||
jnthn | I've done a patch for the second | 16:37 | |
It's still more maxrss then replacing it with Supply.interval(10, 10) | |||
But it's a lot less than it was | |||
jnthn spectests | 16:38 | ||
japhb | jnthn: BTW, is today the beginning of your grant work already? (If so, AWESOME BTW) | 16:39 | |
jnthn | Yes :) | ||
Decided to start out with some leak hunting :) | |||
japhb | I think that's an excellent choice. :-) | ||
timotimo | jnthn++ # grant request approved | 16:42 | |
i'm also glad my early work on the heap analyzer has already made working more comfortable for jnthn :) | 16:43 | ||
jnthn | d'oh, my fix busts some tests | ||
[Coke] is reminded he has many grant related things to post tonight. :| | |||
(which is :| only because it's work for me. :) | 16:44 | ||
timotimo | i'm still bummed i haven't properly started my grant work yet, but the apartment search and subsequent move - which is not actually finished in any way yet - have left me pretty much drained of energy | 16:55 | |
[Coke] | timotimo: please note that the GC has rules about long running grants with no progress. :( | ||
Geth | MoarVM: c6519f4c32 | (Jonathan Worthington)++ | src/io/timers.c Clean up one-shot timers after firing Otherwise, we will end up holding on to the callback functions for them, which is a memory leak. We could in theory have solved the issue by making Rakudo do the cancellation upon first firing also, but this feels a tad more robust. |
16:56 | |
[Coke] | (for the kind that go through the voting part of the GC) | ||
jnthn | So there was a MoarVM issue in github.com/MoarVM/MoarVM/issues/680 after all | 16:57 | |
timotimo | [Coke]: should not be a problem once work starts, though, right? | 16:58 | |
[Coke] | timotimo: except it's been 4 months since the grant was awarded. | 17:02 | |
timotimo | can you point me at the rules in question? | 17:03 | |
[Coke] | We're discussing it on the GC list now, obv. Alan will reach out to you if needed. | 17:04 | |
timotimo | ah | ||
[Coke] | www.perlfoundation.org/rules_of_operation - Linked to off the main nav on that site.; section 2.6 | ||
timotimo | yeah, that's a sensible rule | 17:07 | |
17:09
hoelzro_ joined
17:14
travis-ci joined
|
|||
travis-ci | MoarVM build failed. Jonathan Worthington 'Clean up one-shot timers after firing | 17:14 | |
travis-ci.org/MoarVM/MoarVM/builds/340577255 github.com/MoarVM/MoarVM/compare/c...519f4c32d9 | |||
17:14
travis-ci left
17:20
statisfiable6 joined
17:36
dogbert17 joined
|
|||
dogbert17 | jnthn++: very nice, the unmodified script now shows a maxrss of 206364k, which is less than half of what it was before | 17:38 | |
jnthn | yay :) | 17:39 | |
[Coke] | jnthn++ | 17:53 | |
jnthn wanders home | 18:02 | ||
18:04
zakharyas joined
18:06
zakharyas joined
19:23
squashable6 joined
|
|||
nine | :q | 19:43 | |
20:21
zakharyas joined
20:38
bart__ joined
|
|||
brrt | good * | 20:40 | |
i also get breakage in nqp build | |||
however, i don't get a breakage when MVM_SPESH_BLOCKING=1 | 20:43 | ||
it is also sensitive to MVM_JIT_EXPR_DISABLE=1 | 20:45 | ||
w.t.f | |||
aha, that's interesting | 20:49 | ||
where do we insert the MVM_spesh_log_static things? | 20:51 | ||
timotimo | you mean getlexstatic_o? | 20:55 | |
brrt | oh | ||
is that a thing | |||
timotimo | it is | ||
brrt | hang on a minute | ||
timotimo | we gen that op instead of getlex if we know something is not going to change, or something like that | 20:56 | |
Geth | MoarVM: a01cdb449c | (Bart Wiegmans)++ | 2 files Disable getlexstatic_o for the time being This breaks the NQP build, but only when MVM_SPESH_BLOCKING isn't set. No idea why yet. |
20:59 | |
brrt | i see | 21:05 | |
anyway, the template looks good, so i'm curious where the jit fails | 21:06 | ||
timotimo | hm, do we actually log anything when the frame is already jitted? | 21:07 | |
brrt | don't know | 21:09 | |
but i do note that this case looks suspiciously like the sp_p6ogetvt_o fail | 21:10 | ||
and i hope the cause is the same | |||
timotimo | ah, i already forgot what went wrong with that one | ||
21:10
ChanServ joined
|
|||
brrt | one thing that was wrong, but not *the* thing interestingly, is that we wouldn't allocate registers for live ranges created during the tile rollup process | 21:11 | |
prescription for linear scan allocation is to iterate by popping-off the heap | |||
doesn't work when you have individual instructions that need caretaking by the register allocator | |||
like CALL, for one thing | 21:12 | ||
so, we iterate over each tile (instruction) as well | |||
and because we can allocate the last live range before running out of tiles to process | |||
we could miss processing tiles, and i had a 'rollup' loop to process all the last tiles | |||
but, if those last tiles would then create new live ranges (because of spilling...), those then wouldn't be processed | 21:13 | ||
so... altogether, i fixed that by having a single loop do both things | |||
and either proceed on tiles, or proceed on the live ranges | 21:14 | ||
this works, except, it doesn't actually fix the thing that was broken, which is Something Else that I don't know about just yet | |||
21:16
travis-ci joined
|
|||
travis-ci | MoarVM build passed. Bart Wiegmans 'Disable getlexstatic_o for the time being | 21:16 | |
travis-ci.org/MoarVM/MoarVM/builds/340678599 github.com/MoarVM/MoarVM/compare/c...1cdb449c96 | |||
21:16
travis-ci left
|
|||
brrt | thanks timotimo++, wouldn't have found it otherwise | 21:17 | |
timotimo | hm? | 21:20 | |
brrt | disabling getlexstatic_o | 21:22 | |
timotimo | oh? i literally just grepped interp.c for that function :) | 21:47 | |
22:18
greppable6 joined
22:31
Kaiepi joined
23:09
dogbert2 joined
|