japhb 2 MB seems like a nice savings to me ... 00:03
Or am I misreading?
jnthn No, you're not, though we're talking about on CORE.setting :) 00:05
We'll save some CPU too though, I imagine
timotimo aye, the time fetching the strings indirectly and then comparing them ... that's gotta be costly
mostly the fetching 00:06
TimToady compiling src/mast/compiler.o
src/mast/compiler.c: In function ‘form_bytecode_output’:
src/mast/compiler.c:1255:53: error: ‘MVMThreadContext’ has no member named ‘str_consts’
that's at HEAD
timotimo oh, why was that thing called "vm"?
hold on. 00:07
jnthn Didn't you, like, at least try to compile your change? :P
timotimo some of these macros take a "vm" argument, but use "tc" instead
dalek arVM: 9154ac3 | (Timo Paulssen)++ | src/mast/nodes_moar.h:
this argument was called "vm" misleadingly ...
00:08
00:50 cognome joined 01:46 FROGGS_ joined 03:18 jimmyz joined
jimmyz Stage parse : 34.374, before ~36s Stage mast : 11.899, before ~13s 03:19
since yesterday
xiaomiao death by a thousand papercuts :) 03:43
04:21 ventica joined 07:00 ventica joined
nwc10 m: say 6.243e+01/6.636e+01; say 6.243e+01/6.597e+01 07:17
camelia rakudo-moar a1a236: OUTPUT«0.940777576853526␤0.946339245111414␤»
nwc10 jnthn: about 6% less time than mid yesterday's slight slowdown, and 5.5% less than yesterday morning. I think 07:18
jnthn: but I do need to be careful, as building parrot tanks the machine performance, because it coredumps twice, filling the disk sufficient to make it slow
setting *with* -flto 6.257e+01 08:44
setting without -flto 6.243e+01
so, it makes things fractionally slower, but within the noise 08:45
so, not a free win
at least, not for setting building
timotimo thank you for measuring! 08:53
jnthn afternoon, #moarvm 11:17
FROGGS_ hi all 11:23
dalek arVM: c350fe0 | jnthn++ | src/6model/serialization.c:
Toss dead macro.
11:42
timotimo oh hey jnthn 12:26
did you see the performance of "for" benchmarks is still less than it was at our last release?
jnthn Hmm 12:38
Oh, because they stupidly put parens around their ranges.
timotimo %)
again? :D
time to fix that exact thing a second time
jnthn And the code-path that stripped away such things is applied after the opt
timotimo yeah, phase ordering problem yada yada 12:39
jnthn Heh. "Don't write superstitious parens; it'll make yoru code slower" isn't such a bad thing :P
timotimo %) 12:40
if i want to specialize smart_numify and smart_strify in the case where they cannot directly unbox, i'd have to do a method call on them; would i need to somehow find a correct callsite to give to prepargs? 12:41
or is there a prepargs-less method for invocation?
jnthn No, always need that.
timotimo in that case i'll do the "can unbox, yay" optimizations first and ignore the method call ones for now 12:42
hum. smart_strify only tries to unbox a str if the object is concrete ... so i have to test for that fact, too 12:49
timotimo wonders if brrt can work on moar-jit today 12:52
wow, the smrt_strify -> unbox opt seems to trigger *very* often 12:54
(only tested nqp so far) 12:55
ah, damn, rakudo compilation seems to stumble over it
LOL 13:01
damn you, debug output
since gen-cat has been dogfooded, it put a whole bunch of "optimized a call! yay!" lines into the resulting source code %) 13:02
jnthn Yes, fprintf(stderr,...) advised :P 13:04
timotimo that is very true 13:10
hm. i don't suppose we have a spesh opcode (or general opcode, really) to just take a pointer to MVMObject, reinterpret it as a pointer to MVMString and put it into a register's .s? 13:16
like sp_get_s without the first pointer dereferencing and with an offset of 0
jnthn Um, no, and that sounds dangerous 13:20
What do you want it for?
timotimo smart_strify checks the reprid of the object and if it is MVMString, it just casts 13:22
jnthn That...uh...should never actually occur
timotimo in that case, i'll just remove that case from smrt_strify itself :)
jnthn Can you try removing that case? 13:23
I really hope we don't rely on it.
timotimo sure
maybe i should have put a printf in there instead of removing it 13:28
oh 13:29
it would throw a "cannot stringify this" exception
that's fine, then
doesn't occur anywhere in rakudo's build
does that seem good enough for me to commit the patch?
jnthn Hm, well, spectest is nice but maybe do that after your other improvements 13:31
timotimo will do
13:44 cognome joined, cognominal joined 13:46 cognominal joined
timotimo 29.947 :D 13:47
sadly, the "advanced" smrt_strify cases don't seem to get triggered by either the build nor "make test"
will spectest now.
did i understand correctly that we can't currently put new method calls into spesh'd code? 13:50
jnthn I think perhaps we can, it's just tricky to deal with callsite stuff 13:51
timotimo if the interface was more evilness-friendly, i could directly try to inline the target method :P
jnthn But we need to look at that a bit anyway
Well, no, we should just emit the method call and then let the normal logic look over it to decide if it's inlinable.
timotimo but then i'd also have to find a proper spesh candidate and all that
aye
jnthn Composition always beats hacks. 13:52
timotimo i wasn't very serious about that :)
oh, lock.rakudo.moar crashes?
jnthn Hmm
Try it again just in case it's a one-off?
timotimo was about to
jnthn needs to do more stress testing on that sort of stuff
timotimo waiting for it to finish first 13:53
the failure from combinations.t was already reported in #perl6 by lizmat
although it seems like i have a crash and they had a "not ok" 13:54
spectests are fine 13:57
i (or the test harness) misinterpreted an exit(1) as a crash
oh, d'oh
i didn't save the changes in coerce.c ...
jnthn fail 13:58
dalek arVM: 35687e1 | (Timo Paulssen)++ | src/core/coerce.c:
we should never depend on this working.
14:14
arVM: cfafc8d | (Timo Paulssen)++ | src/spesh/optimize.c:
some smrt_strify can be optimized into simpler ops

like unboxing a string or unboxing num/int and coercing.
timotimo i guess smrt_numify may be worth even more, as it's probably commonly used instead of elems on arrays and hashes 14:31
jnthn yeah 14:32
nwc10 timotimo: did you mean to commit fprintf(stderr, "spesh'd a smrt_strify to unbox and coerce a %d\n", register_type); 14:35
and the other sprintf? 14:36
timotimo er, no 14:38
:)
dalek arVM: 255b466 | (Timo Paulssen)++ | src/spesh/optimize.c:
didn't mean to keep the debug output around
jnthn Seems you forget to relesae the temp reg at the end? 14:43
timotimo ah, yes. will fix that in a bit 14:49
actually, why not right now 14:52
(i blame the heat)
dalek arVM: 8a9fd7f | jnthn++ | src/mast/compiler.c:
Toss unused field.
14:54
arVM: 681ec90 | jnthn++ | src/mast/compiler.c:
Extract label handling code into functions.

Tidies the code, and will make the upcoming refactor a little easier.
arVM: a56d606 | jnthn++ | src/mast/compiler.c:
Switch over to using label identity for matching.

Means we can elminate a couple of hashes, but also that labels will no longer need to have a unique name generated.
arVM: 82ca33d | jnthn++ | src/mast/nodes_moar.h:
Remove name from MAST_Label; now unused.
15:08
timotimo hm. my numify → elems + coerce_in opt doesn't seem to be correct :\ 15:11
dalek arVM: 4656e18 | jnthn++ | lib/MAST/Nodes.nqp:
Remove name from MAST::Label and its constructor.

Breaking API change; requires NQP and Rakudo updates.
15:14
timotimo oh, it could be that the call gets tossed by the unused optimization?
jnthn timotimo: Is that one you've committed?
timotimo not yet
jnthn OK, good...I need to bump
timotimo this time i'm testing properly before i commit! :P
jnthn Yes, you probably should be setting usages up on things you add. 15:15
timotimo on temp registers, too?
jnthn Yes
timotimo that may explain it :)
jnthn Dead code elimination will happily kill instructions involving temp registers too :)
timotimo that fixed it, yay 15:17
this optimization runs often
nwc10 jnthn: "works" on "my" machine - 2 spectests currently aren't clear 15:46
or are flapping
15:51 zakharyas joined
dalek arVM: 84b5348 | (Timo Paulssen)++ | src/spesh/optimize.c:
release the temp register at the end.
15:53
arVM: acaa897 | (Timo Paulssen)++ | src/spesh/optimize.c:
spesh smrt_numify, bump usage counter of temp reg.

this triggers especially often in combination with a MVMArray or MVMHash repr'd object and gives us a
  (usually optimized) elems call + a coerce_in
timotimo after a spectest and a shower i feel confident pushing this
nwc10: yeah, they are for me, too. but when i run them manually, they succeed :(
a moar-jit with current master merged is at 30.5 seconds stage parse for me 16:03
that's hardly any worse than master alone.
jnthn Yeah. Thing is, when I tried the JIT on various hot loop stuff - even with bojecty code - I was seeing a 50% or so win. 16:08
timotimo did you try counting how often we deopt in the core setting compilation? 16:09
jnthn Yeah. Quite a lot. 16:10
timotimo or should i try that while you do more awesome optimization stuff? :3
jnthn And then I managed to reduce it a good bit
But it's not that costly so far as I can tell
timotimo how often compared to jumping into jitted code?
jnthn In fact, deopt from JIT is cheaper than from interpreter in terms of the cost of the deopt itself.
timotimo ah, that count was before the recent work
jnthn Didn't count how often we run JITted cdoe
timotimo fair enough, but if we deopt all the damn time, we'll end up interping all our code instead of running the jitted code :) 16:11
jnthn Yeah.
timotimo we could probably generate code in the jit output that counts how many opcodes we executed before we bailed due to deopt 16:12
or we could postpone that to a bit later 16:13
did brrt say he'd be AFK
all weekend?
nwc10 timotimo: sometimes that means that they are badly written. IIRC one failing was to assume the current directory 16:14
jnthn Think he said he was busy this weekend, yeah
timotimo ah, ok
then i don't need to wonder what's up
turns out, that smart numify/stringify were already implemented in the jit anyway
but i bet the spesh'd solution ends up cheaper in good cases
does it make any sense to spesh away a "not" instruction after an instruction where we know how to negate the result by choosing another instruction? like an isnull + not_i could be just isnonnull 16:16
nwc10 that sounds like bad codegen 16:17
timotimo .o( because the jit doesn't not_i yet )
nwc10 however, I guess that those sorts of sequences can appear as the result of inlining 16:19
timotimo not only that 16:20
nwc10 so, "how often?" and "how costly?" "how much benefit?"
timotimo every time the isnull is the result of one operation and the not_i is the result of another ...
the jit bails out of 36 frames in the core setting because it sees not_i
oh, many more of those are actually isnull_s 16:21
more than isnull itself
jnthn isnull_s and isnull should compile into the same assembly, surely.
oh, no, wait
They won't because of the VMNull thing. 16:22
timotimo at least they are already both implemented :) 16:23
you don't happen to know of some somewhat low-hanging optimization i could look at next? :) 16:42
i suppose if i am to implement some feature or ecosystem-related thing instead it'd end up being "gui frontend for the debugger", which will yak-shave-reduce to "improve GTK::Simple" 16:44
jnthn How's GTK::Simple doing these days?
timotimo it displays windows, buttons and labels :P
it's kinda hard to tell what's still in scope for GTK::Simple and what isn't 16:45
and how to move things into separate modules while still maintaining compatibility between the things
though since we got NativeCast now, ther's no need to have the same class repr the OpaquePointer were playing with
jnthn It's not LHF, but I have pondered that CAPHASH may want to cease to exist, and we build Match objects more directly out of $!cstack 16:47
We'd have to implement building Rakudo's ones too
So we get less code re-use...but Match object construction is so hot path that building an intermediate data structure every time is kinda costly. 16:48
Especially given the intermediate data structure is a hash 16:49
And hash lookups are one of the things we spend most time doing in CORE.setting compilation.
I'm not sure it's LHF, but it is at least "just" NQP and Perl 6 code to write :)
timotimo oof 16:52
commute & 16:58
i'll have a look later :)
17:21 ventica joined 17:53 cognome joined
timotimo now i've finished the commute and also some grocerisation 17:54
FROGGS jnthn: that CAPHASH removal sounds like awesome 17:59
japhb jnthn: still backlogging, so this may be resolved, but: The "superstitious parens in for loops" in the benchmarks were for three reasons: 1. Because it helps align with perl5, so I can visually see if I've typoed, 2. Because Perl 5 converts will accidentally do this *all the time*, and 3. Because it really shouldn't matter for performance, so if it does, I call that a bug worth catching. :-) 18:05
nwc10 m: say 6.1774e+01/6.243e+01; say 6.1774e+01/6.597e+01 18:57
camelia rakudo-moar e036e2: OUTPUT«0.989492231299055␤0.936395331211156␤»
nwc10 so 1% less than last time, and 6.4% less than yesterday morning 18:58
timotimo what happened since last time
?
nwc10 I don't know.
once upon a time it was "this week". Right now, it seems to be "this hour" 18:59
I guess, really, it's "this morning"
jnthn Coulda been the labels improvemnets 19:01
Also timotimo++'s patches
nwc10 does perl6bench like it? 19:02
timotimo oh :3
jnthn perl6bench doesn't measure compilation time really
nwc10 these are mostly compilation time fixies? 19:03
er, fixes
they don't help more general code paths?
jnthn My labels thing was 19:04
timotimo's are more genearl.
japhb jnthn: perl6-bench does measure compile time for each test, it just subtracts it from the run time of the test ... or did you mean, the compile time for the compiler itself? 19:14
timotimo the latter, i believe
jnthn No, I meant for the test...OK, I guess what I shoulda said is "doesn't appear in the graphs" - which is the right thing in many senses. :) 19:15
Though it could be itneresting to know about compile itme improvements over time :)
japhb jnthn: Just turn off the compile time ignoring 19:16
--/ignore-compile and/or --/ignore-setup 19:17
(Because bench defaults both to on.)
Mind you, you'll then see the combination of compile and run time, so hmmm.
Maybe I need a plot mode where it just shows the compile time for each test. 19:18
(Since the compile time is in the timings file, it's just normally subtracted out at analysis time) 19:19
19:23 cognome joined 19:53 ventica joined 20:35 ventica joined 20:59 ilbot3 joined
jnthn *sigh* That took some doing... 22:36
dalek arVM: 3e8e534 | jnthn++ | src/6model/s (3 files):
Prepare for lazy deserialization.
22:37
arVM: 9539fcd | jnthn++ | src/6model/ (4 files):
Start storing serialization reader in the SCRef.

We'll need to keep it around for deserialization. Move cleanup to the SCRef GC.
arVM: 0c30c2b | jnthn++ | src/ (3 files):
Make "allocate in gen2" tracking reentrant.
arVM: 7a722dc | jnthn++ | src/ (4 files):
Switch deserialization to take place lazily.

Now things are only deserialized on "first touch". Unfortunately, we are very touchy, as little is set up to take advantage of this. Even before looking into using it better, however, it takes another 2.5MB off the base memory of Rakudo with CORE.setting loaded.
timotimo nice :) 22:59
dalek arVM: 58fdbb2 | jnthn++ | src/ (4 files):
A little STable cleanup.

Kill two fields we don't, and won't, use. Also re-order a bit to try and get better cache access patterns.
23:10