japhb | 2 MB seems like a nice savings to me ... | 00:03 | |
Or am I misreading? | |||
jnthn | No, you're not, though we're talking about on CORE.setting :) | 00:05 | |
We'll save some CPU too though, I imagine | |||
timotimo | aye, the time fetching the strings indirectly and then comparing them ... that's gotta be costly | ||
mostly the fetching | 00:06 | ||
TimToady | compiling src/mast/compiler.o | ||
src/mast/compiler.c: In function ‘form_bytecode_output’: | |||
src/mast/compiler.c:1255:53: error: ‘MVMThreadContext’ has no member named ‘str_consts’ | |||
that's at HEAD | |||
timotimo | oh, why was that thing called "vm"? | ||
hold on. | 00:07 | ||
jnthn | Didn't you, like, at least try to compile your change? :P | ||
timotimo | some of these macros take a "vm" argument, but use "tc" instead | ||
dalek | arVM: 9154ac3 | (Timo Paulssen)++ | src/mast/nodes_moar.h: this argument was called "vm" misleadingly ... |
00:08 | |
00:50
cognome joined
01:46
FROGGS_ joined
03:18
jimmyz joined
|
|||
jimmyz | Stage parse : 34.374, before ~36s Stage mast : 11.899, before ~13s | 03:19 | |
since yesterday | |||
xiaomiao | death by a thousand papercuts :) | 03:43 | |
04:21
ventica joined
07:00
ventica joined
|
|||
nwc10 | m: say 6.243e+01/6.636e+01; say 6.243e+01/6.597e+01 | 07:17 | |
camelia | rakudo-moar a1a236: OUTPUT«0.9407775768535260.946339245111414» | ||
nwc10 | jnthn: about 6% less time than mid yesterday's slight slowdown, and 5.5% less than yesterday morning. I think | 07:18 | |
jnthn: but I do need to be careful, as building parrot tanks the machine performance, because it coredumps twice, filling the disk sufficient to make it slow | |||
setting *with* -flto 6.257e+01 | 08:44 | ||
setting without -flto 6.243e+01 | |||
so, it makes things fractionally slower, but within the noise | 08:45 | ||
so, not a free win | |||
at least, not for setting building | |||
timotimo | thank you for measuring! | 08:53 | |
jnthn | afternoon, #moarvm | 11:17 | |
FROGGS_ | hi all | 11:23 | |
dalek | arVM: c350fe0 | jnthn++ | src/6model/serialization.c: Toss dead macro. |
11:42 | |
timotimo | oh hey jnthn | 12:26 | |
did you see the performance of "for" benchmarks is still less than it was at our last release? | |||
jnthn | Hmm | 12:38 | |
Oh, because they stupidly put parens around their ranges. | |||
timotimo | %) | ||
again? :D | |||
time to fix that exact thing a second time | |||
jnthn | And the code-path that stripped away such things is applied after the opt | ||
timotimo | yeah, phase ordering problem yada yada | 12:39 | |
jnthn | Heh. "Don't write superstitious parens; it'll make yoru code slower" isn't such a bad thing :P | ||
timotimo | %) | 12:40 | |
if i want to specialize smart_numify and smart_strify in the case where they cannot directly unbox, i'd have to do a method call on them; would i need to somehow find a correct callsite to give to prepargs? | 12:41 | ||
or is there a prepargs-less method for invocation? | |||
jnthn | No, always need that. | ||
timotimo | in that case i'll do the "can unbox, yay" optimizations first and ignore the method call ones for now | 12:42 | |
hum. smart_strify only tries to unbox a str if the object is concrete ... so i have to test for that fact, too | 12:49 | ||
timotimo wonders if brrt can work on moar-jit today | 12:52 | ||
wow, the smrt_strify -> unbox opt seems to trigger *very* often | 12:54 | ||
(only tested nqp so far) | 12:55 | ||
ah, damn, rakudo compilation seems to stumble over it | |||
LOL | 13:01 | ||
damn you, debug output | |||
since gen-cat has been dogfooded, it put a whole bunch of "optimized a call! yay!" lines into the resulting source code %) | 13:02 | ||
jnthn | Yes, fprintf(stderr,...) advised :P | 13:04 | |
timotimo | that is very true | 13:10 | |
hm. i don't suppose we have a spesh opcode (or general opcode, really) to just take a pointer to MVMObject, reinterpret it as a pointer to MVMString and put it into a register's .s? | 13:16 | ||
like sp_get_s without the first pointer dereferencing and with an offset of 0 | |||
jnthn | Um, no, and that sounds dangerous | 13:20 | |
What do you want it for? | |||
timotimo | smart_strify checks the reprid of the object and if it is MVMString, it just casts | 13:22 | |
jnthn | That...uh...should never actually occur | ||
timotimo | in that case, i'll just remove that case from smrt_strify itself :) | ||
jnthn | Can you try removing that case? | 13:23 | |
I really hope we don't rely on it. | |||
timotimo | sure | ||
maybe i should have put a printf in there instead of removing it | 13:28 | ||
oh | 13:29 | ||
it would throw a "cannot stringify this" exception | |||
that's fine, then | |||
doesn't occur anywhere in rakudo's build | |||
does that seem good enough for me to commit the patch? | |||
jnthn | Hm, well, spectest is nice but maybe do that after your other improvements | 13:31 | |
timotimo | will do | ||
13:44
cognome joined,
cognominal joined
13:46
cognominal joined
|
|||
timotimo | 29.947 :D | 13:47 | |
sadly, the "advanced" smrt_strify cases don't seem to get triggered by either the build nor "make test" | |||
will spectest now. | |||
did i understand correctly that we can't currently put new method calls into spesh'd code? | 13:50 | ||
jnthn | I think perhaps we can, it's just tricky to deal with callsite stuff | 13:51 | |
timotimo | if the interface was more evilness-friendly, i could directly try to inline the target method :P | ||
jnthn | But we need to look at that a bit anyway | ||
Well, no, we should just emit the method call and then let the normal logic look over it to decide if it's inlinable. | |||
timotimo | but then i'd also have to find a proper spesh candidate and all that | ||
aye | |||
jnthn | Composition always beats hacks. | 13:52 | |
timotimo | i wasn't very serious about that :) | ||
oh, lock.rakudo.moar crashes? | |||
jnthn | Hmm | ||
Try it again just in case it's a one-off? | |||
timotimo | was about to | ||
jnthn needs to do more stress testing on that sort of stuff | |||
timotimo | waiting for it to finish first | 13:53 | |
the failure from combinations.t was already reported in #perl6 by lizmat | |||
although it seems like i have a crash and they had a "not ok" | 13:54 | ||
spectests are fine | 13:57 | ||
i (or the test harness) misinterpreted an exit(1) as a crash | |||
oh, d'oh | |||
i didn't save the changes in coerce.c ... | |||
jnthn | fail | 13:58 | |
dalek | arVM: 35687e1 | (Timo Paulssen)++ | src/core/coerce.c: we should never depend on this working. |
14:14 | |
arVM: cfafc8d | (Timo Paulssen)++ | src/spesh/optimize.c: some smrt_strify can be optimized into simpler ops like unboxing a string or unboxing num/int and coercing. |
|||
timotimo | i guess smrt_numify may be worth even more, as it's probably commonly used instead of elems on arrays and hashes | 14:31 | |
jnthn | yeah | 14:32 | |
nwc10 | timotimo: did you mean to commit fprintf(stderr, "spesh'd a smrt_strify to unbox and coerce a %d\n", register_type); | 14:35 | |
and the other sprintf? | 14:36 | ||
timotimo | er, no | 14:38 | |
:) | |||
dalek | arVM: 255b466 | (Timo Paulssen)++ | src/spesh/optimize.c: didn't mean to keep the debug output around |
||
jnthn | Seems you forget to relesae the temp reg at the end? | 14:43 | |
timotimo | ah, yes. will fix that in a bit | 14:49 | |
actually, why not right now | 14:52 | ||
(i blame the heat) | |||
dalek | arVM: 8a9fd7f | jnthn++ | src/mast/compiler.c: Toss unused field. |
14:54 | |
arVM: 681ec90 | jnthn++ | src/mast/compiler.c: Extract label handling code into functions. Tidies the code, and will make the upcoming refactor a little easier. |
|||
arVM: a56d606 | jnthn++ | src/mast/compiler.c: Switch over to using label identity for matching. Means we can elminate a couple of hashes, but also that labels will no longer need to have a unique name generated. |
|||
arVM: 82ca33d | jnthn++ | src/mast/nodes_moar.h: Remove name from MAST_Label; now unused. |
15:08 | ||
timotimo | hm. my numify → elems + coerce_in opt doesn't seem to be correct :\ | 15:11 | |
dalek | arVM: 4656e18 | jnthn++ | lib/MAST/Nodes.nqp: Remove name from MAST::Label and its constructor. Breaking API change; requires NQP and Rakudo updates. |
15:14 | |
timotimo | oh, it could be that the call gets tossed by the unused optimization? | ||
jnthn | timotimo: Is that one you've committed? | ||
timotimo | not yet | ||
jnthn | OK, good...I need to bump | ||
timotimo | this time i'm testing properly before i commit! :P | ||
jnthn | Yes, you probably should be setting usages up on things you add. | 15:15 | |
timotimo | on temp registers, too? | ||
jnthn | Yes | ||
timotimo | that may explain it :) | ||
jnthn | Dead code elimination will happily kill instructions involving temp registers too :) | ||
timotimo | that fixed it, yay | 15:17 | |
this optimization runs often | |||
nwc10 | jnthn: "works" on "my" machine - 2 spectests currently aren't clear | 15:46 | |
or are flapping | |||
15:51
zakharyas joined
|
|||
dalek | arVM: 84b5348 | (Timo Paulssen)++ | src/spesh/optimize.c: release the temp register at the end. |
15:53 | |
arVM: acaa897 | (Timo Paulssen)++ | src/spesh/optimize.c: spesh smrt_numify, bump usage counter of temp reg. this triggers especially often in combination with a MVMArray or MVMHash repr'd object and gives us a (usually optimized) elems call + a coerce_in |
|||
timotimo | after a spectest and a shower i feel confident pushing this | ||
nwc10: yeah, they are for me, too. but when i run them manually, they succeed :( | |||
a moar-jit with current master merged is at 30.5 seconds stage parse for me | 16:03 | ||
that's hardly any worse than master alone. | |||
jnthn | Yeah. Thing is, when I tried the JIT on various hot loop stuff - even with bojecty code - I was seeing a 50% or so win. | 16:08 | |
timotimo | did you try counting how often we deopt in the core setting compilation? | 16:09 | |
jnthn | Yeah. Quite a lot. | 16:10 | |
timotimo | or should i try that while you do more awesome optimization stuff? :3 | ||
jnthn | And then I managed to reduce it a good bit | ||
But it's not that costly so far as I can tell | |||
timotimo | how often compared to jumping into jitted code? | ||
jnthn | In fact, deopt from JIT is cheaper than from interpreter in terms of the cost of the deopt itself. | ||
timotimo | ah, that count was before the recent work | ||
jnthn | Didn't count how often we run JITted cdoe | ||
timotimo | fair enough, but if we deopt all the damn time, we'll end up interping all our code instead of running the jitted code :) | 16:11 | |
jnthn | Yeah. | ||
timotimo | we could probably generate code in the jit output that counts how many opcodes we executed before we bailed due to deopt | 16:12 | |
or we could postpone that to a bit later | 16:13 | ||
did brrt say he'd be AFK | |||
all weekend? | |||
nwc10 | timotimo: sometimes that means that they are badly written. IIRC one failing was to assume the current directory | 16:14 | |
jnthn | Think he said he was busy this weekend, yeah | ||
timotimo | ah, ok | ||
then i don't need to wonder what's up | |||
turns out, that smart numify/stringify were already implemented in the jit anyway | |||
but i bet the spesh'd solution ends up cheaper in good cases | |||
does it make any sense to spesh away a "not" instruction after an instruction where we know how to negate the result by choosing another instruction? like an isnull + not_i could be just isnonnull | 16:16 | ||
nwc10 | that sounds like bad codegen | 16:17 | |
timotimo | .o( because the jit doesn't not_i yet ) | ||
nwc10 | however, I guess that those sorts of sequences can appear as the result of inlining | 16:19 | |
timotimo | not only that | 16:20 | |
nwc10 | so, "how often?" and "how costly?" "how much benefit?" | ||
timotimo | every time the isnull is the result of one operation and the not_i is the result of another ... | ||
the jit bails out of 36 frames in the core setting because it sees not_i | |||
oh, many more of those are actually isnull_s | 16:21 | ||
more than isnull itself | |||
jnthn | isnull_s and isnull should compile into the same assembly, surely. | ||
oh, no, wait | |||
They won't because of the VMNull thing. | 16:22 | ||
timotimo | at least they are already both implemented :) | 16:23 | |
you don't happen to know of some somewhat low-hanging optimization i could look at next? :) | 16:42 | ||
i suppose if i am to implement some feature or ecosystem-related thing instead it'd end up being "gui frontend for the debugger", which will yak-shave-reduce to "improve GTK::Simple" | 16:44 | ||
jnthn | How's GTK::Simple doing these days? | ||
timotimo | it displays windows, buttons and labels :P | ||
it's kinda hard to tell what's still in scope for GTK::Simple and what isn't | 16:45 | ||
and how to move things into separate modules while still maintaining compatibility between the things | |||
though since we got NativeCast now, ther's no need to have the same class repr the OpaquePointer were playing with | |||
jnthn | It's not LHF, but I have pondered that CAPHASH may want to cease to exist, and we build Match objects more directly out of $!cstack | 16:47 | |
We'd have to implement building Rakudo's ones too | |||
So we get less code re-use...but Match object construction is so hot path that building an intermediate data structure every time is kinda costly. | 16:48 | ||
Especially given the intermediate data structure is a hash | 16:49 | ||
And hash lookups are one of the things we spend most time doing in CORE.setting compilation. | |||
I'm not sure it's LHF, but it is at least "just" NQP and Perl 6 code to write :) | |||
timotimo | oof | 16:52 | |
commute & | 16:58 | ||
i'll have a look later :) | |||
17:21
ventica joined
17:53
cognome joined
|
|||
timotimo | now i've finished the commute and also some grocerisation | 17:54 | |
FROGGS | jnthn: that CAPHASH removal sounds like awesome | 17:59 | |
japhb | jnthn: still backlogging, so this may be resolved, but: The "superstitious parens in for loops" in the benchmarks were for three reasons: 1. Because it helps align with perl5, so I can visually see if I've typoed, 2. Because Perl 5 converts will accidentally do this *all the time*, and 3. Because it really shouldn't matter for performance, so if it does, I call that a bug worth catching. :-) | 18:05 | |
nwc10 | m: say 6.1774e+01/6.243e+01; say 6.1774e+01/6.597e+01 | 18:57 | |
camelia | rakudo-moar e036e2: OUTPUT«0.9894922312990550.936395331211156» | ||
nwc10 | so 1% less than last time, and 6.4% less than yesterday morning | 18:58 | |
timotimo | what happened since last time | ||
? | |||
nwc10 | I don't know. | ||
once upon a time it was "this week". Right now, it seems to be "this hour" | 18:59 | ||
I guess, really, it's "this morning" | |||
jnthn | Coulda been the labels improvemnets | 19:01 | |
Also timotimo++'s patches | |||
nwc10 | does perl6bench like it? | 19:02 | |
timotimo | oh :3 | ||
jnthn | perl6bench doesn't measure compilation time really | ||
nwc10 | these are mostly compilation time fixies? | 19:03 | |
er, fixes | |||
they don't help more general code paths? | |||
jnthn | My labels thing was | 19:04 | |
timotimo's are more genearl. | |||
japhb | jnthn: perl6-bench does measure compile time for each test, it just subtracts it from the run time of the test ... or did you mean, the compile time for the compiler itself? | 19:14 | |
timotimo | the latter, i believe | ||
jnthn | No, I meant for the test...OK, I guess what I shoulda said is "doesn't appear in the graphs" - which is the right thing in many senses. :) | 19:15 | |
Though it could be itneresting to know about compile itme improvements over time :) | |||
japhb | jnthn: Just turn off the compile time ignoring | 19:16 | |
--/ignore-compile and/or --/ignore-setup | 19:17 | ||
(Because bench defaults both to on.) | |||
Mind you, you'll then see the combination of compile and run time, so hmmm. | |||
Maybe I need a plot mode where it just shows the compile time for each test. | 19:18 | ||
(Since the compile time is in the timings file, it's just normally subtracted out at analysis time) | 19:19 | ||
19:23
cognome joined
19:53
ventica joined
20:35
ventica joined
20:59
ilbot3 joined
|
|||
jnthn | *sigh* That took some doing... | 22:36 | |
dalek | arVM: 3e8e534 | jnthn++ | src/6model/s (3 files): Prepare for lazy deserialization. |
22:37 | |
arVM: 9539fcd | jnthn++ | src/6model/ (4 files): Start storing serialization reader in the SCRef. We'll need to keep it around for deserialization. Move cleanup to the SCRef GC. |
|||
arVM: 0c30c2b | jnthn++ | src/ (3 files): Make "allocate in gen2" tracking reentrant. |
|||
arVM: 7a722dc | jnthn++ | src/ (4 files): Switch deserialization to take place lazily. Now things are only deserialized on "first touch". Unfortunately, we are very touchy, as little is set up to take advantage of this. Even before looking into using it better, however, it takes another 2.5MB off the base memory of Rakudo with CORE.setting loaded. |
|||
timotimo | nice :) | 22:59 | |
dalek | arVM: 58fdbb2 | jnthn++ | src/ (4 files): A little STable cleanup. Kill two fields we don't, and won't, use. Also re-order a bit to try and get better cache access patterns. |
23:10 |