timotimo i'll check it out 00:07
79.04user 1.04system 1:20.35elapsed 99%CPU (0avgtext+0avgdata 1092980maxresident)k 00:11
so that's a small part of it
78.69user 1.09system 1:20.09elapsed 99%CPU (0avgtext+0avgdata 1092796maxresident)k 00:34
another run, a bit better timing apparently
01:14 FROGGS_ joined 06:24 mj41 joined
nwc10 jnthn: strange, but I think I'm testing the correct thing. Anyway, the new origin/inline also still happy 07:07
07:26 FROGGS[mobile] joined
jnthn nwc10: OK, thanks. 09:15
09:21 lizmat joined 09:46 mj41 joined
dalek Heuristic branch merge: pushed 95 commits to MoarVM by jnthn 09:57
timotimo inline has been inlined to master? : 09:58
:)
nwc10 running dumbbench suggests that the allocator makes it about 4% faster than no allocator
it's sad that malloc() still doesn't win 09:59
timotimo huh?
jnthn Yes, merged.
Time to get more feedback :) 10:00
nwc10 I didn't study jnthn's allocator *too* closely, but it seemed to be a general purpose allocator, not specifically particular sizes
so I couldn't see (fundamentally) why it should be faster than malloc
jnthn nwc10: It stores stuff by size classes and freeing also requires knowing the size allocated. 10:01
nwc10 aha. the latter might be part of the win. 10:02
a malloc could store stuff by size classes. Hence my initial confusion
timotimo uh 10:10
did nqp builds use to be that fast?
39.95user 0.95system 0:41.40elapsed 98%CPU (0avgtext+0avgdata 161788maxresident)k
jnthn Maybe not :) 10:12
Rakudo build got a bit faster, after all... 10:13
nwc10 dumbbench thinks that the setting build is about 2.5% faster 10:14
but reports vary, depending on how many outliers it threw away
timotimo without inline, but with CGOTO: 10:15
37.75user 0.83system 0:38.75elapsed 99%CPU (0avgtext+0avgdata 120728maxresident)k
jnthn timotimo: Any reason we can't turn CGOTO on by default if we detect GCC?
Also, what happens if we cgoto + inline? :) 10:16
timotimo 36.17user 1.00system 0:37.86elapsed 98%CPU (0avgtext+0avgdata 161856maxresident)k
the memory usage increase is a bit worrying, IMO.
jnthn Yes, I'm curious where that comes from. 10:17
timotimo 36.08user 0.98system 0:37.24elapsed 99%CPU (0avgtext+0avgdata 161892maxresident)k
so that's somewhat stable ... ish 10:18
nwc10 is the memory use lower on the commit before the custom allocator?
jnthn Good question.
timotimo 38.04user 1.00system 0:39.21elapsed 99%CPU (0avgtext+0avgdata 146088maxresident)k 10:19
that's on 7a52289
nwc10 jnthn: if I understand the source code well enough from skimming it, you're allocating using a bin for objects 1-8, a bin for 9-16, ... 257-264 ... 1017-1024
jnthn Right. 10:20
nwc10 jemalloc isn't using as many bins: www.canonware.com/download/jemalloc...alloc.html
(see "Table 1. Size classes")
and, I'm guessing, MoarMV simply isn't allocating stuff in some of the bin sizes 10:21
and *is* allocating a lot of things of the same size
so having lots of bins wins. 10:22
jnthn Also, an unused bin is near-enough free
timotimo doesn't really know the build system stuff, so can't easily turn on cgoto when gcc is found
vendethiel ooh, inline got merged ? 10:31
timotimo aye
vendethiel nice ! jnthn++
timotimo inline got inlined :3
vendethiel (every else contributing/that has contributed)++ 10:32
timotimo that's mostly nwc10
i didn't do anything :)
jnthn: are there spesh analysis/improvements i could try to build that would benefit greatly from inlined stuff? 10:36
jnthn timotimo: Maybe; one other thing you might like to try, looking at my profile output here, is to look at nqp_nfa_run 10:37
timotimo: And see about using the fixed size allocator instead of malloc/free in there 10:38
timotimo is that c-level or nqp-level?
jnthn C level
timotimo ah
jnthn Apparently 1.1% of setting build time goes on that malloc/free pair.
timotimo that doesn't immediately sound like a huge deal; don't we do lots and lots of nfa during setting compilation? 10:39
jnthn A potential 1% saving for a few lines tweaking is quite a bit. 10:41
timotimo that's 1% if you can make it 10x faster :)
haven't run an actual profile in a long time 10:45
a c-level profile, that is
jnthn Ah, found some memory management fail. 10:46
timotimo that's good :) 10:47
dalek arVM: 9d440a3 | jnthn++ | src/core/fixedsizealloc.c:
Add mechanism for debugging fixed size alloc/free.

Can set a flag where it checks the allocated and freed sizes match up, and panics if they fail to.
jnthn We fail that check, and it seems it happens if we deopt. 10:48
nwc10 jnthn: one thing I was wondering was whether the outermost level of the fixed size stuff could be an inline function - the one that decides if it is in a bin or not 10:49
so that, if one changes the "bin detection" code to "never uses a bin" in a way that the C compiler's optimiser can see
then it can generate code that always uses malloc 10:50
which keeps OpenBSD happy
timotimo iiuc this is about very short-lived objects, which would benefit from having an all-at-once free step 10:53
there's no way to do this on the stack, aye? 10:54
at least for the nfa?
jnthn nwc10: On MSVC at least, considering I couldn't breakpoint an optimized build inside of that outermost one, it already was doing an inline there. 10:56
timotimo: Well, it *is* possible we could allocate one big chunk of memory for the NFA processing and then free it. 10:57
Yes, it's short lived. 10:58
timotimo jnthn: that's not what's going on with the fixed size allocator?
is that allocator itself long-lived?
jnthn The allocator lives for the whole process 11:00
timotimo ah, ok
in that case, yeah, the nfa could possibly benefit from a short-lived allocator
jnthn Not really 11:01
timotimo OK, what do i know :)
jnthn It's just that it makes 4 calls to malloc/free when it could do 2, and then it could use the fixed size allocator which seems to be cheaper than malloc.
timotimo that does sound like a win, aye 11:03
i wonder how many serious security-related bugs lie hidden in moarvm's code 11:04
nwc10 only 1 known use-after-free 11:05
not tried using valgrind to find uninit warnings
dalek arVM: 3bf1aa7 | jnthn++ | src/core/frame. (2 files):
Fix freeing of frame memory to correct bucket.

Before we sometimes ended up putting it back in the wrong one, if we deoptimized. This corrects that issue, hopefully improving memory use.
jnthn timotimo: Plesae try with that, but it seems to help here.
timotimo sure
37.11user 1.17system 0:38.66elapsed 99%CPU (0avgtext+0avgdata 144480maxresident)k 11:07
that's a bit better than before you put the fixed size allocator in
jnthn ah, good 11:10
So it was that.
timotimo still 20mb more than before we had inline at all
does that seem like a sane amount of ram usage for inlining things?
jnthn A little higher than I'd expect 11:11
timotimo i'm generally in favor of having much less ram usage in moarvm, but that's not connected to any particular "work item" 11:12
jnthn Well, also I don't know to what degree it's a VM-level issue and to what degree we need to be more frugal with memory at a higher level.
timotimo fair enough 11:13
there's still the issue with strings being stored many, many times in ram
jnthn It's like QAST node construction.
We've been optimizing all kinds, but the way QAST nodes get created is basically performance hostile.
timotimo is that still the case? 11:14
jnthn Yes.
timotimo ah, that's where we iterate over names and call methods to set attributes?
jnthn Right, meaning that every single one of those method calls is a late-bound lookup
timotimo yeah, ouch!
jnthn And it's a megamorphic callsite, so there's basically nothing the optimizer can do. 11:15
timotimo can we perhaps get that to use nqp::bindattr directly?
instead of the methods?
jnthn Well, having constructors that are more specialized to the nodes may also help
Additionally, not all nodes have children.
timotimo mhm. lots more typing, but better performance for all backends i suspect
jnthn But every single SVal, NVal, WVal, etc. currently has an array allocated for them.
timotimo right, SVal, IVal, WVal, NVal wouldn't have children
the same treatment annotations got might not be that helpful for children lists, right? 11:16
because we really do want to keep the positional_delegate
jnthn yeah, we want that for API reasons too
timotimo should we have a QAST::ChildlessNode as the top of the class hierarchy and then derive one with a children array? 11:17
jnthn No
I'd be more inclined to write a role
timotimo mhm
jnthn And it's composed by the node classes that have children.
timotimo another idea would be to bind nqp::null to the children list?
oh, that'll be problematic if we iterate over nodes without knowing if they'll have children or not 11:18
jnthn Also we waste the 8 bytes for the pointer we don't need.
timotimo what we could do is bind the same empty list to all childless nodes 11:19
how does that sound?
jnthn No, we should do the role thing I'm suggesting.
timotimo how does that interact with trying to iterate over nodes?
will we get a .list method call emitted for all places that would be problematic? 11:20
in that case we could return a global empty list object from that and otherwise have the role provide the list
jnthn I think we can do it transparently to the current usage
That is, this can be done as an internal refactor to the QAST nodes without breaking anything. 11:21
timotimo that would be nice indeed
only very few qast nodes survive past the compilation stage of a program's lifetime, right? 11:22
there's the qast nodes that survive to make inlining in the optimizer possible, do they survive past the last compilation stage?
well, to be fair, the maxrss in building is surely dominated by the compilation phases, as there's very little code being run there 11:23
jnthn Yeah, we serialize the QAST tree for things taht we view as inlineable, yes 11:25
Though it's quite restricted.
timotimo aye, i recall that 11:26
11:32 JimmyZ_ joined 12:14 vendethiel joined
dalek Heuristic branch merge: pushed 117 commits to MoarVM/moar-jit by bdw 12:49
jnthn That's some catch-up :) 12:50
nwc10 jnthn: does your compiler do link time optimisation? In that, can it inline the non-static functions that are used for the allocator? (just curious)
12:51 cognominal joined
jnthn Yes. 12:51
With the default MoarVM build options, anyway.
nwc10 Ah OK. So I guess that that makes those functions behave pretty much like they were static
anyway, this is all possibly premature optimsation (and therefore wrong). You've already made it easy to disable the functionality, and always use the system malloc (or the malloc replacing tool) 12:52
./perl6-m t/spec/S17-promise/allof.t 12:59
;==8851==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fd93c1a9272 sp 0x7fffe15273b0 bp 0x7fffe15273f0 T0)
oh, that's supposed to be red.
anyway, ungood.
master/master/nom
lizmat fwiw, I see that test failing intermittently in the spectest 13:02
over on #perl6 I was just handwaving about a static .WHICH for an object 13:03
I think we're getting to the point that the current non-constant nature of .WHICH is starting to cause problems
jnthn WHICH really wants a re-visit in many ways. 13:04
The current implementation is doomed to be slow also.
And I doubt it has good entropy.
lizmat so how bad is the idea of a per-thread simple int64 counter ? 13:11
jnthn Well, but where to store it? 13:12
We don't want to make very object 8 bytes bigger...
And for, say, Int, the identity is tied up in the value 13:13
nwc10 8 bytes bigger is >2% more peak memory
it's 2% just usng 8 bytes per P6Opaque
lizmat do we want to start playing variable struct size tricks like in P5 ? 13:16
nwc10 bugger. t/spec/S17-promise/allof.t passes first time under valgrind
lizmat: probably not. Because if thread 2 can change the size of a structure (And move it) than every *read* in thread 1 needs to grab a mutex to prevent thread 2 from doing that at the wrong time. 13:17
and, if reads need mutexes, deadlock becomes much easier. 13:18
(oh, that's second order)
reads become much slower 13:19
lizmat yeah, so it makes much more sense to just add it to the struct ?
nwc10 what, add a "which" to the object header?
lizmat isn't that what we're talking about ? 13:20
nwc10 yes. that also sucks, because memory usage will increase by (maybe) 5%
lizmat however, in this case it doesn't seem needed:
$ 6 'say 42.REPR; say 42.WHICH'
P6opaque
Int|42
so maybe we need a P6opaquevalue ? 13:21
that wouldn't need the .which in the struct ?
or maybe treat anything that needs a non-value based .WHICH differently wrt to allocating ? 13:23
jnthn Well, thing is that *most* objects don't ever have .WHICH called on them 13:24
We should associate the cost with using the feature.
13:24 zakharyas joined
lizmat are you talking CPU or memory cost ? 13:26
jnthn Both 13:27
lizmat I'm assuming code depends on the fixed length of an P6opaque?
jnthn More generally, I'm thinking about having the storage of WHICH values be more like a hash table arrangement.
lizmat what would be the key?
and would you clean it up when an object gets destroyed? 13:28
jnthn The object - the trickiness here being it needs to be VM-supported.
Right.
13:28 brrt joined
lizmat and that hash would be per thread, I assume ? 13:29
otherwise we get serious locking issues, no?
jnthn Probably needs to be
otoh, then we get different issues
jnthn doesn't see any particularly easy solutions 13:30
lizmat would the simple approach maybe not be best? 13:32
jnthn No. 13:33
lizmat take the 8byte per Opaque hit, only set it when actually asked for?
at least until we think of something better ?
jnthn No, we should work out the better thing, not pile up technical debt.
13:34 mj41 joined
jnthn It woulda been nice if the spec had been so lenient as Java's .hashCode() spec, which can change over an object's lifetime... 13:35
lizmat well, then maybe we need to pick this up at a higher level?
jnthn But it's not, which is a Tricky Problem. But a big memory usage increase on everything isn't a great answer.
lizmat or maybe only assign some .WHICH when it gets moved out of the nursery (and *then* add the extra 8 bytes) 13:37
and if a .WHICH is called on something not in the nursery, move it out?
*in the nursery rather
jnthn You can't "just move it out", but one idea TimToady++ hinted at that can be feasible is using the gen2 address if it's already there, or pre-allocating a gen2 slot for the object if we are asked for its WHICH and keeping a table of nursery objects => WHICH values. 13:38
And we remove those entries at GC time, due to collection or movement.
lizmat is trying to serve as a catalyst :-) 13:40
brrt oh, i wanted to mention, creating a 'move / copy' node for the jit runs into the register selection explosion problem again, so i'm not doing that (yet)
nwc10 I like TimToady's suggestion. I think it could work well. 13:41
can do that without more RAM by (ab)using the union in the object header, but would need another flag to say that it's being done, and slow SC access 13:42
(you'd put the real SC pointer into the pre-allocated gen2 space)
dalek arVM: 22773f2 | jnthn++ | src/spesh/args.c:
Don't refuse to spesh if we've a slurpy positional
13:44
jnthn timotimo: Feel free to give qast_refactor branches in NQP and Rakudo a spin. 13:54
timotimo 36.30user 0.95system 0:37.52elapsed 99%CPU (0avgtext+0avgdata 142724maxresident)k 14:03
2mb less usage apparently
but about 1s less time? could very well be noise. 14:04
jnthn That's NQP build?
timotimo aye
jnthn OK. Rakudo one could be interesting too. :)
timotimo OK 14:12
refactor'd: 76.05user 0.95system 1:17.56elapsed 99%CPU (0avgtext+0avgdata 820128maxresident)k 14:14
14:15 brrt joined
jnthn tries to find the previous numbers :) 14:15
timotimo i'm making new ones
master'd: 76.37user 1.03system 1:17.60elapsed 99%CPU (0avgtext+0avgdata 826456maxresident)k 14:17
jnthn Hmm, a memory win, not so much of a performance one, curiously. 14:21
timotimo beware the noise
i didn't shut down all running programs :)
jnthn ah
walk :) And when I'm back, I'll look at the spesh args missing thing where it doesn't know how to handle boxing/unboxing and so bails. 14:39
14:52 betterworld joined 15:02 btyler joined 15:08 brrt left
nwc10 jnthn: for those 2 branches, t/spec/S17-scheduler/every.t can fail with a NULL pointer at 16:01
#0 0x7f1e40b4f0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121
#1 0x7f1e40b4f1b1 in MVM_fixed_size_alloc_zeroed src/core/fixedsizealloc.c:144
#2 0x7f1e40adac20 in allocate_frame src/core/frame.c:201
but not reliably
total fails are: t/spec/S06-macros/opaque-ast.rakudo.moar t/spec/S06-macros/unquoting.rakudo.moar t/spec/S17-lowlevel/lock.rakudo.moar t/spec/S17-scheduler/every.t t/spec/integration/advent2012-day23.t 16:02
the S17 are ASAN. The other 3 are
===SORRY!===P6opaque: no such attribute '$!position'
jnthn Hmm, that sounds like "missing a commit"
nwc10 This is nqp version 2014.05-14-g2147886 built on MoarVM version 2014.05-121-g22773f2 16:03
This is perl6 version 2014.05-193-g6d23540 built on MoarVM version 2014.05-121-g22773f2
jnthn Yes, I just pushed the missing one. D'oh.
Thought the error looked very familiar... 16:04
timotimo how often do we have slurpy positional subroutines/methods in nqp and rakudo source respectively? 16:12
hm. so a slurpy positional argument will turn into a list. and we know exactly how big that list is at spesh-time. do i smell a specialization opportunity? 16:14
though, we probably often do things like iterate over these and stuff like that
jnthn timotimo: Yeah, we can do something there, I suspect 16:16
timotimo a fact flag "KNOWN_ARRAY_SIZE"?
probably more like "KNOWN_ELEMENT_COUNT" 16:17
jnthn Oh, I wasn't thinking of even going that far.
timotimo another thing is that if we have a method that has slurpy positional arguments and we "just pass it on" to another, spesh will see it involves flattening and bail out, won't it?
jnthn Just potentially using the sp_getarg_ ops to grab the args and put them into the array. 16:18
Yes
Obviously, there's a change to do better there, but not sure how easy it is.
timotimo if we know we just got these arguments from a slurpy positional, we can probably assume it's safe
i'm not sure i know how that sp_getarg_ thing you mentioned would work; will the positionals that'll end up in the slurped array just be available like regular positionals? 16:19
jnthn Well, I think it actually probably wants to go the other way around.
As in, "I see I get called with a flattening callsite, and I take a slurpy there"
timotimo oh, as in: instead of flattening this array and slurping it again, let's just pass the array directly" 16:20
that seems more sensible, i agree
no spesh: 40.23user 0.91system 0:41.37elapsed 99%CPU (0avgtext+0avgdata 118300maxresident)k 16:35
spesh: 36.37user 0.93system 0:37.52elapsed 99%CPU (0avgtext+0avgdata 144524maxresident)k
that's the complete nqp build
no spesh: 84.33user 1.02system 1:25.91elapsed 99%CPU (0avgtext+0avgdata 722140maxresident)k 16:39
spesh: 77.57user 1.07system 1:18.86elapsed 99%CPU (0avgtext+0avgdata 826312maxresident)k
that's the complete rakudo build
m: say (1 * 60 + 18) / (1 * 60 + 25)
camelia rakudo-moar 7f22e9: OUTPUTĀ«0.917647ā¤Ā»
timotimo this is with inline already; i thought inline would do crazy improvements to the parse time, what with inlining proto regexes and such :/ 16:40
but 9% isn't bad either.
jnthn Well, remember it's just taking out invocation overhead. 16:43
timotimo that contains argument passing and returning already, right? 16:44
and cross-invocation-dead-code-elimination and constant-folding?
jnthn Not the latter two yet really. 16:45
It's being a bit conservative so as not to ruin the inline annotations.
timotimo oh
huh, what is this. the very first thing that gets spesh'd has a named parameter operation removed, which had BB(3) as its label, but BB(3) is still listed as that block's successor? 16:46
rather: as one of the successors
i wonder if this leads to less dead code elimination than is necessary 16:49
i wonder if BBs should be merged if they become completely linear during spesh?
that's probably not easy to do given the dominance tree and stuff?
jnthn It's also not wroth it at all.
BBs don't correspond to anything at runtime. 16:50
gist.github.com/jnthn/2050e5ed6e8991e24e53 # example of inline making a difference.
timotimo OK
oh, that's not too shabby :) 16:51
jnthn Yeah. It's just that if you look at profiles of CORE.setting compilation and similar, invocation overhead is only so much 16:52
timotimo i s'pose that's fair 16:53
dalek arVM: dd80dbf | (Timo Paulssen)++ | src/spesh/optimize.c:
put in a missing break
17:02
timotimo does it sound sensible to spesh coerce_in and coerce_ni? 17:04
probably not much that can be done, eh? 17:05
i see at least one const_n + coerce_ni 17:07
er, actually const_i + coerce_in
a whole lot of coerces of those two come directly after smrt_numify 17:08
hum. these const_i's are all 16bit ints; so replacing the const_i + coerce with a const_n will give us a 64bit num in its place 17:11
should still be a win, right?
would also get rid of a bit of interpretation overhead? i would assume with coerce and const_i, the interpreter overhead is many times what the operation itself takes 17:13
jnthn Well, it's an instruction cheaper, yes. 17:15
nwc10 jnthn: ./perl6-m t/spec/S17-scheduler/every.t can SEGV: 17:22
#0 0x7f421a79b0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121
#1 0x7f421a7fce31 in bind_key src/6model/reprs/MVMHash.c:86
./perl6-m t/spec/S17-promise/allof.t can SEGV
#0 0x7f948135a0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121
#1 0x7f94813bbe31 in bind_key src/6model/reprs/MVMHash.c:86
so, something isn't quite as threadsafe as it should be.
jnthn aye
Looks like
nwc10 both are NULL pointers
threads are hard, let's go asyncing. 17:23
timotimo should we build a smrt_intify? because i see a whole bunch of smrt_numify followed directly by coerce_ni
hm, actually ... that wouldn't be much help
because we still have to parse the stuff after the . because there could be an E in there 17:24
nwc10 or 17:25
core (noun), plural coredump
jnthn timotimo: I think it already exists. 17:26
timotimo there's smrt_numify and smrt_strify 17:27
those are the only ones with smrt_ or ify in their name
jnthn hm, you're right :) 17:36
In other news, I just finally managed to get the instrumented profile in VS to work.
17:36 FROGGS joined
timotimo it's still kinda questionable if that would really help 17:36
knowing that the result is going to be intified
FROGGS o/
timotimo o/ FROGGS
jnthn While that runs, I'm going to find some food :) 17:37
wow, it wrote 18GB so far. Good job it's on something with half a terrabyte to hand...
bbiab 17:38
timotimo 76.60user 1.06system 1:17.88elapsed 99%CPU (0avgtext+0avgdata 826528maxresident)k 17:46
vs 17:48
76.11user 1.10system 1:17.41elapsed 99%CPU (0avgtext+0avgdata 826520maxresident)k
so the coerce thing isn't worth terribly much. not really surprising 17:49
(first line is with coerce spesh thingie, second is without)
dalek arVM: 87221ba | (Timo Paulssen)++ | src/spesh/optimize.c:
can do coerce_in of literals at spesh-time.
17:50
jnthn lol 17:52
CORE.setting running with instrumented profiling got done while I was shopping :)
80GB.
FROGGS not a SSD me thinks 17:54
jnthn No
Spinning rust, and boy is it making a racket now as it analyses the data.
FROGGS that is also a problem of SSDs, they are so fast, when something write stuff to it in an infiniloop you almost can't stop it 17:55
18:38 zakharyas joined 18:40 mj41 joined 18:45 bcode joined 18:55 mj41 joined
timotimo it's like having an LTE on your phone, but a 1gb data limit 19:59
so, what's going on now? :) 20:00
the analysis hopefully is already done? :D
jnthn Yeah. 20:04
Took ages :)
But it got done while I coked, ate, etc.
uh, cooked :)
timotimo sadly, the spesh_diff tool is broken with the current spesh log format 20:05
somehow ...
jnthn Curiously, the instrumented profiler thinks we spend about half as much time in GC as the sampling profile does. 20:07
timotimo huh, that's weird. 20:08
jnthn getattributte is still by some way the most costly thing we do. 20:11
'cus, I assume, spesh can't handle most of the getattribute/bindattribute in Cursors.
That's a pretty strong indicator that I should work on that in 2014.07. :) 20:12
timotimo that's half a month in the future! :(
anything simple i could try to bang my head against in the mean time? 20:14
jnthn No, I mean, for the 2014.07 release
timotimo ah, ok
jnthn I don't really want to go optimizing much further at this point.
Would rather work on fixes, making sure stuff works well for this week's release. 20:15
Then after it can get back to opts :)
timotimo ah ... yeah, that *is* fair 20:16
we do have some known problems with our async and multithreaded things on moar, for example
jnthn Well, we know there's problems. :P 20:19
Anyway, interesting to look through the report. 20:21
String comp comes up fairly high, but a lot of that is 'cus we're still hitting the attribute access slow path so often. 20:22
timotimo mhm 20:23
jnthn 2.6% is spent in smart_numify. Not such a smart move.
1.3% in smart_stringify
timotimo i kind of sort of wish we could give Rat a big speed boost 20:57
it seems likely to me that many people who come to try out p6 are going to be using the / operator and stumbling over the pretty tough performance hit
jnthn Well, step 1 is to write benchmarks for it in perl6-bench, so we understand the magnitude of the problem and how we can improve it :) 20:58
timotimo oh, of course :) 20:59
i could have thought of that
21:07 cognominal joined
dalek arVM/moar-jit: 1b1eac4 | (Bart Wiegmans)++ | / (8 files):
Configure JIT with environmental variables.

This should make the JIT play more nicely. Also supports hello world :-)
21:08
21:08 brrt joined
tadzik :o 21:10
brrt: are the generated files being commited to not depend on lua? 21:11
brrt oh.. yes
oh, good of you to mention that
i forgot the win32 x64 files
tadzik :)
dalek arVM/moar-jit: 1537dcd | (Bart Wiegmans)++ | src/jit/emit_win32_x64.c:
Forgot the win32 x64 dynasm output.
21:12
tadzik do you have like a commit hook to regen all those files?
that might be handy 21:13
brrt not yet
yep
jnthn + MVMString * s = sf->body.cu->body.strings[idx]; + | mov64 TMP, (uintptr_t)s
About that, it assumes gen2 and thus non-moving, which is fine for the string heap, but need to be careful when it comes to, say, spesh slots.
brrt yes, i know, its hacky, but the alternative was i started up coding a call to MVM_strings_get() which - afaik - doesn't exist yet, and the commit wsa big enough as it is :-) 21:14
i'm somewhat against ripping moarvm interp open and diverging before i've got a chance to merge, is what i mean :-) 21:15
jnthn *nod* 21:16
brrt hmm
i'm looking at the getlex_** ops, they look tricky (i.e. not really what i want to encode in single a MVMJItCallC node 21:17
in that the return value is a pointer that needs to be dereferenced before i can store it in the register 21:18
jnthn I think for the JIT we can do some case analysis on those.
brrt case analysis? 21:19
jnthn For example, if outers is 0, then it's just looking directly into ->env
For i/n/s.
The auto-viv doesn't happen.
brrt agreed
not for s, either?
jnthn For o you can know if it's going to auto-viv
No 21:20
brrt ok, seems fair 21:21
fwiw, getlex isn't really the problem, getlex_n. are :-)
jnthn Oh...how so? 21:22
Those are the named forms
And so not so hot
As they handle the (less common) late-bound cases.
21:23 donaldh joined
jnthn brrt: The if file handle then fprintf thing will get tiresome, I suspect; I suggest an MVM_INLINE function. 21:26
brrt yes, it does get tiresome, but how do i pass varargs through to printf? 21:27
jnthn - because they return a pointer
long story short
i call function
pointer is stored in %rax
pointer is to be dereferenced into some temporrary register
temporary register is to be copied into moarvm register space 21:28
thats... annoying
especially considering what happens if value-of-pointer happens to be a float
jnthn brrt: See MVM_exception_throw_adhoc or MVM_panic for example of vararg-hanlding functions
brrt ok, i'll do that :-) 21:29
jnthn They pass to sprintf, but it should be abou tth esame trick.
wow, so typing
What makes it annoying in the float case?
brrt oh, isee 21:31
floats are 80 bits wide on x86_64
my guess is they still are when you return them as MVMnum64
that is a guess, though 21:32
jnthn Hm, I was sure MVMRegister - the union with that in it - came out as 8 bytes wide
brrt then... i hope i'm wrong
i'm just not sure what happens when you stash them in a integer register - obviously you can't do math on them :-) but if the bits come out ok, then it still should be ok 21:33
b
oops 21:34
dalek arVM/moar-jit: 9e8e69b | (Bart Wiegmans)++ | / (5 files):
More low-hanging fruit opcodes.
21:46
brrt off for tonight
21:46 brrt left
jnthn sleep & 22:32
FROGGS gnight jnthn 22:33
lizmat gnight jnthn 22:34
timotimo gnite jnthn :) 22:36
23:43 daxim joined