timotimo | i'll check it out | 00:07 | |
79.04user 1.04system 1:20.35elapsed 99%CPU (0avgtext+0avgdata 1092980maxresident)k | 00:11 | ||
so that's a small part of it | |||
78.69user 1.09system 1:20.09elapsed 99%CPU (0avgtext+0avgdata 1092796maxresident)k | 00:34 | ||
another run, a bit better timing apparently | |||
01:14
FROGGS_ joined
06:24
mj41 joined
|
|||
nwc10 | jnthn: strange, but I think I'm testing the correct thing. Anyway, the new origin/inline also still happy | 07:07 | |
07:26
FROGGS[mobile] joined
|
|||
jnthn | nwc10: OK, thanks. | 09:15 | |
09:21
lizmat joined
09:46
mj41 joined
|
|||
dalek | Heuristic branch merge: pushed 95 commits to MoarVM by jnthn | 09:57 | |
timotimo | inline has been inlined to master? : | 09:58 | |
:) | |||
nwc10 | running dumbbench suggests that the allocator makes it about 4% faster than no allocator | ||
it's sad that malloc() still doesn't win | 09:59 | ||
timotimo | huh? | ||
jnthn | Yes, merged. | ||
Time to get more feedback :) | 10:00 | ||
nwc10 | I didn't study jnthn's allocator *too* closely, but it seemed to be a general purpose allocator, not specifically particular sizes | ||
so I couldn't see (fundamentally) why it should be faster than malloc | |||
jnthn | nwc10: It stores stuff by size classes and freeing also requires knowing the size allocated. | 10:01 | |
nwc10 | aha. the latter might be part of the win. | 10:02 | |
a malloc could store stuff by size classes. Hence my initial confusion | |||
timotimo | uh | 10:10 | |
did nqp builds use to be that fast? | |||
39.95user 0.95system 0:41.40elapsed 98%CPU (0avgtext+0avgdata 161788maxresident)k | |||
jnthn | Maybe not :) | 10:12 | |
Rakudo build got a bit faster, after all... | 10:13 | ||
nwc10 | dumbbench thinks that the setting build is about 2.5% faster | 10:14 | |
but reports vary, depending on how many outliers it threw away | |||
timotimo | without inline, but with CGOTO: | 10:15 | |
37.75user 0.83system 0:38.75elapsed 99%CPU (0avgtext+0avgdata 120728maxresident)k | |||
jnthn | timotimo: Any reason we can't turn CGOTO on by default if we detect GCC? | ||
Also, what happens if we cgoto + inline? :) | 10:16 | ||
timotimo | 36.17user 1.00system 0:37.86elapsed 98%CPU (0avgtext+0avgdata 161856maxresident)k | ||
the memory usage increase is a bit worrying, IMO. | |||
jnthn | Yes, I'm curious where that comes from. | 10:17 | |
timotimo | 36.08user 0.98system 0:37.24elapsed 99%CPU (0avgtext+0avgdata 161892maxresident)k | ||
so that's somewhat stable ... ish | 10:18 | ||
nwc10 | is the memory use lower on the commit before the custom allocator? | ||
jnthn | Good question. | ||
timotimo | 38.04user 1.00system 0:39.21elapsed 99%CPU (0avgtext+0avgdata 146088maxresident)k | 10:19 | |
that's on 7a52289 | |||
nwc10 | jnthn: if I understand the source code well enough from skimming it, you're allocating using a bin for objects 1-8, a bin for 9-16, ... 257-264 ... 1017-1024 | ||
jnthn | Right. | 10:20 | |
nwc10 | jemalloc isn't using as many bins: www.canonware.com/download/jemalloc...alloc.html | ||
(see "Table 1. Size classes") | |||
and, I'm guessing, MoarMV simply isn't allocating stuff in some of the bin sizes | 10:21 | ||
and *is* allocating a lot of things of the same size | |||
so having lots of bins wins. | 10:22 | ||
jnthn | Also, an unused bin is near-enough free | ||
timotimo doesn't really know the build system stuff, so can't easily turn on cgoto when gcc is found | |||
vendethiel | ooh, inline got merged ? | 10:31 | |
timotimo | aye | ||
vendethiel | nice ! jnthn++ | ||
timotimo | inline got inlined :3 | ||
vendethiel | (every else contributing/that has contributed)++ | 10:32 | |
timotimo | that's mostly nwc10 | ||
i didn't do anything :) | |||
jnthn: are there spesh analysis/improvements i could try to build that would benefit greatly from inlined stuff? | 10:36 | ||
jnthn | timotimo: Maybe; one other thing you might like to try, looking at my profile output here, is to look at nqp_nfa_run | 10:37 | |
timotimo: And see about using the fixed size allocator instead of malloc/free in there | 10:38 | ||
timotimo | is that c-level or nqp-level? | ||
jnthn | C level | ||
timotimo | ah | ||
jnthn | Apparently 1.1% of setting build time goes on that malloc/free pair. | ||
timotimo | that doesn't immediately sound like a huge deal; don't we do lots and lots of nfa during setting compilation? | 10:39 | |
jnthn | A potential 1% saving for a few lines tweaking is quite a bit. | 10:41 | |
timotimo | that's 1% if you can make it 10x faster :) | ||
haven't run an actual profile in a long time | 10:45 | ||
a c-level profile, that is | |||
jnthn | Ah, found some memory management fail. | 10:46 | |
timotimo | that's good :) | 10:47 | |
dalek | arVM: 9d440a3 | jnthn++ | src/core/fixedsizealloc.c: Add mechanism for debugging fixed size alloc/free. Can set a flag where it checks the allocated and freed sizes match up, and panics if they fail to. |
||
jnthn | We fail that check, and it seems it happens if we deopt. | 10:48 | |
nwc10 | jnthn: one thing I was wondering was whether the outermost level of the fixed size stuff could be an inline function - the one that decides if it is in a bin or not | 10:49 | |
so that, if one changes the "bin detection" code to "never uses a bin" in a way that the C compiler's optimiser can see | |||
then it can generate code that always uses malloc | 10:50 | ||
which keeps OpenBSD happy | |||
timotimo | iiuc this is about very short-lived objects, which would benefit from having an all-at-once free step | 10:53 | |
there's no way to do this on the stack, aye? | 10:54 | ||
at least for the nfa? | |||
jnthn | nwc10: On MSVC at least, considering I couldn't breakpoint an optimized build inside of that outermost one, it already was doing an inline there. | 10:56 | |
timotimo: Well, it *is* possible we could allocate one big chunk of memory for the NFA processing and then free it. | 10:57 | ||
Yes, it's short lived. | 10:58 | ||
timotimo | jnthn: that's not what's going on with the fixed size allocator? | ||
is that allocator itself long-lived? | |||
jnthn | The allocator lives for the whole process | 11:00 | |
timotimo | ah, ok | ||
in that case, yeah, the nfa could possibly benefit from a short-lived allocator | |||
jnthn | Not really | 11:01 | |
timotimo | OK, what do i know :) | ||
jnthn | It's just that it makes 4 calls to malloc/free when it could do 2, and then it could use the fixed size allocator which seems to be cheaper than malloc. | ||
timotimo | that does sound like a win, aye | 11:03 | |
i wonder how many serious security-related bugs lie hidden in moarvm's code | 11:04 | ||
nwc10 | only 1 known use-after-free | 11:05 | |
not tried using valgrind to find uninit warnings | |||
dalek | arVM: 3bf1aa7 | jnthn++ | src/core/frame. (2 files): Fix freeing of frame memory to correct bucket. Before we sometimes ended up putting it back in the wrong one, if we deoptimized. This corrects that issue, hopefully improving memory use. |
||
jnthn | timotimo: Plesae try with that, but it seems to help here. | ||
timotimo | sure | ||
37.11user 1.17system 0:38.66elapsed 99%CPU (0avgtext+0avgdata 144480maxresident)k | 11:07 | ||
that's a bit better than before you put the fixed size allocator in | |||
jnthn | ah, good | 11:10 | |
So it was that. | |||
timotimo | still 20mb more than before we had inline at all | ||
does that seem like a sane amount of ram usage for inlining things? | |||
jnthn | A little higher than I'd expect | 11:11 | |
timotimo | i'm generally in favor of having much less ram usage in moarvm, but that's not connected to any particular "work item" | 11:12 | |
jnthn | Well, also I don't know to what degree it's a VM-level issue and to what degree we need to be more frugal with memory at a higher level. | ||
timotimo | fair enough | 11:13 | |
there's still the issue with strings being stored many, many times in ram | |||
jnthn | It's like QAST node construction. | ||
We've been optimizing all kinds, but the way QAST nodes get created is basically performance hostile. | |||
timotimo | is that still the case? | 11:14 | |
jnthn | Yes. | ||
timotimo | ah, that's where we iterate over names and call methods to set attributes? | ||
jnthn | Right, meaning that every single one of those method calls is a late-bound lookup | ||
timotimo | yeah, ouch! | ||
jnthn | And it's a megamorphic callsite, so there's basically nothing the optimizer can do. | 11:15 | |
timotimo | can we perhaps get that to use nqp::bindattr directly? | ||
instead of the methods? | |||
jnthn | Well, having constructors that are more specialized to the nodes may also help | ||
Additionally, not all nodes have children. | |||
timotimo | mhm. lots more typing, but better performance for all backends i suspect | ||
jnthn | But every single SVal, NVal, WVal, etc. currently has an array allocated for them. | ||
timotimo | right, SVal, IVal, WVal, NVal wouldn't have children | ||
the same treatment annotations got might not be that helpful for children lists, right? | 11:16 | ||
because we really do want to keep the positional_delegate | |||
jnthn | yeah, we want that for API reasons too | ||
timotimo | should we have a QAST::ChildlessNode as the top of the class hierarchy and then derive one with a children array? | 11:17 | |
jnthn | No | ||
I'd be more inclined to write a role | |||
timotimo | mhm | ||
jnthn | And it's composed by the node classes that have children. | ||
timotimo | another idea would be to bind nqp::null to the children list? | ||
oh, that'll be problematic if we iterate over nodes without knowing if they'll have children or not | 11:18 | ||
jnthn | Also we waste the 8 bytes for the pointer we don't need. | ||
timotimo | what we could do is bind the same empty list to all childless nodes | 11:19 | |
how does that sound? | |||
jnthn | No, we should do the role thing I'm suggesting. | ||
timotimo | how does that interact with trying to iterate over nodes? | ||
will we get a .list method call emitted for all places that would be problematic? | 11:20 | ||
in that case we could return a global empty list object from that and otherwise have the role provide the list | |||
jnthn | I think we can do it transparently to the current usage | ||
That is, this can be done as an internal refactor to the QAST nodes without breaking anything. | 11:21 | ||
timotimo | that would be nice indeed | ||
only very few qast nodes survive past the compilation stage of a program's lifetime, right? | 11:22 | ||
there's the qast nodes that survive to make inlining in the optimizer possible, do they survive past the last compilation stage? | |||
well, to be fair, the maxrss in building is surely dominated by the compilation phases, as there's very little code being run there | 11:23 | ||
jnthn | Yeah, we serialize the QAST tree for things taht we view as inlineable, yes | 11:25 | |
Though it's quite restricted. | |||
timotimo | aye, i recall that | 11:26 | |
11:32
JimmyZ_ joined
12:14
vendethiel joined
|
|||
dalek | Heuristic branch merge: pushed 117 commits to MoarVM/moar-jit by bdw | 12:49 | |
jnthn | That's some catch-up :) | 12:50 | |
nwc10 | jnthn: does your compiler do link time optimisation? In that, can it inline the non-static functions that are used for the allocator? (just curious) | ||
12:51
cognominal joined
|
|||
jnthn | Yes. | 12:51 | |
With the default MoarVM build options, anyway. | |||
nwc10 | Ah OK. So I guess that that makes those functions behave pretty much like they were static | ||
anyway, this is all possibly premature optimsation (and therefore wrong). You've already made it easy to disable the functionality, and always use the system malloc (or the malloc replacing tool) | 12:52 | ||
./perl6-m t/spec/S17-promise/allof.t | 12:59 | ||
;==8851==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fd93c1a9272 sp 0x7fffe15273b0 bp 0x7fffe15273f0 T0) | |||
oh, that's supposed to be red. | |||
anyway, ungood. | |||
master/master/nom | |||
lizmat | fwiw, I see that test failing intermittently in the spectest | 13:02 | |
over on #perl6 I was just handwaving about a static .WHICH for an object | 13:03 | ||
I think we're getting to the point that the current non-constant nature of .WHICH is starting to cause problems | |||
jnthn | WHICH really wants a re-visit in many ways. | 13:04 | |
The current implementation is doomed to be slow also. | |||
And I doubt it has good entropy. | |||
lizmat | so how bad is the idea of a per-thread simple int64 counter ? | 13:11 | |
jnthn | Well, but where to store it? | 13:12 | |
We don't want to make very object 8 bytes bigger... | |||
And for, say, Int, the identity is tied up in the value | 13:13 | ||
nwc10 | 8 bytes bigger is >2% more peak memory | ||
it's 2% just usng 8 bytes per P6Opaque | |||
lizmat | do we want to start playing variable struct size tricks like in P5 ? | 13:16 | |
nwc10 | bugger. t/spec/S17-promise/allof.t passes first time under valgrind | ||
lizmat: probably not. Because if thread 2 can change the size of a structure (And move it) than every *read* in thread 1 needs to grab a mutex to prevent thread 2 from doing that at the wrong time. | 13:17 | ||
and, if reads need mutexes, deadlock becomes much easier. | 13:18 | ||
(oh, that's second order) | |||
reads become much slower | 13:19 | ||
lizmat | yeah, so it makes much more sense to just add it to the struct ? | ||
nwc10 | what, add a "which" to the object header? | ||
lizmat | isn't that what we're talking about ? | 13:20 | |
nwc10 | yes. that also sucks, because memory usage will increase by (maybe) 5% | ||
lizmat | however, in this case it doesn't seem needed: | ||
$ 6 'say 42.REPR; say 42.WHICH' | |||
P6opaque | |||
Int|42 | |||
so maybe we need a P6opaquevalue ? | 13:21 | ||
that wouldn't need the .which in the struct ? | |||
or maybe treat anything that needs a non-value based .WHICH differently wrt to allocating ? | 13:23 | ||
jnthn | Well, thing is that *most* objects don't ever have .WHICH called on them | 13:24 | |
We should associate the cost with using the feature. | |||
13:24
zakharyas joined
|
|||
lizmat | are you talking CPU or memory cost ? | 13:26 | |
jnthn | Both | 13:27 | |
lizmat | I'm assuming code depends on the fixed length of an P6opaque? | ||
jnthn | More generally, I'm thinking about having the storage of WHICH values be more like a hash table arrangement. | ||
lizmat | what would be the key? | ||
and would you clean it up when an object gets destroyed? | 13:28 | ||
jnthn | The object - the trickiness here being it needs to be VM-supported. | ||
Right. | |||
13:28
brrt joined
|
|||
lizmat | and that hash would be per thread, I assume ? | 13:29 | |
otherwise we get serious locking issues, no? | |||
jnthn | Probably needs to be | ||
otoh, then we get different issues | |||
jnthn doesn't see any particularly easy solutions | 13:30 | ||
lizmat | would the simple approach maybe not be best? | 13:32 | |
jnthn | No. | 13:33 | |
lizmat | take the 8byte per Opaque hit, only set it when actually asked for? | ||
at least until we think of something better ? | |||
jnthn | No, we should work out the better thing, not pile up technical debt. | ||
13:34
mj41 joined
|
|||
jnthn | It woulda been nice if the spec had been so lenient as Java's .hashCode() spec, which can change over an object's lifetime... | 13:35 | |
lizmat | well, then maybe we need to pick this up at a higher level? | ||
jnthn | But it's not, which is a Tricky Problem. But a big memory usage increase on everything isn't a great answer. | ||
lizmat | or maybe only assign some .WHICH when it gets moved out of the nursery (and *then* add the extra 8 bytes) | 13:37 | |
and if a .WHICH is called on something not in the nursery, move it out? | |||
*in the nursery rather | |||
jnthn | You can't "just move it out", but one idea TimToady++ hinted at that can be feasible is using the gen2 address if it's already there, or pre-allocating a gen2 slot for the object if we are asked for its WHICH and keeping a table of nursery objects => WHICH values. | 13:38 | |
And we remove those entries at GC time, due to collection or movement. | |||
lizmat is trying to serve as a catalyst :-) | 13:40 | ||
brrt | oh, i wanted to mention, creating a 'move / copy' node for the jit runs into the register selection explosion problem again, so i'm not doing that (yet) | ||
nwc10 | I like TimToady's suggestion. I think it could work well. | 13:41 | |
can do that without more RAM by (ab)using the union in the object header, but would need another flag to say that it's being done, and slow SC access | 13:42 | ||
(you'd put the real SC pointer into the pre-allocated gen2 space) | |||
dalek | arVM: 22773f2 | jnthn++ | src/spesh/args.c: Don't refuse to spesh if we've a slurpy positional |
13:44 | |
jnthn | timotimo: Feel free to give qast_refactor branches in NQP and Rakudo a spin. | 13:54 | |
timotimo | 36.30user 0.95system 0:37.52elapsed 99%CPU (0avgtext+0avgdata 142724maxresident)k | 14:03 | |
2mb less usage apparently | |||
but about 1s less time? could very well be noise. | 14:04 | ||
jnthn | That's NQP build? | ||
timotimo | aye | ||
jnthn | OK. Rakudo one could be interesting too. :) | ||
timotimo | OK | 14:12 | |
refactor'd: 76.05user 0.95system 1:17.56elapsed 99%CPU (0avgtext+0avgdata 820128maxresident)k | 14:14 | ||
14:15
brrt joined
|
|||
jnthn tries to find the previous numbers :) | 14:15 | ||
timotimo | i'm making new ones | ||
master'd: 76.37user 1.03system 1:17.60elapsed 99%CPU (0avgtext+0avgdata 826456maxresident)k | 14:17 | ||
jnthn | Hmm, a memory win, not so much of a performance one, curiously. | 14:21 | |
timotimo | beware the noise | ||
i didn't shut down all running programs :) | |||
jnthn | ah | ||
walk :) And when I'm back, I'll look at the spesh args missing thing where it doesn't know how to handle boxing/unboxing and so bails. | 14:39 | ||
14:52
betterworld joined
15:02
btyler joined
15:08
brrt left
|
|||
nwc10 | jnthn: for those 2 branches, t/spec/S17-scheduler/every.t can fail with a NULL pointer at | 16:01 | |
#0 0x7f1e40b4f0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121 | |||
#1 0x7f1e40b4f1b1 in MVM_fixed_size_alloc_zeroed src/core/fixedsizealloc.c:144 | |||
#2 0x7f1e40adac20 in allocate_frame src/core/frame.c:201 | |||
but not reliably | |||
total fails are: t/spec/S06-macros/opaque-ast.rakudo.moar t/spec/S06-macros/unquoting.rakudo.moar t/spec/S17-lowlevel/lock.rakudo.moar t/spec/S17-scheduler/every.t t/spec/integration/advent2012-day23.t | 16:02 | ||
the S17 are ASAN. The other 3 are | |||
===SORRY!===P6opaque: no such attribute '$!position' | |||
jnthn | Hmm, that sounds like "missing a commit" | ||
nwc10 | This is nqp version 2014.05-14-g2147886 built on MoarVM version 2014.05-121-g22773f2 | 16:03 | |
This is perl6 version 2014.05-193-g6d23540 built on MoarVM version 2014.05-121-g22773f2 | |||
jnthn | Yes, I just pushed the missing one. D'oh. | ||
Thought the error looked very familiar... | 16:04 | ||
timotimo | how often do we have slurpy positional subroutines/methods in nqp and rakudo source respectively? | 16:12 | |
hm. so a slurpy positional argument will turn into a list. and we know exactly how big that list is at spesh-time. do i smell a specialization opportunity? | 16:14 | ||
though, we probably often do things like iterate over these and stuff like that | |||
jnthn | timotimo: Yeah, we can do something there, I suspect | 16:16 | |
timotimo | a fact flag "KNOWN_ARRAY_SIZE"? | ||
probably more like "KNOWN_ELEMENT_COUNT" | 16:17 | ||
jnthn | Oh, I wasn't thinking of even going that far. | ||
timotimo | another thing is that if we have a method that has slurpy positional arguments and we "just pass it on" to another, spesh will see it involves flattening and bail out, won't it? | ||
jnthn | Just potentially using the sp_getarg_ ops to grab the args and put them into the array. | 16:18 | |
Yes | |||
Obviously, there's a change to do better there, but not sure how easy it is. | |||
timotimo | if we know we just got these arguments from a slurpy positional, we can probably assume it's safe | ||
i'm not sure i know how that sp_getarg_ thing you mentioned would work; will the positionals that'll end up in the slurped array just be available like regular positionals? | 16:19 | ||
jnthn | Well, I think it actually probably wants to go the other way around. | ||
As in, "I see I get called with a flattening callsite, and I take a slurpy there" | |||
timotimo | oh, as in: instead of flattening this array and slurping it again, let's just pass the array directly" | 16:20 | |
that seems more sensible, i agree | |||
no spesh: 40.23user 0.91system 0:41.37elapsed 99%CPU (0avgtext+0avgdata 118300maxresident)k | 16:35 | ||
spesh: 36.37user 0.93system 0:37.52elapsed 99%CPU (0avgtext+0avgdata 144524maxresident)k | |||
that's the complete nqp build | |||
no spesh: 84.33user 1.02system 1:25.91elapsed 99%CPU (0avgtext+0avgdata 722140maxresident)k | 16:39 | ||
spesh: 77.57user 1.07system 1:18.86elapsed 99%CPU (0avgtext+0avgdata 826312maxresident)k | |||
that's the complete rakudo build | |||
m: say (1 * 60 + 18) / (1 * 60 + 25) | |||
camelia | rakudo-moar 7f22e9: OUTPUTĀ«0.917647ā¤Ā» | ||
timotimo | this is with inline already; i thought inline would do crazy improvements to the parse time, what with inlining proto regexes and such :/ | 16:40 | |
but 9% isn't bad either. | |||
jnthn | Well, remember it's just taking out invocation overhead. | 16:43 | |
timotimo | that contains argument passing and returning already, right? | 16:44 | |
and cross-invocation-dead-code-elimination and constant-folding? | |||
jnthn | Not the latter two yet really. | 16:45 | |
It's being a bit conservative so as not to ruin the inline annotations. | |||
timotimo | oh | ||
huh, what is this. the very first thing that gets spesh'd has a named parameter operation removed, which had BB(3) as its label, but BB(3) is still listed as that block's successor? | 16:46 | ||
rather: as one of the successors | |||
i wonder if this leads to less dead code elimination than is necessary | 16:49 | ||
i wonder if BBs should be merged if they become completely linear during spesh? | |||
that's probably not easy to do given the dominance tree and stuff? | |||
jnthn | It's also not wroth it at all. | ||
BBs don't correspond to anything at runtime. | 16:50 | ||
gist.github.com/jnthn/2050e5ed6e8991e24e53 # example of inline making a difference. | |||
timotimo | OK | ||
oh, that's not too shabby :) | 16:51 | ||
jnthn | Yeah. It's just that if you look at profiles of CORE.setting compilation and similar, invocation overhead is only so much | 16:52 | |
timotimo | i s'pose that's fair | 16:53 | |
dalek | arVM: dd80dbf | (Timo Paulssen)++ | src/spesh/optimize.c: put in a missing break |
17:02 | |
timotimo | does it sound sensible to spesh coerce_in and coerce_ni? | 17:04 | |
probably not much that can be done, eh? | 17:05 | ||
i see at least one const_n + coerce_ni | 17:07 | ||
er, actually const_i + coerce_in | |||
a whole lot of coerces of those two come directly after smrt_numify | 17:08 | ||
hum. these const_i's are all 16bit ints; so replacing the const_i + coerce with a const_n will give us a 64bit num in its place | 17:11 | ||
should still be a win, right? | |||
would also get rid of a bit of interpretation overhead? i would assume with coerce and const_i, the interpreter overhead is many times what the operation itself takes | 17:13 | ||
jnthn | Well, it's an instruction cheaper, yes. | 17:15 | |
nwc10 | jnthn: ./perl6-m t/spec/S17-scheduler/every.t can SEGV: | 17:22 | |
#0 0x7f421a79b0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121 | |||
#1 0x7f421a7fce31 in bind_key src/6model/reprs/MVMHash.c:86 | |||
./perl6-m t/spec/S17-promise/allof.t can SEGV | |||
#0 0x7f948135a0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121 | |||
#1 0x7f94813bbe31 in bind_key src/6model/reprs/MVMHash.c:86 | |||
so, something isn't quite as threadsafe as it should be. | |||
jnthn | aye | ||
Looks like | |||
nwc10 | both are NULL pointers | ||
threads are hard, let's go asyncing. | 17:23 | ||
timotimo | should we build a smrt_intify? because i see a whole bunch of smrt_numify followed directly by coerce_ni | ||
hm, actually ... that wouldn't be much help | |||
because we still have to parse the stuff after the . because there could be an E in there | 17:24 | ||
nwc10 | or | 17:25 | |
core (noun), plural coredump | |||
jnthn | timotimo: I think it already exists. | 17:26 | |
timotimo | there's smrt_numify and smrt_strify | 17:27 | |
those are the only ones with smrt_ or ify in their name | |||
jnthn | hm, you're right :) | 17:36 | |
In other news, I just finally managed to get the instrumented profile in VS to work. | |||
17:36
FROGGS joined
|
|||
timotimo | it's still kinda questionable if that would really help | 17:36 | |
knowing that the result is going to be intified | |||
FROGGS | o/ | ||
timotimo | o/ FROGGS | ||
jnthn | While that runs, I'm going to find some food :) | 17:37 | |
wow, it wrote 18GB so far. Good job it's on something with half a terrabyte to hand... | |||
bbiab | 17:38 | ||
timotimo | 76.60user 1.06system 1:17.88elapsed 99%CPU (0avgtext+0avgdata 826528maxresident)k | 17:46 | |
vs | 17:48 | ||
76.11user 1.10system 1:17.41elapsed 99%CPU (0avgtext+0avgdata 826520maxresident)k | |||
so the coerce thing isn't worth terribly much. not really surprising | 17:49 | ||
(first line is with coerce spesh thingie, second is without) | |||
dalek | arVM: 87221ba | (Timo Paulssen)++ | src/spesh/optimize.c: can do coerce_in of literals at spesh-time. |
17:50 | |
jnthn | lol | 17:52 | |
CORE.setting running with instrumented profiling got done while I was shopping :) | |||
80GB. | |||
FROGGS | not a SSD me thinks | 17:54 | |
jnthn | No | ||
Spinning rust, and boy is it making a racket now as it analyses the data. | |||
FROGGS | that is also a problem of SSDs, they are so fast, when something write stuff to it in an infiniloop you almost can't stop it | 17:55 | |
18:38
zakharyas joined
18:40
mj41 joined
18:45
bcode joined
18:55
mj41 joined
|
|||
timotimo | it's like having an LTE on your phone, but a 1gb data limit | 19:59 | |
so, what's going on now? :) | 20:00 | ||
the analysis hopefully is already done? :D | |||
jnthn | Yeah. | 20:04 | |
Took ages :) | |||
But it got done while I coked, ate, etc. | |||
uh, cooked :) | |||
timotimo | sadly, the spesh_diff tool is broken with the current spesh log format | 20:05 | |
somehow ... | |||
jnthn | Curiously, the instrumented profiler thinks we spend about half as much time in GC as the sampling profile does. | 20:07 | |
timotimo | huh, that's weird. | 20:08 | |
jnthn | getattributte is still by some way the most costly thing we do. | 20:11 | |
'cus, I assume, spesh can't handle most of the getattribute/bindattribute in Cursors. | |||
That's a pretty strong indicator that I should work on that in 2014.07. :) | 20:12 | ||
timotimo | that's half a month in the future! :( | ||
anything simple i could try to bang my head against in the mean time? | 20:14 | ||
jnthn | No, I mean, for the 2014.07 release | ||
timotimo | ah, ok | ||
jnthn | I don't really want to go optimizing much further at this point. | ||
Would rather work on fixes, making sure stuff works well for this week's release. | 20:15 | ||
Then after it can get back to opts :) | |||
timotimo | ah ... yeah, that *is* fair | 20:16 | |
we do have some known problems with our async and multithreaded things on moar, for example | |||
jnthn | Well, we know there's problems. :P | 20:19 | |
Anyway, interesting to look through the report. | 20:21 | ||
String comp comes up fairly high, but a lot of that is 'cus we're still hitting the attribute access slow path so often. | 20:22 | ||
timotimo | mhm | 20:23 | |
jnthn | 2.6% is spent in smart_numify. Not such a smart move. | ||
1.3% in smart_stringify | |||
timotimo | i kind of sort of wish we could give Rat a big speed boost | 20:57 | |
it seems likely to me that many people who come to try out p6 are going to be using the / operator and stumbling over the pretty tough performance hit | |||
jnthn | Well, step 1 is to write benchmarks for it in perl6-bench, so we understand the magnitude of the problem and how we can improve it :) | 20:58 | |
timotimo | oh, of course :) | 20:59 | |
i could have thought of that | |||
21:07
cognominal joined
|
|||
dalek | arVM/moar-jit: 1b1eac4 | (Bart Wiegmans)++ | / (8 files): Configure JIT with environmental variables. This should make the JIT play more nicely. Also supports hello world :-) |
21:08 | |
21:08
brrt joined
|
|||
tadzik | :o | 21:10 | |
brrt: are the generated files being commited to not depend on lua? | 21:11 | ||
brrt | oh.. yes | ||
oh, good of you to mention that | |||
i forgot the win32 x64 files | |||
tadzik | :) | ||
dalek | arVM/moar-jit: 1537dcd | (Bart Wiegmans)++ | src/jit/emit_win32_x64.c: Forgot the win32 x64 dynasm output. |
21:12 | |
tadzik | do you have like a commit hook to regen all those files? | ||
that might be handy | 21:13 | ||
brrt | not yet | ||
yep | |||
jnthn | + MVMString * s = sf->body.cu->body.strings[idx]; + | mov64 TMP, (uintptr_t)s | ||
About that, it assumes gen2 and thus non-moving, which is fine for the string heap, but need to be careful when it comes to, say, spesh slots. | |||
brrt | yes, i know, its hacky, but the alternative was i started up coding a call to MVM_strings_get() which - afaik - doesn't exist yet, and the commit wsa big enough as it is :-) | 21:14 | |
i'm somewhat against ripping moarvm interp open and diverging before i've got a chance to merge, is what i mean :-) | 21:15 | ||
jnthn | *nod* | 21:16 | |
brrt | hmm | ||
i'm looking at the getlex_** ops, they look tricky (i.e. not really what i want to encode in single a MVMJItCallC node | 21:17 | ||
in that the return value is a pointer that needs to be dereferenced before i can store it in the register | 21:18 | ||
jnthn | I think for the JIT we can do some case analysis on those. | ||
brrt | case analysis? | 21:19 | |
jnthn | For example, if outers is 0, then it's just looking directly into ->env | ||
For i/n/s. | |||
The auto-viv doesn't happen. | |||
brrt | agreed | ||
not for s, either? | |||
jnthn | For o you can know if it's going to auto-viv | ||
No | 21:20 | ||
brrt | ok, seems fair | 21:21 | |
fwiw, getlex isn't really the problem, getlex_n. are :-) | |||
jnthn | Oh...how so? | 21:22 | |
Those are the named forms | |||
And so not so hot | |||
As they handle the (less common) late-bound cases. | |||
21:23
donaldh joined
|
|||
jnthn | brrt: The if file handle then fprintf thing will get tiresome, I suspect; I suggest an MVM_INLINE function. | 21:26 | |
brrt | yes, it does get tiresome, but how do i pass varargs through to printf? | 21:27 | |
jnthn - because they return a pointer | |||
long story short | |||
i call function | |||
pointer is stored in %rax | |||
pointer is to be dereferenced into some temporrary register | |||
temporary register is to be copied into moarvm register space | 21:28 | ||
thats... annoying | |||
especially considering what happens if value-of-pointer happens to be a float | |||
jnthn | brrt: See MVM_exception_throw_adhoc or MVM_panic for example of vararg-hanlding functions | ||
brrt | ok, i'll do that :-) | 21:29 | |
jnthn | They pass to sprintf, but it should be abou tth esame trick. | ||
wow, so typing | |||
What makes it annoying in the float case? | |||
brrt | oh, isee | 21:31 | |
floats are 80 bits wide on x86_64 | |||
my guess is they still are when you return them as MVMnum64 | |||
that is a guess, though | 21:32 | ||
jnthn | Hm, I was sure MVMRegister - the union with that in it - came out as 8 bytes wide | ||
brrt | then... i hope i'm wrong | ||
i'm just not sure what happens when you stash them in a integer register - obviously you can't do math on them :-) but if the bits come out ok, then it still should be ok | 21:33 | ||
b | |||
oops | 21:34 | ||
dalek | arVM/moar-jit: 9e8e69b | (Bart Wiegmans)++ | / (5 files): More low-hanging fruit opcodes. |
21:46 | |
brrt off for tonight | |||
21:46
brrt left
|
|||
jnthn | sleep & | 22:32 | |
FROGGS | gnight jnthn | 22:33 | |
lizmat | gnight jnthn | 22:34 | |
timotimo | gnite jnthn :) | 22:36 | |
23:43
daxim joined
|