#moarvm on 15 June 2014 - Raku Programming Language Log

timotimo	i'll check it out	00:07	Copy link Message link Add to gist Remove
	79.04user 1.04system 1:20.35elapsed 99%CPU (0avgtext+0avgdata 1092980maxresident)k	00:11	Copy link Message link Add to gist Remove
	so that's a small part of it		Copy link Message link Add to gist Remove
	78.69user 1.09system 1:20.09elapsed 99%CPU (0avgtext+0avgdata 1092796maxresident)k	00:34	Copy link Message link Add to gist Remove
	another run, a bit better timing apparently		Copy link Message link Add to gist Remove
01:14 FROGGS_ joined 06:24 mj41 joined
nwc10	jnthn: strange, but I think I'm testing the correct thing. Anyway, the new origin/inline also still happy	07:07	Copy link Message link Add to gist Remove
07:26 FROGGS[mobile] joined
jnthn	nwc10: OK, thanks.	09:15	Copy link Message link Add to gist Remove
09:21 lizmat joined 09:46 mj41 joined
dalek	Heuristic branch merge: pushed 95 commits to MoarVM by jnthn	09:57	Copy link Message link Add to gist Remove
timotimo	inline has been inlined to master? :	09:58	Copy link Message link Add to gist Remove
	:)		Copy link Message link Add to gist Remove
nwc10	running dumbbench suggests that the allocator makes it about 4% faster than no allocator		Copy link Message link Add to gist Remove
	it's sad that malloc() still doesn't win	09:59	Copy link Message link Add to gist Remove
timotimo	huh?		Copy link Message link Add to gist Remove
jnthn	Yes, merged.		Copy link Message link Add to gist Remove
	Time to get more feedback :)	10:00	Copy link Message link Add to gist Remove
nwc10	I didn't study jnthn's allocator too closely, but it seemed to be a general purpose allocator, not specifically particular sizes		Copy link Message link Add to gist Remove
	so I couldn't see (fundamentally) why it should be faster than malloc		Copy link Message link Add to gist Remove
jnthn	nwc10: It stores stuff by size classes and freeing also requires knowing the size allocated.	10:01	Copy link Message link Add to gist Remove
nwc10	aha. the latter might be part of the win.	10:02	Copy link Message link Add to gist Remove
	a malloc could store stuff by size classes. Hence my initial confusion		Copy link Message link Add to gist Remove
timotimo	uh	10:10	Copy link Message link Add to gist Remove
	did nqp builds use to be that fast?		Copy link Message link Add to gist Remove
	39.95user 0.95system 0:41.40elapsed 98%CPU (0avgtext+0avgdata 161788maxresident)k		Copy link Message link Add to gist Remove
jnthn	Maybe not :)	10:12	Copy link Message link Add to gist Remove
	Rakudo build got a bit faster, after all...	10:13	Copy link Message link Add to gist Remove
nwc10	dumbbench thinks that the setting build is about 2.5% faster	10:14	Copy link Message link Add to gist Remove
	but reports vary, depending on how many outliers it threw away		Copy link Message link Add to gist Remove
timotimo	without inline, but with CGOTO:	10:15	Copy link Message link Add to gist Remove
	37.75user 0.83system 0:38.75elapsed 99%CPU (0avgtext+0avgdata 120728maxresident)k		Copy link Message link Add to gist Remove
jnthn	timotimo: Any reason we can't turn CGOTO on by default if we detect GCC?		Copy link Message link Add to gist Remove
	Also, what happens if we cgoto + inline? :)	10:16	Copy link Message link Add to gist Remove
timotimo	36.17user 1.00system 0:37.86elapsed 98%CPU (0avgtext+0avgdata 161856maxresident)k		Copy link Message link Add to gist Remove
	the memory usage increase is a bit worrying, IMO.		Copy link Message link Add to gist Remove
jnthn	Yes, I'm curious where that comes from.	10:17	Copy link Message link Add to gist Remove
timotimo	36.08user 0.98system 0:37.24elapsed 99%CPU (0avgtext+0avgdata 161892maxresident)k		Copy link Message link Add to gist Remove
	so that's somewhat stable ... ish	10:18	Copy link Message link Add to gist Remove
nwc10	is the memory use lower on the commit before the custom allocator?		Copy link Message link Add to gist Remove
jnthn	Good question.		Copy link Message link Add to gist Remove
timotimo	38.04user 1.00system 0:39.21elapsed 99%CPU (0avgtext+0avgdata 146088maxresident)k	10:19	Copy link Message link Add to gist Remove
	that's on 7a52289		Copy link Message link Add to gist Remove
nwc10	jnthn: if I understand the source code well enough from skimming it, you're allocating using a bin for objects 1-8, a bin for 9-16, ... 257-264 ... 1017-1024		Copy link Message link Add to gist Remove
jnthn	Right.	10:20	Copy link Message link Add to gist Remove
nwc10	jemalloc isn't using as many bins: www.canonware.com/download/jemalloc...alloc.html		Copy link Message link Add to gist Remove
	(see "Table 1. Size classes")		Copy link Message link Add to gist Remove
	and, I'm guessing, MoarMV simply isn't allocating stuff in some of the bin sizes	10:21	Copy link Message link Add to gist Remove
	and is allocating a lot of things of the same size		Copy link Message link Add to gist Remove
	so having lots of bins wins.	10:22	Copy link Message link Add to gist Remove
jnthn	Also, an unused bin is near-enough free		Copy link Message link Add to gist Remove
	timotimo doesn't really know the build system stuff, so can't easily turn on cgoto when gcc is found		Copy link Message link Add to gist Remove
vendethiel	ooh, inline got merged ?	10:31	Copy link Message link Add to gist Remove
timotimo	aye		Copy link Message link Add to gist Remove
vendethiel	nice ! jnthn++		Copy link Message link Add to gist Remove
timotimo	inline got inlined :3		Copy link Message link Add to gist Remove
vendethiel	(every else contributing/that has contributed)++	10:32	Copy link Message link Add to gist Remove
timotimo	that's mostly nwc10		Copy link Message link Add to gist Remove
	i didn't do anything :)		Copy link Message link Add to gist Remove
	jnthn: are there spesh analysis/improvements i could try to build that would benefit greatly from inlined stuff?	10:36	Copy link Message link Add to gist Remove
jnthn	timotimo: Maybe; one other thing you might like to try, looking at my profile output here, is to look at nqp_nfa_run	10:37	Copy link Message link Add to gist Remove
	timotimo: And see about using the fixed size allocator instead of malloc/free in there	10:38	Copy link Message link Add to gist Remove
timotimo	is that c-level or nqp-level?		Copy link Message link Add to gist Remove
jnthn	C level		Copy link Message link Add to gist Remove
timotimo	ah		Copy link Message link Add to gist Remove
jnthn	Apparently 1.1% of setting build time goes on that malloc/free pair.		Copy link Message link Add to gist Remove
timotimo	that doesn't immediately sound like a huge deal; don't we do lots and lots of nfa during setting compilation?	10:39	Copy link Message link Add to gist Remove
jnthn	A potential 1% saving for a few lines tweaking is quite a bit.	10:41	Copy link Message link Add to gist Remove
timotimo	that's 1% if you can make it 10x faster :)		Copy link Message link Add to gist Remove
	haven't run an actual profile in a long time	10:45	Copy link Message link Add to gist Remove
	a c-level profile, that is		Copy link Message link Add to gist Remove
jnthn	Ah, found some memory management fail.	10:46	Copy link Message link Add to gist Remove
timotimo	that's good :)	10:47	Copy link Message link Add to gist Remove
dalek	arVM: 9d440a3 \| jnthn++ \| src/core/fixedsizealloc.c: Add mechanism for debugging fixed size alloc/free. Can set a flag where it checks the allocated and freed sizes match up, and panics if they fail to.		Copy link Message link Add to gist Remove
jnthn	We fail that check, and it seems it happens if we deopt.	10:48	Copy link Message link Add to gist Remove
nwc10	jnthn: one thing I was wondering was whether the outermost level of the fixed size stuff could be an inline function - the one that decides if it is in a bin or not	10:49	Copy link Message link Add to gist Remove
	so that, if one changes the "bin detection" code to "never uses a bin" in a way that the C compiler's optimiser can see		Copy link Message link Add to gist Remove
	then it can generate code that always uses malloc	10:50	Copy link Message link Add to gist Remove
	which keeps OpenBSD happy		Copy link Message link Add to gist Remove
timotimo	iiuc this is about very short-lived objects, which would benefit from having an all-at-once free step	10:53	Copy link Message link Add to gist Remove
	there's no way to do this on the stack, aye?	10:54	Copy link Message link Add to gist Remove
	at least for the nfa?		Copy link Message link Add to gist Remove
jnthn	nwc10: On MSVC at least, considering I couldn't breakpoint an optimized build inside of that outermost one, it already was doing an inline there.	10:56	Copy link Message link Add to gist Remove
	timotimo: Well, it is possible we could allocate one big chunk of memory for the NFA processing and then free it.	10:57	Copy link Message link Add to gist Remove
	Yes, it's short lived.	10:58	Copy link Message link Add to gist Remove
timotimo	jnthn: that's not what's going on with the fixed size allocator?		Copy link Message link Add to gist Remove
	is that allocator itself long-lived?		Copy link Message link Add to gist Remove
jnthn	The allocator lives for the whole process	11:00	Copy link Message link Add to gist Remove
timotimo	ah, ok		Copy link Message link Add to gist Remove
	in that case, yeah, the nfa could possibly benefit from a short-lived allocator		Copy link Message link Add to gist Remove
jnthn	Not really	11:01	Copy link Message link Add to gist Remove
timotimo	OK, what do i know :)		Copy link Message link Add to gist Remove
jnthn	It's just that it makes 4 calls to malloc/free when it could do 2, and then it could use the fixed size allocator which seems to be cheaper than malloc.		Copy link Message link Add to gist Remove
timotimo	that does sound like a win, aye	11:03	Copy link Message link Add to gist Remove
	i wonder how many serious security-related bugs lie hidden in moarvm's code	11:04	Copy link Message link Add to gist Remove
nwc10	only 1 known use-after-free	11:05	Copy link Message link Add to gist Remove
	not tried using valgrind to find uninit warnings		Copy link Message link Add to gist Remove
dalek	arVM: 3bf1aa7 \| jnthn++ \| src/core/frame. (2 files): Fix freeing of frame memory to correct bucket. Before we sometimes ended up putting it back in the wrong one, if we deoptimized. This corrects that issue, hopefully improving memory use.		Copy link Message link Add to gist Remove
jnthn	timotimo: Plesae try with that, but it seems to help here.		Copy link Message link Add to gist Remove
timotimo	sure		Copy link Message link Add to gist Remove
	37.11user 1.17system 0:38.66elapsed 99%CPU (0avgtext+0avgdata 144480maxresident)k	11:07	Copy link Message link Add to gist Remove
	that's a bit better than before you put the fixed size allocator in		Copy link Message link Add to gist Remove
jnthn	ah, good	11:10	Copy link Message link Add to gist Remove
	So it was that.		Copy link Message link Add to gist Remove
timotimo	still 20mb more than before we had inline at all		Copy link Message link Add to gist Remove
	does that seem like a sane amount of ram usage for inlining things?		Copy link Message link Add to gist Remove
jnthn	A little higher than I'd expect	11:11	Copy link Message link Add to gist Remove
timotimo	i'm generally in favor of having much less ram usage in moarvm, but that's not connected to any particular "work item"	11:12	Copy link Message link Add to gist Remove
jnthn	Well, also I don't know to what degree it's a VM-level issue and to what degree we need to be more frugal with memory at a higher level.		Copy link Message link Add to gist Remove
timotimo	fair enough	11:13	Copy link Message link Add to gist Remove
	there's still the issue with strings being stored many, many times in ram		Copy link Message link Add to gist Remove
jnthn	It's like QAST node construction.		Copy link Message link Add to gist Remove
	We've been optimizing all kinds, but the way QAST nodes get created is basically performance hostile.		Copy link Message link Add to gist Remove
timotimo	is that still the case?	11:14	Copy link Message link Add to gist Remove
jnthn	Yes.		Copy link Message link Add to gist Remove
timotimo	ah, that's where we iterate over names and call methods to set attributes?		Copy link Message link Add to gist Remove
jnthn	Right, meaning that every single one of those method calls is a late-bound lookup		Copy link Message link Add to gist Remove
timotimo	yeah, ouch!		Copy link Message link Add to gist Remove
jnthn	And it's a megamorphic callsite, so there's basically nothing the optimizer can do.	11:15	Copy link Message link Add to gist Remove
timotimo	can we perhaps get that to use nqp::bindattr directly?		Copy link Message link Add to gist Remove
	instead of the methods?		Copy link Message link Add to gist Remove
jnthn	Well, having constructors that are more specialized to the nodes may also help		Copy link Message link Add to gist Remove
	Additionally, not all nodes have children.		Copy link Message link Add to gist Remove
timotimo	mhm. lots more typing, but better performance for all backends i suspect		Copy link Message link Add to gist Remove
jnthn	But every single SVal, NVal, WVal, etc. currently has an array allocated for them.		Copy link Message link Add to gist Remove
timotimo	right, SVal, IVal, WVal, NVal wouldn't have children		Copy link Message link Add to gist Remove
	the same treatment annotations got might not be that helpful for children lists, right?	11:16	Copy link Message link Add to gist Remove
	because we really do want to keep the positional_delegate		Copy link Message link Add to gist Remove
jnthn	yeah, we want that for API reasons too		Copy link Message link Add to gist Remove
timotimo	should we have a QAST::ChildlessNode as the top of the class hierarchy and then derive one with a children array?	11:17	Copy link Message link Add to gist Remove
jnthn	No		Copy link Message link Add to gist Remove
	I'd be more inclined to write a role		Copy link Message link Add to gist Remove
timotimo	mhm		Copy link Message link Add to gist Remove
jnthn	And it's composed by the node classes that have children.		Copy link Message link Add to gist Remove
timotimo	another idea would be to bind nqp::null to the children list?		Copy link Message link Add to gist Remove
	oh, that'll be problematic if we iterate over nodes without knowing if they'll have children or not	11:18	Copy link Message link Add to gist Remove
jnthn	Also we waste the 8 bytes for the pointer we don't need.		Copy link Message link Add to gist Remove
timotimo	what we could do is bind the same empty list to all childless nodes	11:19	Copy link Message link Add to gist Remove
	how does that sound?		Copy link Message link Add to gist Remove
jnthn	No, we should do the role thing I'm suggesting.		Copy link Message link Add to gist Remove
timotimo	how does that interact with trying to iterate over nodes?		Copy link Message link Add to gist Remove
	will we get a .list method call emitted for all places that would be problematic?	11:20	Copy link Message link Add to gist Remove
	in that case we could return a global empty list object from that and otherwise have the role provide the list		Copy link Message link Add to gist Remove
jnthn	I think we can do it transparently to the current usage		Copy link Message link Add to gist Remove
	That is, this can be done as an internal refactor to the QAST nodes without breaking anything.	11:21	Copy link Message link Add to gist Remove
timotimo	that would be nice indeed		Copy link Message link Add to gist Remove
	only very few qast nodes survive past the compilation stage of a program's lifetime, right?	11:22	Copy link Message link Add to gist Remove
	there's the qast nodes that survive to make inlining in the optimizer possible, do they survive past the last compilation stage?		Copy link Message link Add to gist Remove
	well, to be fair, the maxrss in building is surely dominated by the compilation phases, as there's very little code being run there	11:23	Copy link Message link Add to gist Remove
jnthn	Yeah, we serialize the QAST tree for things taht we view as inlineable, yes	11:25	Copy link Message link Add to gist Remove
	Though it's quite restricted.		Copy link Message link Add to gist Remove
timotimo	aye, i recall that	11:26	Copy link Message link Add to gist Remove
11:32 JimmyZ_ joined 12:14 vendethiel joined
dalek	Heuristic branch merge: pushed 117 commits to MoarVM/moar-jit by bdw	12:49	Copy link Message link Add to gist Remove
jnthn	That's some catch-up :)	12:50	Copy link Message link Add to gist Remove
nwc10	jnthn: does your compiler do link time optimisation? In that, can it inline the non-static functions that are used for the allocator? (just curious)		Copy link Message link Add to gist Remove
12:51 cognominal joined
jnthn	Yes.	12:51	Copy link Message link Add to gist Remove
	With the default MoarVM build options, anyway.		Copy link Message link Add to gist Remove
nwc10	Ah OK. So I guess that that makes those functions behave pretty much like they were static		Copy link Message link Add to gist Remove
	anyway, this is all possibly premature optimsation (and therefore wrong). You've already made it easy to disable the functionality, and always use the system malloc (or the malloc replacing tool)	12:52	Copy link Message link Add to gist Remove
	./perl6-m t/spec/S17-promise/allof.t	12:59	Copy link Message link Add to gist Remove
	;==8851==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fd93c1a9272 sp 0x7fffe15273b0 bp 0x7fffe15273f0 T0)		Copy link Message link Add to gist Remove
	oh, that's supposed to be red.		Copy link Message link Add to gist Remove
	anyway, ungood.		Copy link Message link Add to gist Remove
	master/master/nom		Copy link Message link Add to gist Remove
lizmat	fwiw, I see that test failing intermittently in the spectest	13:02	Copy link Message link Add to gist Remove
	over on #perl6 I was just handwaving about a static .WHICH for an object	13:03	Copy link Message link Add to gist Remove
	I think we're getting to the point that the current non-constant nature of .WHICH is starting to cause problems		Copy link Message link Add to gist Remove
jnthn	WHICH really wants a re-visit in many ways.	13:04	Copy link Message link Add to gist Remove
	The current implementation is doomed to be slow also.		Copy link Message link Add to gist Remove
	And I doubt it has good entropy.		Copy link Message link Add to gist Remove
lizmat	so how bad is the idea of a per-thread simple int64 counter ?	13:11	Copy link Message link Add to gist Remove
jnthn	Well, but where to store it?	13:12	Copy link Message link Add to gist Remove
	We don't want to make very object 8 bytes bigger...		Copy link Message link Add to gist Remove
	And for, say, Int, the identity is tied up in the value	13:13	Copy link Message link Add to gist Remove
nwc10	8 bytes bigger is >2% more peak memory		Copy link Message link Add to gist Remove
	it's 2% just usng 8 bytes per P6Opaque		Copy link Message link Add to gist Remove
lizmat	do we want to start playing variable struct size tricks like in P5 ?	13:16	Copy link Message link Add to gist Remove
nwc10	bugger. t/spec/S17-promise/allof.t passes first time under valgrind		Copy link Message link Add to gist Remove
	lizmat: probably not. Because if thread 2 can change the size of a structure (And move it) than every read in thread 1 needs to grab a mutex to prevent thread 2 from doing that at the wrong time.	13:17	Copy link Message link Add to gist Remove
	and, if reads need mutexes, deadlock becomes much easier.	13:18	Copy link Message link Add to gist Remove
	(oh, that's second order)		Copy link Message link Add to gist Remove
	reads become much slower	13:19	Copy link Message link Add to gist Remove
lizmat	yeah, so it makes much more sense to just add it to the struct ?		Copy link Message link Add to gist Remove
nwc10	what, add a "which" to the object header?		Copy link Message link Add to gist Remove
lizmat	isn't that what we're talking about ?	13:20	Copy link Message link Add to gist Remove
nwc10	yes. that also sucks, because memory usage will increase by (maybe) 5%		Copy link Message link Add to gist Remove
lizmat	however, in this case it doesn't seem needed:		Copy link Message link Add to gist Remove
	$ 6 'say 42.REPR; say 42.WHICH'		Copy link Message link Add to gist Remove
	P6opaque		Copy link Message link Add to gist Remove
	Int\|42		Copy link Message link Add to gist Remove
	so maybe we need a P6opaquevalue ?	13:21	Copy link Message link Add to gist Remove
	that wouldn't need the .which in the struct ?		Copy link Message link Add to gist Remove
	or maybe treat anything that needs a non-value based .WHICH differently wrt to allocating ?	13:23	Copy link Message link Add to gist Remove
jnthn	Well, thing is that most objects don't ever have .WHICH called on them	13:24	Copy link Message link Add to gist Remove
	We should associate the cost with using the feature.		Copy link Message link Add to gist Remove
13:24 zakharyas joined
lizmat	are you talking CPU or memory cost ?	13:26	Copy link Message link Add to gist Remove
jnthn	Both	13:27	Copy link Message link Add to gist Remove
lizmat	I'm assuming code depends on the fixed length of an P6opaque?		Copy link Message link Add to gist Remove
jnthn	More generally, I'm thinking about having the storage of WHICH values be more like a hash table arrangement.		Copy link Message link Add to gist Remove
lizmat	what would be the key?		Copy link Message link Add to gist Remove
	and would you clean it up when an object gets destroyed?	13:28	Copy link Message link Add to gist Remove
jnthn	The object - the trickiness here being it needs to be VM-supported.		Copy link Message link Add to gist Remove
	Right.		Copy link Message link Add to gist Remove
13:28 brrt joined
lizmat	and that hash would be per thread, I assume ?	13:29	Copy link Message link Add to gist Remove
	otherwise we get serious locking issues, no?		Copy link Message link Add to gist Remove
jnthn	Probably needs to be		Copy link Message link Add to gist Remove
	otoh, then we get different issues		Copy link Message link Add to gist Remove
	jnthn doesn't see any particularly easy solutions	13:30	Copy link Message link Add to gist Remove
lizmat	would the simple approach maybe not be best?	13:32	Copy link Message link Add to gist Remove
jnthn	No.	13:33	Copy link Message link Add to gist Remove
lizmat	take the 8byte per Opaque hit, only set it when actually asked for?		Copy link Message link Add to gist Remove
	at least until we think of something better ?		Copy link Message link Add to gist Remove
jnthn	No, we should work out the better thing, not pile up technical debt.		Copy link Message link Add to gist Remove
13:34 mj41 joined
jnthn	It woulda been nice if the spec had been so lenient as Java's .hashCode() spec, which can change over an object's lifetime...	13:35	Copy link Message link Add to gist Remove
lizmat	well, then maybe we need to pick this up at a higher level?		Copy link Message link Add to gist Remove
jnthn	But it's not, which is a Tricky Problem. But a big memory usage increase on everything isn't a great answer.		Copy link Message link Add to gist Remove
lizmat	or maybe only assign some .WHICH when it gets moved out of the nursery (and then add the extra 8 bytes)	13:37	Copy link Message link Add to gist Remove
	and if a .WHICH is called on something not in the nursery, move it out?		Copy link Message link Add to gist Remove
	*in the nursery rather		Copy link Message link Add to gist Remove
jnthn	You can't "just move it out", but one idea TimToady++ hinted at that can be feasible is using the gen2 address if it's already there, or pre-allocating a gen2 slot for the object if we are asked for its WHICH and keeping a table of nursery objects => WHICH values.	13:38	Copy link Message link Add to gist Remove
	And we remove those entries at GC time, due to collection or movement.		Copy link Message link Add to gist Remove
	lizmat is trying to serve as a catalyst :-)	13:40	Copy link Message link Add to gist Remove
brrt	oh, i wanted to mention, creating a 'move / copy' node for the jit runs into the register selection explosion problem again, so i'm not doing that (yet)		Copy link Message link Add to gist Remove
nwc10	I like TimToady's suggestion. I think it could work well.	13:41	Copy link Message link Add to gist Remove
	can do that without more RAM by (ab)using the union in the object header, but would need another flag to say that it's being done, and slow SC access	13:42	Copy link Message link Add to gist Remove
	(you'd put the real SC pointer into the pre-allocated gen2 space)		Copy link Message link Add to gist Remove
dalek	arVM: 22773f2 \| jnthn++ \| src/spesh/args.c: Don't refuse to spesh if we've a slurpy positional	13:44	Copy link Message link Add to gist Remove
jnthn	timotimo: Feel free to give qast_refactor branches in NQP and Rakudo a spin.	13:54	Copy link Message link Add to gist Remove
timotimo	36.30user 0.95system 0:37.52elapsed 99%CPU (0avgtext+0avgdata 142724maxresident)k	14:03	Copy link Message link Add to gist Remove
	2mb less usage apparently		Copy link Message link Add to gist Remove
	but about 1s less time? could very well be noise.	14:04	Copy link Message link Add to gist Remove
jnthn	That's NQP build?		Copy link Message link Add to gist Remove
timotimo	aye		Copy link Message link Add to gist Remove
jnthn	OK. Rakudo one could be interesting too. :)		Copy link Message link Add to gist Remove
timotimo	OK	14:12	Copy link Message link Add to gist Remove
	refactor'd: 76.05user 0.95system 1:17.56elapsed 99%CPU (0avgtext+0avgdata 820128maxresident)k	14:14	Copy link Message link Add to gist Remove
14:15 brrt joined
	jnthn tries to find the previous numbers :)	14:15	Copy link Message link Add to gist Remove
timotimo	i'm making new ones		Copy link Message link Add to gist Remove
	master'd: 76.37user 1.03system 1:17.60elapsed 99%CPU (0avgtext+0avgdata 826456maxresident)k	14:17	Copy link Message link Add to gist Remove
jnthn	Hmm, a memory win, not so much of a performance one, curiously.	14:21	Copy link Message link Add to gist Remove
timotimo	beware the noise		Copy link Message link Add to gist Remove
	i didn't shut down all running programs :)		Copy link Message link Add to gist Remove
jnthn	ah		Copy link Message link Add to gist Remove
	walk :) And when I'm back, I'll look at the spesh args missing thing where it doesn't know how to handle boxing/unboxing and so bails.	14:39	Copy link Message link Add to gist Remove
14:52 betterworld joined 15:02 btyler joined 15:08 brrt left
nwc10	jnthn: for those 2 branches, t/spec/S17-scheduler/every.t can fail with a NULL pointer at	16:01	Copy link Message link Add to gist Remove
	#0 0x7f1e40b4f0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121		Copy link Message link Add to gist Remove
	#1 0x7f1e40b4f1b1 in MVM_fixed_size_alloc_zeroed src/core/fixedsizealloc.c:144		Copy link Message link Add to gist Remove
	#2 0x7f1e40adac20 in allocate_frame src/core/frame.c:201		Copy link Message link Add to gist Remove
	but not reliably		Copy link Message link Add to gist Remove
	total fails are: t/spec/S06-macros/opaque-ast.rakudo.moar t/spec/S06-macros/unquoting.rakudo.moar t/spec/S17-lowlevel/lock.rakudo.moar t/spec/S17-scheduler/every.t t/spec/integration/advent2012-day23.t	16:02	Copy link Message link Add to gist Remove
	the S17 are ASAN. The other 3 are		Copy link Message link Add to gist Remove
	===SORRY!===P6opaque: no such attribute '$!position'		Copy link Message link Add to gist Remove
jnthn	Hmm, that sounds like "missing a commit"		Copy link Message link Add to gist Remove
nwc10	This is nqp version 2014.05-14-g2147886 built on MoarVM version 2014.05-121-g22773f2	16:03	Copy link Message link Add to gist Remove
	This is perl6 version 2014.05-193-g6d23540 built on MoarVM version 2014.05-121-g22773f2		Copy link Message link Add to gist Remove
jnthn	Yes, I just pushed the missing one. D'oh.		Copy link Message link Add to gist Remove
	Thought the error looked very familiar...	16:04	Copy link Message link Add to gist Remove
timotimo	how often do we have slurpy positional subroutines/methods in nqp and rakudo source respectively?	16:12	Copy link Message link Add to gist Remove
	hm. so a slurpy positional argument will turn into a list. and we know exactly how big that list is at spesh-time. do i smell a specialization opportunity?	16:14	Copy link Message link Add to gist Remove
	though, we probably often do things like iterate over these and stuff like that		Copy link Message link Add to gist Remove
jnthn	timotimo: Yeah, we can do something there, I suspect	16:16	Copy link Message link Add to gist Remove
timotimo	a fact flag "KNOWN_ARRAY_SIZE"?		Copy link Message link Add to gist Remove
	probably more like "KNOWN_ELEMENT_COUNT"	16:17	Copy link Message link Add to gist Remove
jnthn	Oh, I wasn't thinking of even going that far.		Copy link Message link Add to gist Remove
timotimo	another thing is that if we have a method that has slurpy positional arguments and we "just pass it on" to another, spesh will see it involves flattening and bail out, won't it?		Copy link Message link Add to gist Remove
jnthn	Just potentially using the sp_getarg_ ops to grab the args and put them into the array.	16:18	Copy link Message link Add to gist Remove
	Yes		Copy link Message link Add to gist Remove
	Obviously, there's a change to do better there, but not sure how easy it is.		Copy link Message link Add to gist Remove
timotimo	if we know we just got these arguments from a slurpy positional, we can probably assume it's safe		Copy link Message link Add to gist Remove
	i'm not sure i know how that sp_getarg_ thing you mentioned would work; will the positionals that'll end up in the slurped array just be available like regular positionals?	16:19	Copy link Message link Add to gist Remove
jnthn	Well, I think it actually probably wants to go the other way around.		Copy link Message link Add to gist Remove
	As in, "I see I get called with a flattening callsite, and I take a slurpy there"		Copy link Message link Add to gist Remove
timotimo	oh, as in: instead of flattening this array and slurping it again, let's just pass the array directly"	16:20	Copy link Message link Add to gist Remove
	that seems more sensible, i agree		Copy link Message link Add to gist Remove
	no spesh: 40.23user 0.91system 0:41.37elapsed 99%CPU (0avgtext+0avgdata 118300maxresident)k	16:35	Copy link Message link Add to gist Remove
	spesh: 36.37user 0.93system 0:37.52elapsed 99%CPU (0avgtext+0avgdata 144524maxresident)k		Copy link Message link Add to gist Remove
	that's the complete nqp build		Copy link Message link Add to gist Remove
	no spesh: 84.33user 1.02system 1:25.91elapsed 99%CPU (0avgtext+0avgdata 722140maxresident)k	16:39	Copy link Message link Add to gist Remove
	spesh: 77.57user 1.07system 1:18.86elapsed 99%CPU (0avgtext+0avgdata 826312maxresident)k		Copy link Message link Add to gist Remove
	that's the complete rakudo build		Copy link Message link Add to gist Remove
	m: say (1 * 60 + 18) / (1 * 60 + 25)		Copy link Message link Add to gist Remove Run code
camelia	rakudo-moar 7f22e9: OUTPUT«0.917647␤»		Copy link Message link Add to gist Remove
timotimo	this is with inline already; i thought inline would do crazy improvements to the parse time, what with inlining proto regexes and such :/	16:40	Copy link Message link Add to gist Remove
	but 9% isn't bad either.		Copy link Message link Add to gist Remove
jnthn	Well, remember it's just taking out invocation overhead.	16:43	Copy link Message link Add to gist Remove
timotimo	that contains argument passing and returning already, right?	16:44	Copy link Message link Add to gist Remove
	and cross-invocation-dead-code-elimination and constant-folding?		Copy link Message link Add to gist Remove
jnthn	Not the latter two yet really.	16:45	Copy link Message link Add to gist Remove
	It's being a bit conservative so as not to ruin the inline annotations.		Copy link Message link Add to gist Remove
timotimo	oh		Copy link Message link Add to gist Remove
	huh, what is this. the very first thing that gets spesh'd has a named parameter operation removed, which had BB(3) as its label, but BB(3) is still listed as that block's successor?	16:46	Copy link Message link Add to gist Remove
	rather: as one of the successors		Copy link Message link Add to gist Remove
	i wonder if this leads to less dead code elimination than is necessary	16:49	Copy link Message link Add to gist Remove
	i wonder if BBs should be merged if they become completely linear during spesh?		Copy link Message link Add to gist Remove
	that's probably not easy to do given the dominance tree and stuff?		Copy link Message link Add to gist Remove
jnthn	It's also not wroth it at all.		Copy link Message link Add to gist Remove
	BBs don't correspond to anything at runtime.	16:50	Copy link Message link Add to gist Remove
	gist.github.com/jnthn/2050e5ed6e8991e24e53 # example of inline making a difference.		Copy link Message link Add to gist Remove
timotimo	OK		Copy link Message link Add to gist Remove
	oh, that's not too shabby :)	16:51	Copy link Message link Add to gist Remove
jnthn	Yeah. It's just that if you look at profiles of CORE.setting compilation and similar, invocation overhead is only so much	16:52	Copy link Message link Add to gist Remove
timotimo	i s'pose that's fair	16:53	Copy link Message link Add to gist Remove
dalek	arVM: dd80dbf \| (Timo Paulssen)++ \| src/spesh/optimize.c: put in a missing break	17:02	Copy link Message link Add to gist Remove
timotimo	does it sound sensible to spesh coerce_in and coerce_ni?	17:04	Copy link Message link Add to gist Remove
	probably not much that can be done, eh?	17:05	Copy link Message link Add to gist Remove
	i see at least one const_n + coerce_ni	17:07	Copy link Message link Add to gist Remove
	er, actually const_i + coerce_in		Copy link Message link Add to gist Remove
	a whole lot of coerces of those two come directly after smrt_numify	17:08	Copy link Message link Add to gist Remove
	hum. these const_i's are all 16bit ints; so replacing the const_i + coerce with a const_n will give us a 64bit num in its place	17:11	Copy link Message link Add to gist Remove
	should still be a win, right?		Copy link Message link Add to gist Remove
	would also get rid of a bit of interpretation overhead? i would assume with coerce and const_i, the interpreter overhead is many times what the operation itself takes	17:13	Copy link Message link Add to gist Remove
jnthn	Well, it's an instruction cheaper, yes.	17:15	Copy link Message link Add to gist Remove
nwc10	jnthn: ./perl6-m t/spec/S17-scheduler/every.t can SEGV:	17:22	Copy link Message link Add to gist Remove
	#0 0x7f421a79b0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121		Copy link Message link Add to gist Remove
	#1 0x7f421a7fce31 in bind_key src/6model/reprs/MVMHash.c:86		Copy link Message link Add to gist Remove
	./perl6-m t/spec/S17-promise/allof.t can SEGV		Copy link Message link Add to gist Remove
	#0 0x7f948135a0b1 in MVM_fixed_size_alloc src/core/fixedsizealloc.c:121		Copy link Message link Add to gist Remove
	#1 0x7f94813bbe31 in bind_key src/6model/reprs/MVMHash.c:86		Copy link Message link Add to gist Remove
	so, something isn't quite as threadsafe as it should be.		Copy link Message link Add to gist Remove
jnthn	aye		Copy link Message link Add to gist Remove
	Looks like		Copy link Message link Add to gist Remove
nwc10	both are NULL pointers		Copy link Message link Add to gist Remove
	threads are hard, let's go asyncing.	17:23	Copy link Message link Add to gist Remove
timotimo	should we build a smrt_intify? because i see a whole bunch of smrt_numify followed directly by coerce_ni		Copy link Message link Add to gist Remove
	hm, actually ... that wouldn't be much help		Copy link Message link Add to gist Remove
	because we still have to parse the stuff after the . because there could be an E in there	17:24	Copy link Message link Add to gist Remove
nwc10	or	17:25	Copy link Message link Add to gist Remove
	core (noun), plural coredump		Copy link Message link Add to gist Remove
jnthn	timotimo: I think it already exists.	17:26	Copy link Message link Add to gist Remove
timotimo	there's smrt_numify and smrt_strify	17:27	Copy link Message link Add to gist Remove
	those are the only ones with smrt_ or ify in their name		Copy link Message link Add to gist Remove
jnthn	hm, you're right :)	17:36	Copy link Message link Add to gist Remove
	In other news, I just finally managed to get the instrumented profile in VS to work.		Copy link Message link Add to gist Remove
17:36 FROGGS joined
timotimo	it's still kinda questionable if that would really help	17:36	Copy link Message link Add to gist Remove
	knowing that the result is going to be intified		Copy link Message link Add to gist Remove
FROGGS	o/		Copy link Message link Add to gist Remove
timotimo	o/ FROGGS		Copy link Message link Add to gist Remove
jnthn	While that runs, I'm going to find some food :)	17:37	Copy link Message link Add to gist Remove
	wow, it wrote 18GB so far. Good job it's on something with half a terrabyte to hand...		Copy link Message link Add to gist Remove
	bbiab	17:38	Copy link Message link Add to gist Remove
timotimo	76.60user 1.06system 1:17.88elapsed 99%CPU (0avgtext+0avgdata 826528maxresident)k	17:46	Copy link Message link Add to gist Remove
	vs	17:48	Copy link Message link Add to gist Remove
	76.11user 1.10system 1:17.41elapsed 99%CPU (0avgtext+0avgdata 826520maxresident)k		Copy link Message link Add to gist Remove
	so the coerce thing isn't worth terribly much. not really surprising	17:49	Copy link Message link Add to gist Remove
	(first line is with coerce spesh thingie, second is without)		Copy link Message link Add to gist Remove
dalek	arVM: 87221ba \| (Timo Paulssen)++ \| src/spesh/optimize.c: can do coerce_in of literals at spesh-time.	17:50	Copy link Message link Add to gist Remove
jnthn	lol	17:52	Copy link Message link Add to gist Remove
	CORE.setting running with instrumented profiling got done while I was shopping :)		Copy link Message link Add to gist Remove
	80GB.		Copy link Message link Add to gist Remove
FROGGS	not a SSD me thinks	17:54	Copy link Message link Add to gist Remove
jnthn	No		Copy link Message link Add to gist Remove
	Spinning rust, and boy is it making a racket now as it analyses the data.		Copy link Message link Add to gist Remove
FROGGS	that is also a problem of SSDs, they are so fast, when something write stuff to it in an infiniloop you almost can't stop it	17:55	Copy link Message link Add to gist Remove
18:38 zakharyas joined 18:40 mj41 joined 18:45 bcode joined 18:55 mj41 joined
timotimo	it's like having an LTE on your phone, but a 1gb data limit	19:59	Copy link Message link Add to gist Remove
	so, what's going on now? :)	20:00	Copy link Message link Add to gist Remove
	the analysis hopefully is already done? :D		Copy link Message link Add to gist Remove
jnthn	Yeah.	20:04	Copy link Message link Add to gist Remove
	Took ages :)		Copy link Message link Add to gist Remove
	But it got done while I coked, ate, etc.		Copy link Message link Add to gist Remove
	uh, cooked :)		Copy link Message link Add to gist Remove
timotimo	sadly, the spesh_diff tool is broken with the current spesh log format	20:05	Copy link Message link Add to gist Remove
	somehow ...		Copy link Message link Add to gist Remove
jnthn	Curiously, the instrumented profiler thinks we spend about half as much time in GC as the sampling profile does.	20:07	Copy link Message link Add to gist Remove
timotimo	huh, that's weird.	20:08	Copy link Message link Add to gist Remove
jnthn	getattributte is still by some way the most costly thing we do.	20:11	Copy link Message link Add to gist Remove
	'cus, I assume, spesh can't handle most of the getattribute/bindattribute in Cursors.		Copy link Message link Add to gist Remove
	That's a pretty strong indicator that I should work on that in 2014.07. :)	20:12	Copy link Message link Add to gist Remove
timotimo	that's half a month in the future! :(		Copy link Message link Add to gist Remove
	anything simple i could try to bang my head against in the mean time?	20:14	Copy link Message link Add to gist Remove
jnthn	No, I mean, for the 2014.07 release		Copy link Message link Add to gist Remove
timotimo	ah, ok		Copy link Message link Add to gist Remove
jnthn	I don't really want to go optimizing much further at this point.		Copy link Message link Add to gist Remove
	Would rather work on fixes, making sure stuff works well for this week's release.	20:15	Copy link Message link Add to gist Remove
	Then after it can get back to opts :)		Copy link Message link Add to gist Remove
timotimo	ah ... yeah, that is fair	20:16	Copy link Message link Add to gist Remove
	we do have some known problems with our async and multithreaded things on moar, for example		Copy link Message link Add to gist Remove
jnthn	Well, we know there's problems. :P	20:19	Copy link Message link Add to gist Remove
	Anyway, interesting to look through the report.	20:21	Copy link Message link Add to gist Remove
	String comp comes up fairly high, but a lot of that is 'cus we're still hitting the attribute access slow path so often.	20:22	Copy link Message link Add to gist Remove
timotimo	mhm	20:23	Copy link Message link Add to gist Remove
jnthn	2.6% is spent in smart_numify. Not such a smart move.		Copy link Message link Add to gist Remove
	1.3% in smart_stringify		Copy link Message link Add to gist Remove
timotimo	i kind of sort of wish we could give Rat a big speed boost	20:57	Copy link Message link Add to gist Remove
	it seems likely to me that many people who come to try out p6 are going to be using the / operator and stumbling over the pretty tough performance hit		Copy link Message link Add to gist Remove
jnthn	Well, step 1 is to write benchmarks for it in perl6-bench, so we understand the magnitude of the problem and how we can improve it :)	20:58	Copy link Message link Add to gist Remove
timotimo	oh, of course :)	20:59	Copy link Message link Add to gist Remove
	i could have thought of that		Copy link Message link Add to gist Remove
21:07 cognominal joined
dalek	arVM/moar-jit: 1b1eac4 \| (Bart Wiegmans)++ \| / (8 files): Configure JIT with environmental variables. This should make the JIT play more nicely. Also supports hello world :-)	21:08	Copy link Message link Add to gist Remove
21:08 brrt joined
tadzik	:o	21:10	Copy link Message link Add to gist Remove
	brrt: are the generated files being commited to not depend on lua?	21:11	Copy link Message link Add to gist Remove
brrt	oh.. yes		Copy link Message link Add to gist Remove
	oh, good of you to mention that		Copy link Message link Add to gist Remove
	i forgot the win32 x64 files		Copy link Message link Add to gist Remove
tadzik	:)		Copy link Message link Add to gist Remove
dalek	arVM/moar-jit: 1537dcd \| (Bart Wiegmans)++ \| src/jit/emit_win32_x64.c: Forgot the win32 x64 dynasm output.	21:12	Copy link Message link Add to gist Remove
tadzik	do you have like a commit hook to regen all those files?		Copy link Message link Add to gist Remove
	that might be handy	21:13	Copy link Message link Add to gist Remove
brrt	not yet		Copy link Message link Add to gist Remove
	yep		Copy link Message link Add to gist Remove
jnthn	+ MVMString * s = sf->body.cu->body.strings[idx]; + \| mov64 TMP, (uintptr_t)s		Copy link Message link Add to gist Remove
	About that, it assumes gen2 and thus non-moving, which is fine for the string heap, but need to be careful when it comes to, say, spesh slots.		Copy link Message link Add to gist Remove
brrt	yes, i know, its hacky, but the alternative was i started up coding a call to MVM_strings_get() which - afaik - doesn't exist yet, and the commit wsa big enough as it is :-)	21:14	Copy link Message link Add to gist Remove
	i'm somewhat against ripping moarvm interp open and diverging before i've got a chance to merge, is what i mean :-)	21:15	Copy link Message link Add to gist Remove
jnthn	nod	21:16	Copy link Message link Add to gist Remove
brrt	hmm		Copy link Message link Add to gist Remove
	i'm looking at the getlex_** ops, they look tricky (i.e. not really what i want to encode in single a MVMJItCallC node	21:17	Copy link Message link Add to gist Remove
	in that the return value is a pointer that needs to be dereferenced before i can store it in the register	21:18	Copy link Message link Add to gist Remove
jnthn	I think for the JIT we can do some case analysis on those.		Copy link Message link Add to gist Remove
brrt	case analysis?	21:19	Copy link Message link Add to gist Remove
jnthn	For example, if outers is 0, then it's just looking directly into ->env		Copy link Message link Add to gist Remove
	For i/n/s.		Copy link Message link Add to gist Remove
	The auto-viv doesn't happen.		Copy link Message link Add to gist Remove
brrt	agreed		Copy link Message link Add to gist Remove
	not for s, either?		Copy link Message link Add to gist Remove
jnthn	For o you can know if it's going to auto-viv		Copy link Message link Add to gist Remove
	No	21:20	Copy link Message link Add to gist Remove
brrt	ok, seems fair	21:21	Copy link Message link Add to gist Remove
	fwiw, getlex isn't really the problem, getlex_n. are :-)		Copy link Message link Add to gist Remove
jnthn	Oh...how so?	21:22	Copy link Message link Add to gist Remove
	Those are the named forms		Copy link Message link Add to gist Remove
	And so not so hot		Copy link Message link Add to gist Remove
	As they handle the (less common) late-bound cases.		Copy link Message link Add to gist Remove
21:23 donaldh joined
jnthn	brrt: The if file handle then fprintf thing will get tiresome, I suspect; I suggest an MVM_INLINE function.	21:26	Copy link Message link Add to gist Remove
brrt	yes, it does get tiresome, but how do i pass varargs through to printf?	21:27	Copy link Message link Add to gist Remove
	jnthn - because they return a pointer		Copy link Message link Add to gist Remove
	long story short		Copy link Message link Add to gist Remove
	i call function		Copy link Message link Add to gist Remove
	pointer is stored in %rax		Copy link Message link Add to gist Remove
	pointer is to be dereferenced into some temporrary register		Copy link Message link Add to gist Remove
	temporary register is to be copied into moarvm register space	21:28	Copy link Message link Add to gist Remove
	thats... annoying		Copy link Message link Add to gist Remove
	especially considering what happens if value-of-pointer happens to be a float		Copy link Message link Add to gist Remove
jnthn	brrt: See MVM_exception_throw_adhoc or MVM_panic for example of vararg-hanlding functions		Copy link Message link Add to gist Remove
brrt	ok, i'll do that :-)	21:29	Copy link Message link Add to gist Remove
jnthn	They pass to sprintf, but it should be abou tth esame trick.		Copy link Message link Add to gist Remove
	wow, so typing		Copy link Message link Add to gist Remove
	What makes it annoying in the float case?		Copy link Message link Add to gist Remove
brrt	oh, isee	21:31	Copy link Message link Add to gist Remove
	floats are 80 bits wide on x86_64		Copy link Message link Add to gist Remove
	my guess is they still are when you return them as MVMnum64		Copy link Message link Add to gist Remove
	that is a guess, though	21:32	Copy link Message link Add to gist Remove
jnthn	Hm, I was sure MVMRegister - the union with that in it - came out as 8 bytes wide		Copy link Message link Add to gist Remove
brrt	then... i hope i'm wrong		Copy link Message link Add to gist Remove
	i'm just not sure what happens when you stash them in a integer register - obviously you can't do math on them :-) but if the bits come out ok, then it still should be ok	21:33	Copy link Message link Add to gist Remove
	b		Copy link Message link Add to gist Remove
	oops	21:34	Copy link Message link Add to gist Remove
dalek	arVM/moar-jit: 9e8e69b \| (Bart Wiegmans)++ \| / (5 files): More low-hanging fruit opcodes.	21:46	Copy link Message link Add to gist Remove
	brrt off for tonight		Copy link Message link Add to gist Remove
21:46 brrt left
jnthn	sleep &	22:32	Copy link Message link Add to gist Remove
FROGGS	gnight jnthn	22:33	Copy link Message link Add to gist Remove
lizmat	gnight jnthn	22:34	Copy link Message link Add to gist Remove
timotimo	gnite jnthn :)	22:36	Copy link Message link Add to gist Remove
23:43 daxim joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!