01:47 ilbot3 joined 02:26 tokuhiro_ joined 03:27 tokuhiro_ joined 03:39 tokuhiro_ joined 07:03 Ven joined 07:59 Ven joined, zakharyas joined
JimmyZ A very nice book 'Static Single Assignment Book': ssabook.gforge.inria.fr/latest/book.pdf # almost complete, project address gforge.inria.fr/scm/?group_id=1950 08:26
08:48 FROGGS joined 08:50 lizmat joined 09:09 FROGGS joined
jnthn JimmyZ++ # nice link indeed! 09:42
JimmyZ scm.gforge.inria.fr/anonscm/svn/ss...torial.pdf # PPT for some more info :) 09:45
jnthn m: say "SSA".flip # :P 09:48
camelia rakudo-moar c54773: OUTPUT«ASS␤»
timotimo i wonder how many techniques we cannot reliably implement because older versions are not available in our ssa implementation 09:49
of course we can always allocate a temporary register and set its value right after the desired version gets written 09:50
jnthn "older versions"?
Oh, I think know the issue you mean 09:51
It's a trade-off.
If you make the other one you get more costly/difficult deopt
timotimo yes 09:52
i remember that
hum, now i remember i got nowhere with my deopt bridges thing yet 09:53
not getting paid enough for complicated things :) 09:54
not actually convinced i really can deal with complicated things that much better when I'm getting paid... 10:12
10:15 Ven joined
timotimo i think i have the impostor syndrome 10:18
but still better than impastor or inpasta
arnsholt Is that where you make a lot of copy-pasta? 10:26
timotimo mhhh pasta 10:30
jnthn: were you able to find out why MapIterCommon doesn't have its new method spesh'd? 10:47
if not, do you want me to litter the code with debug statements and figure it out?
jnthn timotimo: No, if you could look into that it'd be great
Because it's the kind of method that *should* spesh really well
timotimo sure, did you use a specific benchmark for it? 10:48
jnthn 'cus it's jsut a bunch of binds to attributes
for ^1000 { for ^1000 { } }
Well, that may hit the for -> while opt maybe
If that still works
timotimo i recently fixed it 10:49
it used to look for an &infix:<,> in th QAST, which was reoved duringGLR
jnthn OK, well, just my @a = ^1000; for @a { for @a { } }
timotimo rebuilding rakudo now 10:50
interesting 10:55
in the profile i'm looking at it gets called 1001
1001 times, about 4/5th of those calls were even jitted
jnthn Probably something initializ-y
Oh?
timotimo how did you reach the conclusion it doesn't get speshed?
that piece of code initializes a crapton of IntLexRef 11:00
3005001
2002000 of those in sink-all and 1002001 in pull-one
same with a :=, but i suspect that can be eased by implementing push-exactly or something in range's iterator 11:02
jnthn Wow 11:03
I need to look at the lex ref issues there
Maybe that's what I'll do this evening
I really need to do several hours on a $other-job today
timotimo ah, sure
jnthn But good to know
timotimo there's a infix:<<> in the code you gave that gets only speshed, not jitted
1002001 calls, 7.04% (155.9ms) 11:04
jnthn How did I conclude it wasn't? Because in the Text::CSV profiler output it isn't being
timotimo (exclusive time)
jnthn So we'll need to look deeper :(
timotimo i'll grab Text::CSV onto my laptop as well
all the flaky mobile connections :| 11:05
jnthn if you liked it shoulda put 4G on it 11:06
timotimo occasionally i do get 4G
how do you invoke the benchmark? 11:07
jnthn I created a file with this 1000 times: 11:08
hello,","," ",world,"!"
And then
cat test-small.csv | perl6-m -Ilib --profile test-t.pl
You'll need to grab Slang::Tuxic and File::Temp and File::Directory::Tree also
timotimo just noticed that 11:10
rebootstrapping panda right now 11:11
um ... even with test-t.pl i get almost 99% jitted new 11:15
src/gen/m-CORE.setting:2696
perhaps more worrying is sink-all of sequential map being 100% interpreted and 4.66% exclusive time 11:16
and 520351 BOOTCode being allocated inside BUILDALL's while loop 11:17
there's only 5010 calls to BUILDALL according to the routines tab 11:18
hehe. 11:25
List's iterator method has a class :: does Iterator in it
that generates code to take a whole bunch of closures
it gets called 36013 times
allocates 180065 BOTOCode in total 11:26
m: say 180065 / 36013 11:30
camelia rakudo-moar 7c9911: OUTPUT«5␤»
timotimo that's how many methods that class has :P
i've just moved it out of the method and i'll see if it makes a big difference
yeah, though i couldn't report it due to flaky network >_> 11:38
55 instead of 59 gc runs 11:40
we may want to move some more classes for iterators out of the methods that use them, to prevent taking closures fro all the methods
jnthn Wait, why are they taking closrues?!
If they are something's up with our code-gen 11:41
timotimo they shouldn't be?
jnthn The only time a method should take a closure is if it's an l-value
uh damn
r-value
In a class body it's always in sink context 11:42
timotimo how else would we have classes defined in inner scopes be able to refer to closed-over values?
jnthn Classes aren't closures
timotimo ah!
well, then you can fix it :)
jnthn Thus why we've had and fixed various bugs where people did refer to lexicals :)
OK, I'll put it on my todo list along with looking at the code-gen issues that make too many lexicalrefs 11:43
Hm, if we can fix these two then we would get GC runs down a lot
timotimo likely (and hopefully) 11:44
jnthn And so improve performance a whole lot
nwc10 other bloggage: morepypy.blogspot.co.at/2015/09/pyp...ments.html 11:47
timotimo our GC isn't the fastest 11:48
we still have those bunchtons of gen2 roots that are irking me a bit
i'd love common gc run times to dwindle below 2ms :|
actually, if we want to ever be able to do 60fps game development or something, 2ms is still more than a single video frame 11:52
11:55 Ven joined
timotimo maybe incremental GC would be a thing to consider at some point? i have no idea what requirements that adds to the rest of the VM and if we can get there easily enough 11:59
jnthn On my box I see GC times of 3ms-4ms in various cases 12:02
nwc10 for now, I suspect that we get bigger net wins by doing other stuff, relying on KISS and the tail end of Moore's Law. 12:03
but that's just an opinion.
jnthn m: say 1/60 12:05
camelia rakudo-moar 7c9911: OUTPUT«0.016667␤»
jnthn That's a lot more than 0.002 :P
nwc10 jnthn: even if you're now in SECAM territory, surely as you're still in Europe, the correct fraction is 1/50 :-)
timotimo wow, i want some of these gc times 12:07
oh, i thought millisecond meants 1/100 second, haha, that's fail
but still, i hardly ever get gc times as good as jnthn's getting :( 12:10
or is that really just "in some cases"?
jnthn I had 3.x ms average GC for the for lines('file'.IO) { } 12:11
6-7ms is common in apps that are retaining more stuff
timotimo let's see. 12:12
ah, for that test-small.csv isn't big enough by far :)
7 to 8 ms in that 12:13
can our machines' performance differ this drastically?
i'll teach the jit about continuationreset, that'll make pull-one jittable, which is at 15% exclusive time in the for-lines benchmark 12:18
jnthn Wait, are you on latest? 12:19
I made for 'foo'.IO.lines { } not use continuations 12:20
timotimo oh!
jnthn But sure, do that anyway :)
timotimo shall i still go ahead?
jnthn Because it'll make every gather/take thing faster :)
timotimo right. first i'll have to get off this train, though
damn, and my fav song of this album just came on :(
12:47 JimmyZ left, JimmyZ joined
timotimo if something in interp.c sets the cur_op before calling the C function in question, i'll mark it :invokish in the oplist, so that the jit doesn't explode, right? 12:50
FROGGS sounds reasonable
timotimo though in this case it's not because it invokes stuff, but because it records the cur_op into the continuation's address 12:51
FROGGS :throwish seems to have a similar effect 12:52
timotimo BBL 12:59
13:18 virtualsue joined 13:37 brrt joined
brrt \o 13:37
13:37 Ven joined
FROGGS hi brrt 13:41
brrt hi FROGGS 13:42
13:49 virtualsue left 14:33 tokuhiro_ joined 14:48 Ven joined
hoelzro jnthn: regarding that string heap optimization, is the optimization that MoarVM SCs no longer have their own string heaps, and just expect the code to refer to the string heap in the bytecode itself? 15:21
I really want to fix the nqp-js problem, and I think the only way to do that is to truly understand that optimization
jnthn hoelzro: It's exactly that, yes 15:22
hoelzro: Actually all we used to do was just build a string array
15:22 Ven joined
jnthn So there was a huge push arr, "foo" sequence 15:22
And the change was just to get SCs to use identical indexes to the string heap of the bytecode file itself 15:23
So we could save that
Which saved a bunch of work at startup
hoelzro so the dependency string heap reference will almost definitely have a different index after the optimization, right? since it's referring to all strings in the compunit? 15:25
or is that wrong? a compunit with a single SC would have essentially the same string heap as the SC itself, maybe?
hoelzro looks as this as a good thing, because he never really understood the serialization stuff before 15:26
jnthn iirc, the serializer pushes the unique strings into a list
And then keeps that list somewhere internal in the VM 15:27
Oh, on the current CompUnit I think
hoelzro is there a way to get --dump to dump things like the string heap, or lower level info on the SCs in the compunit?
jnthn And then uses it when it does the MAST -> bytecode
Not that I'm aware of
16:04 FROGGS joined 17:14 Ven joined 18:30 arnsholt joined
timotimo i'm puzzled 18:45
as soon as the jit kicks in on "my num @values; loop { @values.push: 0.0e1 }" it complains "expected num register!" 18:46
18:46 vendethiel joined
timotimo but the jit code that's responsible for what gets emitted there should really put MVM_reg_num64 into the slot that decides what happens 18:47
19:16 brrt joined 19:26 tokuhiro_ joined 19:38 Peter_R joined 20:07 Ven joined 21:03 brrt joined
brrt holy mother of irregular instruction encoding 21:07
timotimo: is that the old JIT? and what line says that? 21:08
apparantly, kids, if and only if the register number of an indexed register is 4, then we need a second modrm byte, or something 21:10
timotimo m) 21:11
brrt: what do you mean "what line says that"? 21:12
brrt what line says 'expteced num register!'
timotimo ah
that's from push
brrt i expect we use push_n for that? 21:14
timotimo it's as if this line was wrong:
m)+ (op == MVM_OP_push_n || op == MVM_OP_unshift_n) ? MVM_reg_num64 :
(without the facepalm smiley in front)
jitlog says it's been devirtualized
and speshlog says it's actually a push_n
brrt hmmm 21:15
timotimo oh, my str @foo is NYI?
perhaps a GLR thing?
brrt i think .. i dunno
timotimo it was also NYI pre-glr 21:17
brrt the bytecode generation issue is an irregularity in the encoding of rbp 21:24
timotimo x86 is hard
FROGGS rbp?
timotimo base pointer?
brrt yes.... and also r12, since that looks just like rbp from the perspective of x86 21:26
x86 is really, really hard
21:28 tokuhiro_ joined
brrt actuayll, it's rsp, not rbp 21:30
anyway... 21:31
it looks like something i can crack
FROGGS ++brrt
brrt keep the ++'s for when the commit comes :-P
FROGGS sure :o)
always got some in my pocket 21:32
timotimo sounds like you know what way to go and all that might stand in your way is missing infrastructure for keeping the information that modrm needs to become bigger ... or something
brrt hmm yeah, i guess 21:34
timotimo does it seem like that's the last problem on your way? 21:35
brrt last instruction encoding problem i'm aware of, yes
timotimo awesome :)
brrt there is a cheap-but-ugly workaround. it means giving up r12
and not use it at all 21:36
timotimo sure
then you'll see if everything else works but that :)
brrt that'd work. but it'd only be a matter of time before somebody would choose to push dynasm over the limit again
possibly me
timotimo giving up just a single register doesn't sound terrible
brrt well, it's also all stack relative stuff 21:37
r12 and rsp
timotimo oh
that's more interesting, then
brrt why they are irregular, i don't know
timotimo otherwise i'd have said "if fixing this takes too long, skipping it will get us to a working code gen faster"
brrt well, i'm going to think about it more 21:38
fairly sure this can be fixed
timotimo mhm
brrt we have the 'meaning' of the vreg at runtime
so we can actually add/rewrite the bytes
but it's tricky 21:39
(the good bit is, it was already broken before i started ^^)
timotimo yeah :)
"we" being "inside the dynasm internals", right?
brrt yes 21:41
the runtime
but it requires i study the entire bytes-meaning table 21:42
timotimo urgh
you'll has a bachelor of ft'aghn after that
brrt wiki.osdev.org/X86-64_Instruction_E...addressing 21:43
and this beaty here: wiki.osdev.org/X86-64_Instruction_E...dressing_2
timotimo oh, that's not gigantic
it's just a lot 21:44
brrt yeah, it's managable, it's just highly irregular
21:45 kjs_ joined
timotimo turn off your patern recognition brain parts and it'll feel better, eh? :D 21:45
you won't even notice there's no regularity to it!
brrt right :-)
i'm going to sleep
see you tomorrow!
timotimo good night brrt! 21:46
22:29 tokuhiro_ joined 22:53 kjs_ joined 23:40 tokuhiro_ joined