| timotimo | huh | 00:04 | |
| why is there an implementation of istype in the jit? | |||
| 1) it turns into a call to MVM_6model_istype, but pretends the return value is void and | |||
| 2) it can be invokish | |||
| dalek | arVM/jit-moar-ops: 8c79da9 | (Timo Paulssen)++ | src/jit/graph.c: istype is invokish and its return value ought to be INT, not VOID so i just comment it out for the time being. |
00:07 | |
| arVM/jit-moar-ops: d25255a | (Timo Paulssen)++ | src/jit/graph.c: support for bindpos_i is trivial to add |
|||
| arVM/jit-moar-ops: f5229ae | (Timo Paulssen)++ | src/ (3 files): add isint/isnum/isstr/islist/ishash in the interest of finding out what other ops will cause bails down the road. These ops ought to be implemented by generating a tiny bit of assembly code instead of a 15cb79e | (Timo Paulssen)++ | src/ (3 files): add isint/isnum/isstr/islist/ishash in the interest of finding out what other ops will cause bails down the road. These ops ought to be implemented by generating a tiny bit of assembly code instead of a C call, but i leave that for brrt++ to do. |
|||
| timotimo | wtf | 00:11 | |
| out of nowhere, 128 bails for jumplist have appeared | |||
| with jit-moar-ops, i cannot build nqp any more. i get a strange error: | 00:19 | ||
| /usr/bin/perl tools/build/gen-cat.pl moar src/QRegex/P5Regex/Grammar.nqp src/QRegex/P5Regex/Actions.nqp src/QRegex/P5Regex/Compiler.nqp > gen/moar/stage2/NQPP5QRegex.nqp | |||
| ./nqp-m --target=mbc --output=NQPP5QRegex.moarvm \ | |||
| gen/moar/stage2/NQPP5QRegex.nqp | |||
| Unhandled exception: Missing or wrong version of dependency 'gen/moar/stage1/nqpmo.nqp' | |||
| at <unknown>:1 (./NQPHLL.moarvm::1070) | |||
| well, jumplist is getting more and more interesting :) | 00:31 | ||
| dalek | arVM/jit-moar-ops: 7483c74 | (Timo Paulssen)++ | src/jit/graph.c: implement iscclass |
00:37 | |
| arVM/jit-moar-ops: 55a74f0 | (Timo Paulssen)++ | src/jit/graph.c: implement nfarunalt |
|||
| arVM/jit-moar-ops: f5cd5be | (Timo Paulssen)++ | src/jit/graph.c: implement substr_s and index_s |
|||
| timotimo | sadly can't implement ne_s, as it'd need a negation in addition to the call to MVM_string_equal ... | 00:39 | |
| i think that's pretty much all the hot ops i can implement properly | 00:42 | ||
| the rest is just ops that we run into fewer than 10 times | 00:43 | ||
| hoelzro | I would say "cool", but I don't really understand what you're doing right now timotimo =) | 00:45 | |
| dalek | arVM/jit-moar-ops: cc9e819 | (Timo Paulssen)++ | src/jit/graph.c: fix setelemspos |
00:47 | |
| arVM/jit-moar-ops: bd0c3b9 | (Timo Paulssen)++ | src/jit/graph.c: push_n and push_s are also easy. |
|||
| timotimo | well | ||
| the jit will be invoked for every frame that spesh generates and that is considered even hotter still | 00:48 | ||
| it will just optimistically run and as soon as it sees an op that it doesn't know, it'll bail | |||
| it also writes "BAIL" and the name of the op into the jitlog | |||
| hoelzro | ah, I read about that on your blog | ||
| timotimo | yup | ||
| hoelzro | pretty cool stuff =) | ||
| timotimo | these ops i'm implementing are ops that i can just map 1:1 to C function calls | ||
| hoelzro | but something about how the JIT doesn't generate faster code yet? | 00:49 | |
| timotimo | well, there's multiple parts to that problem | ||
| for one, it doesn't know how to deal with perl6's extops, like p6bind, p6bool, ... | |||
| so it wouldn't be used in any of the benchmarks we have | |||
| hoelzro | ah ha | ||
| timotimo | the other part is that the "mainline" of programs have variadic arguments, which makes spesh itself bail before even trying to do anything | 00:50 | |
| hoelzro | so it may benefit NQP, but not Rakudo (yet)? | ||
| timotimo | so neither the perl6 benchmarks nor the nqp benchmarks will show any difference at all | ||
| except slower run-time, because it tries to jit things here and there | |||
| hoelzro | ah ha | ||
| timotimo | it can benefit perl6 programs, but only for frames where we generate super tight code and also don't hit any extops | ||
| hoelzro | I see | ||
| timotimo | i have no idea how many that would be | ||
| gist.github.com/timo/f92ff69eb4045.../revisions ā look at these diffs, it's fun to look at how implementing one op removes it completely, but ups the counter on a bunch of other ops | 00:51 | ||
| hoelzro | yiks | 00:52 | |
| *yikes | |||
| timotimo | refresh for one more | ||
| ah, look! | |||
| i implemented push_s and 4 bails disappeared, no bails were added | |||
| that means we were able to jit-compile 4 more frames! :) | |||
| oh, interesting | 00:53 | ||
| all of these 4 frames were versions of !dba | |||
| hoelzro | \o/ | 00:55 | |
| timotimo | the sum of the "bytecode size" lines is 429597 | ||
| whatever that means :) | |||
| hoelzro shrugs | 00:56 | ||
| I burned myself out on Moar stuff last night | |||
| so timotimo++ for soldiering on | 00:57 | ||
| timotimo | gist.github.com/timo/f13611f6d587bb1e9188 - lookie here | 00:59 | |
| hoelzro | whoa | 01:00 | |
| timotimo | turning off the jit actually makes it faster at the moment | ||
| hoelzro | 40 second parse? | ||
| timotimo | but it's still way faster than without spesh at all | ||
| yeah, this is just my laptop | |||
| hoelzro | I think I get 120 seconds | ||
| timotimo | huh? | 01:01 | |
| when was that %) | |||
| hoelzro tries again | |||
| timotimo | well, okay, this laptop is only half a year old | ||
| hoelzro | gah, I'll wait until after nqp-js tests fniish | ||
| timotimo | and was pretty up-to-date at that time | ||
| hoelzro | I built my desktop like 3 months ago o_O | ||
| and it's the fastest machine I've built perl6 on! | |||
| ok, let's see... | 01:02 | ||
| timotimo | there must be something wrong, surely | ||
| hoelzro | ok, I'm on drugs | 01:05 | |
| 46 seconds | |||
| maybe I was thinking about parrot speeds? | |||
| hoelzro should really resume S26 work | |||
| timotimo | i like S26 | 01:06 | |
| hoelzro | you wanna help? =) | ||
| timotimo | i've already done a lot way back when :S | 01:07 | |
| the code was strenuous to wade through, tbh | |||
| i had hoped i could refactor the parsing of pod completely | |||
| make it more "parametric" | |||
| hoelzro | ugh, I had the same hope | ||
| at this point, I feel like I've done an "ok" job | |||
| timotimo | (>^_^)> | 01:08 | |
| hoelzro | there's still much to be done | ||
| so I should stop chasing Moar bugs, playing with nqp-js, and messing with lib/Test.pm =) | 01:09 | ||
| timotimo | now i implemented tc, lc, uc and it split into le_s (which is now at 1) and ordat (which is now at 6) | ||
| hoelzro | tbf, the first was a result of the last! | ||
| huh | |||
| it's like trying to plug a leaky boat | |||
| timotimo | ordat wants a bit more than just a call to the C, it wants to verify the string's length, too | ||
| well, yeah :) | 01:10 | ||
| but it's kind of a fun boat | |||
| hoelzro | =) | ||
| timotimo | tomorrow brrt will implement sp_findmeth, which will cause an explosion of new bails | ||
| hoelzro | oh joy | 01:11 | |
| timotimo | it'll potentially spread out into 358 individual ops and ++ all of them :P | ||
| hoelzro .oO( bailing out a boat? ) | |||
| timotimo | (i don't think we have that many ops) | ||
| oh, huh | |||
| 650 is the last line of the oplist file that contains non-sp ops | |||
| hoelzro | so, how does that work? I figured once you implemented the JITing for one op, it wouldn't show up again? | ||
| timotimo | 40 is the first one | ||
| so we have 610 individual ops? wow. | |||
| that's right | 01:12 | ||
| hoelzro | so everytime you implement one, you potentially incr the counts of other unimpl'd ones? | ||
| timotimo | yup | ||
| maybe even from 0 | |||
| hoelzro | grr | 01:13 | |
| timotimo | look at the very first diff on that page | ||
| that was when i implemented bindpos_i | |||
| 32 less for bindpos_i | |||
| 1 more iscclass, 1 more indexat, 20 more islist, 10 more sp_findmeth | |||
| hoelzro | nuts | 01:14 | |
| timotimo | so not a single extra frame got compiled after that | ||
| but later i implemented islist | |||
| that's the next diff, one up | |||
| that diff confused me, though | |||
| didn't actually do the counts there | 01:15 | ||
| oh, i also un-implemented istype there | |||
| i'm surprised it didn't blow up majorly; istype would always leave the value in the register untouched | |||
| oh, hold on | 01:16 | ||
| i think i was wrong about *that* part | |||
| but still, istype is an invokish op, the jit has to be extra careful around these | |||
| hoelzro | I see | ||
| it's pretty cool | |||
| I can't wait to see how well Moar does after the JIT is fully in place | 01:17 | ||
| timotimo | aye. | ||
| i wonder how long the core setting compilation would take if we subtracted the time it takes to attempt a compilation and bail in the middle of it | 01:18 | ||
| i.e. if it'd be worth it already | 01:19 | ||
| the thing is, we still read and write from the working memory before and after every single translated op | |||
| brrt is going to implement a nontrivial feature in dynasm for that to improve | 01:20 | ||
| TimToady | you might want some kind of pragma to say "Don't attempt to jit this bit for now." | ||
| timotimo | i.e. keeping values in registers between ops | ||
| TimToady | if you know it's going to bail | ||
| timotimo | i'd assume it's not worth the time it'd take to implement :) | 01:21 | |
| TimToady | 'course then you fix the optimizer to work with that bit, and then it doesn't jit, d'oh | ||
| timotimo | gotta run for now :) | 01:22 | |
| hoelzro | later timotimo | ||
| timotimo | :) | 01:26 | |
|
02:01
btyler joined
02:05
btyler joined
02:29
jimmyz joined
|
|||
| jimmyz | Stage parse : 39.273, before: 44 | 02:29 | |
| since yesterday | 02:30 | ||
| timotimo | sweet :) | 02:31 | |
| jnthn made some pretty radical improvements in nqp, and a few in moarvm as well | |||
| i'm assuming you're on nom/master/master? | |||
|
02:35
btyler joined
|
|||
| jimmyz | timotimo: yeah | 02:37 | |
| timotimo | yeah what exactly? :) | ||
| lizmat: just don't try the moarvm/jit-moar-ops branch with --enable-jit; it's slower than what we have with spesh, but it's still faster than without spesh | 02:38 | ||
| i only really wanted to benchmark the json thing, but i ended up kicking off a complete benchmark run ... oh well | 02:39 | ||
| when it's done with rakudo-moar, nqp-moar will only take about half the time ... :) | |||
| jimmyz | timotimo: I'm on nom/master/master :P | 02:40 | |
|
02:41
btyler joined
|
|||
| timotimo | ah | 02:44 | |
|
03:42
itz joined
04:19
ventica joined
|
|||
| ventica | o/ | 04:19 | |
|
05:36
jnap joined
05:59
FROGGS joined
|
|||
| nwc10 | jnthn: setting... | 06:02 | |
| was: cmd: Rounded run time per iteration: 7.3881e+01 +/- 9.3e-02 (0.1%) | |||
| now: cmd: Rounded run time per iteration: 7.1820e+01 +/- 4.2e-02 (0.1%) | |||
| win. (about 2.5% win) | |||
|
06:14
brrt joined
|
|||
| brrt | \o | 06:15 | |
| timotimo: istype is correct i believe, the function is actually marked as invokish in oplist (which means it's handled), and it's passed the address of the destination register, thus the return value is void | |||
| thanks for bringing it to my attention, though :-) | 06:16 | ||
| sergot | hi o/ | 06:17 | |
| brrt | also, seeing that core.setting benchmark makes me sad | 06:18 | |
| o/ sergot | |||
| oh, you said that already | 06:21 | ||
| much timotimo++ for hard work, though | 06:22 | ||
|
06:40
jnap joined
06:54
zakharyas joined
|
|||
| nwc10 | brrt: which benchmark, and why sad? | 07:18 | |
| brrt | nwc10: gist.github.com/timo/f13611f6d587bb1e9188 this one | ||
| and although i think there's a logical enough explanation for it, it still hurts | |||
| nwc10 | I didn't say, but I didn't see faster setting compiles with JIT | 07:19 | |
|
07:40
oetiker joined
07:46
jnap joined,
oetiker joined
|
|||
| nwc10 | ==30641== in use at exit: 776,094,561 bytes in 2,625,245 blocks | 08:02 | |
| ==30641== total heap usage: 17,741,594 allocs, 15,116,349 frees, 3,670,670,682 bytes allocated | |||
| not quite a fair comparison with irclog.perlgeek.de/moarvm/2014-07-29#i_9101664 as the former is the JIT, the latter is not, and JIT will allocate more | 08:03 | ||
| botu quite a lot less allocation | |||
|
08:03
FROGGS joined
|
|||
| brrt is further triangulating the regex bug | 08:06 | ||
| hmm, very interesting | 08:42 | ||
| brrt is now hoping this is some sort of OSR bug | 08:44 | ||
| correction, a JIT-OSR bug | 08:47 | ||
| and, it's not that | 08:51 | ||
| /me brb | 08:52 | ||
|
09:03
jnap joined
09:25
brrt joined
|
|||
| brrt | ok, long story short, it isn't OSR | 09:37 | |
| jnthn tries to find yesterday's number | 09:38 | ||
| (for allocated) | |||
| brrt | o/ jnthn | 09:39 | |
| jnthn | nwc10: oh wow, that's like, a third to a quarter of what it was. | 09:40 | |
| o/ brrt | |||
| In other news, yay, I have a working keyboard | |||
| On my laptop | |||
| brrt | \o/ | ||
| jnthn | :))))) <== I can smile again and it works | ||
| brrt | in other news, i'm hardly a step further in the regex bug business | 09:41 | |
| jnthn | Regex bug? | ||
| It was a mis-code-gen, or? | |||
| brrt | probably | 09:42 | |
| basically, the t/qregex/01-qregex.nqp test breaks reliably with iter, but not without | 09:43 | ||
|
09:43
japhb_ joined
|
|||
| brrt | because the QRegex compiler doesn't compile the regex with jit | 09:43 | |
| and.. i have /still/ no idea why exactly this is | 09:44 | ||
|
09:44
[Coke]_ joined,
TimToady_ joined
09:45
cognome joined
|
|||
| brrt | boxing ops aren't normally invokish, are they? | 10:06 | |
| jnthn | No | ||
| Never | |||
| brrt | hmm no, you're right | 10:08 | |
| that'd be weird | |||
| ok, no need to look there | 10:09 | ||
| jnthn | lunch; bbiab | 10:12 | |
|
10:25
jnap joined
|
|||
| brrt back from lunch :-) | 10:37 | ||
| y this no work | 10:50 | ||
| nwc10 | jnthn: as stated in #perl6, MoarVM about 6.5 times faster than parrot at compiling the setting | 10:53 | |
| will take 27 more 2.5% speedups to get to a factor of 10 | 10:56 | ||
| brrt | \o/ | ||
| brrt wonders if JIT will get us there once it ACTUALLY WORKS | |||
| and what an actual optimizing JIT could make of regexes | |||
| well, it isn't istrue combined with iter | 11:18 | ||
| because that looks just dandy | |||
| tl;dr | 11:19 | ||
| 'ok frame is ok' | |||
|
11:28
jnap joined
|
|||
| brrt | y u compile so much | 11:44 | |
| timotimo | brrt: do you feel like jit-moar-ops could be merged into master? it seems to compile rakudo and nqp just fine | 12:28 | |
|
12:30
Ven joined
|
|||
| timotimo | also, i would revert the "omg istype is probably broken!" commit | 12:30 | |
| oh, i was wrong, there's still the "missing or wrong version of dependency nqpmo.nqp" thing when building nqp | |||
| brrt | make clean | 12:31 | |
| :-) | |||
| i'll check it in a bit | 12:32 | ||
| i'm stepping through the frame now | |||
| timotimo | make clean doesn't help | 12:34 | |
| brrt | hmm what | ||
| weird | |||
| 32 bytes seems a bit small for fastcreate, no? | 12:43 | ||
| timotimo | don't know? | ||
| don't understand the context | |||
| i was wondering ... if we have a string that's mostly ascii-encodable, but has a few multibyte-utf8-chars here and there, we could pick it apart into multiple strands and have direct jump-to-correct-byte-for-offset access to big parts of the string | 12:58 | ||
| nwc10 | you can do that by using any fixed with encoding, without needing strands | 12:59 | |
| timotimo | yeah, but if we have a 20k character string with a single ƶ in it, we'd blow it up to 4x the size (if we use ucs-32) | 13:00 | |
| nwc10 | possibly at the cost of more memory. But the overhead of tracking the multiple strands will not be free | ||
| agree. but implementing the 8 bit NFG would also address this | |||
| timotimo | next thing i'll do is get a c-level profile of parse-json | ||
| nwc10 | mostly it's only going to be 2x the size, if we can use ucs-16 | 13:01 | |
| timotimo | except ucs-16 isn't a fixed-width encoding | 13:02 | |
| well, not in the strict sense | |||
| if we do analyze the string up front, then yeah | |||
| set and decont are kinda hot | 13:07 | ||
| interesting, guardconc is also considered hot | 13:09 | ||
| well, all these "hot" things are probably just "warm" | 13:10 | ||
| another warm instruction is in getspeshslot | |||
| brrt | getspeshslot? i thought i did that already | ||
| as well as guardconc | |||
| timotimo | i'm probably reading this wrong | 13:11 | |
| this is trying (and failing) to interpret a report from perf, not a jit bail log | |||
| MVM_frame_decref is warm, too | 13:12 | ||
| are you interested in a bail log for parse-json? | 13:14 | ||
| gist.github.com/timo/89aafc240d13748e4278 - this is from the perl6 version of that benchmark | 13:15 | ||
| AFK | |||
| brrt | yes, i am | 13:20 | |
| paramnameused is a big one | |||
| I CAN"T FIND THE SOURCE OF THIS BLOODY BUG | |||
| and i'm not starting anything until i found it | 13:23 | ||
| jnthn | brrt: Hm, I should probably take a look this evening :) | 13:35 | |
| See if I can spot anything of note :) | 13:36 | ||
| brrt | i'd be delighted :-) | 13:37 | |
| this feels something between Real Work and a Complete Waste Of Time | |||
|
13:38
klaas-janstol joined
|
|||
| brrt | because i'm truly so little further | 13:38 | |
| jnthn | What do we know so far? Where do you have to disable JIT to make it work? | 13:39 | |
| brrt | what i know is that disabling iter and friends make it run, if only because it disables many if not all regex methods | 13:40 | |
| jnthn | ah, ok | ||
| Well, every regex method uses jumptable too | 13:41 | ||
| So it won't be a regex method per se | |||
| It may well be something in the regex engine though | |||
| brrt | timotimo also reported not being able to run because of missing or wrong dependency complaints, but i typically get that when moarvm and nqp no longer feel like they agree | ||
| jnthn | like MATCH or CAPHASH | ||
| brrt | well, i know have the exact frame that was compiled just the moment before the errors start | ||
| so i'm hoping that should give some insight | 13:42 | ||
| and yes, it uses iter | |||
| but i've not seen anything weird yet | |||
| timotimo | brrt: fwiw, disabling the jit for only the single call that fails with "blah dependency" doesn't fix it; disabling the jit completely does, however | 14:32 | |
| brrt | you've made quite a few problems for me this week :-) | 14:33 | |
| timotimo | the good kind of problem, i hope | 14:35 | |
| brrt | pff | 14:36 | |
| lets just say you've uncovered a boat load of jit bugs :-) | |||
| timotimo | :) | ||
| brrt | waitaminute | 14:38 | |
| why... why does shift_o bottom out as an u8? | 14:40 | ||
| timotimo | oh noes! did i do that wrong? ;( | 14:45 | |
| i don't see anything wrong with my implementation off-hand | |||
| dalek | arVM/jit-moar-ops: e4f28ab | (Timo Paulssen)++ | src/jit/graph.c: istype is actually totally correct; i didn't see the REG_ADDR also, it's marked as "invokish", so the jit knows how to deal with that part of the problem at least. |
14:47 | |
| brrt | no, probably not | ||
| hmm.. gdb is just funky i guess | 14:52 | ||
| timotimo | that's not a good sign ... | 14:54 | |
| brrt | that's optimization for you | 14:55 | |
| dalek | arVM/jit-moar-ops: 542fe59 | (Timo Paulssen)++ | src/jit/graph.c: implement uc, lc, tc. |
15:05 | |
| arVM/jit-moar-ops: 5239e3e | (Timo Paulssen)++ | src/ (3 files): implement splice. |
15:06 | ||
| arVM/jit-moar-ops: bca161a | (Timo Paulssen)++ | src/jit/graph.c: implement split. |
15:24 | ||
| timotimo | implementing split caused 2 more frames to be compiled in the core setting! :) | ||
| 1.5k frames still bailing out, though | |||
| [Coke] | q; is having other folks committing to the jit stuff at all confusing for final grading? | 15:26 | |
| (it's awesome for the community and I entirely support it; just don't want to screw up grading) | |||
| timotimo | the contributions i make are all trivial | ||
| brrt | \o/ | 15:39 | |
| not so trivial, they have to be correct too | |||
| git can get you a log of all my commits and changes | |||
|
15:43
ventica joined
|
|||
| TimToady | brrt++'s work is already awesome; it would be difficult to ruin his grade :) | 15:49 | |
| just having the framework in place is a great thing, even if there are still enough opcodes and/or bugs at the end to mask the eventual performance improvements | 15:51 | ||
| JIT by definition tends to perform poorly when there are scattered weak links in the chain of opcodes, which can result when you have code in a language that is not well designed for JIT :) | 15:53 | ||
| brrt | i would add that if you didn't know you were going to do a JIT right at the start (e.g., LuaJIT-2.0), it can be difficult in any language | ||
| TimToady | I think the JIT performance vs effort curve will typically look like a hockey stick. | 15:54 | |
| brrt | and i would further add that some things which are o so simple in plain-old-c can be devillish in ASM :-) | ||
| brrt certainly hopes so | |||
| until we hit another barrier | |||
| and then it'll be flat again for a while, and given enough effort, might increase again | 15:55 | ||
| TimToady | pity we don't have a language designer around to fix the language... :) | ||
| brrt | so perhaps sigmoidal? | ||
| TimToady | prolly | ||
| carlin | sounds like a stair-case | 15:57 | |
| TimToady | "It'f fubconfiouf!" --Sigmoid Frund | ||
| brrt dinner & | 15:59 | ||
|
16:45
brrt joined
|
|||
| brrt | we should be able to dump the moarvm call stack using moar-gdb.py | 16:46 | |
| jnthn | Can you not just MVM_dump_backtrace(tc); in gdb? | ||
| timotimo | you should be able to | 16:47 | |
|
16:50
klaas-janstol joined
17:02
FROGGS joined
17:33
vendethiel joined
|
|||
| brrt | i'll try, but tc has been... optimized out :-( | 17:39 | |
| why, that does explain a lot | 17:41 | ||
| although it puzzles me why there's an invoke there | 17:51 | ||
| jnthn | "there"? | ||
| jnthn has nommed and will attempt to work in his unwanted free home sauna | |||
| brrt | there is this, it seems | 17:53 | |
| github.com/perl6/nqp/blob/master/s...r.nqp#L353 | 17:54 | ||
| jnthn | I'm extremely surprised if you're in a JIT-compiled cclass_elem in so far as we don't compile jumptabl yet... | 17:55 | |
| brrt | well, the code fragment within is compilable :-) | ||
| jnthn | ah, yes | ||
| brrt | anyway, bbi2h or so :-) | 17:56 | |
| jnthn | ok | ||
| brrt | i think my %seen creates a bindlex? | ||
| jnthn | It's just a decl afaik | ||
| brrt | there's a bindlex i can't really explain otherwise | 17:57 | |
| but, i'll be really of now | |||
| jnthn | OK. Lemme know when you're back; I can look through the disassembly. | 17:58 | |
| m: say 1041064 * 64 | 18:05 | ||
| camelia | rakudo-moar 11e193: OUTPUTĀ«66628096ā¤Ā» | ||
| jnthn | m: say (1041064 * 64) / 4194304 | 18:06 | |
| camelia | rakudo-moar 11e193: OUTPUTĀ«15.885376ā¤Ā» | ||
| timotimo | what awesome patch do you have for us now? :) | 18:11 | |
| also ... aaw, no brrt for 2 hours? :( | |||
|
18:14
Ven joined
|
|||
| jnthn | Will let you know in 5 mins if I have one :) | 18:24 | |
| TimToady is always disappointed when he reads that jnthn will not be available till evening, till he remembers that jnthn's evening comes, like, nine hours earlier :) | 18:35 | ||
| jnthn | :) | ||
| I realized that every alternation and protoregex evaluation ended up taking a closure for a needless reason. | |||
| FROGGS | I believe there a a lot of spots in our codebase like that | 18:36 | |
| TimToady | sugoi! | 18:37 | |
| which, as in english awesome/awful or terrific/terrible, can mean both really good and really bad, though usually good | 18:38 | ||
| cf awfully good and terribly good | |||
| not that STD's regex engine isn't chockablock full of closures to emulate laziness... | 18:41 | ||
| jnthn | m: say 5524580 - 4483251 | 18:48 | |
| camelia | rakudo-moar 509b1a: OUTPUTĀ«1041329ā¤Ā» | ||
| jnthn | Yeah, I'll happily make a million less closures, thanks. | ||
| TimToady | .oO(we merely have to call the slow-path binder a million times instead...) |
18:49 | |
| jnthn | There's no slow-path binder in NQP :) | 18:50 | |
|
18:59
cognome joined,
cognominal__ joined
19:02
Ven joined
|
|||
| japhb_ | Waiting for the day when jnthn can say "There's no slow path in NQP" ... | 19:06 | |
|
19:35
brrt joined,
cognome joined
|
|||
| brrt | jnthn, timotimo: i'm back | 19:35 | |
| timotimo | yay :) | 19:37 | |
| turns out i was able to occupy myself with friends & food until brrt came back :) | |||
| brrt | i'll have fully annotated dissassembly in about an hour or 2 | ||
| (annotating dissassembly is expensive) | 19:38 | ||
| dalek | arVM: d51a9cf | jnthn++ | src/ (2 files): Cache fates array rather than re-allocating. |
||
| arVM: a4ac569 | jnthn++ | / (3 files): Cache frame index, to avoid a linear scan. Turns out linear scans for frame indexes dominated assembly time. With this, stage mbc for Rakudo's CORE.setting is a third of what it was. |
|||
| jnthn | m: say 35 + 73 | 19:39 | |
| camelia | rakudo-moar c9ad80: OUTPUTĀ«108ā¤Ā» | ||
| jnthn | Well under 2 minutes for a full NQP and Rakudo re-build on my box these days. :) | 19:40 | |
| vendethiel | you got a good box :P | 19:41 | |
| what was it like, with last year's parrot ? :) | 19:42 | ||
| jnthn | I forget. I suspect CORE.setting alone was in the 2 minute region though. | ||
| timotimo | holy hell! | 19:43 | |
| down to a third %) | |||
| that is pretty fantastic | |||
| jnthn | Stage mbc was the shortest stage. But still, it's a nice win. :) | 19:44 | |
| timotimo | i always thought it ought to be faster than that | 19:45 | |
| the closure thing ought to make all our parsing faster,n o? | 19:47 | ||
| btyler | 'Stage mbc : 0.539' :) | ||
| hoelzro | great scott | 19:48 | |
| timotimo | er ... huh | ||
| oh! | |||
| stage mbc | |||
| i thought we were talking about stage mast | |||
| well, that's still a nice little win :) | |||
| jnthn | btyler: Heh, 0.287 here :) | 19:50 | |
| timotimo | 0.275 | 19:51 | |
| i win :) | |||
| jnthn | :P | 19:55 | |
| tadzik | "stage mbc for Rakudo's CORE.setting is a third of what it was" ( ͔° ĶŹ ͔°) | 20:00 | |
| timotimo | what does that face mean? :\ | 20:02 | |
| jnthn | "I know more weird chars than you" | 20:03 | |
| tadzik | yeah :) | 20:04 | |
| timotimo | oke | 20:06 | |
| brrt | ok, apparantly now i'll be iterating a hash, lets see if that does anything strange | 20:30 | |
| oh, my dumb ass just stepped over the rest of the frame, completely not finding why it doesn't work | 20:42 | ||
| jnthn | I normally use a horse rather than an ass, tbh... :P | ||
| Is the bug certainly in that frame? | 20:43 | ||
| Also, do you know exactly which frame it is? | |||
| brrt | what i know is that there is no bug before that frame is compiled :-) | ||
| i used the break-on-print / break-on-compile technique | 20:44 | ||
| basically, most of the time the compiler JITs quite a few frames of itself, right? so i need to skip these before getting to the frame that i'm really interested in | 20:45 | ||
| i can try the reverse as well, checking if the bug persists if i put something uncompileable in that frame | |||
|
20:46
Ven joined
|
|||
| brrt | anywya, iterating on a VMArray works perfectly | 20:46 | |
| related question: what's still really uncompileable? | 20:48 | ||
| (plenty of things, of course.. but :-) | 20:54 | ||
| jnthn | Well, since we're actually hitting jumplist now sometimes, that means fixing that would get us compiling various regexes. | 20:55 | |
| Well, and tokens/rules | |||
| brrt shivers | |||
| exciting, also very scary | 20:56 | ||
| right now, i'm suspicious of sp_fastcreate | 20:57 | ||
| jnthn | Oh? | ||
| It is JITted to quite a bit of code. | |||
| brrt | basically, it creates an object of only 32 bytes large | 20:58 | |
| thats big enough for a header, but nothing more | |||
| it's a hash iter all right | 21:00 | ||
| and the object is a VMHash | 21:01 | ||
| so far, so good | |||
| also, ,the somewhat funky loop body is transformed into a routine call for reasons unbeknownst to me | 21:03 | ||
| if they hadn't, i wouldn't have seen this bug, so... | |||
|
21:04
klaas-janstol joined
|
|||
| brrt | further spesh opportunities would be - i'd say - creating sp_istrue_iter (and prefereably, sp_istrue_iterhash / sp_istrue_iterarray), and transforming if_o for iters into those + if_i | 21:06 | |
| that should help a bundle wrt to making a fast implementation | 21:07 | ||
| and it should be easy since iter is almost always followed by unless_o or if_o | |||
| jnthn | Yeah | ||
| brrt | but that's just, like, my opinion :-) | 21:08 | |
| jnthn | It makes sense. | ||
| brrt | (in fact, spesh gives us an opportunity to disintermediate so many things) | ||
| it's really exciting | |||
| i should not forget that, probably, as i'm struggling to understand this | 21:09 | ||
| jnthn | Yes, spesh is really about taking away various bits of late bidning. | ||
| brrt | and it should also remove the completely uneccesary invokish guard in this case, since istrue for an iter never invokes | 21:10 | |
| jnthn | yeah | 21:12 | |
| brrt | MVMIter shift always returns the hash for a return value? that seems.. odd | 21:18 | |
| jnthn | Not the hash | 21:22 | |
| The iterator. | |||
| brrt | you're quite right | 21:23 | |
| jnthn | Did investigating sp_fastcerate go any further? | 21:28 | |
| brrt | no, not really | 21:29 | |
| the object behaves exactly as intened | |||
| it's a VMHash btw | 21:30 | ||
| nwc10 | jnthn: ASAN barfage | ||
| I need to go to bed | |||
| ==8038==ERROR: AddressSanitizer: heap-use-after-free on address 0x619000356b80 a | |||
| t pc 0x7fa6838f49bc bp 0x7fff6e2d9900 sp 0x7fff6e2d98f8WRITE of size 8 at 0x619000356b80 thread T0 #0 0x7fa6838f49bb in nqp_nfa_run src/6model/reprs/NFA.c:406 | |||
| ]blah | 21:31 | ||
| 0x619000356b80 is located 0 bytes inside of 1088-byte region [0x619000356b80,0x619000356fc0)freed by thread T0 here: #0 0x7fa6841778e6 in __interceptor_realloc ../../.././libsanitizer/asan/asan_malloc_linux.cc:93 #1 0x7fa6838f4970 in nqp_nfa_run src/6model/reprs/NFA.c:403 | |||
| brrt | and it is precisely as large as it needs to be in fastcreate, namely 32 bytes | ||
| nwc10 | without using a debugger, I guess thatthe problem is | ||
| fates = (MVMint64 *)realloc(fates, sizeof(MVMint64) * fate_arr_len); | |||
| and I would hazard a guess that fate_arr_len is 0 | |||
| t/spec/S05-metasyntax/longest-alternative.rakudo.moar | 21:32 | ||
| Nope | 21:33 | ||
| (gdb) p total_fates | |||
| $1 = 1 | |||
| (gdb) p fate_arr_len | |||
| $2 = 1 | |||
| jnthn | Huh, what on earth frees it... | ||
| nwc10 | #1 0x7ffff66d6970 in nqp_nfa_run src/6model/reprs/NFA.c:403 | 21:34 | |
| the realloc() 3 lines before that kaboom | |||
| jnthn | ah | ||
| Darn, yes | 21:35 | ||
| nwc10 | oh, it never assignes back to tc->fates | 21:36 | |
| I mean tc->nfa_fates | 21:37 | ||
| jnthn | yeah, testing a fix here now | ||
| And wondering if it's actually to blame for the SEGV I thought a different refactor had caused. | |||
| Yes, it was | 21:38 | ||
| nwc10++ | |||
| nwc10 | win! | ||
| can I go to bed please? :-) | |||
| jnthn | Thanks, that's saved me quite a headache. | ||
| Yes. Sleep well. :) | |||
| nwc10 | glad to be of assistance | 21:39 | |
| brrt | sleep well nwc10 :-) | 21:40 | |
| you're always of assistance :-) | |||
| (really, hash lookups flatten strings? i can only say... wow) | 21:41 | ||
| jnthn | Yeah, for the moment. | ||
| brrt | :-) | ||
| jnthn | We use uthash but need to moarph it some more to handle our strandy strings. | 21:42 | |
| brrt | all strings will get their own brand new normalization right? | ||
| hmm i see | |||
| then using strandy strings might sometimes be much more expensive than imagined | |||
| jnthn | Yeah, for the moment. | ||
| The string opts so far aren't the end of the work :) | 21:43 | ||
| Just enough of an improvement for some previously painful benchmarks. | |||
| brrt | is suposse VMIter has a value method | 21:44 | |
| s/is/i/ | 21:45 | ||
| dalek | arVM: 2f16928 | jnthn++ | src/6model/reprs/NFA.c: Fix the fates allocation optimization. It failed to update a code path that reallocated, leading to a use after free bug. Found by nwc10++ using ASAN. |
||
| jnthn | brrt: Well, VMIter is just a representation | ||
| brrt: But yeah, it can be on a type with a .value method. | |||
| brrt: NQP defines an NQPHashIter type with one, iirc | 21:46 | ||
| brrt | hmm ok | ||
| i suppose my understanding of 6model is still rather limited | |||
| i.e. i treat a REPR as a class | |||
| but that's pretty much wrong :-) | |||
| jnthn | Right | ||
| REPR is about memory layout | 21:47 | ||
| Type = meta-object + REPR | |||
| An STable is per type (the "S" meaning "Shared", as in "stuff things of the same type share in common" | |||
| ) | |||
| brrt | ok, i think i get that | 21:49 | |
| premature optimizer me would say 'how nice would it be if these (meta-obj + repr) could be aligned in memory | 21:50 | ||
| jnthn | Well, a meta-obj is just a normal object | ||
| brrt | i wonder why i'm starting to like nqp | 21:55 | |
| f..... | 21:56 | ||
| oh bloody hell | |||
| this can't be serious | |||
| call MVM_dump_backtrace, ASAN kicks in, crashburn | 21:57 | ||
| jnthn | :( | ||
| Did it point out some intresting kind of corruption? | |||
| brrt | no, not really, probably just because clang optimized something away | ||
| well, i'm pretty sure now the problem is /somewhere/ in that frame | 22:03 | ||
| except that it seems to work precisely as advertised | |||
| as in, /precisely/ as advertised | |||
| why am i so sure? well, i inserted something that i knew wouldn't compile - namely, eq_n, and lo and behold, test no longer crashes | 22:04 | ||
| the secret is in there | 22:05 | ||
| also, despite being not american and never will be a customer of such a data provider, this scares me terribly: blogs.wsj.com/digits/2014/07/30/spr...less-plan/ | 22:07 | ||
| imagine an internet with only 4 sites | 22:09 | ||
| jnthn | ugh | ||
| brrt | if i had that when i was young, i never ever would have gotten even as far as i have | ||
| no irc | |||
| no mailing lists | |||
| no msdn, no wikipedia, no random guys blogs | 22:10 | ||
| (or gals) | |||
| no slackware, fedora, debian | |||
| jnthn | Yeah. Wow. | 22:12 | |
| Quite a step backwards. | |||
| brrt | btw, i find it ironic that it's easier to debug a JIT frame for me than to debug interpreted frame | 22:13 | |
| jnthn | o.O :) | 22:14 | |
| brr | 22:17 | ||
| brrt: oh no, I just spotted something and you're not going to like it :( | |||
| for %seen { | 22:18 | ||
| next if $_.value < 2; | |||
| self.worry("Repeated character (" ~ $_.key ~ ") unexpectedly found in character class"); | |||
| } | |||
| Could it be that "next" there? | |||
| That's implemented as an exception. | 22:19 | ||
| I shoulda seen it before, but I was looking at the previous loop :( | |||
| brrt | o really? | 22:21 | |
| well, that's certainly possible, i guess | |||
| jnthn | Yeah. Stick a breakpoint in throwcatdyn or so | 22:22 | |
| brrt | ok | ||
| if that's it, you are my hero | |||
| gist.github.com/bdw/899799631900fab733fa this is by the way, everything i've collected so far | 22:26 | ||
| that... seems to be it, yet, throwcatdyn is actually called | 22:30 | ||
| jnthn | Yes, trace into it and see what it does... | 22:31 | |
| It'll unwind a frame and then...do something...with the PC | |||
| brrt | well,, the caller frame indeed has a handler | 22:38 | |
| oh, lord | |||
| timotimo | ooooh, are we close to a fix for the hang issues? :) | ||
| brrt | well.... we're close to the /reason/ for the hang issues, that much seems certain | 22:39 | |
| brrt should have debugged unoptimized code long ago | |||
| i suppose the 'next' exception should've been caught by the frame above it? | 22:45 | ||
| jnthn | Yes | 22:46 | |
| exception.c attempts to put the PC in the right place... | |||
| ...then I don't quite know what will happen. | |||
| brrt | well, it seems to jump way above that | ||
| i mean way way above that | |||
| why would that be? | |||
| jnthn | Well, it will unwind the current frame | 22:47 | |
| frame.c has that logic | |||
| Maybe it's then finding some outdated JIT re-entry address? | |||
| And re-entering the JITted code at the wrong place | 22:48 | ||
| Rather than at the place the exception handler points to | |||
| TimToady | long run this one oughta just turn into a goto and never throw | ||
| jnthn | TimToady: Yeah, that relies on the loop body being inlined...which I don't know why it isn't, tbh | 22:49 | |
| TimToady | basically needs to go to whatever NEXTish logic there is | ||
| p5 jumps to the continue block, but optimizes that to jump directly to the while condition, loosely speaking | 22:51 | ||
| (in the absence of a continue) | |||
| maybe lexical reentry conditions are a problem though | 22:52 | ||
| jnthn | Yeah | ||
| It's already on the Moar todo list this month to optimize various throws into gotos. | 22:53 | ||
| TimToady | for %seen has an implicit $_ | ||
| jnthn | Right | ||
| I'm a bit surprised it hasn't managed to optimize that away, though. | |||
| TimToady | but if you know you're gonna clobber it anyway | ||
| no need to reclone | |||
| or whatever it does on entry to the block | 22:54 | ||
| could become a state var, really | |||
| jnthn | Well, creates a frame in theory, but just re-uses the frame from last time in reality. | ||
| TimToady waves hands encourageingly :) | |||
| *ging | 22:55 | ||
| jnthn | I thought I'd taught NQP how to turn such for loops into not needing to invoke, though. | ||
| I'll have to look into why the opt didn't work out this time. | |||
| TimToady | maybe it's does, and that's why there's no catcher for 'next'? | ||
| *it | 22:56 | ||
| just a wag | |||
| about all I do anymore :) | |||
|
23:03
ventica joined
|
|||
| brrt | as it seems to be, it never re-enters the JIT at all | 23:04 | |
| which is what i expereience here, too | 23:05 | ||
| oh, i see what's the problem | |||
| timotimo | yay! | ||
| brrt | basically, search_frame_handlers searches handlers on basis of the current bytecode offset | 23:06 | |
| i.e. i suppose the handler is a mapping between a throwpoint and a catchpoint | |||
| jnthn | Ahhhh | ||
| brrt | clearly, when in the JIT, the current bytecode is the magic bytecode | 23:07 | |
| jnthn | Right. | ||
| brrt | so that doesn't appear in any handler | ||
| so it wals the stack all the way until it finds something that /does/ have a handler, and continues from there | |||
| the quick fix is to disable JIT compilation on graphs that have handlers, i think? | 23:08 | ||
| and then tomorrow work on adding handlers to the JIT | |||
| somehow, at least | |||
| brrt ponders a bit on how that should look, but it will probably involve dynamic labels much like OSR | 23:10 | ||
| jnthn | brrt: Yes, that could work though might rule out a lot of things | ||
| brrt | well, it /should/ rule out a lot of things | 23:13 | |
| it's broken :-) | |||
| i'll fix it, but for now, it won't work | 23:14 | ||
| jnthn | *nod* | ||
| Yes, agree with the way forward. | |||
| timotimo | will you put in that hotfix before bed or tomorrow? | 23:15 | |
| brrt | right now is when i'm testing it | ||
| again, lizmat++ and woolfy++ got me a computer that can speedwise compete with jnthn's :-) so i don't have to wait for compilations so long anymoe | |||
| more | |||
| no more hanging during building | 23:16 | ||
| nqp | |||
| jnthn | \o/ | 23:17 | |
| brrt | btw, jnthn, it might interest you that the block within for %seen is invoked by invoke_o after JIT (and spesh) | 23:18 | |
| i.e., not even fastinvoke | |||
| jnthn | Yeah, I know why :) | 23:20 | |
| dalek | arVM/moar-jit: 7d17cb0 | (Bart Wiegmans)++ | src/jit/graph.c: Don't compile frames that have handlers Handlers and the JIT don't play so nicely together, yet. This is because searching a frame handler looks for the current bytecode offset, which is all off in the JIT because it uses a magic bytecode. So I'll disable the handlers for now. This fixes a long-standing issue whereby adding the iter ops would case infinite loops and other weird behavior. |
||
| brrt | jnthn++ for actually finding the cause | 23:23 | |
| timotimo | that was tough. | 23:25 | |
| jnthn | aye | ||
| brrt++ # persistence | |||
| brrt | :-) | ||
| we're not out of the woods yet | |||
| jit-moar-ops, even with the current fix, can't compile nqp | |||
| timotimo | darn | 23:26 | |
| brrt | current error: 'No applicable candidates found to dispatch to for 'compile_node'. | ||
| timotimo | uh ... huh? | 23:27 | |
| brrt hopes something went wrong in jit-moar-ops | |||
| but i doubt it | |||
| what's wrong with that, you think? | |||
|
23:27
itz_ joined
|
|||
| brrt | oh, that's not a JIT bug | 23:31 | |
| still happens with MVM_JIT_DISABLE=1 :-) | |||
| and not a spesh bug | |||
| maybe a corrupted-install bug | |||
| let's check that first | 23:32 | ||
| thats not it | 23:33 | ||
| jnthn | brrt: Do you have a latest NQP? | 23:34 | |
| brrt | i think i do | ||
| hmm | 23:37 | ||
| seems like a jit bug after all | |||
| or, you know, i just don't know | |||
| maybe this is the bug timotimo was talking about | 23:38 | ||
| jnthn | Sleep on it? :) | ||
| Or are you too zoned in? :) | |||
| dalek | arVM/jit-moar-ops: 7d17cb0 | (Bart Wiegmans)++ | src/jit/graph.c: Don't compile frames that have handlers Handlers and the JIT don't play so nicely together, yet. This is because searching a frame handler looks for the current bytecode offset, which is all off in the JIT because it uses a magic bytecode. So I'll disable the handlers for now. This fixes a long-standing issue whereby adding the iter ops would case infinite loops and other weird behavior. |
23:39 | |
| MoarVM/jit-moar-ops: d14351c | (Bart Wiegmans)++ | src/jit/graph.c: | |||
| MoarVM/jit-moar-ops: Merge branch 'moar-jit' into jit-moar-ops | |||
| timotimo | may need a clean and then a JIT_DISABLE run from the very beginning | ||
| that's how one of my problems behaved | |||
| brrt | yeah, i suspect that will fix it | ||
| but that's nasty | |||
| it means something in the compilation went wrong | |||
| timotimo | yes | 23:40 | |
| in a way that doesn't make it crash | |||
| brrt | i mean that's which circle of debugging hell? | ||
| jnthn | tbh, I'd pull the patches from timotimo++'s branch in one or two at a time until you hit the one that breaks | 23:41 | |
| brrt | yep, that seems like a plan | ||
| again :-) | |||
| :-) | |||
| brrt is going to sleep on it | |||
| jnthn | 'night o/ | ||
| brrt | timotimo: if you're still awake, btw, when did this first appear? | 23:42 | |
| 'night :-) | 23:43 | ||
| timotimo | oof | 23:44 | |
| last night? dunno :( | |||
| brrt | well, doesn't matter, i'll find it eventually | ||
|
23:44
brrt left
|
|||
| timotimo | good luck! | 23:46 | |