00:14
pyrimidine joined
|
|||
samcv | well what was the switch checking | 01:27 | |
every single value of the numbers? | |||
like 0-9? | |||
MasterDuke | samcv: gist.github.com/MasterDuke17/e14fe...0a98abdbb9 this is the code i used to create the switch | 01:28 | |
samcv | MasterDuke, yeah you need to group it by digits | 01:39 | |
MasterDuke | by digits? | 01:40 | |
samcv | uhm | ||
m: say "0".ord.base(2); say "9".ord.base(2) | |||
camelia | rakudo-moar cfae23: OUTPUT«110000111001» | ||
samcv | maybe i need to think about this a little more | 01:42 | |
uhm but if you look by hex | 01:43 | ||
m: say '0'.base(16).say; '9'.base(16).say | |||
camelia | rakudo-moar cfae23: OUTPUT«No such method 'base' for invocant of type 'Str' in block <unit> at <tmp> line 1» | ||
samcv | m: say '0'.ord.base(16).say; '9'.ord.base(16).say | ||
camelia | rakudo-moar cfae23: OUTPUT«30True39» | ||
samcv | so you want to switch ignoring the last hex digit | 01:44 | |
so you can combine all of each set of 0-9 into one 'case' | |||
MasterDuke | more like the pre-existing if statements you mean? | 01:46 | |
samcv | yeah i guess if is more optimized than i thought i guess | 01:47 | |
but we can just generate all the if statements maybe | |||
MasterDuke | for the Unicode Nd digits? | 01:48 | |
samcv | yeAH | ||
MasterDuke | well, that might be nice, but i'm way more interested in making the non-Unicode case faster | 01:49 | |
samcv | yes but | ||
MasterDuke | that's going to be the 99.99% use case | ||
samcv | if we consolidate them together don't we have to do less checks? | ||
and not process each point twice? | 01:50 | ||
MasterDuke | twice? | ||
samcv | uhm let me look at the function again, sorry | 01:51 | |
also that switch would be very slow because they're not in order | |||
the numbers are totally out of order so it would be slow | |||
MasterDuke | well, the ascii digits are first | 01:52 | |
samcv | yeah but it needs a really weird jump table | 01:53 | |
also that table doesn't have everything in it, it's missing a lot | 01:55 | ||
you are only adding Nd general category | |||
MasterDuke | i guess i could extract the ascii digits out into their own switch first | ||
well, isn't that the same as the existing code? | 01:56 | ||
(the Nd general category thing) | |||
samcv | uh | ||
idk | |||
what function is it in MVM again? | |||
MasterDuke | github.com/MoarVM/MoarVM/blob/mast...rce.c#L352 | 02:00 | |
samcv | it does look like it checks Nd | 02:02 | |
wtf why is it checking that :( | |||
MasterDuke | what would it check instead? | ||
samcv | and it looks up the property code... | ||
yeah this is gonna be slow | |||
Numeric_Type > 0 | 02:03 | ||
MasterDuke | ah, but we don't want to do Nl or No | ||
samcv | you want things with numteric values right? | 02:04 | |
MasterDuke | no, just digits | ||
samcv | define digits | ||
like <:Digit> ? | |||
MasterDuke | because unlike the highlander, there can be more than one | ||
samcv | why general category Nd though | 02:05 | |
i mean sure it's the category for numeric digits | |||
you just want only things from 0-9? | 02:06 | ||
in value? | |||
MasterDuke | so the string "12۳4" should give the value 1234 | ||
samcv | of course | ||
idk what you mean by digit. do you want all unicode cp's that have a value? | |||
only ones from value 0-9? | 02:07 | ||
MasterDuke | but "1Ⅷ" is invalid | ||
timotimo | then say "decimal" :) | ||
yoleaux2 | 27 Jan 2017 13:54Z <samcv> timotimo: so I have added some indexes to names.c for every other codepoint | ||
MasterDuke | Ⅷ by itself is valid | ||
samcv | yeah. i had no clue wtf you meant lol | ||
MasterDuke | timotimo++ | ||
timotimo | samcv: you wanted me to try to make lookup from codepoint to name work with your changes? | ||
02:08
pyrimidine joined
|
|||
samcv | so you want Numeric_Type=Decimal then | 02:08 | |
that is what you want | |||
MasterDuke | :shrug: | ||
all i know is .uniprop eq 'Nd' is what i want | |||
samcv | you sure | 02:09 | |
i really don't think | |||
MasterDuke | yeah, that's what TimToady has said | ||
samcv | that is what you want. though it may have full overlap | ||
looking up the property value is gonne be faster than getting the property code for general category | |||
and THEN string comparing that it is 'Nd' | |||
gonna be slow | |||
and it'll have to do that every loop | |||
MasterDuke | true, but only if there is in fact a Nd char in the string | 02:10 | |
02:11
Geth joined
|
|||
samcv | why though. you want decimal digits right? | 02:11 | |
MasterDuke | and i'm way less concerned about speed in that case. sure it should be fast, but the usual ascii case is of most concern | ||
samcv | but it checks them together? | ||
exactly. it has to check that every time and it'll be slow. that's what i'm trying to say | |||
and so by fixing that it will be faster going through that ifelse loop | |||
MasterDuke | ah, yes, the switch does | ||
samcv | you mean the if else if else? | 02:12 | |
MasterDuke | the if else will short circuit on ascii chars and not do the slow lookup | 02:13 | |
samcv | yes | ||
i guess so | |||
timotimo | samcv: i didn't have time today and time will be sparse tomorrow as well .. but feel free to leave a few more comments with some extra detail in the backlog | ||
samcv | kk | 02:14 | |
i did leave comments in backlog? | |||
np tho | |||
timotimo | a little bit | ||
but i don't exactly know what i'm supposed to try to make work :) | |||
samcv | ok the consume a string function needs to be able to skip a specified number of strings | 02:15 | |
and it also needs to be able to center itself and start from the first 0 | |||
because i may give it some position that has the first base40 number not be a 0 | 02:16 | ||
and it needs to skip to the 0, and then skip a specified number of 0's further, if i tell it to skip 1 | |||
before returning. if that makes sense. i have a table that holds values of every other cp's location | |||
timotimo disappears already | |||
samcv | as we had talked about | ||
so right now it just consumes a string and assumes it will be started at the beginning, but i need it to go to the beginning of the next name | 02:17 | ||
so if it sees 600, 255, 0 < three 'base40' numbers i just made up | 02:18 | ||
it has to skip to the 0, then grab a string | |||
or however many further down i request | |||
MasterDuke | i just tried extracting the ascii values out into their own switch before the larger one. it was a little faster than the combined one, but still slower than the existing code | 02:21 | |
samcv | what are you using to bench? can i see | ||
MasterDuke | perf stat -B ./nqp-m -e 'my int $i := 0; my int $s := 0; while ++$i < 1_000_000 { $s := $s + nqp::radix2(10, "1234567890123456789", 0, 0)[0] }' | 02:23 | |
i added a radix2 nqp op that calls a radix2 moar function to make it easier to compare | 02:24 | ||
one thing that shoud help with speed is getting nqp::radix speshed and JITted (neither of which i know how to do) | 02:27 | ||
02:27
ggoebel joined
|
|||
MasterDuke | i just have all my changed locally, are you interested enough for me to push them to my moarvm fork on github? | 02:28 | |
samcv | well i sped it up 50% | ||
by implementing that thing i said | |||
for numbers that are non-ascii | |||
MasterDuke | cool | 02:30 | |
samcv | :) | ||
and can make it even faster when we don't need to call atoi | 02:31 | ||
and just have the values be integer like i have in my rewrite for UCD | 02:32 | ||
MasterDuke | could it use MVM_unicode_codepoint_get_property_int() instead of MVM_unicode_codepoint_get_property_cstr()? | 02:34 | |
samcv | well if the property were integer it could | 02:42 | |
but atm it's just an array of strings | |||
but yeah that's one of the many things trying to fix in the rewrite | 02:46 | ||
MasterDuke | hm, yeah, just removing the atoi and using _int() instead of _cstr() gives 36 for ۳ | ||
02:48
ilbot3 joined
|
|||
MasterDuke | ah, just subtracting 33 makes _int() work, and is faster | 03:12 | |
samcv | lol subtracting 33 | 03:46 | |
except they're not in order :P | 03:47 | ||
goes NaN, -1, 0, 1, 3, 2, 5, 7, 4, 11, 9 | |||
err wait the numeric_value might be in order. numeric_value_numerator def isn't | 03:48 | ||
MasterDuke | heh, yeah, just checked. the first couple i checked worked, but it's definitely not correct | ||
samcv | also there's 1.5 in between 1 and 2 | ||
yep | |||
XD | 03:49 | ||
MasterDuke, we also use ascii strings for storing the decomposition values | 03:51 | ||
m: 'ā'.NFD.say | |||
camelia | rakudo-moar c6e37e: OUTPUT«NFD:0x<0061 0304>» | ||
MasterDuke | can they be reordered to work? or is that something the unicode spec defines | ||
samcv | idk there's so many things that need fixing tho, i already have it working in my rewrite though | 03:52 | |
numeric value's being integer values that is | |||
and not strings | |||
MasterDuke | ok, i won't play around with that part anymore | ||
samcv | not sure how to make the ascii faster tbh tho | ||
MasterDuke | TimToady mentioned python has some very fast code for them that he thought might be worth stealing | 03:54 | |
i know almost nothing about python or where to look for that code, but maybe you do | |||
samcv | don't know much about python either | ||
you mean for Nd's? or just for atoi in general | 03:55 | ||
MasterDuke | irclog.perlgeek.de/perl6-dev/2017-...i_13970619 | 03:57 | |
samcv | hmm looks like we support fullwidth letters even though unicode doesn't list them as AHex digits | 04:41 | |
MasterDuke | well, we support more than hex characters (when radix is > 16) | 05:24 | |
sleep & | |||
samcv | how does that relate to fullwidth letters? | 05:25 | |
night | 05:26 | ||
05:58
pyrimidine joined
07:12
domidumont joined
07:19
domidumont joined
09:19
pyrimidine joined
09:48
Ven joined
10:06
Ven joined
11:36
pyrimidine joined
11:50
Ven joined
|
|||
MasterDuke | samcv: i mean when the radix is >16 we allow letters higher than the hex digits (e.g., radix == 36, a-z are allowed letters/digits/characters, not just the a-f of hex) | 12:29 | |
12:35
domidumont joined
12:38
pyrimidine joined
12:55
Ven joined
13:17
MasterDuke joined
13:45
domidumont joined
14:35
pyrimidine joined
14:37
Ven joined
14:42
pyrimidi_ joined
14:46
Ven joined
15:18
pyrimidine joined
16:37
pyrimidine joined
17:23
pyrimidine joined
17:38
Geth joined
17:39
pyrimidine joined
17:55
pyrimidine joined
18:29
pyrimidine joined
18:34
Ven joined
|
|||
MasterDuke | how do you make an nqp:: op speshable/speshed/(this word has lost all meaning to me)? | 18:40 | |
jnthn | That's a property of MoarVM ops rather than nqp:: ops. | 18:44 | |
And beyond that, it depends what you want to do with it, exactly | |||
src/spesh/ is where stuff happens, src/spesh/graph.h is a good point to start reading | 18:45 | ||
MasterDuke | well, MVM_radix is just interpreted, how would we get it speshed | ||
jnthn | Which op is it? | 18:46 | |
I'm trying to figure out what you're wanting to do :) | 18:47 | ||
MasterDuke | just radix i believe | ||
jnthn | It may be that we don't JIT it | ||
Though it's a fairly large C function so the best we're going to machine, I suspect, is to JIT it into a C function call. That'd still prevent us having to interpret frames with it in. | 18:48 | ||
MasterDuke | i thought things had to be speshable before they could be JITted | ||
jnthn | src/jit/graph.c is the place to look for that | ||
Not really. | |||
spesh = specialization, it mostly means producing devirtualized versions of code | |||
There are various ops where there's no interesting things to do in terms of specialization, but we can still have calls to their implementation JITted into machine code | 18:49 | ||
MasterDuke | same with the coerce_* ops | ||
jnthn | case MVM_OP_radix_I: return MVM_bigint_radix; | 18:50 | |
That's already in src/jit/graph.c | |||
As are | |||
case MVM_OP_coerce_ni: | |||
case MVM_OP_coerce_in: | |||
And a bunch more | |||
MasterDuke | but not regular radix | ||
jnthn | Ah, indeed | 18:51 | |
That just is a function call to MVM_radix | |||
So should be quite easy to add the required bits to src/jit/graph.c | 18:52 | ||
Shouldn't need edits beyond to that one file | |||
And should look like radix_I does | |||
Though the exact number of arguments may differ | |||
MasterDuke | cool. i'll give that a try | ||
timotimo | if the argument to radix is statically (i.e. at spesh time) known to be a constant number, we could dispatch to an optimized version that handles that exact radix, i.e. base 10 | 19:04 | |
MasterDuke | hm, a profile of nqp::radix_I shows 100% interpreted frames and no speshed or JITted | ||
timotimo | frames not getting speshed can have a whole host of reasons | 19:05 | |
among other things not being run often enough, but that's unlikely if you're building a benchmark that calls it often | |||
MasterDuke | this was my test: my int $i := 0; my $s := 0; my $t := nqp::knowhow().new_type(:name("TestBigInt"), :repr("P6bigint")); while ++$i < 1_000_000 { $s := $s + nqp::radix_I(10, "12345678", 0, 0, $t)[0] } | ||
timotimo | jitting, on the other hand, can be prevented by a single op not being implemented in the jit | 19:06 | |
you can figure out if a frame gets prevented like that by outputting a jitlog (i.e. MVM_JIT_LOG=foo.txt) and searching for "BAIL:" | 19:07 | ||
MasterDuke | there were a couple: `BAIL: op <throwpayloadlex>` | 19:08 | |
copying the radix_I case in src/jit/graph.c and adopting it for radix compiles. my benchmark still runs, but nothing seems to have changed | 19:15 | ||
timotimo | i might have tried to build that in the past | ||
how many cases did you copy? | |||
you need at least three, i think | |||
MasterDuke | ah, right | 19:16 | |
fixed the other place, still the same | 19:17 | ||
i gotta afk for a while, but i'll play around some more later | |||
timotimo | well, if the frame with radix in it already doesn't get speshed, it won't have a chance to get jitted in the first place | 19:18 | |
19:22
pyrimidine joined
19:47
pyrimidine joined
|
|||
MasterDuke | timotimo: it looks like pretty simple/straightforward code to me (my test code that is). any reason it wouldn't get speshed? | 21:02 | |
timotimo | *shrug*, i usually don't know myself when i have my own code | 21:04 | |
21:07
domidumont joined
|
|||
MasterDuke | jnthn: got any ideas about why i see radix_I (and my newly added to src/jit/graph.c radix) just as interpreted, not speshed or JITted in the test code i pasted above? | 21:20 | |
21:39
MasterDuke joined
21:44
pyrimidine joined
|
|||
jnthn | MasterDuke: I'd expect OSR to be happening there. Not clear why it isn't. | 22:21 | |
MasterDuke | a profile definitely says no OSR | 22:22 | |
i also ran the same code with my system nqp (so not using my modified moarvm at all), same results | 22:25 | ||
jnthn | *nod* | 22:26 | |
Yeah, I get that here too | |||
22:48
agentzh joined
|
|||
agentzh | hi guys | 22:48 | |
jnthn | o/ | 22:49 | |
agentzh | i've run into a performance issue in the rakudo compilation time where the --profile-compile report shows that get() at SETTING::src/core/IO/Handle.pm:128 takes most of the Exclusive Time (88%) | ||
jnthn | o.O | ||
agentzh | anyone willing to do me a favor to optimize that thing away? :) | ||
jnthn | What's the call count? | ||
agentzh | jnthn: is it the Entries column in the routines report? | ||
jnthn | Yeah | ||
agentzh | 557 | ||
jnthn | What on earth is it reading... | 22:50 | |
agentzh | it's a pretty large perl 6 project | ||
jnthn | Hmm | ||
I wonder if it's pre-comp database files | |||
agentzh | 6K LOC excluding empty lines and comment lines. | ||
MasterDuke | FIWI, i just perf recorded reading a 1m line file | ||
43.92% moar libmoar.so [.] MVM_string_utf8_decodestream | |||
23.08% moar libmoar.so [.] find_separator.isra.6 | |||
agentzh | or 8K LOC if including empty lines and comment lines. | ||
MasterDuke: is that an artificial p6 file? | 22:51 | ||
mine is a real thing. | |||
ported from a working perl 5 project. | |||
jnthn | Oh, one other thing | 22:52 | |
That's blocking I/O | |||
MasterDuke | it wasn't source, i was just doing an nqp::readlinefh through it | ||
jnthn | And it might be reading from a pipe | ||
I think precomp spawns subprocesses | |||
And reads from stdout to get info about the results of that process | 22:53 | ||
agentzh | mine is 46 .pm6 files for 46 compilation units on the file system. | ||
jnthn | So it's possible that what you're actually seeing there is we spend 88% of time waiting on a spawned process to do its thing | ||
agentzh | the report is generated from a partial compilation. | ||
otherwise the resulting profiling report is too big to render in my web browser. | 22:54 | ||
the whole things takes 30+ sec to compile on my side. | |||
jnthn | Yes, but does that trigger compilation? | ||
uh, to be more clear | |||
agentzh | while the report i'm referring to is just a 6 ~ 13 sec partial compilation. | ||
jnthn | Does it trigger compilation of modules? | ||
agentzh | already slow enough for me to cry :) | ||
jnthn | :( | 22:55 | |
agentzh | jnthn: it triggers a few dependent modules to recompile. | ||
even though those modules do not change at all. | |||
nor is the API of the edited file. | |||
MasterDuke | agentzh: you're on linux? | ||
agentzh | MasterDuke: yes. | ||
MasterDuke | can you run `perf record -g --call-graph dwarf <whatever command you run to compile here>`? | 22:56 | |
jnthn | Could also look at the RAKUDO_MODULES_DEBUG (iirc) output to see what exactly it's doing | ||
agentzh | assuming that the profiling report is correct, the IO::Handle::get() thing looks like a very low hanging fruit? | ||
unfortunately i don't have the skillset to optimize that thing myself. so i'm asking for help here :) | |||
it's really a showstopper for me. | |||
imagine a 6 ~ 13 sec delay everytime i make a single edit. | |||
feeling like hacking on a machine sitting on the moon. | |||
*grin* | 22:57 | ||
MasterDuke: trying | |||
MasterDuke | and then when that's done, `perf report --call-graph=none --no-children` | ||
22:58
zakharyas joined
|
|||
agentzh | [ perf record: Woken up 1949 times to write data ] [ perf record: Captured and wrote 487.745 MB perf.data (60689 samples) ] | 22:58 | |
running the 2nd cmd. | 22:59 | ||
gist.github.com/agentzh/c760f72509...380ffae504 | |||
MasterDuke | huh | 23:00 | |
agentzh | need more? | ||
jnthn | agentzh: It triggers recompilation of modules even if nothing in them or their dependencies changed? | 23:01 | |
If so that sounds decidedly odd | |||
agentzh | jnthn: it triggers recompilation of modules depending directly and indirectly on the edited module. | 23:02 | |
using today's rakduo git nom. | |||
jnthn | That's right. | ||
agentzh | it would be great if it can check if the API changes in the dependency module. | ||
jnthn | hehehe | ||
agentzh | since i seldom or never change the module API. | ||
jnthn | Oh, if only Perl 6 was such a simple language that was reasonable to actually do :) | 23:03 | |
agentzh | solely compiling a single CU is fast enough to bear. | ||
but i have to run the tests. | |||
it's all about test driven dev :) | |||
jnthn | *nod* | ||
I think this really boils down to "the compiler needs to be faster" | |||
Which I totally agree with | |||
agentzh | the 7sec ~ 13sec delay is driving me nuts :) | ||
jnthn | I managed to know 20% off CORE.setting, which is a huge compilation unit, a couple of weeks back. TimToady++ is working on improving parse time. | 23:04 | |
agentzh | good to see TimToady is still hacking on rakudo himself :) | ||
MasterDuke: anything useful found in my report? | 23:05 | ||
jnthn | I think the .get thing is a false lead, though. I expect it's blocking in .get waiting to read the results of the precompilation process that it spawned, and all the work is in the subprocess. | ||
So much as I'd love an easy win...I fear there ain't one here. :( | |||
agentzh | the profiling is measuring wallclocks instead of CPU ticks? | ||
jnthn | Yes | ||
agentzh | jnthn: i see. | 23:06 | |
TimToady | I wonder if we could treat 'need' as less of a dep than 'use'... | ||
agentzh | TimToady: i was actually wondering if need can help :) | ||
TimToady | it does restrict you to the OO interface, to the first approximation | ||
agentzh | TimToady: that would be sweet :) | 23:07 | |
TimToady | but I doubt the precomp treats it as less likely to influence the downstream parse | ||
agentzh is hoping p6 comes with something like the C header files. | 23:08 | ||
MasterDuke | agentzh: i don't think so. there doesn't appear to be file reading moarvm functions there, so i'll defer to jnthn's explanation | ||
agentzh | MasterDuke: got it. | 23:09 | |
merging all the CUs into a single p6 file helps a bit. | |||
jnthn | If it only helps a bit, then that pretty much incriminates compilation time... | ||
TimToady | that's essentially what we do with compiling the setting, where we deal with a 60-second turnaround or so | 23:10 | |
so we have plenty of motivation to get the compiler faster our own selves :) | |||
jnthn | Yeah, that's the one I knocked the 20% off recently... :) | ||
Now we just need to find another dozen ways to do that ;) | 23:11 | ||
TimToady | well, most of the ways I know are likelier to knock 5% or 10% off, but still | ||
jnthn | *nod* | ||
TimToady | it's about time for me to test how our dynvar overhead is again, too | ||
jnthn | It'd be kinda nice if we could set of parallel precompilation of dependencies | ||
Trouble is, a `use` can switch out the language | 23:12 | ||
TimToady | we keep adding them, and the dynvar cache was already overstressed | ||
jnthn | Meaning it can cause the `use` after it to compile differently | ||
TimToady | we could say that 'need' promised not to depend on anything OO in the subsequent parse | ||
jnthn | Guess that doesn't prevent us from speculating though | ||
After all, MoarVM speculates happily all over the place and just deopts if it guessed wrong :) | 23:13 | ||
.oO( "just" deopts... ) |
|||
TimToady | we could also try to get a better handle on which modules are actively trying to modify the parse of their users | 23:15 | |
jnthn | There's also that | ||
TimToady | agentzh: are you defining any of your own operators? That's known to slow parsing way down at the moment. | ||
agentzh | with all the CUs merged into a single .p6 file, it now always takes 7.2 sec. | 23:16 | |
i'm amazed to see it's as fast as a typical partial compilation. | |||
jnthn | It's kinda silly in 2016 to be doing stuff that prevents us from parallelizing compilation though :) | ||
agentzh | when always compiling everything in the project. | ||
jnthn | Uh, in 2017 too | ||
agentzh | TimToady: nope. the only thing i'm redefining is .Str(), which does not count as my own operators, right? | 23:17 | |
jnthn | Nope | ||
agentzh | jnthn: yeah, my box has 8 logical CPU cores :) | 23:18 | |
jnthn: i also tried to do parallel recompilation myself via a makefile. | |||
jnthn: but oddly enough it does not make things faster. | |||
jnthn: the module later compiled still recompile its dependencies even though those deps are just manually recompiled right before. | 23:19 | ||
TimToady | that seems a bit like a bug | ||
agentzh | TimToady: i tried both perl6 /path/to/file.pm6 and perl6 -c /path/to/file.pm6 in my makefile commands. | ||
the whole make -j8 time is still 30+ sec. | 23:20 | ||
like before without -j8 | |||
TimToady | something to ask nine about over in #perl6-dev when he's awake | ||
agentzh | seems like rakudo does not test timestamps correctly. | ||
brokenchicken | IIRC we don't use timestamps because they're not precise enough. | 23:21 | |
jnthn | afaik, it doesn't precomp entrypoints | ||
Only dependencies | |||
So you might have had more luck with perl6 -e 'use Module::To::Compile' | 23:22 | ||
agentzh | oh, good to know | 23:23 | |
will try. | |||
okay, i tried again, even without perl6 -e 'use xxx', make -j8 seems to be working now. | 23:31 | ||
maybe i messed thing up the last time i tried make -j8. | |||
now it takes 6.8s to compile everything. | |||
MasterDuke | jnthn: nqp-m -e 'my $f := nqp::open("small_compile.sql", "r");my str $l; while $l := nqp::readlinefh($f) { }', where small_compile.sql is 1m lines, also has no speshed or JITted frames | ||
agentzh | oh, wait, i need to wipe off .precomp... | ||
forgot that one. | 23:32 | ||
jnthn | MasterDuke: What about if you do the same code with perl6-m ? | ||
agentzh | okay, with .precomp wiped out, make -j8 takes 31.9 sec to compile everything. | 23:33 | |
the same as a single compilation of the root from scratch. | |||
MasterDuke | ah, that's 33% speshed | ||
but was 60ms slower | 23:34 | ||
agentzh | brokenchicken: sorry to hear that. | 23:35 | |
MasterDuke | 1 OSR | ||
agentzh | hmm, seems like i have to give up the rakudo route for now due to the compilation time issue. | 23:36 | |
tried everything already. | |||
will jump directly to my perl 6 dialect compiler targeting luajit which will soon support linking and partial compilation. | 23:37 | ||
my plan was to use rakudo as the intermediate reference impl. | 23:38 | ||
TimToady | jnthn: it appears that dynvar lookup is still around 5% compiling the setting | 23:40 | |
MasterDuke | agentzh: ah right, you work at cloudflare don't you? i've read some really good blog posts from them | ||
agentzh | MasterDuke: i used to. i quit CF 3 months ago. | ||
MasterDuke: now i'm setting up my own company, OpenResty Inc. | |||
MasterDuke | TimToady: if you're interested, here's a --profile-compile of a recent rakudo build, sorted by exclusive time. gist.github.com/MasterDuke17/77230...805c3274b5 | 23:41 | |
agentzh: cool | |||
jnthn | TimToady: $*ACTIONS would be a nice one to be rid of :) | ||
MasterDuke | TimToady: and here's a perf record. gist.github.com/MasterDuke17/aff44...6d1d0ba537 | 23:42 | |
agentzh | MasterDuke: maybe generating a flame graph is more intuitive to find places to optimize? | ||
at least some times. | |||
MasterDuke | TimToady: and here's a list of the names and counts of what find_symbol is called with. gist.github.com/MasterDuke17/a3b00...c4b884946b | 23:43 | |
agentzh | and an inverted flame graph can give you the same and also more info than the exclusive sort. | ||
*exclusive time | |||
MasterDuke | TimToady: and here are the names passed to MATCH (and their counts). gist.github.com/MasterDuke17/ad31d...43e02f67bb | 23:44 | |
agentzh: i'm not a viz/ui person at all. and i think trying to do that in a web browser would still choke on the amount of data a profile of the rakudo build creates | 23:45 | ||
but i agree that a flame graph would be good somehow | 23:46 | ||
TimToady | jnthn: $*WANTEDOUTERBLOCK actually has a bit more time, for fewer calls; a better cache and maybe some interning will help there, but yeah, ACTIONS, LANG, and PRAGMAS shouldn't really be using dynvars in the first place | 23:48 | |
23:51
pyrimidine joined
|
|||
TimToady | MasterDuke: your files crash my ff tab | 23:52 | |
I did see your text version earlier | 23:53 | ||
MasterDuke | TimToady: whoops. and, text version? | 23:55 | |
TimToady | well, I saw something that showed find_symbol was hot | 23:56 | |
MasterDuke | ah, this? irclog.perlgeek.de/moarvm/2017-01-28#i_14005549 | 23:57 | |
jnthn | find_symbol I can probably figure out something to help with | 23:58 | |
TimToady notes that it's probably not a coincidence that find_symbol calls $*WANTEDOUTERBLOCK | |||
jnthn | Oh. | 23:59 | |
o.O | |||
TimToady | which is really only used during the interspersed "second" pass | ||
so there are a lot of not-there's the rest of the time | |||
MasterDuke | a little while ago i tried adding a find_symbol2 that didn't take an array, since the vast majority of the time it's just a single element in the array |