00:14 pyrimidine joined
samcv well what was the switch checking 01:27
every single value of the numbers?
like 0-9?
MasterDuke samcv: gist.github.com/MasterDuke17/e14fe...0a98abdbb9 this is the code i used to create the switch 01:28
samcv MasterDuke, yeah you need to group it by digits 01:39
MasterDuke by digits? 01:40
samcv uhm
m: say "0".ord.base(2); say "9".ord.base(2)
camelia rakudo-moar cfae23: OUTPUT«110000␤111001␤»
samcv maybe i need to think about this a little more 01:42
uhm but if you look by hex 01:43
m: say '0'.base(16).say; '9'.base(16).say
camelia rakudo-moar cfae23: OUTPUT«No such method 'base' for invocant of type 'Str'␤ in block <unit> at <tmp> line 1␤␤»
samcv m: say '0'.ord.base(16).say; '9'.ord.base(16).say
camelia rakudo-moar cfae23: OUTPUT«30␤True␤39␤»
samcv so you want to switch ignoring the last hex digit 01:44
so you can combine all of each set of 0-9 into one 'case'
MasterDuke more like the pre-existing if statements you mean? 01:46
samcv yeah i guess if is more optimized than i thought i guess 01:47
but we can just generate all the if statements maybe
MasterDuke for the Unicode Nd digits? 01:48
samcv yeAH
MasterDuke well, that might be nice, but i'm way more interested in making the non-Unicode case faster 01:49
samcv yes but
MasterDuke that's going to be the 99.99% use case
samcv if we consolidate them together don't we have to do less checks?
and not process each point twice? 01:50
MasterDuke twice?
samcv uhm let me look at the function again, sorry 01:51
also that switch would be very slow because they're not in order
the numbers are totally out of order so it would be slow
MasterDuke well, the ascii digits are first 01:52
samcv yeah but it needs a really weird jump table 01:53
also that table doesn't have everything in it, it's missing a lot 01:55
you are only adding Nd general category
MasterDuke i guess i could extract the ascii digits out into their own switch first
well, isn't that the same as the existing code? 01:56
(the Nd general category thing)
samcv uh
idk
what function is it in MVM again?
MasterDuke github.com/MoarVM/MoarVM/blob/mast...rce.c#L352 02:00
samcv it does look like it checks Nd 02:02
wtf why is it checking that :(
MasterDuke what would it check instead?
samcv and it looks up the property code...
yeah this is gonna be slow
Numeric_Type > 0 02:03
MasterDuke ah, but we don't want to do Nl or No
samcv you want things with numteric values right? 02:04
MasterDuke no, just digits
samcv define digits
like <:Digit> ?
MasterDuke because unlike the highlander, there can be more than one
samcv why general category Nd though 02:05
i mean sure it's the category for numeric digits
you just want only things from 0-9? 02:06
in value?
MasterDuke so the string "12۳4" should give the value 1234
samcv of course
idk what you mean by digit. do you want all unicode cp's that have a value?
only ones from value 0-9? 02:07
MasterDuke but "1Ⅷ" is invalid
timotimo then say "decimal" :)
yoleaux2 27 Jan 2017 13:54Z <samcv> timotimo: so I have added some indexes to names.c for every other codepoint
MasterDuke Ⅷ by itself is valid
samcv yeah. i had no clue wtf you meant lol
MasterDuke timotimo++
timotimo samcv: you wanted me to try to make lookup from codepoint to name work with your changes?
02:08 pyrimidine joined
samcv so you want Numeric_Type=Decimal then 02:08
that is what you want
MasterDuke :shrug:
all i know is .uniprop eq 'Nd' is what i want
samcv you sure 02:09
i really don't think
MasterDuke yeah, that's what TimToady has said
samcv that is what you want. though it may have full overlap
looking up the property value is gonne be faster than getting the property code for general category
and THEN string comparing that it is 'Nd'
gonna be slow
and it'll have to do that every loop
MasterDuke true, but only if there is in fact a Nd char in the string 02:10
02:11 Geth joined
samcv why though. you want decimal digits right? 02:11
MasterDuke and i'm way less concerned about speed in that case. sure it should be fast, but the usual ascii case is of most concern
samcv but it checks them together?
exactly. it has to check that every time and it'll be slow. that's what i'm trying to say
and so by fixing that it will be faster going through that ifelse loop
MasterDuke ah, yes, the switch does
samcv you mean the if else if else? 02:12
MasterDuke the if else will short circuit on ascii chars and not do the slow lookup 02:13
samcv yes
i guess so
timotimo samcv: i didn't have time today and time will be sparse tomorrow as well .. but feel free to leave a few more comments with some extra detail in the backlog
samcv kk 02:14
i did leave comments in backlog?
np tho
timotimo a little bit
but i don't exactly know what i'm supposed to try to make work :)
samcv ok the consume a string function needs to be able to skip a specified number of strings 02:15
and it also needs to be able to center itself and start from the first 0
because i may give it some position that has the first base40 number not be a 0 02:16
and it needs to skip to the 0, and then skip a specified number of 0's further, if i tell it to skip 1
before returning. if that makes sense. i have a table that holds values of every other cp's location
timotimo disappears already
samcv as we had talked about
so right now it just consumes a string and assumes it will be started at the beginning, but i need it to go to the beginning of the next name 02:17
so if it sees 600, 255, 0 < three 'base40' numbers i just made up 02:18
it has to skip to the 0, then grab a string
or however many further down i request
MasterDuke i just tried extracting the ascii values out into their own switch before the larger one. it was a little faster than the combined one, but still slower than the existing code 02:21
samcv what are you using to bench? can i see
MasterDuke perf stat -B ./nqp-m -e 'my int $i := 0; my int $s := 0; while ++$i < 1_000_000 { $s := $s + nqp::radix2(10, "1234567890123456789", 0, 0)[0] }' 02:23
i added a radix2 nqp op that calls a radix2 moar function to make it easier to compare 02:24
one thing that shoud help with speed is getting nqp::radix speshed and JITted (neither of which i know how to do) 02:27
02:27 ggoebel joined
MasterDuke i just have all my changed locally, are you interested enough for me to push them to my moarvm fork on github? 02:28
samcv well i sped it up 50%
by implementing that thing i said
for numbers that are non-ascii
MasterDuke cool 02:30
samcv :)
and can make it even faster when we don't need to call atoi 02:31
and just have the values be integer like i have in my rewrite for UCD 02:32
MasterDuke could it use MVM_unicode_codepoint_get_property_int() instead of MVM_unicode_codepoint_get_property_cstr()? 02:34
samcv well if the property were integer it could 02:42
but atm it's just an array of strings
but yeah that's one of the many things trying to fix in the rewrite 02:46
MasterDuke hm, yeah, just removing the atoi and using _int() instead of _cstr() gives 36 for ۳
02:48 ilbot3 joined
MasterDuke ah, just subtracting 33 makes _int() work, and is faster 03:12
samcv lol subtracting 33 03:46
except they're not in order :P 03:47
goes NaN, -1, 0, 1, 3, 2, 5, 7, 4, 11, 9
err wait the numeric_value might be in order. numeric_value_numerator def isn't 03:48
MasterDuke heh, yeah, just checked. the first couple i checked worked, but it's definitely not correct
samcv also there's 1.5 in between 1 and 2
yep
XD 03:49
MasterDuke, we also use ascii strings for storing the decomposition values 03:51
m: 'ā'.NFD.say
camelia rakudo-moar c6e37e: OUTPUT«NFD:0x<0061 0304>␤»
MasterDuke can they be reordered to work? or is that something the unicode spec defines
samcv idk there's so many things that need fixing tho, i already have it working in my rewrite though 03:52
numeric value's being integer values that is
and not strings
MasterDuke ok, i won't play around with that part anymore
samcv not sure how to make the ascii faster tbh tho
MasterDuke TimToady mentioned python has some very fast code for them that he thought might be worth stealing 03:54
i know almost nothing about python or where to look for that code, but maybe you do
samcv don't know much about python either
you mean for Nd's? or just for atoi in general 03:55
MasterDuke irclog.perlgeek.de/perl6-dev/2017-...i_13970619 03:57
samcv hmm looks like we support fullwidth letters even though unicode doesn't list them as AHex digits 04:41
MasterDuke well, we support more than hex characters (when radix is > 16) 05:24
sleep &
samcv how does that relate to fullwidth letters? 05:25
night 05:26
05:58 pyrimidine joined 07:12 domidumont joined 07:19 domidumont joined 09:19 pyrimidine joined 09:48 Ven joined 10:06 Ven joined 11:36 pyrimidine joined 11:50 Ven joined
MasterDuke samcv: i mean when the radix is >16 we allow letters higher than the hex digits (e.g., radix == 36, a-z are allowed letters/digits/characters, not just the a-f of hex) 12:29
12:35 domidumont joined 12:38 pyrimidine joined 12:55 Ven joined 13:17 MasterDuke joined 13:45 domidumont joined 14:35 pyrimidine joined 14:37 Ven joined 14:42 pyrimidi_ joined 14:46 Ven joined 15:18 pyrimidine joined 16:37 pyrimidine joined 17:23 pyrimidine joined 17:38 Geth joined 17:39 pyrimidine joined 17:55 pyrimidine joined 18:29 pyrimidine joined 18:34 Ven joined
MasterDuke how do you make an nqp:: op speshable/speshed/(this word has lost all meaning to me)? 18:40
jnthn That's a property of MoarVM ops rather than nqp:: ops. 18:44
And beyond that, it depends what you want to do with it, exactly
src/spesh/ is where stuff happens, src/spesh/graph.h is a good point to start reading 18:45
MasterDuke well, MVM_radix is just interpreted, how would we get it speshed
jnthn Which op is it? 18:46
I'm trying to figure out what you're wanting to do :) 18:47
MasterDuke just radix i believe
jnthn It may be that we don't JIT it
Though it's a fairly large C function so the best we're going to machine, I suspect, is to JIT it into a C function call. That'd still prevent us having to interpret frames with it in. 18:48
MasterDuke i thought things had to be speshable before they could be JITted
jnthn src/jit/graph.c is the place to look for that
Not really.
spesh = specialization, it mostly means producing devirtualized versions of code
There are various ops where there's no interesting things to do in terms of specialization, but we can still have calls to their implementation JITted into machine code 18:49
MasterDuke same with the coerce_* ops
jnthn case MVM_OP_radix_I: return MVM_bigint_radix; 18:50
That's already in src/jit/graph.c
As are
case MVM_OP_coerce_ni:
case MVM_OP_coerce_in:
And a bunch more
MasterDuke but not regular radix
jnthn Ah, indeed 18:51
That just is a function call to MVM_radix
So should be quite easy to add the required bits to src/jit/graph.c 18:52
Shouldn't need edits beyond to that one file
And should look like radix_I does
Though the exact number of arguments may differ
MasterDuke cool. i'll give that a try
timotimo if the argument to radix is statically (i.e. at spesh time) known to be a constant number, we could dispatch to an optimized version that handles that exact radix, i.e. base 10 19:04
MasterDuke hm, a profile of nqp::radix_I shows 100% interpreted frames and no speshed or JITted
timotimo frames not getting speshed can have a whole host of reasons 19:05
among other things not being run often enough, but that's unlikely if you're building a benchmark that calls it often
MasterDuke this was my test: my int $i := 0; my $s := 0; my $t := nqp::knowhow().new_type(:name("TestBigInt"), :repr("P6bigint")); while ++$i < 1_000_000 { $s := $s + nqp::radix_I(10, "12345678", 0, 0, $t)[0] }
timotimo jitting, on the other hand, can be prevented by a single op not being implemented in the jit 19:06
you can figure out if a frame gets prevented like that by outputting a jitlog (i.e. MVM_JIT_LOG=foo.txt) and searching for "BAIL:" 19:07
MasterDuke there were a couple: `BAIL: op <throwpayloadlex>` 19:08
copying the radix_I case in src/jit/graph.c and adopting it for radix compiles. my benchmark still runs, but nothing seems to have changed 19:15
timotimo i might have tried to build that in the past
how many cases did you copy?
you need at least three, i think
MasterDuke ah, right 19:16
fixed the other place, still the same 19:17
i gotta afk for a while, but i'll play around some more later
timotimo well, if the frame with radix in it already doesn't get speshed, it won't have a chance to get jitted in the first place 19:18
19:22 pyrimidine joined 19:47 pyrimidine joined
MasterDuke timotimo: it looks like pretty simple/straightforward code to me (my test code that is). any reason it wouldn't get speshed? 21:02
timotimo *shrug*, i usually don't know myself when i have my own code 21:04
21:07 domidumont joined
MasterDuke jnthn: got any ideas about why i see radix_I (and my newly added to src/jit/graph.c radix) just as interpreted, not speshed or JITted in the test code i pasted above? 21:20
21:39 MasterDuke joined 21:44 pyrimidine joined
jnthn MasterDuke: I'd expect OSR to be happening there. Not clear why it isn't. 22:21
MasterDuke a profile definitely says no OSR 22:22
i also ran the same code with my system nqp (so not using my modified moarvm at all), same results 22:25
jnthn *nod* 22:26
Yeah, I get that here too
22:48 agentzh joined
agentzh hi guys 22:48
jnthn o/ 22:49
agentzh i've run into a performance issue in the rakudo compilation time where the --profile-compile report shows that get() at SETTING::src/core/IO/Handle.pm:128 takes most of the Exclusive Time (88%)
jnthn o.O
agentzh anyone willing to do me a favor to optimize that thing away? :)
jnthn What's the call count?
agentzh jnthn: is it the Entries column in the routines report?
jnthn Yeah
agentzh 557
jnthn What on earth is it reading... 22:50
agentzh it's a pretty large perl 6 project
jnthn Hmm
I wonder if it's pre-comp database files
agentzh 6K LOC excluding empty lines and comment lines.
MasterDuke FIWI, i just perf recorded reading a 1m line file
43.92% moar libmoar.so [.] MVM_string_utf8_decodestream
23.08% moar libmoar.so [.] find_separator.isra.6
agentzh or 8K LOC if including empty lines and comment lines.
MasterDuke: is that an artificial p6 file? 22:51
mine is a real thing.
ported from a working perl 5 project.
jnthn Oh, one other thing 22:52
That's blocking I/O
MasterDuke it wasn't source, i was just doing an nqp::readlinefh through it
jnthn And it might be reading from a pipe
I think precomp spawns subprocesses
And reads from stdout to get info about the results of that process 22:53
agentzh mine is 46 .pm6 files for 46 compilation units on the file system.
jnthn So it's possible that what you're actually seeing there is we spend 88% of time waiting on a spawned process to do its thing
agentzh the report is generated from a partial compilation.
otherwise the resulting profiling report is too big to render in my web browser. 22:54
the whole things takes 30+ sec to compile on my side.
jnthn Yes, but does that trigger compilation?
uh, to be more clear
agentzh while the report i'm referring to is just a 6 ~ 13 sec partial compilation.
jnthn Does it trigger compilation of modules?
agentzh already slow enough for me to cry :)
jnthn :( 22:55
agentzh jnthn: it triggers a few dependent modules to recompile.
even though those modules do not change at all.
nor is the API of the edited file.
MasterDuke agentzh: you're on linux?
agentzh MasterDuke: yes.
MasterDuke can you run `perf record -g --call-graph dwarf <whatever command you run to compile here>`? 22:56
jnthn Could also look at the RAKUDO_MODULES_DEBUG (iirc) output to see what exactly it's doing
agentzh assuming that the profiling report is correct, the IO::Handle::get() thing looks like a very low hanging fruit?
unfortunately i don't have the skillset to optimize that thing myself. so i'm asking for help here :)
it's really a showstopper for me.
imagine a 6 ~ 13 sec delay everytime i make a single edit.
feeling like hacking on a machine sitting on the moon.
*grin* 22:57
MasterDuke: trying
MasterDuke and then when that's done, `perf report --call-graph=none --no-children`
22:58 zakharyas joined
agentzh [ perf record: Woken up 1949 times to write data ] [ perf record: Captured and wrote 487.745 MB perf.data (60689 samples) ] 22:58
running the 2nd cmd. 22:59
gist.github.com/agentzh/c760f72509...380ffae504
MasterDuke huh 23:00
agentzh need more?
jnthn agentzh: It triggers recompilation of modules even if nothing in them or their dependencies changed? 23:01
If so that sounds decidedly odd
agentzh jnthn: it triggers recompilation of modules depending directly and indirectly on the edited module. 23:02
using today's rakduo git nom.
jnthn That's right.
agentzh it would be great if it can check if the API changes in the dependency module.
jnthn hehehe
agentzh since i seldom or never change the module API.
jnthn Oh, if only Perl 6 was such a simple language that was reasonable to actually do :) 23:03
agentzh solely compiling a single CU is fast enough to bear.
but i have to run the tests.
it's all about test driven dev :)
jnthn *nod*
I think this really boils down to "the compiler needs to be faster"
Which I totally agree with
agentzh the 7sec ~ 13sec delay is driving me nuts :)
jnthn I managed to know 20% off CORE.setting, which is a huge compilation unit, a couple of weeks back. TimToady++ is working on improving parse time. 23:04
agentzh good to see TimToady is still hacking on rakudo himself :)
MasterDuke: anything useful found in my report? 23:05
jnthn I think the .get thing is a false lead, though. I expect it's blocking in .get waiting to read the results of the precompilation process that it spawned, and all the work is in the subprocess.
So much as I'd love an easy win...I fear there ain't one here. :(
agentzh the profiling is measuring wallclocks instead of CPU ticks?
jnthn Yes
agentzh jnthn: i see. 23:06
TimToady I wonder if we could treat 'need' as less of a dep than 'use'...
agentzh TimToady: i was actually wondering if need can help :)
TimToady it does restrict you to the OO interface, to the first approximation
agentzh TimToady: that would be sweet :) 23:07
TimToady but I doubt the precomp treats it as less likely to influence the downstream parse
agentzh is hoping p6 comes with something like the C header files. 23:08
MasterDuke agentzh: i don't think so. there doesn't appear to be file reading moarvm functions there, so i'll defer to jnthn's explanation
agentzh MasterDuke: got it. 23:09
merging all the CUs into a single p6 file helps a bit.
jnthn If it only helps a bit, then that pretty much incriminates compilation time...
TimToady that's essentially what we do with compiling the setting, where we deal with a 60-second turnaround or so 23:10
so we have plenty of motivation to get the compiler faster our own selves :)
jnthn Yeah, that's the one I knocked the 20% off recently... :)
Now we just need to find another dozen ways to do that ;) 23:11
TimToady well, most of the ways I know are likelier to knock 5% or 10% off, but still
jnthn *nod*
TimToady it's about time for me to test how our dynvar overhead is again, too
jnthn It'd be kinda nice if we could set of parallel precompilation of dependencies
Trouble is, a `use` can switch out the language 23:12
TimToady we keep adding them, and the dynvar cache was already overstressed
jnthn Meaning it can cause the `use` after it to compile differently
TimToady we could say that 'need' promised not to depend on anything OO in the subsequent parse
jnthn Guess that doesn't prevent us from speculating though
After all, MoarVM speculates happily all over the place and just deopts if it guessed wrong :) 23:13
.oO( "just" deopts... )
TimToady we could also try to get a better handle on which modules are actively trying to modify the parse of their users 23:15
jnthn There's also that
TimToady agentzh: are you defining any of your own operators? That's known to slow parsing way down at the moment.
agentzh with all the CUs merged into a single .p6 file, it now always takes 7.2 sec. 23:16
i'm amazed to see it's as fast as a typical partial compilation.
jnthn It's kinda silly in 2016 to be doing stuff that prevents us from parallelizing compilation though :)
agentzh when always compiling everything in the project.
jnthn Uh, in 2017 too
agentzh TimToady: nope. the only thing i'm redefining is .Str(), which does not count as my own operators, right? 23:17
jnthn Nope
agentzh jnthn: yeah, my box has 8 logical CPU cores :) 23:18
jnthn: i also tried to do parallel recompilation myself via a makefile.
jnthn: but oddly enough it does not make things faster.
jnthn: the module later compiled still recompile its dependencies even though those deps are just manually recompiled right before. 23:19
TimToady that seems a bit like a bug
agentzh TimToady: i tried both perl6 /path/to/file.pm6 and perl6 -c /path/to/file.pm6 in my makefile commands.
the whole make -j8 time is still 30+ sec. 23:20
like before without -j8
TimToady something to ask nine about over in #perl6-dev when he's awake
agentzh seems like rakudo does not test timestamps correctly.
brokenchicken IIRC we don't use timestamps because they're not precise enough. 23:21
jnthn afaik, it doesn't precomp entrypoints
Only dependencies
So you might have had more luck with perl6 -e 'use Module::To::Compile' 23:22
agentzh oh, good to know 23:23
will try.
okay, i tried again, even without perl6 -e 'use xxx', make -j8 seems to be working now. 23:31
maybe i messed thing up the last time i tried make -j8.
now it takes 6.8s to compile everything.
MasterDuke jnthn: nqp-m -e 'my $f := nqp::open("small_compile.sql", "r");my str $l; while $l := nqp::readlinefh($f) { }', where small_compile.sql is 1m lines, also has no speshed or JITted frames
agentzh oh, wait, i need to wipe off .precomp...
forgot that one. 23:32
jnthn MasterDuke: What about if you do the same code with perl6-m ?
agentzh okay, with .precomp wiped out, make -j8 takes 31.9 sec to compile everything. 23:33
the same as a single compilation of the root from scratch.
MasterDuke ah, that's 33% speshed
but was 60ms slower 23:34
agentzh brokenchicken: sorry to hear that. 23:35
MasterDuke 1 OSR
agentzh hmm, seems like i have to give up the rakudo route for now due to the compilation time issue. 23:36
tried everything already.
will jump directly to my perl 6 dialect compiler targeting luajit which will soon support linking and partial compilation. 23:37
my plan was to use rakudo as the intermediate reference impl. 23:38
TimToady jnthn: it appears that dynvar lookup is still around 5% compiling the setting 23:40
MasterDuke agentzh: ah right, you work at cloudflare don't you? i've read some really good blog posts from them
agentzh MasterDuke: i used to. i quit CF 3 months ago.
MasterDuke: now i'm setting up my own company, OpenResty Inc.
MasterDuke TimToady: if you're interested, here's a --profile-compile of a recent rakudo build, sorted by exclusive time. gist.github.com/MasterDuke17/77230...805c3274b5 23:41
agentzh: cool
jnthn TimToady: $*ACTIONS would be a nice one to be rid of :)
MasterDuke TimToady: and here's a perf record. gist.github.com/MasterDuke17/aff44...6d1d0ba537 23:42
agentzh MasterDuke: maybe generating a flame graph is more intuitive to find places to optimize?
at least some times.
MasterDuke TimToady: and here's a list of the names and counts of what find_symbol is called with. gist.github.com/MasterDuke17/a3b00...c4b884946b 23:43
agentzh and an inverted flame graph can give you the same and also more info than the exclusive sort.
*exclusive time
MasterDuke TimToady: and here are the names passed to MATCH (and their counts). gist.github.com/MasterDuke17/ad31d...43e02f67bb 23:44
agentzh: i'm not a viz/ui person at all. and i think trying to do that in a web browser would still choke on the amount of data a profile of the rakudo build creates 23:45
but i agree that a flame graph would be good somehow 23:46
TimToady jnthn: $*WANTEDOUTERBLOCK actually has a bit more time, for fewer calls; a better cache and maybe some interning will help there, but yeah, ACTIONS, LANG, and PRAGMAS shouldn't really be using dynvars in the first place 23:48
23:51 pyrimidine joined
TimToady MasterDuke: your files crash my ff tab 23:52
I did see your text version earlier 23:53
MasterDuke TimToady: whoops. and, text version? 23:55