samcv | ok i think i have it working where we can use a faster hex2decimal number thing | 01:43 | |
because what it does to get the decomposition characters, it reads a string with hex numbers delimited by spaces | |||
and strtol has to get the number, then store back as a string in the original pointer. so i wrote one where we just keep it as a char ** instead of char * and just move the char** pointer down a bit | 01:44 | ||
have to profile now | |||
but should take trivial processing power | |||
and when it sees a space (on non ascii) it returns what it's seen so far, and sets the pointer to one after this non hexchar | 01:45 | ||
and passes the NFD-round trip spectest :) | |||
hmm i'm still seeing calls to strtol | 01:50 | ||
ah it was the atoi function. now uses 12% less cpu :) | 01:59 | ||
down from 14.5% to 2.28 | |||
from 6.5seconds slurping this 200MB file to 5.7 | 02:01 | ||
02:48
ilbot3 joined
|
|||
samcv | yeah | 02:59 | |
timotimo | so this is utf8, i wonder how utf8-c8 compares when the content is 100% valid utf8 | ||
samcv | well. the same | 03:00 | |
timotimo | you'd certainly hope so | ||
samcv | since they both call it when composing graphemes | ||
timotimo | i meant in general, though | ||
samcv | ah | ||
you mean between utf8-c8 and normal utf8? | |||
timotimo | yes | ||
and latin1 is probably a whole lot faster because it just YOLOs all those bytes | 03:01 | ||
samcv | weird. www.google.com/trends/explore?date...l%206,perl | 03:02 | |
oh can more easily see trend if you look since 2004 | 03:03 | ||
www.google.com/trends/explore?date...l%206,perl | |||
timotimo | oof | ||
mst | samcv: blah, doesn't look any better if I correct 'perl 6' to 'perl6' either | 03:15 | |
samcv | yeah | 03:33 | |
www.google.com/trends/explore?q=perl%206 | 03:34 | ||
obviously we just need to release 6.d :P | |||
timotimo | samcv: would you be interested in doing the 200MB file with utf8 vs utf8-c8? | ||
samcv | how do I use utf8-c8? | ||
from nqp/perl6. whichever | |||
specify the encoding or something? | |||
timotimo | just :enc<utf8-c8> | ||
samcv | (in perl 6) kkk | ||
col | |||
* kk *cool | 03:35 | ||
utf8 5.7s, utf8-c8 7.8 (both with patch applied) | 03:37 | ||
timotimo | interesting, so it's a noticable impact | ||
samcv | yeah | ||
with the patch and plain utf-8 getting 40% (exclusive) cpu usage by MVM_string_utf8_decodestream and 17.68% by the codepoint normalization function (inclusive) | 03:41 | ||
but we have 10% happening in MVM_unicode_codepoint_get_property_cstr | 03:43 | ||
so i can shave another 10% off the cpu | |||
timotimo | nice | 03:44 | |
samcv | off the 14% (12% speedup using faster atoi + 2% faster by inlining). and the 2% time going to uh | ||
the fast_atoi should go away too as i make def's and check the property value as integer | |||
timotimo, maybe we can find some library that uses SIMD to do utf8-decode | 05:02 | ||
that would be really fast | |||
i think arm has SIMD now too? | |||
apparently even SSE4.2 has a string comparison function. interesting | 05:06 | ||
05:47
synopsebot6 joined
05:49
synopsebot6 joined
05:50
synopsebot6 joined
05:54
synopsebot6 joined
05:55
synopsebot6 joined
06:00
synopsebot6 joined
06:12
mst joined
07:15
domidumont joined
07:20
domidumont joined
07:40
FROGGS joined
|
|||
nine | samcv: couldn't you combine strlen and fast_atoi into a single function? A fast_max_3_digit_atoi of sorts? | 08:13 | |
samcv | for what? | ||
for ccc? | |||
why do we check the length of the string anyway. do you know? | 08:14 | ||
i think it would return like `ccc3` or `ccc51` but those are all more than 3 characters | |||
er wait. no it returns just the numbers | 08:15 | ||
well i guess it returns like. numbers or whatever, but they're stored as strings. will have to roll that into my ucd2c.pl that i generate the brackets with | 08:16 | ||
that generates integer properties only with no string value | |||
yeah i will make it integer instead of using the string thing soonish. would be nice to change it for now just to get that performance improvement | 08:18 | ||
i also wrote a much faster hex string function, so it can check decompositions faster, instead of having to parse the first hex in the string "DD38 EAT" then copy the rest of the string to the same pointer location | 08:19 | ||
but that will not have as much impect unless we're having to decompose a lot. the ccc is checked for any codepoint giong through MVM_unicode_normalizer_process_codepoint_full | 08:20 | ||
dalek | arVM: d0e297f | samcv++ | src/strings/normalize.c: Use much faster atoi function. 14% less CPU use when slurping a Unicode file This should greatly speed up any calls made to ccc which is used extensively in MVM_unicode_normalizer_process_codepoint_full and canonical_composition. |
||
arVM: 36f3385 | niner++ | src/strings/normalize.c: Merge pull request #483 from samcv/fast_atoi Use much faster atoi function. 14% less CPU use when slurping a Unicode file |
|||
samcv | oh also in canonical_sort and canonical_composition is used extensively too | ||
samcv | and RE the ccc function, I was thinking about rewriting it to rely less on ccc, maybe even throw it out completely, so we only have to look up grapheme_cluster_break except in rare cases | 08:21 | |
thanks for the merge :) | 08:22 | ||
nine | samcv: just picking the smallest of nits really. I'd appreciate having operators surrounded by spaces: value * 10 as especially in C my brain immediately interprets an asterisk running into a name as pointer. | ||
samcv | ah | ||
yeah that doesn't look that nice | 08:23 | ||
08:27
brrt joined
|
|||
brrt | hey #moarvm | 08:30 | |
samcv | hey brrt :) | ||
brrt | i have a christmas present, and it's either late or early depending on your background | ||
dalek | arVM/even-moar-jit: ab07774 | brrt++ | / (3 files): linear_scan can now works for NQP compilation Fixed the seemingly last few bugs that prevented linear_scan from working correctly. NQP now compiles and works, and I can start implementing spilling. NB: NQP commit a6c267fcd11cc3c69ec579345c8db959de78af1c, because it's not fully up to date with master. |
08:31 | |
brrt | this means that as far as I know (and pending a merge with master) the new linear scan algorithms is feature-equivalent with the old inline register allocator | 08:32 | |
08:35
zakharyas joined
|
|||
nine | Yeah! | 08:40 | |
brrt: so how close are we to the point where a horde of coders can chime in and add JIT ops? | 08:41 | ||
brrt | actually, considerably closer | ||
one thing that should theorethically be handled now, is conditional constructs | 08:42 | ||
e.g. (if (foo?) x y) resolve to either x and y | |||
if you don't yet see how that is a register allocation problem, well, i didn't either | 08:43 | ||
but the first priority is to have spilling implemented | 08:44 | ||
it's a simple algorithm, really | |||
(the spilling) | |||
but not triggered very often | |||
one of the things that might be a bit tricky is handling conflicting register requirements | 08:45 | ||
dalek | Heuristic branch merge: pushed 95 commits to MoarVM/even-moar-jit by bdw | 09:02 | |
09:08
brrt joined
09:18
brrt joined
|
|||
nine | Ok, I'll bite. How are conditionals a register allocation problem especially? | 09:30 | |
brrt | they're not, but if you maintain your IR in SSA form the register allocator needs to have PHI resolution built in :-P | 09:31 | |
so, in my case, handling conditionals was a register allocator problem | 09:32 | ||
because at the end of the conditional, both value paths must store their value in the same location | 09:33 | ||
samcv | .tell jnthn so i found the code for qt's ascii and utf-8 decoder code.woboq.org/qt5/qtbase/src/core...c.cpp.html | 09:35 | |
damn yoleux is not here | |||
09:36
brrt joined
09:43
brrt joined
|
|||
brrt | there ought to be yoleaux2? | 09:51 | |
hmmm | |||
isn't | |||
samcv | :( | 09:53 | |
jnthn | Lazy bots. | 09:57 | |
morning o/ | |||
samcv | morning jnthn | 09:58 | |
oh i should probably close like a bunch of unicode tickets i opened | |||
since i fixed them | |||
nwc10 | good UGT, #moarvm | 09:59 | |
brrt | good *, nwc10, jnthn | 10:03 | |
samcv | good *, every*! | 10:08 | |
jnthn | :) | ||
jnthn tries to catch up on backlog here | |||
samcv++ # speedup | 10:11 | ||
brrt++ # linear allocator \o/ | |||
re "we first put them into codepoints, then into graphemes" from #perl6-dev - since the normalizer is a streaming thing we often only ever have one codepoint sat in the intermediate buffer. | 10:13 | ||
(Which is why there's a fast-path for that case, iirc) | |||
samcv | yep | ||
was that to me. | 10:14 | ||
samcv tries to check what i said | |||
ah ok i see | |||
jnthn | fwiw, I'm fine with us using SIMD ops and so forth when available for faster utf-8 decoding provided the detection of when we have them is robust | 10:16 | |
arnsholt | samcv: I'm a fan of GH's "closes #XYZ" handling =) | 10:17 | |
samcv | yeah that's nice | ||
arnsholt | Saves that additional step (and saves you from forgetting it) | ||
samcv | oh jnthn ok so this hex function i made can replace the strtol we have github.com/samcv/MoarVM/commit/504...1bc1b26907 | ||
though the commit nine already merged accouts for most of the calls to strtol (called by atoi) | 10:18 | ||
jnthn | Yeah, we ccc a lot more than we decmop :) | ||
samcv | i think i said it here already. instead of using strtol to put what remains of the string back in the original strings position | ||
jnthn | *decomp | ||
samcv | yeah :) | ||
we use char** and set a further down pointer location and use that. this implementation works, haven't like cleaned it up obviously but wanted to know if it seems fine to do it that way | 10:19 | ||
jnthn | [0 ... 255] = -1, | ||
jnthn wonders how portable between compilers that syntax is | |||
It looks too nice for MSVC :P | 10:20 | ||
But yeah, overall seems reasonable. | 10:21 | ||
// bit aligned access into this table is considerably ... | |||
Did you mean word aligned? | |||
Or 4-byte aligned | 10:22 | ||
samcv | i did not write that part of it ha | ||
jnthn | :P | ||
hexdec2 wants marking static too so we don't leak it | |||
samcv | it seemed ambiguous to me too | ||
jnthn | I think it's a thinko | ||
It's probably true if you replace "bit" with the appropraite unit | 10:23 | ||
Well, may be true | |||
samcv | technically isn't everything bit alligned? | ||
hahaha | |||
jnthn | Yes! :D | ||
samcv | (one would hope) | ||
jnthn | I mean, certainly reading, say, a 16-bit or 32-bit value when it's not aligned to such a boundary will be slower | 10:24 | |
About reading bytes, not so sure | |||
samcv | also the -1 also easily makes it break for unknown characters as well jnthn | 10:25 | |
like when it sees a space it breaks and returns it up to that point. than it can resume after the space on next invocation | |||
jnthn | Yes, the -1 is a nice trick :) | 10:26 | |
Gotta tend to something else, back soon | |||
10:41
brrt joined
|
|||
samcv | not sure why this breaks a few NFKC tests github.com/samcv/MoarVM/commit/36b...f81cb1ea91 | 11:00 | |
the codepoints that fail both respond with Y/N or 1/0, i checked so hmm | 11:05 | ||
11:13
Dunearhp joined,
Ven joined
11:25
brrt joined
11:29
Ven joined
14:37
yoleaux2 joined
15:10
lizmat_ joined
15:13
Dunearhp_ joined
16:02
dogbert17 joined
|
|||
japhb | jnthn: After fighting with some ecosystem problems last night (JSON::Tiny was failing lots of tests), I was able to confirm your callsame fixes for the golfed version indeed fixed the original as well -- THANK YOU again! | 16:17 | |
jnthn | Hurrah :) | 16:28 | |
Warnings gone too? | |||
16:34
synopsebot6 joined
|
|||
japhb | jnthn: I didn't run it a pile of times to be sure, but I didn't see any in the limited time I had this morning. | 16:45 | |
jnthn | OK, sounds good. It felt likely to me that they had the same root cause. | 16:49 | |
(And I never saw them again when testing the fix also) | 16:50 | ||
17:02
domidumont joined
17:08
Ven joined
17:28
Ven joined
17:47
Ven joined
18:07
Ven joined
18:27
Ven joined
|
|||
timotimo | what the hell is [0 ... 255] = -1 syntax, i have never heard of that before | 18:28 | |
notviki | m: d @ = -1 | 18:31 | |
camelia | rakudo-moar 40d7de: OUTPUTĀ«===SORRY!=== Error while compiling <tmp>ā¤Undeclared routine:ā¤ d used at line 1ā¤ā¤Ā» | ||
notviki | m: dd @ = -1 | ||
camelia | rakudo-moar 40d7de: OUTPUTĀ«Array @ = [-1]ā¤Ā» | ||
timotimo | i'm talking about C syntax, though | 18:32 | |
+static const long hextable[] = { | |||
+ [0 ... 255] = -1, // bit aligned access into this table is considerably | |||
notviki | aw :( | ||
geekosaur | go home C, you're drunk | 18:36 | |
dalek | arVM: c03a35f | jnthn++ | src/profiler/heapsnapshot.c: Fix typo. |
18:37 | |
arVM: d2fb3c3 | jnthn++ | src/profiler/heapsnapshot.c: Fix heap snapshot crash on eventloop thread. It has no cur_frame, since it's not purposed to execute code. |
|||
timotimo | so ... is that actually going to work in MSVC, too? | 18:38 | |
jnthn | I dunno, but I'm not betting on it :) | 18:46 | |
18:50
geekosaur joined
19:08
Ven joined
19:23
Ven joined
19:34
diakopter joined
19:36
masak_ joined,
domidumont joined,
btyler joined,
samcv joined
19:44
Ven joined
|
|||
jnthn | *sigh* So I wonder what exactly I was thinking when I introduced the getrefref ops | 19:59 | |
*getregref | 20:00 | ||
They mean the lifetime of ->work isn't promised to be over at frame exit | |||
lizmat | it's not too late to change them? | 20:01 | |
jnthn | They're used for QAST localref support | ||
Rakudo doesn't even mention it | |||
NQP only tests and implements it but doesn't sue it | |||
*use it | |||
The memory leak I just found is rooted in this. | |||
But also needing to support it is an utter pain | |||
For all backends | |||
I guess my line of thought may have been "it'd be nice to lower more lexicals to locals" | 20:02 | ||
But...it'll be horrible | |||
timotimo | that's a thing i hadn't considered when i last looked at that :S | 20:03 | |
20:03
Ven joined
|
|||
timotimo | so ... we can totally still throw that out | 20:03 | |
jnthn | Indeed. | ||
I can't think of any other reson we can't just kill ->work always upon frame LEAVE | 20:04 | ||
jnthn should think about dinner and some rest from the keyboard | |||
timotimo | right, that makes sense | ||
jnthn | Will have a look further tomorrow | ||
Guess they went in during the intense year of 2016 | 20:05 | ||
*2015 | |||
timotimo | so, we were leaking stuff because frames were being kept alive and their -> work kept big things alive? | ||
jnthn | Yup | ||
timotimo | and we could just throw out -> work much earlier | ||
jnthn | And also because anything with a ->work stays in inter-gen | ||
timotimo | got it | ||
jnthn | 'cus it assumes we're still in it | ||
So yeah | |||
moar-ha was right | |||
timotimo | a-ha! | 20:06 | |
jnthn | Righty, food... :) | ||
My wife just came home so I can't forget to eat dinner for any longer. :) | |||
nwc10 | I was wondering how you'd managed to get away with it for so long | ||
timotimo | have a good one! | 20:07 | |
i got up at like 7pm today :S | |||
notviki | what's inter-gen? | 20:10 | |
timotimo | you know how we have minor and major GC collections? | 20:11 | |
notviki | sorta, yeah | ||
timotimo | OK | ||
so imagine we create an object (in the nursery) and add it to some big list (that already lives in the old generation) | |||
notviki | ok | 20:12 | |
timotimo | and every other pointer in the nursery that points to this object no longer lives, for example we've already left the routine we created that object in or something | ||
notviki | ok | ||
timotimo | now if we hit a minor collection, and we only follow stuff in the nursery | ||
we'll not reach that object, and thus consider it dead | 20:13 | ||
but that's wrong, as it is actually kept alive by the object in the old generation | |||
and also: the object in the old generation has a pointer to that object which needs to be updated | |||
that's where the inter-generational roots come in | |||
whenever an object that lives in the old generation gets something assigned to it that lives in the nursery, that pointer (that's inside the old-gen object) gets added to the set of "inter-generational roots" | 20:14 | ||
notviki | Thanks. | ||
So all the "root" stuff I've seen in commits is about GC? | |||
timotimo | yup | 20:15 | |
notviki | Thanks. | ||
timotimo | the most difficult part about this roots stuff is when C code comes in | ||
because then there's pointers to GC-managed objects on the C stack, and whenever GC happens the underlying objects can change their position in memory | |||
if the pointers on the C stack aren't "rooted", the GC won't know to update those pointers | 20:16 | ||
(and also things might be considered dead if only the C stack was holding on to them) | |||
when we have nqp or p6 code that works with objects, we're unable to forget to root stuff, because the whole stack has roots set up for it automatically | |||
other GC things for C languages tend to scan the stack for "pointers that could point at GC-managed objects", but that has to be "conservative", i.e. it may consider some things pointers that aren't actually pointers; for that reason those GCs aren't allowed to be moving GCs, because they could accidentally change values on the stack that aren't actually pointers | 20:17 | ||
notviki wonders how long it'd take to learn all of this stuff :( | 20:19 | ||
timotimo | hah | ||
there's a book called "The GC Handbook" or something, that's considered to be The Definitive Book On GCs | 20:20 | ||
notviki | I'll give it a read then | 20:22 | |
20:23
Ven joined
|
|||
timotimo | i haven't read it yet | 20:25 | |
if i understand correctly, it covers every imaginable technique, whereas moar only uses a few (because many are mutually exclusive) | |||
did you see "the secret life of garbage collectors"? | 20:26 | ||
notviki | Do you have a CS degree? | ||
timotimo | i do not | ||
notviki | That gives me hope :} | ||
(to ever learn this stuff, I mean) | |||
geekosaur doesn't either, and has a reasonable grasp of practical GC | |||
timotimo | i did start a CS degree, though | 20:27 | |
geekosaur | mostly obtained operationally (i.e. by diving expeditions in various garbage collectors, and reading discussions thereof) | ||
that said, I, uh, learn somewhat unconventionally :) | |||
20:53
Ven joined
21:23
Ven joined
21:36
Ven_ joined
|
|||
dalek | arVM: 592e536 | samcv++ | Configure.pl: Use /usr/bin/env perl for ./Configure.pl We already do this for NQP and Rakudo now, and will make it be more compatible on Unixish systems. Windows systems are unaffected by this change. |
21:41 | |
arVM: 7361f46 | niner++ | Configure.pl: Merge pull request #484 from samcv/#! Use /usr/bin/env perl for ./Configure.pl |
|||
arVM: 40ee0e8 | samcv++ | tools/ucd2c.pl: Generate Decomposition_Type Unicode prop. #define's in ucd2c.pl |
22:51 | ||
arVM: d119127 | samcv++ | / (4 files): Decompose_Type, Unicode: use int lookup instead of str for better perf. Also generate Canonical_Combining_Class #define's in ucd2c.pl. This will reduce our need to compare strings. |
|||
arVM: 69e2a24 | lizmat++ | / (4 files): Merge pull request #485 from samcv/get_property_int_a 1994c6e | jnthn++ | src/strings/normalize.c: Remove left-behind comment. |
|||
jnthn | samcv: That commit has commented out code left-behind: github.com/MoarVM/MoarVM/commit/d1...bc21b3R285 | 23:14 | |
lizmat: Pity you didn't catch that in code review. If merging MoarVM PRs, please read them very carefully. :-) | |||
lizmat | jnthn: ok, I figured that samcv could be working on this while you were aslpee | ||
*asleep | |||
will refrain from doing these types of MoarVM PR's in the future | 23:15 | ||
timotimo | asloop* | 23:16 | |
japhb | .oO( Sloop, there it is ... ) |
23:17 | |
jnthn | In this case it's harmless in terms of effects on users, but in general the cost of hunting bugs that slip into Moar tends to be quite high compared to in NQP/Rakudo. | ||
Including my own bugs. I sometimes wonder if we'd move slower, but smoother if we PR'd/reviewed everything non-trivial here. | 23:18 | ||
(And then waste less time bug-hunting, so actually move at least as fast anyway in the long run, but with less annoyance downstream.) | 23:19 | ||
jnthn | righty, sleep...hopefully... | 23:52 | |
o/ |