01:48 ilbot3 joined
samcv good * 04:11
so it looks like the max number of codepoints => 1 or more collation keys is 3 04:18
so we are dealing 3 max cp there. and the max number of collation keys which match a certain codepoint or sequence is 18 04:19
though the very vast majority only have 1-3 04:20
array => [(9060 32 26) (9116 32 26) (9157 32 26) (521 32 26) (8971 32 26) (9116 32 26) (9116 32 26) (9137 32 26) (521 32 26) (9070 32 26) (9116 32 26) (9158 32 26) (9137 32 26) (521 32 26) (9143 32 26) (9049 32 26) (9116 32 26) (9123 32 26)], codepoints => [65018], comment => ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
one codepoint.... to 18 collation keys...
the array up there is what i'm calling collation keys (primary, secondary, tertiary factors for each key)
so we will need to check for the next codepoint ahead in only certain cases. and rare cases 2 ahead. and i think that should be reasonable from a speed standpoint 04:21
and having a ton of collation keys for a single codepoint is not really going to impact any of the other simple codepoints 04:22
that sigature must be pretty long to need 18 collation keys though heh 04:23
u: ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM 04:24
unicodable6 samcv, U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM [Lo] (ļ·ŗ)
samcv totally unreadable with monospace :P
it's a ligature for three words. wow 04:25
geekosaur not much better with a proportioinal font. I have a dreadful suspicion someone tried to do calligraphy in a computer font... 04:30
samcv well. it's three words 04:31
it means 'may peace be upon him' 04:32
well it's three words in arabic. in english it's 5 :)
and the words are on top of each other
now with the UCA you can collate an equivilant of 18 characters in arabic together with the actual 18 characters :) 04:33
hahaha
oh and importantly. there's only 62 "special" starter codepoints. these are codepoints that have a possiblily of having to look at the next codepoint 04:37
to see if it matches one of the special collation keys for series of cp's
so that's not too many
04:41 geekosaur joined
samcv though certain codepoints have 48 possibilities. so have 48 possiblities going off the first codepoint. eek 04:42
06:13 domidumont joined 06:18 domidumont joined 07:04 domidumont joined 07:20 domidumont joined 08:54 domidumont joined
nine Trying to profile zef, I get a segfault caused by MVM_repr_get_int being called with a NULL object from JITed code: gist.github.com/niner/d5d2e6cf104a...26be972d2d 08:59
timotimo nine: it probably uses multiple threads? that makes the profiler tend to go boom 09:16
i mean if it uses a proc it now turns on the event loop thread, which makes the profiler unhappy 09:18
nine timotimo: no, there's only a single thread running 09:21
timotimo hmm 09:23
yeah, that's bad then :)
and disabling the jit makes it no longer asplode?
nine At least it works on 2017.06 09:34
Bisecting this sucks as I have to recompile moar+nqp+rakudo, otherwise I get errors like "Unhandled exception: Hash keys must be concrete strings" 09:36
timotimo: I should have checked if the JIT is to blame. Reverting your recent "JIT some more ops" commit gives me a shorter backtrace but still leaves the segfault 09:43
gist.github.com/niner/75f4f03fcdf5...634ee4ba3f
So MVM_nfa_run_alt get's called with a NULL labels parameter and passes it on to MVM_repr_at_pos_i 09:45
timotimo i'll be afk for much of the day again 09:56
10:39 domidumont joined
samcv ok so i visualized the data i have here with max depth 3 would a trie still be a good plan? idk. gist.github.com/0239ee41677dbc064d...92af3f01ef here it is. obviously the terminal hashes are the terminal points 11:17
as most cases it's only two codeponitns in a row. the max is 3
1st -> 2nd -> 3rd codepoints or whatever
since i have only 60 starters or so. maybe i should store indices of an array in the unicode data 11:19
check the codepoint, gives me a point in an array which from there hmm. though also makes me think that some of these may be faster to have code to check if the next cp is between a certain range of values. and a switch or something 11:20
since the highest depth is almost always 2. i get pointed to this array number, from 0 to 60 and then have a switch beneith that if the codepoint is in the proper range 11:21
anybody got any suggestions? 11:22
moritz use a linked list for the second (or maybe third) character, and all that follow? 12:13
timotimo if the numbers we have are low enough, encode a "contination bit" in the numers? 12:22
12:41 AlexDaniel joined 13:19 domidumont joined 13:41 brrt joined
brrt (my infinite loop is OSR) 13:43
osr disabled, it finishes
nine: i'll investigate at some point. i want to have a jit-bisect for specific ops... 13:51
timotimo oh, interesting 13:52
brrt well, 13:55
it's virtually impossible, really
timotimo that's what bugs are :P 13:56
14:49 zakharyas joined 14:51 ggoebel joined 17:27 zakharyas joined 17:32 MasterDuke joined 17:48 domidumont joined 19:12 zakharyas joined 19:17 domidumont joined 20:09 AlexDaniel joined 22:08 lizmat joined