01:48
ilbot3 joined
|
|||
samcv | good * | 04:11 | |
so it looks like the max number of codepoints => 1 or more collation keys is 3 | 04:18 | ||
so we are dealing 3 max cp there. and the max number of collation keys which match a certain codepoint or sequence is 18 | 04:19 | ||
though the very vast majority only have 1-3 | 04:20 | ||
array => [(9060 32 26) (9116 32 26) (9157 32 26) (521 32 26) (8971 32 26) (9116 32 26) (9116 32 26) (9137 32 26) (521 32 26) (9070 32 26) (9116 32 26) (9158 32 26) (9137 32 26) (521 32 26) (9143 32 26) (9049 32 26) (9116 32 26) (9123 32 26)], codepoints => [65018], comment => ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM | |||
one codepoint.... to 18 collation keys... | |||
the array up there is what i'm calling collation keys (primary, secondary, tertiary factors for each key) | |||
so we will need to check for the next codepoint ahead in only certain cases. and rare cases 2 ahead. and i think that should be reasonable from a speed standpoint | 04:21 | ||
and having a ton of collation keys for a single codepoint is not really going to impact any of the other simple codepoints | 04:22 | ||
that sigature must be pretty long to need 18 collation keys though heh | 04:23 | ||
u: ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM | 04:24 | ||
unicodable6 | samcv, U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM [Lo] (ļ·ŗ) | ||
samcv | totally unreadable with monospace :P | ||
it's a ligature for three words. wow | 04:25 | ||
geekosaur | not much better with a proportioinal font. I have a dreadful suspicion someone tried to do calligraphy in a computer font... | 04:30 | |
samcv | well. it's three words | 04:31 | |
it means 'may peace be upon him' | 04:32 | ||
well it's three words in arabic. in english it's 5 :) | |||
and the words are on top of each other | |||
now with the UCA you can collate an equivilant of 18 characters in arabic together with the actual 18 characters :) | 04:33 | ||
hahaha | |||
oh and importantly. there's only 62 "special" starter codepoints. these are codepoints that have a possiblily of having to look at the next codepoint | 04:37 | ||
to see if it matches one of the special collation keys for series of cp's | |||
so that's not too many | |||
04:41
geekosaur joined
|
|||
samcv | though certain codepoints have 48 possibilities. so have 48 possiblities going off the first codepoint. eek | 04:42 | |
06:13
domidumont joined
06:18
domidumont joined
07:04
domidumont joined
07:20
domidumont joined
08:54
domidumont joined
|
|||
nine | Trying to profile zef, I get a segfault caused by MVM_repr_get_int being called with a NULL object from JITed code: gist.github.com/niner/d5d2e6cf104a...26be972d2d | 08:59 | |
timotimo | nine: it probably uses multiple threads? that makes the profiler tend to go boom | 09:16 | |
i mean if it uses a proc it now turns on the event loop thread, which makes the profiler unhappy | 09:18 | ||
nine | timotimo: no, there's only a single thread running | 09:21 | |
timotimo | hmm | 09:23 | |
yeah, that's bad then :) | |||
and disabling the jit makes it no longer asplode? | |||
nine | At least it works on 2017.06 | 09:34 | |
Bisecting this sucks as I have to recompile moar+nqp+rakudo, otherwise I get errors like "Unhandled exception: Hash keys must be concrete strings" | 09:36 | ||
timotimo: I should have checked if the JIT is to blame. Reverting your recent "JIT some more ops" commit gives me a shorter backtrace but still leaves the segfault | 09:43 | ||
gist.github.com/niner/75f4f03fcdf5...634ee4ba3f | |||
So MVM_nfa_run_alt get's called with a NULL labels parameter and passes it on to MVM_repr_at_pos_i | 09:45 | ||
timotimo | i'll be afk for much of the day again | 09:56 | |
10:39
domidumont joined
|
|||
samcv | ok so i visualized the data i have here with max depth 3 would a trie still be a good plan? idk. gist.github.com/0239ee41677dbc064d...92af3f01ef here it is. obviously the terminal hashes are the terminal points | 11:17 | |
as most cases it's only two codeponitns in a row. the max is 3 | |||
1st -> 2nd -> 3rd codepoints or whatever | |||
since i have only 60 starters or so. maybe i should store indices of an array in the unicode data | 11:19 | ||
check the codepoint, gives me a point in an array which from there hmm. though also makes me think that some of these may be faster to have code to check if the next cp is between a certain range of values. and a switch or something | 11:20 | ||
since the highest depth is almost always 2. i get pointed to this array number, from 0 to 60 and then have a switch beneith that if the codepoint is in the proper range | 11:21 | ||
anybody got any suggestions? | 11:22 | ||
moritz | use a linked list for the second (or maybe third) character, and all that follow? | 12:13 | |
timotimo | if the numbers we have are low enough, encode a "contination bit" in the numers? | 12:22 | |
12:41
AlexDaniel joined
13:19
domidumont joined
13:41
brrt joined
|
|||
brrt | (my infinite loop is OSR) | 13:43 | |
osr disabled, it finishes | |||
nine: i'll investigate at some point. i want to have a jit-bisect for specific ops... | 13:51 | ||
timotimo | oh, interesting | 13:52 | |
brrt | well, | 13:55 | |
it's virtually impossible, really | |||
timotimo | that's what bugs are :P | 13:56 | |
14:49
zakharyas joined
14:51
ggoebel joined
17:27
zakharyas joined
17:32
MasterDuke joined
17:48
domidumont joined
19:12
zakharyas joined
19:17
domidumont joined
20:09
AlexDaniel joined
22:08
lizmat joined
|