timotimo | heh, the moarvm.org repo already has the fixes, but the website itself is still on an older revision | 00:12 | |
samcv | hmm i'm getting a crash when i call MVM_free on something i've MVM_realloc'd | 01:24 | |
anybody know of any gotcha's that could maybe cause that? it only crashes on the free | |||
An interesting article on use of conditionals llewellynfalco.blogspot.com/2016/02...gn-in.html | 01:37 | ||
i've since taken its advice and now only use the less than or less than or equal to sign when programming. it's resulted in much clearer and consistently typed conditionals | |||
ok so i used realloc(ptr, 0) to free it and that probbaly works from what the documentation on realloc says | 01:42 | ||
01:52
ilbot3 joined
03:17
solarbunny joined
04:01
MasterDuke joined
06:13
brrt joined
|
|||
brrt | good * #moarvm | 06:26 | |
samcv | brrt, you should check out that article i posted | 06:34 | |
brrt | i should first backlog | ||
that's an interesting idea, yes | 06:41 | ||
samcv | i've been loving it since i started doing it several weeks ago | 06:43 | |
makes everything more readable and the code and conditionals just end up looking nicer | |||
nwc10 | good_ *, jnthn_ | 06:44 | |
brrt | ohai nwc10 | ||
nwc10 | jnthn: that panic seems to also crop up in some spectests. It's always the same negative value in the message, and ASAN makes no comment | ||
07:05
domidumont joined
07:12
domidumont joined
07:15
brrt joined
07:36
geekosaur joined
07:42
geekosaur joined
07:43
geekosaur joined
07:44
geekosaur joined
07:58
zakharyas joined
08:06
zakharyas joined
08:18
TimToady joined
08:53
robertle joined
|
|||
jnthn_ | moarning o/ | 08:53 | |
samcv | morning jnthn_ :) | 08:59 | |
brrt | moarning jnthn, samcv | 09:04 | |
jnthn | Grr, two mornings in a row the internet connection here breaks for about 30 minutes :/ | 09:22 | |
Bemusingly, I don't see the crash people have been reporting even in stresstest | 09:25 | ||
brrt | i'm not seeing a crash either, fwiw | 09:27 | |
timotimo | i only get it with some debug stuff turned up | ||
and it's only a guaranteed on-startup-crash when i change the is_on_stack | 09:28 | ||
maybe i did the is_on_stack function wrong? | |||
jnthn | Maybe :) | 09:32 | |
Hm, no geth? | 09:55 | ||
nine | Geth_: status | 10:06 | |
10:07
Geth joined
|
|||
nine | Geth_: status | 10:07 | |
Geth | MoarVM: 8d7f9d404c | (Jonathan Worthington)++ | 3 files Remove now-unused pool_index in MVMStaticFrame. Plus code to increment it. |
||
MoarVM: 5a5b1be977 | (Jonathan Worthington)++ | 12 files Eliminate now-unused Lexotic REPR, lexotic cache. |
|||
MoarVM: f302bbed14 | (Jonathan Worthington)++ | src/core/instance.h Remove a dead field. |
|||
timotimo | moarvm's shrinking ... and i like it | 10:08 | |
jnthn | :) | ||
jnthn was doing the trans-siberian when that song was at the height of its popularity | 10:11 | ||
It followed me across the whole of darn Siberia. Every. Single. City. | 10:12 | ||
Geth | MoarVM: fdd04a6a99 | (Jonathan Worthington)++ | 2 files Remove unused gotolexotic function. |
10:13 | |
timotimo | what song was that? "i'm so excited"? | ||
jnthn | hah, no, that Katy Perry one...I thought you were referencing it :P :P | 10:14 | |
Semes I've killed off all the lexotic leftovers | 10:15 | ||
The thing that motivated me to do that was pondering cleaning up invocation a bit | 10:16 | ||
Now there's only 3 things that can hang off the invoke pointer on an STable | |||
Which means it is somewhat endangered :) | 10:17 | ||
10:17
travis-ci joined
|
|||
travis-ci | MoarVM build errored. Jonathan Worthington 'Remove now-dead newlexotic functions.' | 10:17 | |
travis-ci.org/MoarVM/MoarVM/builds/258490308 github.com/MoarVM/MoarVM/compare/0...b7fb2f4c78 | |||
10:17
travis-ci left
|
|||
jnthn | pull fail | 10:17 | |
oh hah, after that change memory moved just enough that now I do get a spectest fail too | 10:19 | ||
t/spec/integration/deep-recursion-initing-native-array.t | |||
Oh no, until I do a debug build | 10:20 | ||
nine | Of course. That would have been too easy. | ||
timotimo | i'm not sure which katy perry song we're talking about %) | 10:22 | |
nine | Probably "I kissed a girl" as jnthn's trans-siberian trip was some time ago | 10:25 | |
jnthn | Yeah, that one :) | 10:27 | |
timotimo | ah, i see | 10:32 | |
yeah, it has "and i liked it", here i said "and i like it" :P | |||
"i shrunk moarvm and i liked it" | |||
"the taste of the git commit ..." what exactly? | 10:33 | ||
Geth | MoarVM: 82c282efda | (Jonathan Worthington)++ | src/core/frame.c Zero owner of a stack frame also. The flags being zero is a sufficient test for whether the frame is a stack frame or not when we know it's certainly a frame. However, when we have an unknown collectable, the fully zeroed flags would also be a normal object. The temp roots mechanism thus uses a zero owner to identify this situation. Ensure the owner is zeroed. |
||
timotimo | oh i thought zeroed flags would be "not a normal object" | 10:34 | |
jnthn | No, I forgot that we do that :) | ||
We do it 'cus it means object allocation never has to set any flags :) | |||
So the common case is cheap | |||
timotimo | ah | ||
jnthn | An alternative to save, like, 1 instruction, would be to introduce a flag set on stack frames on the call stack | 10:35 | |
Instead of 0 | |||
spectest looks happy with GC stressing | 10:38 | ||
So hopefully that nails it | 10:39 | ||
One upshot of MVMStaticFrame having spesh stats hung off of it is that it often ends up needing to be marked every single collection | 10:42 | ||
Well, that's not quite true | 10:43 | ||
But since spesh stats sample values, it means a lot more frames end up in the inter-gen roots | |||
We thus spend 2.32% of CORE.setting compilation cycles in gc_mark of a static frame | 10:44 | ||
Since there's a load of stuff to mark | 10:45 | ||
timotimo | oh, hmm | ||
right | |||
jnthn | Static frame's gc_mark is thus called 35 million times | 10:46 | |
timotimo | the sampled values never get stale, do they? like, we don't throw out stuff that never gets used any more in the future? | 10:49 | |
jnthn | There's a throwing out mechanism | ||
If the stats weren't updated for N spesh runs then they are thrown away | |||
timotimo | that's good | ||
samcv | is create in array that has as many elements as the longest sequence of codepoints. i'll put codepoints into this array. 0, 1, 2 then wrap 0, 1, 2. so it just continuously puts codepoits in there | ||
always storing only 3 to prevent having to store too much for really long strings | 10:50 | ||
timotimo | oh so like a ring buffer? | ||
jnthn | I guess we could have an MVMSpeshStats repr | ||
samcv | then when it finds a mismatch it puts the three codepoits which might be 2, 0, 1 for example since it wraps continuously | ||
yeah | |||
and then pass that through to the collation function | |||
jnthn | Just to hang the stats off | ||
And then that'd end up in the inter-gen set | 10:51 | ||
10:51
brrt joined
|
|||
samcv | right now i do tricks so i don't have to go the whole length of the strig before starting collation. since how the UCA works: primary_1, primary_2 .... primary_n, secondary_1 etc | 10:51 | |
jnthn | Alternatively, we could just keep an array of MVMSpeshStats ** in the instance and just store an index in the MVMStaticFrame | ||
samcv | is how the ordering of them go. so if you just niavely did that you'd have to go the whole length of the string and that'd be no fun | ||
so need to now solve how to not have to push them starting from the very beginning of the string | 10:52 | ||
jnthn | And have the throwing away done by index | ||
And re-use indices | |||
timotimo | that'll naturally throw out old stuff | 10:53 | |
jnthn | It's a bit tricky to reason about compared to just having a spesh stats repr though | 10:55 | |
And we'd still need to make sure that if a static frame gets collected then we throw out the stats | 10:56 | ||
And it gets tricky if two threads are doing GC and want to toss stuff out and add the indices to an "unused" array | 10:58 | ||
Then we need a lock on taht | |||
And urgh | |||
So I suspect the stats holder object is the way to go | |||
I wonder if we should hang *all* spesh stuff off such an object | 11:00 | ||
Tempting. | 11:01 | ||
Then for frames that are never called at all (which in a typical program is most of them) we save 24 bytes (64-bit) | 11:02 | ||
And the MVMStaticFrame itself will never be forced into the inter-gen roots list | |||
(The most of them meaning all of the CORE.setting frames we don't call) | 11:03 | ||
timotimo | that doesn't sound bad; is it problematic to install the spesh object in a threaded environment? | 11:05 | |
jnthn | I was figuring we'd allocate it on first call | 11:06 | |
When we do all the other static frame setup work | |||
Which we do under lock | |||
So it's safe | |||
It's an extra indirection to reach the arg guards is all | 11:07 | ||
But when we're paying so much in the static frame gc_mark... | |||
timotimo | OK | ||
jnthn | Yeah, I think that's a good bet | 11:10 | |
And I can move things bit by bit | |||
So, will do that. But...after lunch :) | |||
timotimo | in come lots of "body." | 11:13 | |
11:15
travis-ci joined
|
|||
travis-ci | MoarVM build passed. Jonathan Worthington 'Zero owner of a stack frame also. | 11:15 | |
travis-ci.org/MoarVM/MoarVM/builds/258502483 github.com/MoarVM/MoarVM/compare/f...c282efdae8 | |||
11:15
travis-ci left
11:59
committable6 joined,
quotable6 joined,
bloatable6 joined,
bisectable6 joined,
greppable6 joined,
evalable6 joined,
coverable6 joined,
unicodable6 joined,
benchable6 joined,
statisfiable6 joined
12:39
zakharyas joined
|
|||
Geth | MoarVM: 3f26aa9a6e | (Jonathan Worthington)++ | 11 files Add spesh data REPR; hang off static frame. All of the spesh-related fields in MVMStaticFrame will move into this in order to get better generational GC behavior (by cutting down how many times we need to mark static frames) and making static frames a bit smaller in the case they aren't used (which, for something like CORE.setting, is a lot of the time for typical programs). |
12:49 | |
MoarVM: 838f7cf70c | (Jonathan Worthington)++ | 13 files Move candidates and entries count to spesh object. |
13:09 | ||
nine | I think unused code is actually quite common even outside the setting. Think about all those times you use some module just for one of its gazillion functions. Or about loading a full DBIx::Class schema with all table definitions (and custom methods) just for inserting some entries in a single table in a script. | 13:13 | |
jnthn | Yeah, good point :) | 13:19 | |
Saving 24 bytes off each of those could quickly add up :) | |||
nine | I guess that's also true for all the generated accessor methods? | 13:20 | |
jnthn | Yup | 13:21 | |
brrt | jnthn; the spesh guard tree thingyā¦ do we ever update that as more specializations come in? | 13:24 | |
timotimo | yeah | ||
we use the "free at safepoint" mechanism to make replacing it safe | 13:25 | ||
brrt | so, you mentioned at some point that we should JIT compile that | ||
jnthn | We update it by copying it and tweaking and then installing the new one as just a pointer update | ||
brrt | i'm guessing that at the point we are compiling a JIT frame, we at least have the current version of that tree | ||
jnthn | And yeah, safepoint thing as timo mentioned it | 13:26 | |
brrt | so we can compile a new tree into the latest JIT frame | ||
jnthn | At the moment we delete the tree after | ||
oops | |||
update the tree after | |||
brrt | hmmm | ||
jnthn | But there's no need for that | ||
It's just the way it is | |||
I can't think of a reason why we can't re-order | |||
I mean, we can't *publish* the new tree until the compilation is done | |||
But we can produce it and have it avaialble earlier | 13:27 | ||
brrt | i'm asking mostly because if we'd allocate it separately, we need at least a whole page for it | ||
whereas if we do it in the compiled frame, we can append it after | |||
jnthn | ah | ||
True | |||
Though we lose the ability to free older ones | 13:28 | ||
But still, it's probably very little code, so... | |||
brrt | why would we lose that? | ||
jnthn | Because we still need the JITted candidate | 13:29 | |
I mean, most of the time it's a non-issue as probably there's space left over in the page we put JIT output on anyway | |||
Geth | MoarVM: 5fdfddf300 | (Jonathan Worthington)++ | 15 files Move spesh arg guards into spesh object. |
13:36 | |
MoarVM: 7fae17c96e | (Jonathan Worthington)++ | 8 files Move spesh stats to spesh object. |
13:41 | ||
MoarVM: eff1b1f35d | (Jonathan Worthington)++ | src/core/frame.c Fix typo; MasterDuke17++. |
|||
13:41
lizmat joined
|
|||
Geth | MoarVM: 1db5c5020a | (Jonathan Worthington)++ | src/spesh/candidate.c No longer force static frame into gen2 after spesh All of the spesh-related data is on the static frame spesh object instead now, so just barrier that. |
13:47 | |
jnthn | Now for a callgrind run to see if it helped :) | 13:48 | |
14:10
Voldenet joined
14:20
zakharyas joined
|
|||
jnthn | Yeah, it's pushed static frame marking way down the profile | 14:32 | |
Geth | MoarVM: 2bfee71331 | (Jonathan Worthington)++ | 2 files Break MVM_sc_get_sc into inline and slow path. This gets called incredibly often, but in the common case does very little work. Split it up into a static inline and a normal function for the slow path part, since the fast part will most probably be less insturctions than making the call. |
14:43 | |
timotimo | yes, yes, we must get rid of these terrible insturctions :) | 14:48 | |
jnthn | :P | 14:51 | |
15:03
domidumont joined
15:09
brrt joined
|
|||
Geth | MoarVM: 5c67d732b1 | (Jonathan Worthington)++ | src/gc/collect.c calloc a tospace instead of memset old fromspace. Callgrind had the memset as a huge cost, and believes this to be far cheaper. Wallclock timings show it seems to be slightly faster, though not by ~5%; not all instructions are created equal, and the callgrind counts are in terms of instructions, and the outcome does show a ~5% reduction there. |
15:30 | |
MoarVM: 1d3a139bf2 | (Jonathan Worthington)++ | 4 files Avoid range check on every SC object access. By validating them as part of bytecode validation to ensure they aren't too big, and using unsigned instead of signed there and in the access to avoid the zero check. |
|||
MoarVM: MasterDuke17++ created pull request #619: Remove unnecessary variable in MVM_string_(ascii|latin1)_encode_substr |
15:46 | ||
16:38
robertle joined
|
|||
jnthn went home but his brane kept working on how we'll take spesh forward anyway... | 18:41 | ||
Basically, we cheated really hard on aliasing so far. We presumed that just because we guarded a Scalar and it contained something then it would continue to contain that. This didn't really work so we ended up with a "block it if we see a call", which is both too strong (most calls immediately decont their args) and too weak (we don't know the Scalar isn't aliased elsewhere, and being touched by another thread) | 18:43 | ||
lizmat | jnthn: a datapoint: "my @a = (^10000)>>.Str" with --profile reliably segfaults for me, with ^1000 it doesn't | ||
jnthn | lizmat: That's not really related to what I'm thinking about now :) | ||
But maybe you meant a general thing that wants a look at :) | 18:44 | ||
lizmat | true, it wasn't related to it in any way... | ||
jnthn | Ah :) | ||
lizmat | sorry to have interrupted your train of thought | ||
jnthn | OK, not sure why that'd be; could you RT it so it doesn't get lost? | ||
lizmat | will do | 18:45 | |
jnthn | Thanks | ||
Anyway, doing a more proper alias analysis is useful for cutting down on our GC. But we kinda have a problem in that almost everything in Perl 6 is a call. All operators are multi-dispatch. And we pass them scalar containers. | 18:46 | ||
Anyway, I had the idea a while back that spesh could calculate frame-level facts, and attach them to the spesh candidate | |||
Included "we always decont this straight away" for each object arg, and later richer escape info. Plus "we always return type T" | 18:47 | ||
The other thing I just realized is that our facts need to become lattices like: | 18:48 | ||
Morphic (?) | 18:49 | ||
/ .... \ | |||
Int Str | |||
\ / | |||
Unknown (?) | 18:50 | ||
Uh, crappy IRC drawing :) | |||
18:51
zakharyas joined
|
|||
jnthn | So in a case like my $i = 0; while $i < 100 { $i++ } | 18:51 | |
We can do an abstract interpretation | |||
Actually that's an awkward example, let's do | 18:52 | ||
So in a case like my $i = 0; while $i < 100 { $i = $i + 1 } | |||
So here the PHI at the entrance to the loop body has $i as containing an Int | |||
We then say "OK, if we presume that we call the $i + 1 candidate we'd be allowed to, what'd it give us?" and we know from the frame facts it's an Int | 18:53 | ||
Which then gets stored into $i | |||
Then we have to join our content model and at the join point both say Int and Int so we stay on Int | 18:54 | ||
If they instead could differ then we'd end up at Morphic | |||
Which may have impacts on other things | |||
So we'd iterate to a fixed point | |||
Which the lattices promise we'll reach | 18:55 | ||
And only then would be go and do the various transformations | |||
lizmat is afk again& | 18:58 | ||
jnthn | Anyways, I think that gives me enough of an idea of how to move forward that I can relax and enjoy my weekend now :P | 19:01 | |
[Coke] | \o/ | 19:08 | |
Zoffix | \o/ | 19:10 | |
timotimo | that's good | 19:27 | |
21:07
brrt joined
21:15
praisethemoon joined
|
|||
brrt | good * #moarvm | 21:16 | |
timotimo | yo brrt | 21:17 | |
how's you? | |||
brrt | happy that it's weekend | ||
how are you? | |||
timotimo | i'm okay | ||
a bit frustrated by my continuing lack of productivity | 21:18 | ||
21:22
praisethemoon joined
|
|||
[Coke] | I feel your pain. | 21:22 | |
(about, to be clear, my own productivity) | 21:23 | ||
mst | similar | 21:28 | |
too much staring at an empty editor | |||
22:02
dogbert2 joined
|
|||
dogbert2 | there's at least one bug still lurking in the shadows | 22:02 | |
timotimo | crashbug? | 22:04 | |
dogbert2 | yes | ||
read all about it :) gist.github.com/dogbert17/3cc123e3...52eee4fb3c | |||
samcv | woo my ring buffer at least seems to work. now to run tests on it | 22:12 | |
though most of the unicode tests are all really short so won't see any difference there. but it shouldn't matter if i chop off the front | |||
and also having the ringbuffer allows me to break any resulting collation ties easily. i already know which string would be greater or less than the other if broken by a codepoint tie | 22:15 | ||
atm i have it go back through the string again from start to finish in case the collation values are the same. now i can just store that value and return it if the collation values end up the same | |||
timotimo | that sounds like good news | 22:17 | |
samcv | yep :) | 22:18 | |
timotimo | dogbert2: does it trigger reliably? did you set anything up like a smaller nursery? | 22:19 | |
i think i see what's wrong | 22:21 | ||
dogbert2 | timotimo: not reliable, I had to run with a small nursery but it still doesn't trigger every time | ||
I ran with '#define MVM_NURSERY_SIZE (32768 * 8)' | 22:23 | ||
timotimo | right | ||
the spesh thread doesn't know to wait while the threadcontext gets destroyed while it's joined | 22:24 | ||
so it blindly reaches for where there used to be a thread to fetch some log data | |||
and bam | |||
samcv | anybody know if i'm wrong that when you MVM_malloc something, and then use MVM_realloc, that you can't just call MVM_free on it? | 22:26 | |
because if i had some allocated memory that had been resized with MVM_realloc, mvm would crash when i ran MVM_free | |||
timotimo | didn't you say you realloced it to 0? | 22:27 | |
samcv | i did | ||
timotimo | i'd say that the implementation treats that as if you had used free on it | ||
and another free would be explosive | |||
samcv | well not quite | ||
Geth | MoarVM/collation-arrays: 8 commits pushed by (Samantha McVey)++
|
||
samcv | timotimo, github.com/MoarVM/MoarVM/blob/coll....c#L62-L68 this is what i do now | 22:28 | |
after reading the documentation for realloc more closely | |||
timotimo | huh, that's strange? | ||
samcv | it says it may return NULL or a pointer suitable to be freed | 22:29 | |
wondering if we do this other places in moarvm where we use realloc? | |||
timotimo | but why do you realloc to 0 in the first place? | ||
samcv | because otherwise it crashes moarvm | ||
timotimo | o_O | 22:30 | |
samcv | yep | ||
timotimo | valgrind should tell you why exactly, i'd expect | ||
samcv | it prints out a big stacktrace | ||
timotimo | like it'll tell you if it had already been freed, or if it's not a pointer that was returned by any *alloc | 22:31 | |
it looks like we never actually root an MVMThreadContext | 22:39 | ||
we clearly just use the status to tell when it may be destroyed | |||
dogbert2 | aha | 22:46 | |
timotimo | i'm not sure what the right step forwards is | 22:48 | |
but jnthn will know :) | 22:49 | ||
Geth | MoarVM: b161c85143 | (Timo Paulssen)++ | src/profiler/instrument.c be safe about marking NULL call graph nodes |
22:50 | |
22:55
praisethemoon joined
|
|||
timotimo | dogbert2: want to open a RT about the join issue? | 22:56 | |
something along the lines of "the spesh worker doesn't cope with a tc being freed while it's working on data contributed by it" | 22:57 | ||
jnthn | timotimo: Is that profiler patch for the profiler bug lizmat mentioned earlier? | 23:10 | |
samcv | jnthn, can i make a proposal for MVM_string_compare? | 23:21 | |
jnthn | samcv: sure | 23:22 | |
samcv | currently it isn't predicable when synthetics are involved. we don't need to expand most synthetics, but only one. i.e. if the different grapheme between the two is a synthetic, we should return based on codepoint instead of the synthetic number | ||
all other synthetics we encounter just get left aloe | |||
*alone | |||
so shouldn't slow it down except a very tiny amount | 23:23 | ||
jnthn | Well, it's predictable if you consider it a codepoint level operation... :) | 23:25 | |
oh wait | |||
It ain't | |||
samcv | well. if synthetics get added in a different order then it will have a different response | ||
yep | |||
jnthn | It's scanning graphemes | ||
d'oh | |||
Yeah, it isn't doing what I thought it was doing | |||
I dunno, maybe the easiest way to do it is with codepoint iters... | 23:26 | ||
samcv | yeah. we save time not doing a codepoint iterator, which is good. it's not needed except if the deciding codepoint is a synthetic | ||
then it matters | |||
jnthn | Or the spiritual equivalent | ||
Yeah | |||
samcv | codepoint iterator won't be as fast and isn't needed | ||
jnthn | Well, it depends :) | ||
samcv | we can do a graphemecodepoint iterator the thing i just created a few days back | ||
but only if one of the deciding codepoints is a synthetic | |||
jnthn | Codepoint iter is faster if you've strandy stuff going on | ||
samcv | wait why? | 23:27 | |
jnthn | 'cus it doesn't have to re-scan the strands to figure out which one has the grapheme you want | ||
samcv | codepoint iterators are more work | ||
since it has to iterate each grapheme | |||
and we *don't* want to iterate by codepoint | |||
we want to iterate by grapheme and oly compare one that differs | 23:28 | ||
otherwise the strings will become unaligned | |||
if one grapheme expands more than another | |||
oh wait | |||
that makes no sense | |||
hah | |||
jnthn | I don't think that could ever cause a different result | ||
:) | |||
samcv | well it won't become misaliged | ||
jnthn | 'cus we return as soon as we see the diference | ||
samcv | but it will have to expand graphemes for no reason | ||
jnthn | But yeah, agree | 23:29 | |
samcv | and codepoint iterator is just a grapheme + codhepoint iterator. it iterates both | ||
jnthn | My point wasn't so much about codepoint iter so much as about grapheme iterator and codepoint iterator in general | ||
timotimo | jnthn: i stumbled upon a null deref and fixed it, but forgot to push for hours. i don't know if it also fixes what liz found | ||
jnthn | Which is that MVM_string_get_grapheme_at_nocheck is cheap when it is just a buffer string and you can fixed index. | ||
timotimo | damn, without the patch it doesn't crash either | 23:30 | |
samcv | oh | ||
ah | |||
jnthn | But when you've a string of strands it might have to look through 10 strands to see their lengths to know that the grapheme it wants is actually in the 11th one, for example | ||
samcv | yeah | ||
good point. so we should use a grapheme iterator probably | |||
jnthn | Yeah | ||
I think what I wanted to say is that we probably want the semantics we'd get if we used codepoint iter | 23:31 | ||
Not that using codepoint iter was the best way to implement it | |||
samcv | oh. wait how does codepoint iter have different semantics to grapheme iterator? | ||
jnthn | Though maybe what you're suggesting is subtly different | ||
samcv | the codepoint iterator is just a grapheme iterator but if it's synthetic iterates by codepoint then once done with that synthetic grabs next from the grapheme iterator | 23:32 | |
jnthn | iiuc, you're saying that when we see a synthetic, than we'll look at the codepoints it would conssit of | ||
samcv | yeah | ||
jnthn | Which is I think going to give about the same results, but faster, as if we just used the codepoint iter | ||
timotimo | it was a bit strange to me that the title said "crash with --profile" but the examples in the body didn't have --profile | ||
samcv | so we don't waste time expanding synthetics that are going to be equal | ||
yeah | |||
jnthn | Yeah, I think we're agreeing :) | 23:33 | |
samcv | cool :-) | ||
jnthn | I fully agree the semantics today are weird. I didn't realize it did that. | ||
Geth | MoarVM/collation-arrays: 5fc8dce423 | (Samantha McVey)++ | src/strings/unicode_ops.c Implement a ring buffer so only different codepoints need processing The ring buffers hold the exact number of codepoints which comprise the longest sequence of codepoints which map to its own collation keys in the Unicode Collation Algorithm. As of Unicode 9.0 this number was 3. This also allows us to return based on codepoint based on the determination we found using the ring buffer |
23:36 | |
samcv | also we should make sure to use MVMuint64 or MVMint64 for every "i" we use iterating a string. since if the string is the 2**32 long, then i will roll back to 0. at least i assume | 23:38 | |
or maybe i'm wrong | |||
timotimo | what the hell, it crashes with that exact number, but not with a higher number | 23:39 | |
but with the patch cherry-picked it doesn't crash immediately | |||
jnthn | samcv: MVMuint32 would be save since the grapheme count is currently a 32-bit number | 23:40 | |
samcv: We could decide we want to support strings of more than 4GB I guess... :) | 23:41 | ||
samcv | 32bit signed? | ||
timotimo | didn't someone just upgrade that for us? | ||
jnthn | I thought it was unsinged | ||
samcv | so it can be max 2**31 ? | ||
yeah it is unsigned | |||
jnthn | But we index from 0 so a for loop up to the int max value is safe? | 23:42 | |
samcv | for (i = 0; i < str_len; i++) { } ah you're right ok so | ||
i think that sholud be fine | |||
since i would == str_len and then it'd end the loop | |||
jnthn | We could go 64-bit | ||
Yeah | |||
samcv | but str_len = 2**32 -1 then? | ||
jnthn | Don't know anybody has hit the limit yet though :) | ||
Yeah | 23:43 | ||
samcv | i can do 'a' x 2 ** 32 - 2 | ||
but not - 1 | |||
jnthn | But when doing things like finding the lenght of a join/concat we should really use 64-bit | ||
*length | 23:44 | ||
samcv | yeah | ||
jnthn | Though if we used 64-bit everywhere we're fine. It's just such a ridiculously big number ;) | ||
samcv | jnthn, i'm also going to make it so if all the codepoints in the synthetic match, it goes based on which has the most codepoints | 23:50 | |
(in that synthetic) | 23:51 | ||
timotimo | jnthn: got a hot take for "tc gets freed because thread_join but spesh is still working on stuff submitted from it"? | ||
samcv | since that could totally happen | ||
will come in useful that you can use MVM_string_grapheme_ci on non-synthetics | 23:53 | ||
i think i used that as well to greatly simplify the code so it wouldn't have to have a huge number of branches | 23:54 |