jnthn | Well, the alternative would be that they're mutable and we have to enforce locking on every single time a string is even just read. :) | 00:42 | |
02:20
mtj_ joined
02:56
ilbot3 joined
03:02
unicodable6 joined
03:34
colomon joined
05:09
MasterDuke joined
|
|||
MasterDuke | are there plans to make grammars/regexes/etc work on strands? | 05:10 | |
Geth | MoarVM/master: 4 commits pushed by MasterDuke17++, (Samantha McVey)++ | 05:28 | |
samcv | MasterDuke: well i mean they can work on strands if you just remove nqp::optimizedindexing or edit moarvm to always return the same string when you call that op | ||
it's gotten much closer now than it used to be since all the speedups i did during my grant | 05:29 | ||
i know that it was still slower when running a regex on something but i'm curious if you concat then do a regex on a string and then concat again and then regex again it may be faster not to flatten | |||
07:03
geospeck joined
07:51
geospeck joined
08:02
geospeck joined
|
|||
nine | jnthn: yes, I've seen the link and read deopt.c but other than the comment I can't find anything in there that's actually specific to gotos. At least nothing that I'd understand as such. | 09:10 | |
09:25
japhb joined
10:48
domidumont joined
10:50
lizmat joined
10:54
domidumont joined
10:58
domidumont joined
11:40
robertle joined
13:20
AlexDaniel joined
|
|||
MasterDuke | samcv: i don't think you can just the nqp::indexingoptimized here github.com/perl6/nqp/blob/master/s...r.nqp#L417 | 13:56 | |
i'm pretty sure jnthn has said (that at least for now), flattening is required for regexes | |||
it just causes lots of memory use in the case of something like this: | 13:57 | ||
c: HEAD use snapper; my $a = "a" x 1_000_000; for ^1_000 { $a ~~ /./ } | |||
committable6 | MasterDuke, gist.github.com/8d626a2023014dad63...6800940570 | ||
MasterDuke | because it creates a new flattened $a every iteration | 13:58 | |
timotimo | it's not required as in "it won't work if we don't do it", it'll just probably have a severe performance impact | 15:04 | |
jnthn also points out that strands are an *optimization* that saves memory in a bunch of cases; the non-stand behavior is the baseline. | 15:06 | ||
*non-strand | 15:07 | ||
MasterDuke | right, i'm trying to see if strands can be used in more cases | 15:08 | |
jnthn | Yeah, but I think regexes are a place where it's a pretty good assumption that we don't want that. We want to index as fast as possible. | 15:12 | |
15:13
tangible6 joined
|
|||
MasterDuke | very true, but this case (though artificial), isn't all that fast and it spends a lot of its time mallocing and grapheme iterating the strand into the flat buffer | 15:15 | |
m: my $a = "a" x 1_000_000; for ^1_000 { $a ~~ /./ }; say now - INIT now | |||
camelia | 3.3505512 | ||
MasterDuke | m: use nqp; my $a = "a" x 1_000_000; my $b = nqp::indexingoptimized($a); for ^1_000 { $b ~~ /./ }; say now - INIT now | 15:16 | |
camelia | 0.02798563 | ||
jnthn | That benchmark is artificial in a number of ways, yes. | 15:17 | |
It'd be unusual to apply a regex to the same string a lot of times for one | 15:18 | ||
MasterDuke | hm. any way to make the grapheme iterator smarter? when it sees a strand with repetitions, is there a faster way to create the flat string? | ||
jnthn | And I struggle to think of a real-world situation where one would use the x operator and then do a regex match. | ||
Quite possibly, yes | 15:19 | ||
I mean, if we were to be creating an 8-bit string, then it is just a memset :) | |||
MasterDuke | heh, was just going to ask if was as simple as that | ||
jnthn | We could detect this case in indexingoptimized I guess | 15:20 | |
15:20
lizmat joined
|
|||
jnthn | Though it feels like we're doing it for the sake of an artificial benchmark :) | 15:20 | |
otoh, that's how many people measure performance... | |||
MasterDuke | and it's practically a philosophy question whether there are any non-artificial benchmarks... | 15:21 | |
jnthn | This is true | ||
And it's not like benchmarks that meausre features or small numbers of features in isolation don't have some value. This one just feels like it's testing a combination that I'm struggling to see how up in a real program :) | 15:23 | ||
*show | |||
timotimo | jnthn: when you regex match a unary number to figure out if it's a prime :) | 15:24 | |
MasterDuke | i could imagine something more like: my $a = $some-variable-string x $some-variable-count; for @list-of-regexes -> $r { die "this is still kind of artificial" if $a ~~ $r } | 15:26 | |
hm. to collape strands, would it likely be faster in general to just loop over each strand, memsetting or memcpying as required, with a re_nfg at the end, instead of using the grapheme iterator? | 15:42 | ||
15:57
domidumont joined
16:11
zakharyas joined
17:08
MasterDuke joined
17:48
geospeck joined
18:18
geospeck joined
|
|||
lizmat | m: say $*THREAD.app_lifetime # shouldn't that need to be True ? | 18:22 | |
camelia | False | ||
lizmat | not that it would technically make a difference, but informationally that would be more correct, no ? | 18:31 | |
samcv | timotimo: MasterDuke not flattening has like 2x less performance impact compared to before for most of the ops | 18:32 | |
since my changes during my grant | |||
i believe the thing that made me not decide to do it was the performance of it with INTERPOLATE or something i forget | |||
for my string benchmarks i usually will take a long book like tolsky and then cut it into 8ths and then concatenate them together | 18:34 | ||
after altering moarvm so indexingoptimized doesn't actually flatten and compare to normally | |||
timotimo | can you run a comparison between a regular string and that same string but with "(" in front and ")" in the back as a string with three strands? i.e. we'll always be hitting the same strand effectively, but we'll still have to do a bit of calculation each time we index into the whole string? | 18:36 | |
samcv | just 3 strands? yeah i can do that | 18:37 | |
i mean i made it faster for moving to the right index point for strands as well as making functions use grapheme iterator instead of grapheme_at, and made grapheme_at faster by speeding up the seeking | 18:38 | ||
and many other changes, but let me see if i remembered correctly if it was interpolated things that were slow that made me think it wasn't a good idea to change it | |||
we could also not flatten if it's only one strand that's repeated | 18:39 | ||
timotimo | i'd also be interested to see how regexes that have to go back and forth a bunch will perform | 18:44 | |
if you have a cached grapheme iterator, going forwards is really fast, but going backwards isn't | |||
timotimo afk for a few hours | 18:46 | ||
samcv | still much faster than it used to be :) but yeah i'm curious too | 18:47 | |
i still haven't gotten it to be able to go backward with just a move_to maybe something worth thinking about after some benchmarks. timotimo if you're still there what would be the best way to make something that has to go backwards | 18:48 | ||
timotimo | you could have lookaheads and lookbehinds | 18:52 | |
though i think one of these reverses the string and matches "forwards" anyway | |||
lizmat | which reminds me of @a.reverse using an iterator that starts at the end and goes back | 18:57 | |
without actually reversing anything | |||
not sure how applicable that would be here | |||
samcv | someone else may be better to write a regex that will for sure backtrack or something idk. i mean not all look behinds have to reverse? or do they | 18:59 | |
19:15
zakharyas joined
19:17
dogbert17 joined
|
|||
jnthn | lizmat: app_lifetime being False for the main thread is actually correct, in that the meaning of app_lifetime is "doesn't block termination" | 19:23 | |
lizmat: And the main thread decidedly *does* block that | |||
lizmat | ok, reverted, | 19:25 | |
really afk now& | |||
jnthn | Thanks | ||
The VM exits when the only threads left are app_lifetime | |||
21:48
ggoebel joined
22:44
evalable6 joined
23:36
MasterDuke joined
23:40
MasterDuke_ joined
|