[00:00] *** kjp left
[00:00] *** kjp joined
[06:09] *** japhb left
[06:09] *** japhb joined
[07:33] *** [TuxCM] joined
[08:08] *** [TuxCM] left
[09:21] *** finanalyst joined
[10:27] *** finanalyst left
[12:19] *** [TuxCM] joined
[13:30] *** [TuxCM] left
[14:19] *** [TuxCM] joined
[15:46] *** [TuxCM] left
[15:56] *** finanalyst joined
[15:59] *** [TuxCM] joined
[16:08] *** [TuxCM] left
[17:08] *** sjm_ joined
[17:54] *** finanalyst left
[21:09] *** MasterDuke joined
[21:10] <MasterDuke> i'm planning to (finally) get Text::Diff::Sift4 into the fez/zef ecosystem. as part of that move, i thought i'd try to un-nqpify it, hoping that rakudo has gotten faster enough that it's not necessary anymore

[21:11] <MasterDuke> i've gotten it down to only 3x slower (with nqp ops being used in only one specific way)

[21:12] <MasterDuke> but i'm not seeing many more things to try to get that last bit

[21:15] <MasterDuke> here are the two versions if anybody is interested: https://gist.github.com/MasterDuke17/d8403e036ab778bcaa47310dcdd89b56

[21:16] <ugexe> maybe predeclare my Bool $isTrans

[21:18] <MasterDuke> i did try that, seems to make a small-but-noticeable negative difference

[21:20] <MasterDuke> yep, my rough benchmark went from ~1.46s to ~1.8s

[21:22] *** finanalyst joined
[21:22] <MasterDuke> i just re-measured the nqp version to make sure i was using the same benchmark, and turns out it's only 2x faster (i.e., ~0.75s for the more nqp version, ~01.46 for the less nqp version)

[21:23] <MasterDuke> a profile just says all the time is spent in the main `while` loop, nothing else really stands out

[21:25] <MasterDuke> i was originally using a hash for offsets in the raku version, but switch to a class was a decent improvement

[21:28] <ugexe> is it under the inline limit? potentially you could split it up more to do so

[21:29] <ugexe> also thats a bit unintuitive that predeclare slows it down but not impossible

[21:30] <MasterDuke> yeah, wasn't expecting that at all

[21:30] <MasterDuke> i haven't looked at a spesh log in a while, i'll take a look

[21:30] <MasterDuke> but afk for a bit

[21:43] <ugexe> you could technically calculate one of the ords once, the nqp::ordat($s2, $c2)

[21:43] <ugexe> instead of twice

[21:45] *** MasterDuke left
[21:49] <ugexe> $s1.chars / $s2.chars would also be faster although obviously not important in the overall performance of the function

[21:54] <ugexe> ($x + $y).abs will be slightly faster than abs($x + $y)

[22:07] <ugexe> i suspect it might be faster to do `my Bool $isTrans;` instead of `my Bool $isTrans = False;`

[22:11] <ugexe> more algorithmically i wonder if the splice could be moved outside the inner loop so it isn't called as much

[22:36] *** finanalyst left
[22:51] *** MasterDuke joined
[22:53] <MasterDuke> ugexe: the method form of a bunch of those things (e.g., `abs`, `chars`) is slower. i believe it's because the sub form accepts natives, but the method form has to box the native it's called upon

[22:53] <MasterDuke> m: my int $a; for ^10_000_000 -> int $i { $a = abs($i + $a) }; say now - INIT now; say $a;

[22:53] <camelia> rakudo-moar b527cefcb: OUTPUT: «0.066115068␤49999995000000␤»

[22:54] <MasterDuke> m: my int $a; for ^10_000_000 -> int $i { $a = ($i + $a).abs }; say now - INIT now; say $a;

[22:54] <camelia> rakudo-moar b527cefcb: OUTPUT: «2.012868178␤49999995000000␤»

[22:57] <ugexe> ah

[23:09] <MasterDuke> with some of the recommendations i'm now at ~1.36s

[23:10] <MasterDuke> actually closer to ~1.30s

[23:14] <MasterDuke> ugexe: and i'm not sure it's safe to cache `nqp::ordat($s2, $c2)`, since i believe `$c2` can change between the first and second call

[23:14] <timo> in theory, we can eliminate the boxing in spesh, but i'm not up to date on the exact circumstances that may prevent that

[23:15] <MasterDuke> timo: well, even if nothing prevents it, it wouldn't happen until spesh decided to optimize the call, right?

[23:21] <MasterDuke> afk for a bit again

[23:30] <timo> yes, true. the call also has to be inlined, or it won't work

[23:30] <timo> unless the static optimizer happens to already have done it? do we actually do that?

[23:40] *** MasterDuke left
