🦋 Welcome to the IRC channel of the core developers of the Raku Programming Language (raku.org #rakulang). This channel is logged for the purpose of history keeping about its development | evalbot usage: 'm: say 3;' or /msg camelia m: ... | log inspection situation still under development | For MoarVM see #moarvm
Set by lizmat on 22 May 2021.
00:03 reportable6 left
Geth ¦ rakudo: coke self-assigned 42 as sum of cubes - doesn't work on Windows github.com/rakudo/rakudo/issues/3176 00:25
¦ rakudo: coke self-assigned The Str.succ method on 1, ¹, ⒈ and ① works differently github.com/rakudo/rakudo/issues/3379 00:28
¦ rakudo: coke self-assigned Raku repl cant't correctly decode `π` or any other no-acsii chars github.com/rakudo/rakudo/issues/4540 00:51
01:03 linkable6 left, reportable6 joined 03:04 linkable6 joined 03:31 evalable6 joined 04:15 codesections5 joined 04:20 [Tux] left, codesections left, codesections5 is now known as codesections 04:22 MasterDuke left 04:23 Kaipi joined 04:24 Kaiepi left, Altai-man joined 04:27 bisectable6_ joined, committable6_ joined 04:28 sourceable6_ joined, sena_kun left, committable6 left, bisectable6 left, sourceable6 left, rypervenche left, releasable6 left, notable6 left, ugexe left, [Tux] joined 04:29 ugexe joined, jdv_ left, jdv joined, rypervenche joined 04:32 SmokeMachine_ joined, shareable6_ joined 04:33 unicodable6_ joined, coverable6 left, SmokeMachine left, shareable6 left, unicodable6 left, SmokeMachine_ is now known as SmokeMachine, tbrowder left, tbrowder joined 04:41 nativecallable6 left, nativecallable6 joined 04:47 gfldex_ joined, gfldex left 04:48 statisfiable6 left, statisfiable6 joined 04:49 benchable6 left, benchable6 joined 04:50 squashable6_ joined 04:51 squashable6 left 05:01 bloatable6_ joined 05:03 bloatable6 left 05:25 releasable6 joined 05:27 coverable6 joined 05:28 notable6 joined 06:02 reportable6 left 06:05 reportable6 joined
Geth problem-solving/JJ-patch-1: a17ae4d921 | (Juan Julián Merelo Guervós)++ (committed using GitHub Web editor) | solutions/meta/TheRakuFoundation.md
Addressing @vrurg comments
06:26
07:22 MasterDuke joined 08:31 lizmat_ joined 09:06 lizmat_ left, Geth joined 09:07 TempIRCLogger joined 09:08 gfldex_ is now known as gfldex 09:29 lizmat_ joined, Geth left 09:30 Geth joined, TempIRCLogger__ joined 09:31 RakuIRCLogger left, lizmat left, TempIRCLogger left 09:32 lizmat_ left, lizmat joined 10:53 linkable6 left, evalable6 left, evalable6 joined 11:55 linkable6 joined 12:03 reportable6 left 12:04 reportable6 joined 12:37 jgaz joined 13:02 jgaz left 13:49 jgaz joined 14:21 jgaz left 18:02 reportable6 left 18:06 japhb left 18:09 japhb joined
timo interprete with the rakudo compiler using the best optimization level. 19:16
$ raku --optimize=0 prime.rk
^- ???
github.com/PlummersSoftwareLLC/Pri...1/prime.rk
MasterDuke ha 19:20
timo uh
it is actually faster with --optimize=0
with a "my uint8 @!Bits" and using 0 and 1 as True and False respectively, it's almost 10x faster 19:24
25 passes instead of just 3
MasterDuke oh damn 19:25
timo setting optimize to 0 there gives just 16 passes
now i'm at 41 passes 19:27
80 19:28
MasterDuke sounds like a rakudo issue is needed 19:29
japhb timo: I'm confused. What are you changing between different runs?
timo well, first i went from "my @!Bits is default(True)" to "my int8 @!Bits" and creating it to size in BUILD 19:30
then i made more and more variables native ints. also, trying uint first, which is slower than int, so changing it to int
japhb I'm weirded out by uint being slower than int (I can imagine UInt being slower than Int, but not the native versions) 19:31
timo i guess we have uint/int conversions all over the place 19:32
we're not exactly smart about that yet
japhb timo: I've also found that we don't handle the final bit correctly for uint64 -- I had to special case in a few places in CBOR::Simple to work around that. 19:34
And when I read through the MoarVM JIT code, it certainly looks like we're playing fast and loose with the difference between int64 and uint64
timo perhaps we're clipping it off somewhere? you'll want to inspect the mbc
indeed
plummerssoftwarellc.github.io/Prim...mp;sd=True 19:36
plummerssoftwarellc.github.io/Prim...mp;sd=True 19:37
MasterDuke sounds like you're well on your way to moving it up that list 19:40
timo interesting, a lisp implementation beats even the fastest C++ and D implementations
japhb Wow, PrimeView is quite possibly the most overengineered tool for that task I've ever seen 19:41
timo well, we're faster than one Clojure implementation ...
i could send this as a pull request and describe it as an improvement over the solution by draco1006 19:43
MasterDuke have you looked at a profile or spesh log to figure out what's up with the --optimize=0 being faster?
timo not yet 19:44
especially since the benefit goes away at higher speeds 19:45
it's also allowed to have multithreaded implementations, would have to read up on what that entails, exactly 19:48
lizmat for reference: it looks like test-t is *not* slower with --optimize=0 19:49
timo it looks like the base rules require either that i use 1 for prime or to generate an array of prime numbers
japhb lizmat: Meaning it's the same speed?
lizmat within noise, yes 19:50
japhb timo: Can you use a buf8, and .allocate all bytes to 1?
lizmat: Huh. That's kinda odd
lizmat I'd even say it is tending to be a tad faster with --optimize=0 19:51
MasterDuke what about with --optimize=off ?
lizmat stage Optimize: 0.009 secds 19:52
the tends to be a bit slower
timo the dockerfile for that solution uses the 2021.04 rakudo star image 19:53
i would suggest that, since it uses no dependencies at all, that we instead give it a base raku image
MasterDuke yeah
timo what's our fastest way to initialize a native int array to all 1s? 19:56
japhb m: my $buf = buf8.allocate(1_000_000, 1); say now - INIT now; 19:57
camelia 0.014089853
japhb I *think* that's near the top
timo okay buf instead of iny @foo 19:58
m: say buf8.allocate(1_000_000, 1);
camelia Buf[uint8]:0x<01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 …
japhb Yeah, Blob has 2-arg allocate, intarray only has 1-arg allocate 19:59
lizmat intarray has allocate ?
m: my int @a; @a.allocate(1000) 20:00
camelia No such method 'allocate' for invocant of type 'array[int]'
in block <unit> at <tmp> line 1
timo i'm down to 20 runs now
japhb github.com/rakudo/rakudo/blob/mast...uf.pm6#L82
Is that better or worse?
timo worse, but i'm not allowed to submit the faster one because its values are inverted 20:01
japhb Ah, interesting
timo somehow i'm getting very strange profiler output 20:02
japhb lizmat: Sorry, CArray was the other one with allocate, braino there
lizmat well, perhaps native arrays should have an allocate 20:03
timo looks like the "last" does it 20:04
20:04 reportable6 joined
timo unfortunately my code allocates a whole boatload of IntLexRef, and so the garbage collector runs a whole lot 20:04
only 6% of total time, but of course setting up and using the IntLexRef objects costs something 20:05
japhb lizmat: I'm not entirely clear on the intended differences between the APIs of Blob, array, and CArray. (I understand that CArray and VMArray are different REPRs, I'm talking purely about the differences in Raku-level API) 20:06
lizmat I wonder whether IntLexRefs could be prevented by a similar trick that is being used for scalar parameters, aka bind directly ? 20:08
japhb In other words, I can't tell what differences are accidents of history and which are intentional design
lizmat I think the differences are just that: accidents of history
japhb Feels like fixing that wants a tracking issue then 20:10
timo now i would love it if my nullOut method, the one that goes through from beginning to end and sets all ultiples of one number to 0, wouldn't allocate IntLexRef objects 20:27
MasterDuke what if you do the trick of creating a variable with the value of 0 and then just assigning that? 20:28
timo using nqp::bindpos_i directly instead of @!Bits.ASSIGN-POS, which is just a null-check against the index followed by a bindpos_i, i go from 20 to 90 20:29
lizmat I have a feeling we could do that null-check in a special dispatch program 20:30
timo hm, potentially using the "guard not equal" after a number check, but how do we emit the number arithmetic there 20:31
lizmat I wouldn't know :-) 20:32
timo using the nqp op got gc runs down to 25 from 10x that amount 20:35
for 90 passes through in the 5 seconds it runs, allocate now takes 1 second, or 1.25 ms per entry. the size we allocate is one million 20:38
so here's a silly idea 20:39
japhb Re: nqp::ops and GC runs -- that's one of the reasons that both JSON::Fast and CBOR::Simple use a non-trivial amount of nqp. 20:40
timo m: my buf8 $a .= new; $a[0, 1, 2, 3, 4] = 1; $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); say $a.elems; say $a.pick(100);
camelia Cannot unbox a type object (Nil) to int.
in block <unit> at <tmp> line 1
timo m: my buf8 $a .= new; $a[0, 1, 2, 3, 4] >>=>> 1; $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); say $a.elems; say $a.pick(100); 20:41
camelia 5===SORRY!5=== Error while compiling <tmp>
Missing << or >>
at <tmp>:1
------> 3y buf8 $a .= new; $a[0, 1, 2, 3, 4] >>=>7⏏5> 1; $a.splice($a.elems, 0, $a); $a.spli
expecting any of:
infix
infix stopper
timo m: my buf8 $a .= new; $a[0, 1, 2, 3, 4] »=» 1; $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); say $a.elems; say $a.pick(100);
camelia 5120
(1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1)
timo m: my buf8 $a .= new; $a[0, 1, 2, 3, 4] »=» 1; $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); $a.splice($a.elems, 0, $a); say $a.elems; say now - INIT now
camelia 5120
0.006869058
timo m: my buf8 $a .= buf8.allocate(5120, 1); say $a.elems; say now - INIT now 20:42
camelia No such method 'buf8' for invocant of type 'Buf[uint8]'. Did you mean
'Buf'?
in block <unit> at <tmp> line 1
timo m: my buf8 $a = buf8.allocate(5120, 1); say $a.elems; say now - INIT now
camelia 5120
0.002985832
timo hm. not very good like this, is it
lizmat nope :-)
japhb timo: You mean your idea of self-splicing?
MasterDuke couldn't allocate just memset a large chunk of memory?
timo perhaps it starts being better at bigger sizes
MasterDuke: we don't have a memset op :D 20:43
japhb timo: Although that would be nice, yes ....
MasterDuke not yet we don't...
timo doesn't even have to be an op now that we have the syscalls mechanism
lizmat but that would be MoarVM specific, no? 20:44
timo yes
i wonder how to get much better than the current implementation of allocate 20:45
perhaps loop unrolling?
it's using lexicals for $i, which isn't optial 20:47
japhb nqp::while?
timo github.com/rakudo/rakudo/blob/mast...uf.pm6#L82
japhb Also, avoiding $i++? 20:48
My fast copy loops look like this: github.com/japhb/CBOR-Simple/blob/...kumod#L407 (or in the 8-bit case, the slightly simplified github.com/japhb/CBOR-Simple/blob/...umod#L427) 20:49
timo param_rp_i r8(1), liti16(2) # [000] bailed argument spesh: expected arg flag 2 to be int or box an int; type at position was IntLexRef
the loop that it turns into is pretty tight, were it not for usage of using lexicals 20:52
getlex, const_i64_16, add_i, bindlex, getlex, lt_i, unless_i, getlex, bindpos, goto 20:54
lizmat PRs to make buf8.allocate faster are welcome :-) 20:58
japhb There's something sublime about thinking 'Meh, 14ns/element is TOO DANG SLOW' 20:59
timo if we get it into moarvm and get it to actually use memset in case we have an 8 bit buffer, we can reach more than that, right? because of vector operations 21:01
japhb (I mean, I don't disagree. But my high-school self trying to speed up 8086 code is grinning from ear to ear.)
Oh certainly. Heck, I'm pretty sure we can do slightly better without even editing MoarVM 21:02
timo if we use 1 bit per element instead of 8 bit per element, that would give us a lot less memory we have to comb through, even though we'll have to do bit-level fiddling 21:04
21:04 linkable6 left, evalable6 left
timo if i'm not mistaken, we'll at most have to have 8 masks no matter what $factor is (for nulling out bits at $factor distance starting at 0) 21:04
21:05 linkable6 joined
timo another 2x can be saved by not even storing what even elements would have 21:05
(my at most 8 masks actually assumes we want to work with every bit, not just every odd one)
OTOH then we still have to make sure the return value is according to the rules 21:06
gotta have a look at some of the faster implementations if they have any tricks
here's a python implementation, it sets zeroes with self._bits[start :: step] = b"\x00" * (size // step + bool(size % step)) 21:09
and self._bits = bytearray(b"\x01") * ((self._size + 1) // 2) 21:10
so they are only saving odd elements anyway?
hold up, they return a Seq that iterates primes, they don't return an array of primes at all! 21:11
that is allowed? 21:12
then i can go back to not having to allocate at all, you know 21:18
japhb Returning an iterator and returning an allocated value are wildly different performance-wise. If that's how other languages are jumping ahead, that's definitely apples and oranges. 21:24
timo github.com/PlummersSoftwareLLC/Pri...issues/630 - relevant to this question, but i didn't read all of it yet and gotta go afk for a bit first 21:25
"the contributing document clearly states that the solutions can return a pre-sieved sieve buffer as one of the options (just that the sieving must occur on every pass of the timed loop as is done here)," 21:26
japhb m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $elems, int $value) { my $blob := nqp::setelems(nqp::create(self),$elems); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i, 1)), $elems), nqp::bindpos_i($blob,$i,$value)) } }; my $t0 = now; B8.allocate(1_000_000, 1); my $t1 = now; B8.allo(1_000_000, 1); my $t2 = now; .say for $t1-$t0, $t2-$t1; 21:27
camelia 0.010042925
0.0084734
japhb m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $elems, int $value) { my $blob := nqp::setelems(nqp::create(self),$elems); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i, 1)), $elems), nqp::bindpos_i($blob,$i,$value)) } }; my $t0 = now; B8.allocate(10_000_000, 1); my $t1 = now; B8.allo(10_000_000, 1); my $t2 = now; .say for $t1-$t0,
camelia 0.098266638
japhb $t2-$t1;
GAH 21:28
m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t0 = now; B8.allocate(10_000_000, 1); my $t1 = now; B8.allo(10_000_000, 1); my $t2 = now; .say for $t1-$t0, $t2-$t1; 21:29
camelia 0.097264776
0.081980577
japhb Not like a doubling of speed, but at least 2-digit %age 21:33
timo don't forget to swap them around to make sure warmup and such aren't distorting the results
MasterDuke i would time those two in two separate invocations
timo in favor of the second impl 21:34
yeah, or that
japhb m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t0 = now; B8.allocate(10_000_000, 1); my $t1 = now; .say for $t1-$t0;
camelia 0.094383851
timo probably have the same code for both cases so compilation doesn't make a difference either. just have like an if/else that decides at run time which one to pick
japhb m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t0 = now; B8.allo(10_000_000, 1); my $t1 = now; .say for $t1-$t0;
camelia 0.087945616
timo if we can take this speedup to the core, so be it :) :) 21:35
japhb Yeah, I think there's still a valid difference there. 21:36
timo can you compare the speeds at different array sizes as well?
japhb timo: Effect is there for both 1_000_000 and 10_000_000; much smaller is getting down to noise levels; how much bigger are you thinking of? 21:38
s/Effect/Improvement/
m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t0 = now; B8.allocate(100_000_000, 1); my $t1 = now; .say for $t1-$t0; 21:40
camelia 0.877247417
japhb m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t0 = now; B8.allo(100_000_000, 1); my $t1 = now; .say for $t1-$t0;
camelia 0.833379933
22:05 linkable6 left
timo i was thinking smaller; can a easurable difference be teased out by repeating it a hundred times or whatever? 22:05
22:06 linkable6 joined 22:07 evalable6 joined
timo if we have a new, faster loop teplate, this could be spread all over the core setting 22:12
95 with japhb's allo method 22:51
did i do something wrong? lmao
19 when i go back to [] instead of ASSIGN-POS and AT-POS 22:53
MasterDuke [] isn't being inlined?
timo i think it was 23:02
let me look again
ok, [] is inlined, but ASSIGN-POS isn't inlined 23:06
# [012] could not inline 'ASSIGN-POS' (4081) candidate 0: bytecode is too large to inline 23:19
670 whole bytes 23:20
there's three dispatches in there that have all never been dispatched, that's all from the bounds checking and exception throwing in the error case 23:22
it also contains a dynamic variable lookup to give something more interesting than "Index" in some cases
i'm trying a private method that creates the failure now, see if that makes the bytecode small enough 23:28
# [013] inline-preventing instruction: param_rp_o 23:34
# [012] could not inline 'ASSIGN-POS' (4082) candidate 0: target has a :noinline instruction
*sigh* lmao
param_rp_i r3(1), liti16(1) # [000] bailed argument spesh: expected arg flag 1 to be int or box an int; type at position was IntLexRef 23:36
^- this most probably caused that
would be great to have more control about when refs are passed, especially when we're in the core and calling other stuff from core 23:41
japhb timo: I had to deal with some $RL stuff (BAK now), but there is another variant of the loop I wanted to try, I'll post in a couple minutes. 23:46
timo d'oh, the frame is small enough now of course, but the argument passed to ASSIGN-POS was the issue that prevented inlining also 23:47
japhb Ah dang, the other method I wanted to try is slower, sigh 23:51
m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t0 = now; B8.allo(100_000_000, 1); my $t1 = now; .say for $t1-$t0; 23:55
camelia 0.832321031
japhb Dang it, not what I meant to do
m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t = now; B8.allo(100_000, 1) for ^1_000; say now - $t; 23:56
camelia 0.854373476
japhb m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t = now; B8.allo(1_000, 1) for ^100_000; say now - $t;
camelia 0.782978554
japhb m: use nqp; class B8 does Buf[uint8] is repr('VMArray') is array_type(uint8) { method allo(Blob:U: int $e, int $v) { my $b := nqp::setelems(nqp::create(self),$e); my int $i = -1; nqp::while(nqp::islt_i(($i = nqp::add_i($i,1)),$e), nqp::bindpos_i($b,$i,$v)) } }; my $t = now; B8.allocate(1_000, 1) for ^100_000; say now - $t;
camelia 0.898699618
japhb timo: ^^ # Yep, difference between my tweak and core's is larger for looping over smaller allocations. 23:57
23:57 squashable6_ left
timo cool 23:58
cool 23:59
23:59 squashable6 joined