Geth Lingua-Number/main: e356b64237 | (Elizabeth Mattijsen)++ | 14 files
Prepare for release in zef ecosystem using mi6

Alas, the original tests are failing, so it's **not** ready yet to be actually released
Lingua-Number/main: f578aec4ec | (Elizabeth Mattijsen)++ | .travis.yml
Not doing Travis anymore
lizmat These timings feel counter-intuitive: 10:32
m: sub a(str $a) { $a ~ $a }; a("foo") for ^10000000; say now - ENTER now
camelia 0.775085136
lizmat m: my constant &a = -> str $a { $a ~ $a }; a("foo") for ^10000000; say now - ENTER now
camelia 2.8015152
lizmat almost 4x as slow?
slightly better performant: 10:33
m: my constant &a = { $_ ~ $_ }; a("foo") for ^10000000; say now - ENTER now
camelia 2.302535727
lizmat but still... ? 10:34
I wonder whether that's a dispatch deficiency
also remarkable: 10:36
m: my constant &a = { .lc }; a("foo") for ^10000000; say now - ENTER now
camelia 0.56984057
lizmat m: my constant &a = -> $_ { .lc }; a("foo") for ^10000000; say now - ENTER now 10:37
camelia 0.695784301
lizmat m: my constant &a = *.lc ; a("foo") for ^10000000; say now - ENTER now
camelia 0.577406613
lizmat another weirdo: 11:06
m: my str $a = "a"; my &b = * ~ $a; say b "b"
camelia ba
lizmat m: my str $a = "a"; my constant &b = * ~ $a; say b "b"
camelia Lexical with name '$a' has wrong type. real type 8 wanted type 7
in block <unit> at <tmp> line 1
nine lizmat: look at the generated QAST 11:09
Pointy block lookups up $_ in outer lex. I bet that's what makes it slow 11:10
finanalyst lizmat: sorry to distract you. I sent an email with a glitch. No problem if you dont want to be distracted 11:15
lizmat nine: but if $_ is in the sig, why would it need to lookup in the outer lex? 11:17
finanalyst: looking :-)
ab5tract m: Q| my str $a = "a"; my &b = * ~ $a; say b "b"|.AST.EVAL 11:20
camelia ba
ab5tract Ooof
lizmat m: Q| my str $a = "a"; my constant &b = * ~ $a; say b "b"|.AST.EVAL 11:21
camelia ===SORRY!===
Unknown compilation input 'qast'
ab5tract lizmat: did you already file a bug report for that one? Because that needs to be dug into for sure 11:22
lizmat ab5tract: don't bother this until nine has merged the BEGIN work branch
ab5tract Ok :)
lizmat ab5tract: no, didn't do any issues yet
Geth rakudo/main: 46511d59cb | (Elizabeth Mattijsen)++ | src/core.c/RakuAST/LegacyPodify.rakumod
RakuAST: fix podifying issue spotted by finanalyst++
ab5tract m: my constant &a = { .lc }; a("foo") for ^10000000; say now - ENTER now
camelia 0.588587184
finanalyst Thank you
lizmat finanalyst: you're welcome: if all bugs where these little ones :-) 11:25
nine lizmat: which branch? 11:26
lizmat the branch jnthn started and you worked on since ? 11:27
the one that started with 0 tests passing and now something like 7 files still failing ? 11:28
nine ah, yeah 11:32
Geth rakudo/lizmat-Block.WhateverCode: 7b89fb2153 | (Elizabeth Mattijsen)++ | src/core.c/Block.rakumod
Introduce a Block.WhateverCode coercer

This is really just a quick implementation of an idea wrt performance. Since WhateverCodes cannot have phasers, they can be treated differently in .map / for loops taking a simpler path. However, from the grammar it is impossible to create slightly more complex WhateverCodes, even though they are actually quite simple, e.g.: { $_ ~ $_ } .
rakudo: lizmat++ created pull request #5596:
Introduce a Block.WhateverCode coercer
ab5tract Do you reckon that RakuAST will allow us to easily bypass laziness in the case of .IO.lines.elems? 14:14
And - importantly - if so, how?
Asking because right now it reportedly segfaults or otherwise dies at 4 million lines or fewer 14:16
lizmat ab5tract: I don't see how we could produce .elems without reading the whole file? 16:19
ab5tract But presumably the lazy version of reading the whole file is much slower than the eager one? 16:20
Either way, we can’t actually do the thing at the moment
lizmat .IO.slurp.linnes.elems tends to be a little bit faster, if the file is not like humongous 16:22
ab5tract But what about when the file is humongous? Why are we unable to do the same thing that q:x( wc -l ) can easily accomplish? 16:24
My impression was that it was the overhead of lazy evaluation. Would be happy to learn I’m wrong and now how to proceed in addressing it 16:28
lizmat NFG
ab5tract So we are doomed to be incapable? 16:29
lizmat to mimic wc -l, we could use .read().indices("\n".ord) ? 16:30
thus bypassing NFG ?
ab5tract Excellent! So that circles back to my original question: is taking that approach potentially unlocked by RakuAST? 16:33
I guess I’m asking: what does a RakuAST-based optimizer look like? 16:34
lizmat that's a good question I'm not sure yet
it all depends on how much we want to keep of the current status optimize stage 16:35
some timings on a 810 MB text file: 16:41
say "sixteenth.txt".IO.slurp.lines.elems # 17.7 seconds
say "sixteenth.txt".IO.lines.elems # 38.65 seconds 16:42
wc -l sixteenth.txt # 0.86 seconds 16:43
and apparently we don't have an .indices on Buf :-(
ab5tract Even if the optimizer did something outrageous and use Q:x to call wc, it would be a massive win… 16:55
nine: do you have any envisionings re: what a RakuAST-based optimizer would look like, or when it would be appropriate to break ground on one? 16:58
lizmat except that wc is not a thing on Windows ? 17:36
ab5tract then presumably we would fallback or use an alternative… 17:49
Shelling out to wc is an outrageous approach anyway, but presumably feasible for a RakuAST based optimizer 17:51
lizmat indeed 17:57
nine Laziness is a feature, not a bug. That's even more true for things like .IO.lines.elems. You absolutely don't want to read the whole file into memory first, then construct an array with a string for each line just to count the array's elements. wc doesn't do that either. 18:01
Even if all of that were not the case, the static optimizer couldn't do anything about this, unless it could prove that the object you call .IO.lines.elems on is just a string or at least a type with an IO method of which we know that it will return an IO::Path from the setting. 18:02
Actually the problem seems to be that we are not lazy enough 18:05
ab5tract I’m not talking about doing the wrong-headed thing. I’m asking whether we can ever unlock doing the right-headed thing 18:12
nine Just make GetLineFast a PredictiveIterator and give it a count-only method 18:13
lizmat :-) 18:14
nine Or have Seq.elems *not* cache the sequence
lizmat yeah, I tried that once...
nine That seems weird anyway. A Seq is supposed to be read only once
lizmat well, the amount of code that breaks is impressive if you do that :-) 18:15
nine The amount of code that *is* broken 18:16
lizmat the problem is really that a lot of code expects core methods (that return a Seq) to cache stuff 18:17
also Bool
unless the iterator is a PredictiveIterator, calling .Bool on a Seq will turn on caching 18:18
if $seq { say "start"; .say for $seq }
nine That just looks like a broken pattern 18:28
lizmat if foo -> $seq { .say for $seq } 18:29
fwiw, I *think* we could actually handle the Bool case better: by replacing the iterator in a Seq by a wrapper iterator that would first produce the first value, and then replace itself by the original iterator on subsequent fetches 18:30
nine Well for .Bool Seq would only have to cache the very first value. But I really don't see the point of .elems caching. If you call .elems on a Seq it's pretty clear that it must be consumed. 18:38
lizmat to us it is, to many Rakoon it isn't :-( 18:39
nine Then that's just one of the things they have to learn. I am not aware of any other languages where iterators try to cater to people who don't understand what an iterator is. You learn the concept once and then you can reap the benefits. 18:40
ab5tract Weren’t iterators introduced years after Seq? 18:41
lizmat here's the roast fallout of making Seq.elems *not* cache 18:53
and that's just roast
pretty sure the ecosystem fallout would be much bigger
ugexe would it be possible to know when e.g. ecosystem code is used in a way to suggest it needs the sequence to be cached and to throw a warning? 19:06
in other words: is it possible for rakudo to tell me if i'm doing this in any of my code?
lizmat in RakuAST we might be able to 19:13
[Tux] Rakudo v2024.05-27-g46511d59c (v6.d) on MoarVM 2024.05-5-gf48abb710
csv-ip5xs0.270 - 0.271
csv-ip5xs-201.134 - 1.208
csv-parser1.555 - 1.639
csv-test-xs-200.142 - 0.142
test1.950 - 1.958
test-t0.425 - 0.433
test-t --race0.273 - 0.277
test-t-205.091 - 5.214
test-t-20 --race1.234 - 1.249
tux.nl/Talks/CSV6/speed4-20.html / tux.nl/Talks/CSV6/speed4.html tux.nl/Talks/CSV6/speed.log
ab5tract If caching a sequence is so wrong, why was it implemented in the first place? 19:45
lizmat the answer is the PositionalBindFailover role 19:46
to make $seq[42] work
nine Well if you're missing a .cache we already tell you. After all it's missing when you actually do try to read the Seq again. The error contains "(you might solve this by adding .cache on usages of the $kind_name, or by assigning the $kind_name into an array)" 19:50
ugexe if that was catching the aforementioned uses then it wouldnt break the roast or ecosystem 20:02
because they would have already been giving that error 20:03
