07:37 soverysour joined 07:42 soverysour left 09:22 rakkable left, rakkable joined 09:47 [Coke]_ joined, [Coke] left 11:34 soverysour joined 11:42 soverysour left
disbot2 <librasteve> yeah, I get that this is not at the raku level … hard core FP folks will fail Raku for that (rightly) … really Raku(do) is a GC language with a Functional attitude. Which is a fine thing btw since the GC will recover the memory anyway if you have very complex recursive play with eg multiple objects. 11:48
timo with non-tail-recursion-optimized code you're accumulating stack frames which hold on to references to objects, so the GC will not recover the memory of those objects and also not the memory of the stack frames 11:58
with a language that has procedural features like loops, the lack of tail-call optimization is not such a big issue 12:21
in a language where you have to loop by doing a call to the loop body from the end of the loop body, you've got a problem when there isn't a tail call optimization that can throw away the stack frames from previous iterations 12:22
in a language where you have a syntactic difference between loops and recursion you may be very surprised when your compiler or runtime decides that your call was actually an optimizable tail call and suddenly your stack traces have what looks like unexplainable holes in them 12:24
C is going to get explicit tail calls with a syntax "return goto" (or "goto return"?) where you explicitly opt into the semantics of a tail call where your previous stack frames disappear 12:25
now the interpreter loop, especially with computed goto, is already very similar to what the tail call version of the interpreter looks like, but the important bit is apparently the compiler has a lot less difficulty optimizing the little snippets instead of having to optimize the huge function that has a boatload of internal labels and jumps in it 12:28
interp.c is already the longest file to compile by far in some versions of some compilers
like, a ridiculous amount of time spent compiling just that file in some circumstances
though i think there used to be a compiler bug (maybe it was clang?) that made it especially bad, which was presumably fixed some time ago? maybe i'm misremembering 12:29
disbot2 <librasteve> thanks for the detailed explanation… I’m going to study it carefully 13:29
timo sure, do feel free to ask follow-up questions, too 13:33
i tend to write these explanations in the hopes of other readers of the channel also benefitting even if they don't speak up, so more questions from readers - especially those not intimately familiar with the details yet - are also useful 13:34
disbot2 <librasteve> =b
13:51 soverysour joined, soverysour left, soverysour joined 13:56 soverysour left
disbot2 <librasteve> So my initial view that GC recovers the memory anyway is incorrect (recursion uses stack memory, GC recovers heap memory). That said, since Raku has a procedural loop syntax (unlike pure FP I infer), then it does not suffer the stack bloat that tail recursion solves. All that at the Raku level, nothing to do with MOARVM internals. 14:12
timo yeah, that sounds about right 14:18
turning the interpreter loop from the "big function with labels that we goto to" into "a bunch of functions that tail-call into each other" has given good performance improvements in other projects such as CPython 14:22
primarily by allowing the C compiler to do a better job
Geth MoarVM/tail_call_interpreter: 5d286163f6 | (Timo Paulssen)++ | 4 files
Split MVM_CGOTO into "has feature" and "use for interp"

so that we can have interp.c as musttail and program.c as Computed Goto
14:31
15:11 [Coke]_ is now known as [Coke]
timo lizmat: can you test if this commit ^ gives you a performance improvement over main now? 22:39
Geth MoarVM/spesh_str_eq: b2983533ea | (Timo Paulssen)++ | 10 files
Spesh-optimize string comparisons against spesh-time-known strings

Put the length and cached hash code of the constant string into the op as arguments and use that to quickly reject strings that don't match.
In case the constant string is 8 characters or shorter and the graphemes fit into 8bit storage, we drop the usage of the constant string's register completely and instead store the content as a 64bit integer constant in the op arguments.
23:34
timo 44.774 +- 0.647 seconds time elapsed --> 43.949 +- 0.206 23:43
not sure it's a decisive improvement