00:29 MasterDuke joined 00:39 quotable6 joined, tangible6 joined, releasable6 joined, coverable6 joined, nativecallable6 joined, bloatable6 joined, committable6 joined, bisectable6 joined, unicodable6 joined, greppable6 joined, statisfiable6 joined, benchable6 joined, evalable6 joined, squashable6 joined 01:00 lizmat joined 01:35 evalable6 joined 02:05 MasterDuke_ joined 02:40 lizmat joined 02:55 ilbot3 joined 03:41 evalable6 joined 09:29 lizmat joined 09:51 domidumont joined 09:56 domidumont joined 12:16 robertle joined 12:59 benchable6 joined
timotimo it occurs to me that if we know the types of our p6 bigints, i.e. Int, we could spesh the "get_boxed_ref" part out of all the bigint ops 13:10
it's not responsible for a very large chunk, but it's big enough for my tastes 13:14
16 billion incl for the whole MVM_bigint_add and 702 million incl for get_boxed_ref
of course when we have our jit be very smart about bigint calculations, we might compile a path that doesn't even allocate Int objects at all? 13:15
jnthn With EA we can at least "allocate" them on the stack 13:18
timotimo right 13:19
in this benchmark a significant chunk of time is spent in mp_set_long :\ 13:20
6 billion Ir out of the 16 13:21
and out of those 6 billion, 5 billion are in mp_mul_2d
bbl 13:22
13:29 evalable6 joined
nine So a MVM_SPESH_ANN_DEOPT_INLINE annotation is really just a "the inlined code had _any_ kind of deopt annotation" marker? 13:38
That means the annotation of the original code could have been a MVM_SPESH_ANN_DEOPT_ALL_INS, i.e. the goto carrying the annotation used to be an invoke_*. In that case the real case to look at is the MVM_SPESH_ANN_DEOPT_ALL_INS one, i.e. entering an inlined BB. 13:44
dogbert17 doc question: what would be a good, one line, description of the Encoding class. Currently we don't have any 14:13
nine Another question: does it really matter what I do? A no_op should already be much better than a goto and will be turned into literally nothing by the JIT. So trying to get rid of it doesn't sound like useful use of my time. 14:18
lizmat nine: could it be that Inline::Python still depends on Panda? 14:34
nine lizmat: I wouldn't be surprised 14:35
MasterDuke nine: don't you only get failures in a very few spots? how hard is it to benchmark no_op vs removed entirely? 14:36
jnthn nine: Well, it's kicking whatever deeper problem exists here down the road for you or somebody else to solve, I guess... 14:37
nine jnthn: and the point of the exercise is actually for me to learn about inlining in spesh. And the no_op trick doesn't work in all cases anyway... 14:38
jnthn Ah, if it doesn't work in all cases, it's less attractive.
nine jnthn: what really scares me is that you meant this to be a kind of beginner exercise as an entry for me. How hard will my actual goal be then?? 14:39
Geth MoarVM/inline_in_place: 4b25ee8603 | (Timo Paulssen)++ (committed by Stefan Seifert) | 4 files
Put inlined blocks between their caller and its succ

Previously inlined callees were added to the end of the basic block list. We now put the inlined blocks into the list at the position of the invoke op. However we cannot yet get rid of the goto ops entering and exiting the inlined code as that would lead to odd bugs possibly related to the annotations on these ops.
MoarVM/inline_in_place: b4b8c5765e | (Stefan Seifert)++ | src/spesh/optimize.c
Turn inline_end annotated pointless gotos into no_ops

We can't remove them completely as that causes weird deopt issues. But turning them into no_ops should give almost the same benefits without the cost.
nine I pushed it anyway as it documents the findings in a way.
jnthn You'd not be alone in putting things off for later in spesh. A good while back I marked lastexpayload :noinline to avoid having to debug the issues inlining it brought up. Only in the last week or so did I fix one, golf another into something in the JIT that brrt++ fixed, and there's still one more issue to hunt down. 14:41
And the issues rarely show up in nice, simple, isolated examples. 14:42
As to how hard things are - this tends to hinge on whether in the course of the task, something turns up that makes an in-theory straightforward change expose some other problem, and potentially an existing bug. 14:47
Which I guess is the issue here
bbiab 14:48
14:48 evalable6 joined
timotimo so anyway, this for loop code, it passes the 32bit boundary if you set $n to 2047 (surely a conicidence!!), so the default of 10⁴ is spending the vast majority of its time doing big int math for under 64 bit numbers 15:14
er, it passes the boundary if you set it to 2048, 2047 is the last one below the boundary 15:17
15:45 zakharyas joined 15:50 zakharyas joined 16:22 evalable6 joined 16:35 domidumont joined
dogbert17 added some, possibly interesting, info to the thread on github.com/rakudo/rakudo/issues/1202 16:46
20:40 Ven joined 21:00 MasterDuke joined
dogbert17 interesting, stresstesting on a 64 bit VM uncovers problems I don't see on my 32 bit VM 21:03
here's one example, gist.github.com/dogbert17/cebea492...1a183813fc 21:07
21:13 Ven joined 22:11 MasterDuke joined 22:30 unicodable6 joined