Geth MoarVM/atomic_accesses_to_spesh_arg_guard: 681eb074bf | (Timo Paulssen)++ | 8 files
Make accesses to spesh arg guard atomic

This is an experiment to make moarvm freezing on ARM debian machines a thing of the past.
The theory is that other cores see the pointer to the new arg guard tree before they see all the data inside it, ... (6 more lines)
14:35
timo just a line i missed, force-pushed
lizmat so, what to think of a piece of code that fails on Windows, but succeeds on Linux and MacOS 15:16
and the way to make it succeed on Windows is to remove "constant" from 'my constant %foo = ....'
github.com/lizmat/Text-Emoji/actio...5595871721 15:18
timo can you get RMD turned on in that runner? 15:20
lizmat hmmm... lemme see 15:24
RAKUDO_MODULE_DEBUG=1 right?
timo i think so
lizmat can't do it in the yaml, trying another approach 15:31
WTF, with RAKUDO_MODULE_DEBUG, it succeeds :-( 15:34
timo hahaha 15:36
well, could be nondeterministic
lizmat yeah, I wonder if it is some race condition that additional RMD processing is preventing from happening 15:37
timo could be related to accidental output during compile time to stdout or stderr? i think only when RMD is set do we take output from the precompilation process?
lizmat argh
no, looked at the wrong one :-(
no visible difference
argh, misspelled RMD 15:39
timo number and order of environment variables can have an effect on program behaviour and performance, unfortunately 15:40
lizmat github.com/lizmat/Text-Emoji/actio...5597592922
the absence of a name in line 802 "( )" is weird: that should be Text::Emoji 15:43
timo > RMD: Precompiling C:\Users\RUNNER~1\AppData\Local\Temp/.zef.1736869173.4736\1736869175.4736.9404.634034198098\sources\9D3785E74CB5300C1C8530677B2A45CF25C84F2A failed: 3221225725 15:44
3221225725?
is that like, the return code of the subprocess?
lizmat I haz no idea
timo m: say 3221225725.base(16)
camelia C00000FD
timo there we go, that's helpful
stack exhaustion. possibly an infinite recursion inside C code (since our raku-level stack actually lives on the heap) 15:45
try turning spesh off please
lizmat MVM_SPESH_DISABLE=1
right?
timo yep
spesh is one of the not terribly many things in moarvm that can actually recurse 15:46
the patch to limit recursion in spesh went in the release that you're using there, didn't it? 15:47
| * | a041dd10f - spesh: don't deeply recurse into inline attempts (3 months ago) <timo> 15:48
linkable6 (2024-10-28) github.com/MoarVM/MoarVM/commit/a041dd10f2 spesh: don't deeply recurse into inline attempts
lizmat Welcome to Rakudo™ v2024.12. 15:49
so... :-)
timo it did make it in it looks like
maybe there's something else, then.
in any case, good to have something that cleanly reproduces
lizmat added a command-line argument to the runner for convenience :-) 15:50
timo yee-owch 15:51
not a beautiful error messages there
lizmat github.com/lizmat/Text-Emoji/actio...5598260703
well, I think it's a symptom from the other thing we say: no module name 15:52
argh
timo why is it trying to generate the usage there
lizmat yeah 15:53
timo disable-spseh vs spesh-disable?
lizmat yeah :-(
timo the syscall stat-flags or stat-is-executable might have a bug on windows 15:55
those are per-platform
lizmat but that would be noticeable on all Windows CI jobs, wouldn't it ? 15:56
timo MVMint64 n = MVM_string_index_from_end(tc, stat_obj->body.filename, tc->instance->str_consts.dot, 0);
lizmat it's only this particular module, that has two large constant
github.com/lizmat/Text-Emoji/actio...5598526897 15:57
same error as before
timo this here is quite possibly unrelated to the failure we were looking at before
and only causes trouble when generating the usage in some cases
lizmat argh... I should stop coding :-(
timo hm? 16:00
out of time?
lizmat nah... too many stoopid errors
timo turning spesh off fixed the issue? 16:01
lizmat passes with spesh disabled
yup
timo ok, please try with MVM_SPESH_INLINE_DISABLE=1 but not SPESH_DISABLE
lizmat fails with inline disable 16:06
timo interesting. then it's not that 16:11
like, it's not another bug somewhere in inlining that i missed
if we turn spesh logging on, is it possible to get the spesh log out? maybe only the last couple thousand lines?
lizmat lemme see if I can put that in 16:12
MVM_SPESH_LOG=filename right
timo yes 16:14
we need like the bat signal for windows using raku devs 16:15
lizmat heh
would you like the spesh log to be shown always if specified, or just if there were failures ? 16:17
timo this would only be for when you `raku run-tests --with-speshlog` or something? 16:18
lizmat yes
timo probably fine for "just if there was a failure" 16:19
lizmat 2000 lines ?
timo not sure. probably fine 16:20
lizmat say $spesh-log.lines(:!chomp).tail(2000).join; 16:21
timo OK
aha 16:23
lizmat there;s your spesh log
timo 1) that looks terrible, 2) we need more output
and not just a little bit more 16:24
lizmat all?
timo 3700 more lines just to get to the next interesting bit. maybe the last 20k lines would be enough. but i don't know what the log will look like exactly.
lizmat 20K coming up 16:26
timo what the fuck is happening 16:27
lizmat lemme remove the installing test 16:28
timo oh. 16:29
goodness ...
try setting MVM_SPESH_NODELAY=1 and/or MVM_SPESH_BLOCKING=1 on your own machine and try running the same thing
hm. no crash at least. 16:30
lizmat with MVM_SPESH_LOG you mean? 16:31
doesn't crash for me either
I wonder if disabling the JIT would make a difference
timo no need for the spesh log. my intuition was maybe wrong 16:32
i find it maybe not a great idea to spesh functions where we have over nine thousand arguments 16:33
lizmat especially at compile time :-) 16:36
timo well, that's not so much an issue i don't think
we can maybe just disable the slurpy arguments optimization for "way too many arguments" 16:37
lizmat disabling the JIT doesn't fix
well, that code at compile time will never get hot, as it's only executed once ?
timo yeah, without more output from the spesh log i can't tell why it got hot enough 16:41
btw especially when precomp is involved, we might possibly not be looking at the right spesh log file. if you put a %d in the filename it'll make one per PID instead of having any additional moar processes overwrite it 16:43
lizmat that feels like a good addition :-) 16:44
timo it's probably a lot of work to get a "custom" moar build into the github runner? 16:45
turning spesh off for now could be an option. maybe only on windows, too. 16:46
lizmat the version of Moar comes with the docker image, afaik
if there'd be a Docker image with the most current version of everything and debugging enabled and such 16:47
then it would be a simple change in the yaml ?
timo i think docker on windows can only run linux stuff inside containers
we're just downloading and extracting rakudo.org/dl/rakudo/rakudo-moar-2...4-msvc.zip 16:48
lizmat not sure how that is specified in the yaml 16:50
timo no, it's part of the setup-raku action implementation i assume 16:51
SETTING::src/core.c/List.rakumod:1488 is not doing things optimally for sure here
lizmat looks
timo ooooh
it has to check if there is any Slip in there 16:52
that sounds like we want to do an optimization in the QAST Optimizer of rakudo and dispatch to a different candidate of infix:<,> or even put the stuff in line without a call entirely
hm, do we really have to copy the stuff from the vmargarray? can we nqp::splice instead of loop and bindpos? 16:53
ah i misread
lizmat that code was written by a lot less experienced /me 16:54
timo we all learn :)
lizmat ah, and that's being called with the large constant of course...
hmmm
timo it might actually be being called by the optimizer at compile time 16:55
OTOH, building the list with infix:<,> to then make a Hash out of that is also wasteful in the first place
lizmat ok, lemme see if I can tweak that
so what would be the best way to do that? 16:56
timo not sure if we have something for that at the moment
lizmat I don't think so either :-)
I could change it to make a lot of %h<foo> := value statements
since that constant hash is generated from JSON
timo could even just parse the json at precomp time and only put the resulting Hash in the result of the constant bla = do { ... } 16:57
theoretcially, no need to go through raku code at all
lizmat well, the JSON hash is not in the correct format 16:58
timo the do block can do the necessary precomputation as well, or is it in a very wrong format?
lizmat it has way more information than needed
[ 16:59
{
"emoji": "😀"
, "description": "grinning face"
, "category": "Smileys & Emotion"
etc etc ect
timo OK
how about code that looks like one string constant for keys, one string constant for values, then two calls to .split() to generate two lists and shift() both lists in lockstep and assign the results
and all that happens inside the constant's do block 17:00
no need to generate code for a frame that has a couple thousand assignments in it
still not sure what actually causes the stack overflow on windows 17:02
lizmat well, spesh, as with it disabled it doesn't crash ? 17:03
timo well, yeah, but spesh is big :)
.o( oooh big spesh )
lizmat since this is Windows only, I wonder C-compiler setting for stack size ?
timo there is a setting for initial stack size that we had to bump up for musl-libc long ago
that's the initial stack size of additional threads 17:04
lizmat ok, lemme rework the initialization without a large list
timo ah the spesh log actually gives me an idea 17:05
i wish i could just get a debug session, or just a static debug printf into this stupid worker 17:09
i'd like to see if the failure / error code with speshlog was still 3221225725, so could you turn spesh log and RMD back on? 17:14
lizmat ah, I just reworked it without the list 17:22
timo is it better?
lizmat without the long list it doesn't crash on Windows
but takes siginificantly longer to compile 17:23
timo we can postpone the debugging. we'll want to have a branch or tag or maybe just the commit sha1 so we can reference it from a github issue
lizmat ok, going to stash that
and then run with with speshlog and rmd back on
timo would you like to give the "long string + split + shift" suggestion a try? 17:25
lizmat no, because that can't work 17:26
timo can't?
lizmat some emojis have several aliases
so split won't cut it
timo can just put the same emoji in the list multiple times 17:27
lizmat github.com/lizmat/Text-Emoji/actio...5604124480 17:29
timo m: say 3221225725.base(16) 17:30
camelia C00000FD
timo right. ok. 17:31
hm. so it actually succeeded in compiling and then before doing anything else with spesh crashed?
lizmat looks like
timo can i get one with SPESH_BLOCKING as well maybe?
lizmat I wonder: would it help if I ran another script *without logging* before so it would get precompiled ? 17:32
sure
timo might help, but depends on what exactly we're trying to do right now; do we a) want to get your module to compile reliably also under windows, b) figure out what exactly causes the crash and maybe how to reproduce it outside of windows, c) something else? 17:35
lizmat b I'd say
because this could be a symptom of a deeper problem on Windows ?
I can tell windows users to install that module with MVM_SPESH_DISABLE=1 17:36
and I can make it CI clean with --spesh-disable in the yaml 17:37
github.com/lizmat/Text-Emoji/actio...5604553576
dinner& 17:47
timo ok. yeah, i'm not sure why it'd crash at that moment, though it's possible there's still text in the internal buffer that the process has for the spesh log that haven't been flushed to the file at the time of the crash 17:51
lizmat doesn't it flush immediately ? 18:07
timo no, there's explicit flush statements, and much of the code uses a custom little buffer appender thingie to construct the strings for parts of the spesh log 18:13
lizmat ah, too bad... so no black box :-( 18:21
timo and anything to change that also has to go through some kind of build cycle to even arrive at the github actions runner
lizmat right 18:23
lizmat timo: re infix<,>, looked at reducing bytecode size of it, but it appears to always be above the limit for inlining 19:51
so there's little point in trying to shift code around
timo OK 19:52
well, we do generate bytecode in spesh based on the actual args passed for slurpy arrays right 19:55
if the callsite has a load of arguments, it'll certainly be above the threshold already by itself
lizmat also for (|) signature ? 19:56
timo yes 19:57
lizmat how can it now at compile time whether it'd be 2 or 2000 args ?
*know 19:58
timo that's at spesh time
we spesh based on a given callsite
lizmat aahh... ok *phew* :-)
so maybe spesh should give up after X ?
timo yeah, i'd put a limit. can be high, like 2000 or so
lizmat why that high? I'd say these cases would occur generally only once in the lifetime of a process, or at compile time 19:59
which I guess is the same :-)
timo right, it would be interesting to see at what point it's beneficial to generate the code and when it's better to keep the regular slurpy positional / slurpy named op as is 20:06
for small values it can be interesting since spesh then knows a little bit about what's in the hash or array 20:07
lizmat fwiw, would it be useful at that level if there were an immutable version of nqp::list ? 20:11
timo not sure. escape analysis can at this point recognize when the array or hash has been created and not modified and not yet passed to any other places 20:20
lizmat ack 20:26
timo we don't do much with that yet, i had a branch runcy_funcy_optimizations which did a little bit more with the arrays and hashes that came from slurpy ops, but not very much 20:43
lizmat sanity check: 20:56
1. nuke install dir
2. build Rakudo from scratch
3. goto zef dir, and do `raku -I. bin/zef install . --/test`
4, rename .precomp to .precomp-old 20:57
5. go back to rakudo root, nuke install dir
6. build Rakudo from scratch (with --gen-moar --make-install BTW, in both cases) 20:58
7 goto zef dir, and do `raku -I. bin/zef install . --/test`
after that, should the binary files in .precomp and .precomp-old be identical ?
in my case, they are not :-( 20:59
timo yeah rakudo is currently not reproducible 21:02
the .moarvm files of rakudo are part of the dependency lists in the precompilation results 21:03
so their own hashes should also change
lizmat I thought nine had done a lot of work to make them reproducible ? 21:05
(or nine_)
timo yep, we're about one step away from reaching the goal
i believe nine did a lot elready, i added a bit more a few months ago 21:06
lizmat I see
timo there's still a case where a few frames in - i think core.c.setting? - change their order 21:07
lizmat it is *very* annoying, because sometimes if I don't nuke the .precomp in the zef dist, it crashes trying to install zef 21:08
timo so if a precompiled file were to attempt to call any of the frames that have changed their index between one version of this file and the other
lizmat yup big bada boom
timo yeah that sounds bad, indeed 21:09
lizmat fortunately, not many users run this scenario :)
but I do, a lot :(
timo i think we might have the case that the blib/ or runtime/ moarvm files which don't have a sha1 in their name or path, they just get used whatever they are 21:10
and there is no check for compatibility there
so if you overwrite them by doing a rebuild and make install, then it just blindly uses the file
then runs into some change and explodes
instead of realizing up front that the dependency isn't valid any more in order to do a re-precompilation
lizmat yeah, feels like something like that 21:12
timo i have a prototype for exploding more helpfully, but it's not baked through yet 21:16
lizmat afk& 21:37