01:30 librasteve_ left
[Coke] the inline::perl5 failure is reproducible on rakudo HEAD on mac os, even. 02:05
05:19 ShimmerFairy left 05:30 ShimmerFairy joined 07:00 sivoais_ left 07:02 sivoais joined 07:03 Pixi left, ShimmerFairy left
patrickb [Coke]: What are your exact steps for reproduction? 07:27
(it doesn't show for me on MacOS)
07:35 melezhik joined
melezhik . 07:35
09:44 melezhik left 11:13 Pixi joined 11:16 ShimmerFairy joined
[Coke] rakubrew build head. rakubrew build-zef. zef install --verbose Inline::Perl5 13:55
"rakubrew build moar 6f557f1ec; rakubrew switch moar-6f557f1ec; rakubrew build-zef; zef install --verbose Inline::Perl5" 13:56
This is perl 5, version 40, subversion 2 (v5.40.2) built for darwin-thread-multi-2level
ProductVersion:15.6 13:57
BuildVersion:24G84
I'll see if I can find the exact failing test for you 14:06
hurm. doing it with zef look instead, everything dies with "cannot locate native library" 14:08
github.com/niner/Inline-Perl5/blob...ter/t/v6.t but it looks like that will be a little challenging to try to golf that. 14:13
gist.github.com/coke/9efc8f083eb6a...607d4b7731 is the full trace on the error, at least. 14:29
Let me know if folks think this is a release blocker. 14:30
I would vote yes since we've had it on two separate machines (despite the fact that those machines are both mine, they are very different. :)
I am OK with delaying the release to January to get this resolved and have us moving forward also - please raise any objects on that, also, 14:38
lizmat confirmed error on MacOS 14:41
14:41 librasteve_ joined
patrickb [Coke]: rakubrew build moar-blead that is, correct? 14:41
lizmat when doing zef install. however, running the v6.t file separately in an Inline::Perl5 checkout passes all tests 14:42
[Coke] I used the specific hash (see above)
lizmat: do we need an IP5 release?
lizmat but it *does* happen if I first nuke .precomp
[Coke] what if you use the last release tag instead of HEAD, i mean?
lizmat [Coke]: added my stacktrace to your stack trace gist 14:44
timo bisectable6: help
bisectable6 timo, Like this: bisectable6: old=2015.12 new=HEAD exit 1 if (^∞).grep({ last })[5] // 0 == 4 # See wiki for more examples: github.com/Raku/whateverable/wiki/Bisectable
timo committable6: releases class Test { has str $.test; has str $.two }; say Test.new(:test("tada")).raku; 14:45
committable6 timo, gist.github.com/f6e63e0862a7ee9411...8e1344ca27 14:46
timo bisectable6: good=2019.11 bad=2020.01 class Test { has str $.test; has str $.two }; say Test.new(:test("tada")).raku; 14:48
bisectable6 timo, Bisecting by exit code (old=2019.11 new=2020.01). Old exit code: 0
patrickb I believe this is an issue with the var-args changes related to unions 14:49
lizmat note this only happens on precomp
a second run without nuking precomp, passes
timo [Coke]: if something fails in "zef look" but not outside, check the environment variables inside and outside; could be related to how zef spawns the shell 14:51
14:52 bisectable6 left
[Coke] yup, I assume especially on macos with strict "where are my libraries" rules. I'm not going to make it work to get the same failure INSIDE zef look though, when it fails from the normal command line. 14:54
lizmat patrickb timo looks like the issue is spesh related: can't get it to fail with: rm -rf .precomp; MVM_SPESH_DISABLE=1 raku -I. t/v6.t 14:56
timo were we on the way to throwing out dyncall in favor of libffi? 14:58
releasable6 Next release in ≈4 days and ≈3 hours. There are no known blockers. Please log your changes in the ChangeLog: github.com/rakudo/rakudo/wiki/ChangeLog-Draft 15:00
lizmat patrickb timo none of the other MVM_SPESH...DISABLE options let it passs 15:01
only MVM_SPESH_DISABLE=1 allows the code to run without errors 15:02
all others generate: Internal error: unhandled dyncall argument type 0 processing int argument 0 in MVM_nativecall_dispatch
note that I'm running this on Apple silicon, so not JITting
timo argument type 0 is "void" 15:04
how does that get in there, i wonder
15:05 bisectable6 joined
lizmat another data point: if I "dd" all of the arguments before the p5_call_function, it also always passes 15:06
timo does the spesh bisect tool still work?
lizmat if I just put the arguments in an array, it does *not* pass 15:07
if I create a sub foo(*@a) { }, and put all of the arguments as arguments to a call to "foo", it passes 15:08
the plot thickens: if I specifically initialize the native int32 variables ($retvals, $err, $type), it passes 15:10
timo spesh bisect lives in moarvm's tools folder
though it probably will just point at the exact function we also see in the stack trace
lizmat ok, it looks like "int $retvals" is the only one that needs initialization 15:11
method call-simple-args(Str $function, **@args) {
- my int32 $retvals;
+ my int32 $retvals = 0;
fixes all testing for Inline::Perl5 15:12
timo does it still say "processing int argument 0"? 15:13
lizmat no, all tests pass 15:14
another fix is apparently moving the definition of "my int32 $retvals" **after**the definition of "my Int $j = 0" 15:15
so it feels like a combination fg 15:16
of allocation of native ints
timo could also be related to how argument passing in dispatchers works 15:17
lizmat with this change: 15:18
- my Int $j = 0;
+ my Int $j is default(0);
I get a flapper:
most of the time, the v6.t test file passes, when it doesn't: 15:19
[Coke] Wonder why the PTY work would trigger this. :|
lizmat MoarVM oops: Internal error: unhandled dyncall argument type for str 50
Internal error: unhandled dyncall argument type 0 processing int argument 0 in MVM_nativecall_dispatch
[Coke]: it's related to spesh, as the problem goes away if spesh is disabled 15:20
timo could you set a breakpoint in a debugger for me, or alternatively put some debug output in your moarvm?
src/core/nativecall.c:228
it has the comment "should never be reached", but it's potentially where our 0 comes from 15:21
lizmat [Coke]: could be that the PTY work changed some subtle timings, which uncovered a gremlin
[Coke] (timings) ahhh 15:22
lizmat timo: what would you like me to put there ?
timo sorry, i think i looked wrong, that is actually unreachable code, right? 15:23
patrickb determine type from arg should only be called in the var arg case. 15:25
inline p5 doesn't do that.
timo dyncall argument type 50 is not assigned to any meaning
lizmat yeah, looks unreachable to me 15:26
patrickb so this is memory corruption related? 15:27
[Coke] it was type 42 when I saw it. 15:28
patrickb we've seen 42, 50 and 0 15:29
lizmat repeats: it is spesh related 15:30
perhaps spesh isn't sufficiently aware of the varargs changes ?
or PTY changes ?
timo these values come from inside the NativeCallBody
patrickb looks at the var-args PR again 15:31
timo the serialize and deserialize code doesn't look wrong, but that could be one way how a wrong value could sneak in that would only show up if you first nuke precomp?
actually, not sure if we go through serialize and deserialize if we freshly pre-comp to run it, vs if we just load stuff
15:33 Pixi` joined
timo liz, can you break on the oops or the MVM_exception_throw_adhoc and print out what the "body" from MVM_nativecall_dispatch has inside it? also the "args" 15:34
15:35 Pixi left
patrickb timo: yes there is intent to nuke dyncall. That currently blocks on getting libffi compilable on Windows (probably me who's gonna do that). 15:35
timo patrickb: do you have any way to repro the issue liz and coke are seeing? you don't have a mac i assume? 15:36
patrickb I reproed the issue on Linux.
lizmat I'm in the middle of writing a advent post for tomorrow 15:37
patrickb rakubrew build moar-blead;rakubrew switch moar-blead;rakubrew build-zef;zef install --verbose Inline::Perl5
that's possibly golfable
timo: There is the Mac stadium Mac that you also have access to iirc. 15:38
timo right, I haven't tried that in a while 15:39
Segmentation fault (core dumped) rakudo -I . t/v6.t 15:40
:D
the segfault i had was inside a block guarded by "if (body->variadic)" when doing a native call dispatch inside call-simple-args 15:44
oh, actually, that may have been using an older precomp 15:45
patrickb: did you bump the serialization version when you added the variadic flag to nativecall body's serialization?
patrickb no
lizmat oooh... that could be an easy fix then? 15:46
patrickb I didn't realize.
Could that cause this behavior?
timo it can only give us trouble if we accidentally load a serialized precomp belonging to a pre-variadic merge
how would we have that, but not rebuild the precomps anyway because of a newer rakudo? 15:47
patrickb Which is not the case in most of the tests I performed.
timo in my case, it's probably from not running Configure.pl often enough to regenerate version information?
yeah, you all built stuff completely fresh
i couldn't get the segfault again, either 15:48
patrickb Is there any point in bumping the version now (given that we've had unrelated commits in the meantime).
lizmat perhaps would be wise to do anyway? at least to eliminate a possible cause ? 15:50
timo it feels cleaner to do the bump, and have the read and write of "variadic" be dependent on the version in use
but i'm not sure if it's possible for us or anyone in the wild to actually run into that problem? 15:51
anyway i have a segfault now that i can properly look at because this time i recorded it
(and i'm using libffi)
patrickb I'll have a look at the version bump thing. At least I'll learn what I missed last time.
timo we have two versions, one for bytecode and one for serialization 15:53
you'll want the serialization one, it's like in the 40s or so at the moment
my tools are giving me a bit of trouble right now 15:59
disbot6 <melezhik.> FWIW we can build Rakudo head in brownie and run Inline::Perl5 tests on it 16:05
<melezhik.> brw.sparrowhub.io/project/brw-orch 16:06
timo it doesn't crash reliably on my end either
patrickb timo: I'm interested, how do you try to approach this? dd? 16:08
timo do you know rr? 16:09
patrickb s/dd/rr/
so, yeah
timo hehe.
there is also ddd
but there is not rrr
it's so annoying when i rebuild moar with -O1 and i still get boatloads of "<optimized out>" when trying to print stuff 16:10
patrickb isn't ddd a pretty ancient graphical gdb frontend?
timo yup
i'm not sure any other program really tried to replicate its "make a navigatable graph of memory" feature
ok, so, i'm ending up in a block with `if (body->variadic) { ... }` around it, but near the start of the function when get_nc_body runs i print *body and its variadic is actually 0 16:15
16:16 finanalyst joined
patrickb I have some time available in about 3 hours. I'll try to join the fun then. 16:23
So there is some memory corruption going on. Is there any chance to rule out GC? 16:26
lizmat patrickb: did you seem my workarounds for the issue ? 16:30
patrickb lizmat: turning of spesh you mean? 16:33
lizmat well, that's one way
the others are:
16:33 Pixi` left
lizmat - my Int $j = 0; 16:33
+ my Int $j is default(0);
in Inline::Perl5, "call-simple-args" 16:34
patrickb what we know up to now: needs precomp, needs spesh, does corrupt nc body. Depends on how it's called.
lizmat another fix is apparently moving the definition of "my int32 $retvals" **after**the definition of "my Int $j = 0" 16:35
- my int32 $retvals;
+ my int32 $retvals = 0;
and that was the final fix I found
s/fix/workaround 16:36
patrickb I think changing the code to hopefully not trigger this anymore is losing our chance to find and fix this (even though I'm unsure if we'll be able to pull this off.)
timo that would surely shuffle some stuff around, which could potentially just mask the issue
lizmat well, my point was that perhaps the way this shuffles could point at a cause
timo to be fair, this is not the right thing to be doing when you have a bad headache already :)
patrickb I'm afk for now, but I'll report back later.
16:39 Pixi joined
timo i'll plop my recording into pernosco 16:44
it'll take a moment to ingest 16:47
there it is 16:49
ah 16:53
we used to take all relevant information out of the nativecall body before calling anything of consequence
so we didn't bother making sure we update the pointer after a GC run 16:54
but that wasn't made clear in the code at all
looks like there's another bug there too that we haven't hit yet 16:55
well, at least in the libffi based code i'm looking at right now 16:56
the MVM_NATIVECALL_ARG_CPPSTRUCT case of argument passing can call MVM_nativecall_make_cppstruct which allocates, and after that, "body" is potentially no longer valid 16:57
and then if there's a MVM_NATIVECALL_ARG_CALLBACK after that, we read the body->arg_info array from there 16:58
patrickb Whoop! Thank you so much for digging into this 17:00
timo this isn't the problem that coke and liz see though i think 17:01
well, it is possible that it is 17:08
github.com/MoarVM/MoarVM/pull/1976 is the pull request with what I assume is the fix 17:14
i'm sorry i wasn't available to do a thorough code review of your branch for the merge, i might have spotted it back then 17:17
lizmat, [Coke], can you build your moarvm with this pull request? 17:21
i seem to recall i started building something to make the missing-root-finding gcc plugin usable again by grabbing an old-enough gcc in a container ... i wonder if it could have spotted that the body pointer goes into an object that can move 17:25
lizmat timo: so, this PR would be a fix for other potential issues, right? 17:33
timo yeah, i'm not sure exactly how it could be responsible for what you're seeing 17:35
i can't reproduce the same issue on my end 17:38
i also missed adding an MVMROOT around the case where we may create a struct object 17:42
lizmat just decided to restart a different advent blog post because not being able to find the right tone for the current approach 17:44
[Coke] timo: testing on my mac... 17:51
ooh, can use rakubrew triple for this. neat 17:53
timo useful
patrickb: did you look at src/spesh/disp.c at all for the vararg changes? there's a function "translate_dispatch_program" that handles native calls that may need changes. could be as simple as detecting the unsupported new thing and bailing out, there's at least one example of that if you search for MVM_spesh_graph_add_comment 18:11
it handles, among many other things, native calls, i should say
could be at that point anything related to vararg support isn't relevant any more, or not yet 18:15
> Label followed by a declaration is a C23 extensionclang(-Wc23-extensions) 18:20
could this be related to the procops.c compilation failures?
src/io/procops.c:1018:9: error: expected expression 18:21
MVMObject *msg_box = NULL;
[Coke] heh: rakubrew switch moar-HEAD-HEAD-nativecall_beware_the_moving_pointer 18:30
timo t/09-moar/Line_Break__LineBreak.t is passing TODOs and t/04-nativecall/02-simple-args.t is failing its test number 14, that's not so good ... 18:45
is that just because the rakudo we're using to CI moarvm is very latest, and so is the nqp? and we didn't update the tests yet to account for the bump we would do? 18:46
lizmat is the reason why the test is TODOd clear ?
some tests have been TODOed because they depend on the optimizer, and since RakUAST doesn't have one yet, they were failing there 18:47
[Coke] timo: with your branch, I can now install Inline::Perl5 on my mac
timo huh, so it really was this? that's *weird* tho 18:49
lizmat *phew* 18:51
timo "# Many codepoints return XX instead of ID. These codepoints are undefined, but unicode spec has specified that they should regardless be ID" from t/09-moar/UnipropCheck.rakumod
i don't get this test :) 18:53
ah, my local rakudo was also rather out of date 18:54
dev.azure.com/MoarVM/MoarVM/_build...amp;l=4770 here's a link to the LineBreak property test failing in azure 19:01
> ok 106 - postfix hyper primes properly # TODO ensure that hyper operators prime as expected 19:04
not sure that's one of the tests that depended on the optimizer?
it's "ok" with both rakudo_rakuast set to 0 and 1, but in one of the cases it's todo'd (wrongly) 19:06
lizmat perhaps ShimmerFairy has an idea
timo oh, unexpectedly passed TODO don't make the CI red 19:08
patrickb the msg_box failure can be fixed by simply adding a ; to the label line
timo is the change i made that moves the label in front of the "if" statement also correct?
i looked and every place that goto's that label also sets the variable that the if checks
patrickb looks 19:09
timo also, can you look at why the nativecall tests fail on some variants? for example dev.azure.com/MoarVM/MoarVM/_build...amp;l=2154 here 19:10
.o( also, the real test will happen when I build debian packages of the latest ) 19:11
the failures are only on clang, not on gcc? 19:24
only on clang and dyncall, not clang and libffi 19:25
19:37 finanalyst left
timo i think i see it 19:38
well, i see *something*
godbolt.org/z/EnajheTMW gcc compares the constant we're interested in with the contents of the dil register, aka the lowest 8 bits of edi, while clang compares the full edi register 19:39
lizmat oof... subtle! 19:40
timo could be we're only setting the lowest 8 bits of the register from our side and leaving some trash in the upper bits?
> ok 15 - # SKIP Cannot test TakeUint16(0xFFFE) with clang without -O0 19:41
presumably this is similar?
oh 19:49
$dil and $edi both contain "-2", meaning $edi is 0xfffffffe and $dil is 0xfe 19:50
so guess why in the clang compiled code `cmp edi, 254` doesn't work how we expect it 19:51
gitlab.com/x86-psABIs/x86-64-ABI/-...0bcc213236 and github.com/llvm/llvm-project/issues/12579 seem related 19:57
groups.google.com/g/x86-64-abi/c/h...V4lCRQAQAJ 19:59
i believe this is not just for 8 bit but also 16 bit values 20:00
github.com/rakudo/rakudo/issues/16...-377443272 kaiepi already analyzed this long ago 20:02
lizmat oO( kaiepi still being missed :-( ) 20:03
timo quite :(
patrickb: in any case, feel free to not be bothered too much by this particular issue ... unless you have a good idea for how we should behave in light of this mess 20:07
20:21 librasteve_ left
patrickb I think I don't fully understand the issue yet. There are multiple registers referring to the same data, but with different sizes? 20:23
And clang confuses the sizes? 20:24
edi 32 bits, dil 8bits. but dil == lowest 8 bits of edi? 20:25
timo yeah, you can refer to different-sized versions of the same register by different names 20:29
actually, libffi has a uchar argument type, maybe that is exactly what we need
explaining why it's broken on dyncall but works on libffi? 20:30
presumably ffi_type_uchar causes the right kind of extension to give clang what it wants?
patrickb Is this issue actually about call conventions? 20:32
timo yes
clang wishes that the calling convention is "the caller extends the register"
the calling convention says "undefined. if you look at the bits outside of the defined type, you can keep the pieces"
patrickb So edi is 32 bits. But we only write 8 bits. 20:33
Understood. Okay. And clang typically works out, because it creates its own calls.
But now that we're using a ffi lib we want to play in it's park. 20:34
I guess we have to play by it's rules then.
And libffi already does what's needed. That's also why most projects don't hit this issue. 20:35
timo bugs.llvm.org/show_bug.cgi?id=44228#c4 - one word from LLVM about it, and the groups.google.com link above has a rebuttal or so
patrickb I guess given we plan to move to libffi exclusively Soon™️, we can just punt on this issue? 20:36
timo presumably values below 127 can be compared against fine, so we can pass two different values and if one of them gives the expected result, we can at least display a hint next to the failed test 20:47
patrickb Could we just fix up dyncall by up casting manually? 21:15
Actually I recall Tassilo (one of the dyncall devs) telling me, that it's part of the library contract, that the user up casts stuff. And that the documentation is lacking in that regard. 21:16
21:29 lucs_ is now known as lucs
patrickb I've looked it up again. That info was about varargs only. 21:35
timo ah, yeah, varargs are special yet again
patrickb Still, if full int width is passed in every case anyways, can't we just null out the full width by default when we are on clang? 21:36
timo it isn't about whether we are on clang, since a moar compiled with gcc may encounter libraries compiled with clang and vice versa 21:37
ab5tract patrickb: do you have an example for reproducing R#6038 ? 21:39
linkable6 R#6038 [open]: github.com/rakudo/rakudo/pull/6038 RakuAST: Default unnamed packages to `my` scope instead of `our`
ab5tract If you have one handy, I wanted to try adding `anon` to `RakuAST::Package.allowed-scopes`
timo if we just always pass a 32bit integer or bigger, we can run afoul of ABIs on other targets behaving differently
also, i haven't checked but i assume when passing arguments via the stack once you run out of registers, it's different yet again 21:43
and if we put a check "is it going to be in a register or on the stack" before choosing whether to use dcArgChar / dcArgShort or upgrade to dcArgInt, then we're re-inventing half of dyncall inside of the code where we use dyncall 21:45
are we still on a very old fork of dyncall? does dyncall maybe have UChar vs Char now? 21:46