|
01:30
librasteve_ left
|
|||
| [Coke] | the inline::perl5 failure is reproducible on rakudo HEAD on mac os, even. | 02:05 | |
|
05:19
ShimmerFairy left
05:30
ShimmerFairy joined
07:00
sivoais_ left
07:02
sivoais joined
07:03
Pixi left,
ShimmerFairy left
|
|||
| patrickb | [Coke]: What are your exact steps for reproduction? | 07:27 | |
| (it doesn't show for me on MacOS) | |||
|
07:35
melezhik joined
|
|||
| melezhik | . | 07:35 | |
|
09:44
melezhik left
11:13
Pixi joined
11:16
ShimmerFairy joined
|
|||
| [Coke] | rakubrew build head. rakubrew build-zef. zef install --verbose Inline::Perl5 | 13:55 | |
| "rakubrew build moar 6f557f1ec; rakubrew switch moar-6f557f1ec; rakubrew build-zef; zef install --verbose Inline::Perl5" | 13:56 | ||
| This is perl 5, version 40, subversion 2 (v5.40.2) built for darwin-thread-multi-2level | |||
| ProductVersion:15.6 | 13:57 | ||
| BuildVersion:24G84 | |||
| I'll see if I can find the exact failing test for you | 14:06 | ||
| hurm. doing it with zef look instead, everything dies with "cannot locate native library" | 14:08 | ||
| github.com/niner/Inline-Perl5/blob...ter/t/v6.t but it looks like that will be a little challenging to try to golf that. | 14:13 | ||
| gist.github.com/coke/9efc8f083eb6a...607d4b7731 is the full trace on the error, at least. | 14:29 | ||
| Let me know if folks think this is a release blocker. | 14:30 | ||
| I would vote yes since we've had it on two separate machines (despite the fact that those machines are both mine, they are very different. :) | |||
| I am OK with delaying the release to January to get this resolved and have us moving forward also - please raise any objects on that, also, | 14:38 | ||
| lizmat | confirmed error on MacOS | 14:41 | |
|
14:41
librasteve_ joined
|
|||
| patrickb | [Coke]: rakubrew build moar-blead that is, correct? | 14:41 | |
| lizmat | when doing zef install. however, running the v6.t file separately in an Inline::Perl5 checkout passes all tests | 14:42 | |
| [Coke] | I used the specific hash (see above) | ||
| lizmat: do we need an IP5 release? | |||
| lizmat | but it *does* happen if I first nuke .precomp | ||
| [Coke] | what if you use the last release tag instead of HEAD, i mean? | ||
| lizmat | [Coke]: added my stacktrace to your stack trace gist | 14:44 | |
| timo | bisectable6: help | ||
| bisectable6 | timo, Like this: bisectable6: old=2015.12 new=HEAD exit 1 if (^∞).grep({ last })[5] // 0 == 4 # See wiki for more examples: github.com/Raku/whateverable/wiki/Bisectable | ||
| timo | committable6: releases class Test { has str $.test; has str $.two }; say Test.new(:test("tada")).raku; | 14:45 | |
| committable6 | timo, gist.github.com/f6e63e0862a7ee9411...8e1344ca27 | 14:46 | |
| timo | bisectable6: good=2019.11 bad=2020.01 class Test { has str $.test; has str $.two }; say Test.new(:test("tada")).raku; | 14:48 | |
| bisectable6 | timo, Bisecting by exit code (old=2019.11 new=2020.01). Old exit code: 0 | ||
| patrickb | I believe this is an issue with the var-args changes related to unions | 14:49 | |
| lizmat | note this only happens on precomp | ||
| a second run without nuking precomp, passes | |||
| timo | [Coke]: if something fails in "zef look" but not outside, check the environment variables inside and outside; could be related to how zef spawns the shell | 14:51 | |
|
14:52
bisectable6 left
|
|||
| [Coke] | yup, I assume especially on macos with strict "where are my libraries" rules. I'm not going to make it work to get the same failure INSIDE zef look though, when it fails from the normal command line. | 14:54 | |
| lizmat | patrickb timo looks like the issue is spesh related: can't get it to fail with: rm -rf .precomp; MVM_SPESH_DISABLE=1 raku -I. t/v6.t | 14:56 | |
| timo | were we on the way to throwing out dyncall in favor of libffi? | 14:58 | |
| releasable6 | Next release in ≈4 days and ≈3 hours. There are no known blockers. Please log your changes in the ChangeLog: github.com/rakudo/rakudo/wiki/ChangeLog-Draft | 15:00 | |
| lizmat | patrickb timo none of the other MVM_SPESH...DISABLE options let it passs | 15:01 | |
| only MVM_SPESH_DISABLE=1 allows the code to run without errors | 15:02 | ||
| all others generate: Internal error: unhandled dyncall argument type 0 processing int argument 0 in MVM_nativecall_dispatch | |||
| note that I'm running this on Apple silicon, so not JITting | |||
| timo | argument type 0 is "void" | 15:04 | |
| how does that get in there, i wonder | |||
|
15:05
bisectable6 joined
|
|||
| lizmat | another data point: if I "dd" all of the arguments before the p5_call_function, it also always passes | 15:06 | |
| timo | does the spesh bisect tool still work? | ||
| lizmat | if I just put the arguments in an array, it does *not* pass | 15:07 | |
| if I create a sub foo(*@a) { }, and put all of the arguments as arguments to a call to "foo", it passes | 15:08 | ||
| the plot thickens: if I specifically initialize the native int32 variables ($retvals, $err, $type), it passes | 15:10 | ||
| timo | spesh bisect lives in moarvm's tools folder | ||
| though it probably will just point at the exact function we also see in the stack trace | |||
| lizmat | ok, it looks like "int $retvals" is the only one that needs initialization | 15:11 | |
| method call-simple-args(Str $function, **@args) { | |||
| - my int32 $retvals; | |||
| + my int32 $retvals = 0; | |||
| fixes all testing for Inline::Perl5 | 15:12 | ||
| timo | does it still say "processing int argument 0"? | 15:13 | |
| lizmat | no, all tests pass | 15:14 | |
| another fix is apparently moving the definition of "my int32 $retvals" **after**the definition of "my Int $j = 0" | 15:15 | ||
| so it feels like a combination fg | 15:16 | ||
| of allocation of native ints | |||
| timo | could also be related to how argument passing in dispatchers works | 15:17 | |
| lizmat | with this change: | 15:18 | |
| - my Int $j = 0; | |||
| + my Int $j is default(0); | |||
| I get a flapper: | |||
| most of the time, the v6.t test file passes, when it doesn't: | 15:19 | ||
| [Coke] | Wonder why the PTY work would trigger this. :| | ||
| lizmat | MoarVM oops: Internal error: unhandled dyncall argument type for str 50 | ||
| Internal error: unhandled dyncall argument type 0 processing int argument 0 in MVM_nativecall_dispatch | |||
| [Coke]: it's related to spesh, as the problem goes away if spesh is disabled | 15:20 | ||
| timo | could you set a breakpoint in a debugger for me, or alternatively put some debug output in your moarvm? | ||
| src/core/nativecall.c:228 | |||
| it has the comment "should never be reached", but it's potentially where our 0 comes from | 15:21 | ||
| lizmat | [Coke]: could be that the PTY work changed some subtle timings, which uncovered a gremlin | ||
| [Coke] | (timings) ahhh | 15:22 | |
| lizmat | timo: what would you like me to put there ? | ||
| timo | sorry, i think i looked wrong, that is actually unreachable code, right? | 15:23 | |
| patrickb | determine type from arg should only be called in the var arg case. | 15:25 | |
| inline p5 doesn't do that. | |||
| timo | dyncall argument type 50 is not assigned to any meaning | ||
| lizmat | yeah, looks unreachable to me | 15:26 | |
| patrickb | so this is memory corruption related? | 15:27 | |
| [Coke] | it was type 42 when I saw it. | 15:28 | |
| patrickb | we've seen 42, 50 and 0 | 15:29 | |
| lizmat repeats: it is spesh related | 15:30 | ||
| perhaps spesh isn't sufficiently aware of the varargs changes ? | |||
| or PTY changes ? | |||
| timo | these values come from inside the NativeCallBody | ||
| patrickb looks at the var-args PR again | 15:31 | ||
| timo | the serialize and deserialize code doesn't look wrong, but that could be one way how a wrong value could sneak in that would only show up if you first nuke precomp? | ||
| actually, not sure if we go through serialize and deserialize if we freshly pre-comp to run it, vs if we just load stuff | |||
|
15:33
Pixi` joined
|
|||
| timo | liz, can you break on the oops or the MVM_exception_throw_adhoc and print out what the "body" from MVM_nativecall_dispatch has inside it? also the "args" | 15:34 | |
|
15:35
Pixi left
|
|||
| patrickb | timo: yes there is intent to nuke dyncall. That currently blocks on getting libffi compilable on Windows (probably me who's gonna do that). | 15:35 | |
| timo | patrickb: do you have any way to repro the issue liz and coke are seeing? you don't have a mac i assume? | 15:36 | |
| patrickb | I reproed the issue on Linux. | ||
| lizmat | I'm in the middle of writing a advent post for tomorrow | 15:37 | |
| patrickb | rakubrew build moar-blead;rakubrew switch moar-blead;rakubrew build-zef;zef install --verbose Inline::Perl5 | ||
| that's possibly golfable | |||
| timo: There is the Mac stadium Mac that you also have access to iirc. | 15:38 | ||
| timo | right, I haven't tried that in a while | 15:39 | |
| Segmentation fault (core dumped) rakudo -I . t/v6.t | 15:40 | ||
| :D | |||
| the segfault i had was inside a block guarded by "if (body->variadic)" when doing a native call dispatch inside call-simple-args | 15:44 | ||
| oh, actually, that may have been using an older precomp | 15:45 | ||
| patrickb: did you bump the serialization version when you added the variadic flag to nativecall body's serialization? | |||
| patrickb | no | ||
| lizmat | oooh... that could be an easy fix then? | 15:46 | |
| patrickb | I didn't realize. | ||
| Could that cause this behavior? | |||
| timo | it can only give us trouble if we accidentally load a serialized precomp belonging to a pre-variadic merge | ||
| how would we have that, but not rebuild the precomps anyway because of a newer rakudo? | 15:47 | ||
| patrickb | Which is not the case in most of the tests I performed. | ||
| timo | in my case, it's probably from not running Configure.pl often enough to regenerate version information? | ||
| yeah, you all built stuff completely fresh | |||
| i couldn't get the segfault again, either | 15:48 | ||
| patrickb | Is there any point in bumping the version now (given that we've had unrelated commits in the meantime). | ||
| lizmat | perhaps would be wise to do anyway? at least to eliminate a possible cause ? | 15:50 | |
| timo | it feels cleaner to do the bump, and have the read and write of "variadic" be dependent on the version in use | ||
| but i'm not sure if it's possible for us or anyone in the wild to actually run into that problem? | 15:51 | ||
| anyway i have a segfault now that i can properly look at because this time i recorded it | |||
| (and i'm using libffi) | |||
| patrickb | I'll have a look at the version bump thing. At least I'll learn what I missed last time. | ||
| timo | we have two versions, one for bytecode and one for serialization | 15:53 | |
| you'll want the serialization one, it's like in the 40s or so at the moment | |||
| my tools are giving me a bit of trouble right now | 15:59 | ||
| disbot6 | <melezhik.> FWIW we can build Rakudo head in brownie and run Inline::Perl5 tests on it | 16:05 | |
| <melezhik.> brw.sparrowhub.io/project/brw-orch | 16:06 | ||
| timo | it doesn't crash reliably on my end either | ||
| patrickb | timo: I'm interested, how do you try to approach this? dd? | 16:08 | |
| timo | do you know rr? | 16:09 | |
| patrickb | s/dd/rr/ | ||
| so, yeah | |||
| timo | hehe. | ||
| there is also ddd | |||
| but there is not rrr | |||
| it's so annoying when i rebuild moar with -O1 and i still get boatloads of "<optimized out>" when trying to print stuff | 16:10 | ||
| patrickb | isn't ddd a pretty ancient graphical gdb frontend? | ||
| timo | yup | ||
| i'm not sure any other program really tried to replicate its "make a navigatable graph of memory" feature | |||
| ok, so, i'm ending up in a block with `if (body->variadic) { ... }` around it, but near the start of the function when get_nc_body runs i print *body and its variadic is actually 0 | 16:15 | ||
|
16:16
finanalyst joined
|
|||
| patrickb | I have some time available in about 3 hours. I'll try to join the fun then. | 16:23 | |
| So there is some memory corruption going on. Is there any chance to rule out GC? | 16:26 | ||
| lizmat | patrickb: did you seem my workarounds for the issue ? | 16:30 | |
| patrickb | lizmat: turning of spesh you mean? | 16:33 | |
| lizmat | well, that's one way | ||
| the others are: | |||
|
16:33
Pixi` left
|
|||
| lizmat | - my Int $j = 0; | 16:33 | |
| + my Int $j is default(0); | |||
| in Inline::Perl5, "call-simple-args" | 16:34 | ||
| patrickb | what we know up to now: needs precomp, needs spesh, does corrupt nc body. Depends on how it's called. | ||
| lizmat | another fix is apparently moving the definition of "my int32 $retvals" **after**the definition of "my Int $j = 0" | 16:35 | |
| - my int32 $retvals; | |||
| + my int32 $retvals = 0; | |||
| and that was the final fix I found | |||
| s/fix/workaround | 16:36 | ||
| patrickb | I think changing the code to hopefully not trigger this anymore is losing our chance to find and fix this (even though I'm unsure if we'll be able to pull this off.) | ||
| timo | that would surely shuffle some stuff around, which could potentially just mask the issue | ||
| lizmat | well, my point was that perhaps the way this shuffles could point at a cause | ||
| timo | to be fair, this is not the right thing to be doing when you have a bad headache already :) | ||
| patrickb | I'm afk for now, but I'll report back later. | ||
|
16:39
Pixi joined
|
|||
| timo | i'll plop my recording into pernosco | 16:44 | |
| it'll take a moment to ingest | 16:47 | ||
| there it is | 16:49 | ||
| ah | 16:53 | ||
| we used to take all relevant information out of the nativecall body before calling anything of consequence | |||
| so we didn't bother making sure we update the pointer after a GC run | 16:54 | ||
| but that wasn't made clear in the code at all | |||
| looks like there's another bug there too that we haven't hit yet | 16:55 | ||
| well, at least in the libffi based code i'm looking at right now | 16:56 | ||
| the MVM_NATIVECALL_ARG_CPPSTRUCT case of argument passing can call MVM_nativecall_make_cppstruct which allocates, and after that, "body" is potentially no longer valid | 16:57 | ||
| and then if there's a MVM_NATIVECALL_ARG_CALLBACK after that, we read the body->arg_info array from there | 16:58 | ||
| patrickb | Whoop! Thank you so much for digging into this | 17:00 | |
| timo | this isn't the problem that coke and liz see though i think | 17:01 | |
| well, it is possible that it is | 17:08 | ||
| github.com/MoarVM/MoarVM/pull/1976 is the pull request with what I assume is the fix | 17:14 | ||
| i'm sorry i wasn't available to do a thorough code review of your branch for the merge, i might have spotted it back then | 17:17 | ||
| lizmat, [Coke], can you build your moarvm with this pull request? | 17:21 | ||
| i seem to recall i started building something to make the missing-root-finding gcc plugin usable again by grabbing an old-enough gcc in a container ... i wonder if it could have spotted that the body pointer goes into an object that can move | 17:25 | ||
| lizmat | timo: so, this PR would be a fix for other potential issues, right? | 17:33 | |
| timo | yeah, i'm not sure exactly how it could be responsible for what you're seeing | 17:35 | |
| i can't reproduce the same issue on my end | 17:38 | ||
| i also missed adding an MVMROOT around the case where we may create a struct object | 17:42 | ||
| lizmat just decided to restart a different advent blog post because not being able to find the right tone for the current approach | 17:44 | ||
| [Coke] | timo: testing on my mac... | 17:51 | |
| ooh, can use rakubrew triple for this. neat | 17:53 | ||
| timo | useful | ||
| patrickb: did you look at src/spesh/disp.c at all for the vararg changes? there's a function "translate_dispatch_program" that handles native calls that may need changes. could be as simple as detecting the unsupported new thing and bailing out, there's at least one example of that if you search for MVM_spesh_graph_add_comment | 18:11 | ||
| it handles, among many other things, native calls, i should say | |||
| could be at that point anything related to vararg support isn't relevant any more, or not yet | 18:15 | ||
| > Label followed by a declaration is a C23 extensionclang(-Wc23-extensions) | 18:20 | ||
| could this be related to the procops.c compilation failures? | |||
| src/io/procops.c:1018:9: error: expected expression | 18:21 | ||
| MVMObject *msg_box = NULL; | |||
| [Coke] heh: rakubrew switch moar-HEAD-HEAD-nativecall_beware_the_moving_pointer | 18:30 | ||
| timo | t/09-moar/Line_Break__LineBreak.t is passing TODOs and t/04-nativecall/02-simple-args.t is failing its test number 14, that's not so good ... | 18:45 | |
| is that just because the rakudo we're using to CI moarvm is very latest, and so is the nqp? and we didn't update the tests yet to account for the bump we would do? | 18:46 | ||
| lizmat | is the reason why the test is TODOd clear ? | ||
| some tests have been TODOed because they depend on the optimizer, and since RakUAST doesn't have one yet, they were failing there | 18:47 | ||
| [Coke] | timo: with your branch, I can now install Inline::Perl5 on my mac | ||
| timo | huh, so it really was this? that's *weird* tho | 18:49 | |
| lizmat | *phew* | 18:51 | |
| timo | "# Many codepoints return XX instead of ID. These codepoints are undefined, but unicode spec has specified that they should regardless be ID" from t/09-moar/UnipropCheck.rakumod | ||
| i don't get this test :) | 18:53 | ||
| ah, my local rakudo was also rather out of date | 18:54 | ||
| dev.azure.com/MoarVM/MoarVM/_build...amp;l=4770 here's a link to the LineBreak property test failing in azure | 19:01 | ||
| > ok 106 - postfix hyper primes properly # TODO ensure that hyper operators prime as expected | 19:04 | ||
| not sure that's one of the tests that depended on the optimizer? | |||
| it's "ok" with both rakudo_rakuast set to 0 and 1, but in one of the cases it's todo'd (wrongly) | 19:06 | ||
| lizmat | perhaps ShimmerFairy has an idea | ||
| timo | oh, unexpectedly passed TODO don't make the CI red | 19:08 | |
| patrickb | the msg_box failure can be fixed by simply adding a ; to the label line | ||
| timo | is the change i made that moves the label in front of the "if" statement also correct? | ||
| i looked and every place that goto's that label also sets the variable that the if checks | |||
| patrickb looks | 19:09 | ||
| timo | also, can you look at why the nativecall tests fail on some variants? for example dev.azure.com/MoarVM/MoarVM/_build...amp;l=2154 here | 19:10 | |
| .o( also, the real test will happen when I build debian packages of the latest ) | 19:11 | ||
| the failures are only on clang, not on gcc? | 19:24 | ||
| only on clang and dyncall, not clang and libffi | 19:25 | ||
|
19:37
finanalyst left
|
|||
| timo | i think i see it | 19:38 | |
| well, i see *something* | |||
| godbolt.org/z/EnajheTMW gcc compares the constant we're interested in with the contents of the dil register, aka the lowest 8 bits of edi, while clang compares the full edi register | 19:39 | ||
| lizmat | oof... subtle! | 19:40 | |
| timo | could be we're only setting the lowest 8 bits of the register from our side and leaving some trash in the upper bits? | ||
| > ok 15 - # SKIP Cannot test TakeUint16(0xFFFE) with clang without -O0 | 19:41 | ||
| presumably this is similar? | |||
| oh | 19:49 | ||
| $dil and $edi both contain "-2", meaning $edi is 0xfffffffe and $dil is 0xfe | 19:50 | ||
| so guess why in the clang compiled code `cmp edi, 254` doesn't work how we expect it | 19:51 | ||
| gitlab.com/x86-psABIs/x86-64-ABI/-...0bcc213236 and github.com/llvm/llvm-project/issues/12579 seem related | 19:57 | ||
| groups.google.com/g/x86-64-abi/c/h...V4lCRQAQAJ | 19:59 | ||
| i believe this is not just for 8 bit but also 16 bit values | 20:00 | ||
| github.com/rakudo/rakudo/issues/16...-377443272 kaiepi already analyzed this long ago | 20:02 | ||
| lizmat | oO( kaiepi still being missed :-( ) | 20:03 | |
| timo | quite :( | ||
| patrickb: in any case, feel free to not be bothered too much by this particular issue ... unless you have a good idea for how we should behave in light of this mess | 20:07 | ||
|
20:21
librasteve_ left
|
|||
| patrickb | I think I don't fully understand the issue yet. There are multiple registers referring to the same data, but with different sizes? | 20:23 | |
| And clang confuses the sizes? | 20:24 | ||
| edi 32 bits, dil 8bits. but dil == lowest 8 bits of edi? | 20:25 | ||
| timo | yeah, you can refer to different-sized versions of the same register by different names | 20:29 | |
| actually, libffi has a uchar argument type, maybe that is exactly what we need | |||
| explaining why it's broken on dyncall but works on libffi? | 20:30 | ||
| presumably ffi_type_uchar causes the right kind of extension to give clang what it wants? | |||
| patrickb | Is this issue actually about call conventions? | 20:32 | |
| timo | yes | ||
| clang wishes that the calling convention is "the caller extends the register" | |||
| the calling convention says "undefined. if you look at the bits outside of the defined type, you can keep the pieces" | |||
| patrickb | So edi is 32 bits. But we only write 8 bits. | 20:33 | |
| Understood. Okay. And clang typically works out, because it creates its own calls. | |||
| But now that we're using a ffi lib we want to play in it's park. | 20:34 | ||
| I guess we have to play by it's rules then. | |||
| And libffi already does what's needed. That's also why most projects don't hit this issue. | 20:35 | ||
| timo | bugs.llvm.org/show_bug.cgi?id=44228#c4 - one word from LLVM about it, and the groups.google.com link above has a rebuttal or so | ||
| patrickb | I guess given we plan to move to libffi exclusively Soon™️, we can just punt on this issue? | 20:36 | |
| timo | presumably values below 127 can be compared against fine, so we can pass two different values and if one of them gives the expected result, we can at least display a hint next to the failed test | 20:47 | |
| patrickb | Could we just fix up dyncall by up casting manually? | 21:15 | |
| Actually I recall Tassilo (one of the dyncall devs) telling me, that it's part of the library contract, that the user up casts stuff. And that the documentation is lacking in that regard. | 21:16 | ||
|
21:29
lucs_ is now known as lucs
|
|||
| patrickb | I've looked it up again. That info was about varargs only. | 21:35 | |
| timo | ah, yeah, varargs are special yet again | ||
| patrickb | Still, if full int width is passed in every case anyways, can't we just null out the full width by default when we are on clang? | 21:36 | |
| timo | it isn't about whether we are on clang, since a moar compiled with gcc may encounter libraries compiled with clang and vice versa | 21:37 | |
| ab5tract | patrickb: do you have an example for reproducing R#6038 ? | 21:39 | |
| linkable6 | R#6038 [open]: github.com/rakudo/rakudo/pull/6038 RakuAST: Default unnamed packages to `my` scope instead of `our` | ||
| ab5tract | If you have one handy, I wanted to try adding `anon` to `RakuAST::Package.allowed-scopes` | ||
| timo | if we just always pass a 32bit integer or bigger, we can run afoul of ABIs on other targets behaving differently | ||
| also, i haven't checked but i assume when passing arguments via the stack once you run out of registers, it's different yet again | 21:43 | ||
| and if we put a check "is it going to be in a register or on the stack" before choosing whether to use dcArgChar / dcArgShort or upgrade to dcArgInt, then we're re-inventing half of dyncall inside of the code where we use dyncall | 21:45 | ||
| are we still on a very old fork of dyncall? does dyncall maybe have UChar vs Char now? | 21:46 | ||