Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:00
bartolin left,
discord-raku-bot left,
colemanx left,
discord-raku-bot joined,
Colt left
00:02
Colt joined,
reportable6 left
00:04
[Coke] left
00:05
bartolin joined,
reportable6 joined,
[Coke] joined
00:17
squashable6 left
00:18
squashable6 joined
01:00
colemanx joined
|
|||
ugexe | I'm trying to debug why macOS Monterey usually locks up when running e.g. shell("ls"), but I don't know if this lldb output actually has anything useful -- gist.github.com/ugexe/8c34b26c5edb...cf377ae334 | 01:44 | |
02:42
frost joined
06:03
reportable6 left
|
|||
nine | Won't work, but why not give it a try anyway: export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES | 08:00 | |
08:03
reportable6 joined
08:14
linkable6 left
08:15
linkable6 joined
|
|||
Nicholas | good *, #moarvm | 08:50 | |
jnthnwrthngtn | Are threads 4 and 5 making progress, or is it just coincidence that they are both deep in OS things and trying to acquire a lock? | 08:58 | |
meeting, bbiab | 08:59 | ||
timo | ugexe: do we have info about the process that was spawned by thread 5? | 10:07 | |
10:11
patrickb joined
|
|||
jnthnwrthngtn | I'd guess thread 5 is the shell to run `ls`, the thing that surprises me more is that uv_cpu_info leads to a fork | 11:19 | |
11:41
Altai-man joined
12:03
reportable6 left
12:07
frost left
12:49
linkable6 left
|
|||
ugexe | if i repeat the example in the gist it gives the same relevant lldb output each time, so i assume its not making progress (same if i resume + interrupt) | 13:14 | |
jnthnwrthngtn | OK, so it really is stuck there. | 13:17 | |
lizmat | github.com/rakudo/rakudo/pull/4659 | 13:18 | |
Make sure that nqp::cpucores is only called once ever | |||
if this fixes it, should probably be done at the MoarVM level | 13:19 | ||
jnthnwrthngtn | If this fixes it, it should probably be reported to libuv and maybe Apple. | 13:23 | |
ugexe | that does actually work around it | 13:30 | |
maybe someone with a non-Monterey mac could run lldb on `shell("ls")` to see what it used to do | 13:32 | ||
i can do more debugging later if anyone has any suggestions, but i'm off to work for now | 13:34 | ||
13:51
linkable6 joined
14:03
reportable6 joined
14:16
patrickb left
|
|||
lizmat | $ r 'use nqp; nqp::cpucores for ^10000' | 14:26 | |
real0m0.466s | |||
$ r 'use nqp; Kernel.cpu-cores for ^10000' | |||
real0m0.173s | |||
that's with #4659 applied | 14:27 | ||
ugexe | Kernel.cpu-cores is public api though, so if the number of cpu cores visible can change at runtime (i have no idea) then we probably shouldnt cache that value for that method | 14:29 | |
[Coke] | I have a (checks) Big Sur mac if you want me to check something. | 14:31 | |
timo | you can turn off cores via /sys somewhere | 14:33 | |
jnthnwrthngtn | I think hot-changing number of CPU cores is a thing, for example a virtual machine may be able to have more cores added while it's running. | 14:35 | |
ugexe | could someone tell [Coke] the lldb or gdb commands to run a script (containing just `shell("ls")`) so they can get the `bt all` output? | ||
i don't think the runner works, and I only otherwise know how to attach to a running process (that script will finish running before he can attach to it) | |||
jnthnwrthngtn | However, I'm not sure it's terribly common, and since our thread pool already fixates its maximum at creation time anyway... ;) | 14:36 | |
timo | if a user is calling that method regularly, they will have a reason, probably? | ||
jnthnwrthngtn | Well, TPS was doing nqp::cpucores regularly, but honestly I don't remember thinking "oh, we should do this for a fresh value", it was more "make things work" and "don't expect it to be too costly" | 14:39 | |
[Coke] | ugexe: what about 'shell("ls");sleep 30' or somesuch? | ||
someone forwarded me an article on hardening release builds that we might want to adopt in part. | 14:41 | ||
cheatsheetseries.owasp.org/cheatsh...eoi7qt4p2Q | 14:42 | ||
cheatsheetseries.owasp.org/cheatsh...Sheet.html | |||
14:43
Guest12 joined
|
|||
ugexe | TPS caching the result from Kernel.cpu-cores wouldnt surprise me, but Kernel.cpu-cores itself caching might | 14:43 | |
nine | [Coke]: just gdb rakudo -e '...' is what I do | 14:44 | |
[Coke] | and once I'm in gdb? | 14:45 | |
(Surprised to find I have a gdb lying around on my mac.) | 14:46 | ||
ugexe | `bt all` or `bt full` or some such | ||
[Coke] | both say "no stack" | 14:48 | |
nine | First `run` | 14:49 | |
Then ctrl+c (or whatever on the mac) to interrrupt at the point where you want that backtrace | |||
[Coke] | sorry, yes just figured that out. run says: | ||
Unable to find Mach task port for process-id 15321: (os/kern) failure (0x5). | |||
(please check gdb is codesigned - see taskgated(8)) | |||
so I'm guessing the system is *also* suprised there's a gdb on my old mac. | 14:50 | ||
give me a bit. | |||
ugexe | do you have lldb? | ||
[Coke] | aye | 14:52 | |
ugexe | that might work better on a mac | ||
[Coke] | stub code executed, process exited with status = 1 | ||
(I have a moar-2021.10 release installed) | 14:59 | ||
vrurg | Doing `RAKUDO_OPTIMIZER_DEBUG=1 make install` in rakudo ends up with 'Cannot find method 'scope' on object of type QAST::Var+{QAST::SpecialArg}'. I'm stilling trying to figure out what's going on, but perhaps somebody has good ideas? So far I see that mixin in SpecialArg role sometimes wipes out parents array on HOW. | 15:04 | |
Could be a moar issue. | 15:05 | ||
jnthnwrthngtn | Maybe, but moar doesn't really know about the content of meta-objects; so far as it's concerned they're just, well, objects. | 15:09 | |
How do you conclude it wipes out the parents array? | 15:10 | ||
15:13
[Coke] left
15:16
[Coke] joined
|
|||
[Coke] | .' | 15:24 | |
vrurg | jnthnwrthngtn: was dumping HOW fields. | 15:36 | |
Weird thing, when it happens to routine parameters (__lowered_param_*) they're ok right after mixin, but may start failing a few lines down, within the same lower_signature sub. Since there're no races could be involved, and initially they all have >0 number of parents, I conclude that the array gets emptied accidentally. Loss of data by the VM is not excluded yet. | 15:41 | ||
Ok, probably it's pure NQP issue, after all. Tried with JVM backend and got 'Unhandled exception: Sub+{is-implementation-detail} object coerced to string'. Won't be able to work on this until later today, so if somebody wants to look into it ā I don't mind. ;) | 16:07 | ||
afk for a couple of hour. | |||
ugexe | gist.github.com/ugexe/8c34b26c5edb...cf377ae334 -- updated the gist to also include the lldb output from big sur (which does not fork) | 16:08 | |
the big sur example is on 2021.08 whereas monterey is on master fwiw | 16:09 | ||
hmm in the big sur example it doesnt call uv__get_cpu_speed among other things | 16:58 | ||
17:41
patrickb joined
|
|||
[Coke] | ugexe++ | 17:46 | |
18:00
Altai-man left
18:02
reportable6 left
18:24
Guest12 left
|
|||
ugexe | rakudo 2020.08.2 works, 2020.10 does not. these correspond to github.com/libuv/libuv/commit/87f0...e65e1b81c7 | 18:28 | |
18:54
squashable6 left
18:56
squashable6 joined
19:23
patrickb left
19:44
MasterDuke joined
|
|||
ugexe | __si_module_static_mdns_block_invoke -- i cant find anything on si_module_static_mdns but it kind of feels strange it is being queried as part of dlopen | 19:54 | |
strange in that it looks like it might be mDNS | 19:55 | ||
20:04
reportable6 joined
20:16
[Coke] left
|
|||
vrurg | BTW, do we have an article/paper where the exact technical meaning of wanted/unwanted is explained? I was googling awhile ago, with no apparent success. | 20:17 | |
timo | that's what powers "useless use of" messages | ||
20:22
[Coke] joined
|
|||
vrurg | timo: that's where my understanding basically ends. :) It looks like there are nuances I'm not aware of. For example, does it affect lowering of vars? | 20:32 | |
timo | hm, maybe that's only influenced by whether a lexical is used by an inner scope or not | 20:33 | |
20:35
[Coke] left
|
|||
vrurg | It should. But somehow I probably fail to indicate this... Whatever, that's what I was debugging when encountered the NQP bug with mixins. So, the bug goes first. | 20:35 | |
timo | how do you mean "indicate"? | 20:36 | |
vrurg | First I thought that 'whanted' attribute of a node means no variables involved are to be lowered. Second, I see that lowering code is using some annotations to skip variables. But this part of the optimizer is rather big and I didn't get deeper into it yet. | 20:39 | |
20:43
[Coke] joined
|
|||
timo | i'm a little surprised to hear wanted was involved in variable lowering | 20:49 | |
vrurg | I don't say it, I _thought_ it is. :) That's why I'd like to find out more about it. Otherwise I either monkey-copying from other code. Or trying to guess, sometimes in a hard way. | 20:52 | |
timo | i'm headachy today, i'm not sure if i can be of too much help | 20:54 | |
20:54
[Coke] left
|
|||
vrurg | timo: don't bother and get well soon! I'll figure out eventually. :) | 20:57 | |
timo | i can recommend asking many questions in here anyway | 21:00 | |
Geth | MoarVM: 8a684b3304 | (Stefan Seifert)++ | 2 files Fix out of bounds read of PHI facts in spesh During spesh optimization, we remove reads of registers with dead writers from PHI nodes. It could happen that the PHI node ended up with no registers to read at all. However the following analysis code assumed that we'd always have at least 1 register to read from, resulting in an array read out of bounds error and a variety of failure modes. ... (7 more lines) |
21:01 | |
MoarVM: 7d58542da1 | (Jonathan Worthington)++ (committed using GitHub Web editor) | 2 files Merge pull request #1610 from MoarVM/fix_phi_out_of_bounds_read Fix out of bounds read of PHI facts in spesh |
|||
MasterDuke | is there any difference between `has Int $.a is default(0)` and `has Int $a = 0` if $!a is never set to Nil? | 21:02 | |
lizmat | not sure anymore, jnthnwrthngtn reworked that part with new-disp I believe | 21:03 | |
21:26
[Coke] joined
|
|||
jnthnwrthngtn | MasterDuke: I suspect the first is cheaper | 21:29 | |
MasterDuke: Although it's probably relatively fine margins, and likely vanishes once PEA is far enough along | 21:30 | ||
nine: One merged, one with minor comment but approved, the other one I need more time on, but posted an initial concern. | 21:31 | ||
MasterDuke | performance-wise, i tried a sort of micro-benchmark and it seemed like maybe `= 0` was *very* slightly faster (with MVM_SPESH_BLOCKING=1), but it was small enough i'd need to re-run a lot more times to be sure. anyway they're close enough i'm not going to change existing code | 21:32 | |
but in this case i was actually more interested in semantics | 21:33 | ||
jnthnwrthngtn | They should result in the same outcome (assuming a literal value) | 21:34 | |
MasterDuke | thanks, good to know | ||
jnthnwrthngtn | Actually if you never use $a beyond that point PEA may already be eating the difference. | ||
timo | spesh log could perhaps give a little bit of insight, but i couldn't analyze one right now | ||
jnthnwrthngtn | Also --profile and see if a load of Scalar allocations are getting optimized away :) | 21:35 | |
But if they aren't then a) the allocation dominates, b) the literal 0 means easy pickings for tossing out the type guard, and that leaves an attribute bind, which is a couple of machine instructions | 21:36 | ||
(And if PEA does work to its full potential and your loop literally is no more than a variable decl with default or an assignment, then both become an empty loop.) | 21:37 | ||
MasterDuke | m: use nqp; class I { has Int $.n is default(0); method from-posix-nanos(I:U: Int:D $nanos --> I:D) { nqp::p6bindattrinvres(nqp::create(I),I,q|$!n|,$nanos) } }; my $a; my $b = nqp::time; $a = I.from-posix-nanos($b) for ^10_000_000; say now - INIT now; say $a | 21:38 | |
camelia | 0.266082961 I.new(n => 1638394682931903452) |
||
MasterDuke | m: use nqp; class I { has Int $.n = 0; method from-posix-nanos(I:U: Int:D $nanos --> I:D) { nqp::p6bindattrinvres(nqp::create(I),I,q|$!n|,$nanos) } }; my $a; my $b = nqp::time; $a = I.from-posix-nanos($b) for ^10_000_000; say now - INIT now; say $a | ||
camelia | 0.265698586 I.new(n => 1638394692289436231) |
||
MasterDuke | ^^^ was my benchmark | ||
timo | huh, does that even make a difference when you're using bindattr like that? | 21:40 | |
jnthnwrthngtn | Oh | 21:41 | |
Yeah, then it's irrelevant :) | |||
Because create doesn't run BUILDALL and friends anyway | |||
So there's no way the default closure will apply | 21:42 | ||
And the Scalar is being discarded | |||
m: for ^10_000_000 { my $x is default(0) }; say now - INIT now | |||
camelia | 1.430858666 | ||
jnthnwrthngtn | m: for ^10_000_000 { my $x = 0 }; say now - INIT now | 21:43 | |
camelia | 0.207846194 | ||
jnthnwrthngtn | Huh? :) | ||
m: for ^10_000_000 { my $x }; say now - INIT now | |||
camelia | 0.16197103 | ||
jnthnwrthngtn | OK, that surprises me a lot | ||
m: for ^10_000_000 { }; say now - INIT now | |||
camelia | 0.059409284 | ||
jnthnwrthngtn | I wonder if `is default` somehow frustrates lexical -> local lowering and thus PEA? | 21:44 | |
Worth a dig. Perhaps a LHF | |||
jnthnwrthngtn afk for a bit | |||
MasterDuke | 10000009 Scalar allocations | 21:45 | |
same with the `= 0` version | 21:46 | ||
is default(0): In total, 10004354 call frames were entered and exited by the profiled code. Inlining eliminated the need to create 9 call frames (that's 0%). | 21:47 | ||
= 0: In total, 9476 call frames were entered and exited by the profiled code. Inlining eliminated the need to create 19994888 call frames (that's 99.95%). | |||
21:48
linkable6 left,
linkable6 joined
|
|||
MasterDuke | `sp_runbytecode_v r13(2), liti64(140269206293408), liti16(0), r6(2), r5(2) # [015] could not inline 'set' (21) candidate 0: bytecode is too large to inline` maybe? | 21:49 | |
well, there were a couple 'too large' could not inlines, and one `inline-preventing instruction: takeclosure` + `could not inline '' (3) candidate 0: target has a :noinline instruction` | 21:52 | ||
timo | huh, 99.95% inlined should make a big performance difference | 22:02 | |
MasterDuke | is default(0) also did 687 GCs vs 114 for = 0 | 22:04 | |
timo | could the profiler have made an unfortunate impact somehow? | 22:10 | |
MasterDuke | well, we saw the 7x perf difference here on camelia | 22:11 | |
timo | what were you doing differently when you got a difference so small it wasn't easily measurable? | 22:12 | |
MasterDuke | i wasn't actually assigning, i was using bindattr | 22:13 | |
also i wasn't creating a new variable in the loop body | |||
timo | ok | 22:14 | |
that makes sense now | |||
MasterDuke | gist.github.com/MasterDuke17/be5bc...9435d0743c perf reports for the two cases | 22:15 | |
timo | i would really have expeted default(0) to be faster than = 0 since in theory the thing on the right could be code that needs to be run, but we do already recognize it's a static value, so why aren't the two the exact same | ||
MasterDuke | seems to me to be something about not being able to inline 'unit' because of the takeclosure | 22:16 | |
which doesn't happen in the spesh log of the assigning version | 22:17 | ||
timo | try isolating the for loop in its own sub? | 22:21 | |
MasterDuke | m: sub foo() { for ^10_000_000 { my $x is default(0) } }; foo; say now - INIT now | 22:22 | |
camelia | 1.725214816 | ||
MasterDuke | m: sub foo() { for ^10_000_000 { my $x = 0 } }; foo; say now - INIT now | ||
camelia | 0.20743934 | ||
timo | the default(0) is even slower now | 22:27 | |
+/- camelia being rather noisy i imagine | 22:28 | ||
MasterDuke | `sub foo { for ^10_000_000 { my $x is default(0) }; return 3 }; sub bar { say (5_000_000_000..5_000_001_000).grep(*.is-prime).tail; say foo; say (^1_000_000).pick gcd (^1_000_000).pick }; bar; say now - INIT now` shows the same behavior | 22:39 | |
in the after of 'foo'. `inline-preventing instruction: takeclosure` + `could not inline '' (3) candidate 0: target has a :noinline instruction` | 22:40 | ||
timo | maybe the output from --target=optimize gives a hint | 22:45 | |
MasterDuke | added to gist | 22:47 | |
gist updated with both versions | 22:49 | ||
oh. `- QAST::Var(lexical $x :decl(static))` for assign vs `- QAST::Var(lexical $x :decl(contvar)) :lexical_used_implicitly<?>` for is default | 22:52 | ||
gist.github.com/MasterDuke17/be5bc...ze-L71-L96 compared to gist.github.com/MasterDuke17/be5bc...ze-L72-L88 | 22:54 | ||
ha, with rakudo option `--optimize=0`, assign takes 2.8s, but is default takes the same 1.3 | 23:02 | ||
japhb | MasterDuke: Another big difference between those two gists is the presence of an extra block in the second (versus a lot more attribute binding in the first) | 23:14 | |
MasterDuke | adding `$v.implicit-lexical-usage = False;` here github.com/rakudo/rakudo/blob/mast...le.pm6#L78 drops 0.05s off the is default version, but does have a single failing spectest | 23:17 | |
not ok 182 - Failure.new as a default value on an unconstrained Scalar works | 23:18 | ||
# Failed test 'Failure.new as a default value on an unconstrained Scalar works' | |||
# at t/spec/S02-names/is_default.rakudo.moar line 395 | |||
# Error: Object of type Failure in QAST::WVal, but not in SC | |||
23:51
childlikempress joined,
moon-child left
23:52
childlikempress is now known as moon-child
|