ugexe I'm trying to debug why macOS Monterey usually locks up when running e.g. shell("ls"), but I don't know if this lldb output actually has anything useful -- gist.github.com/ugexe/8c34b26c5edb...cf377ae334 01:44
nine Won't work, but why not give it a try anyway: export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES 08:00
Nicholas good *, #moarvm 08:50
jnthnwrthngtn Are threads 4 and 5 making progress, or is it just coincidence that they are both deep in OS things and trying to acquire a lock? 08:58
meeting, bbiab 08:59
timo ugexe: do we have info about the process that was spawned by thread 5? 10:07
jnthnwrthngtn I'd guess thread 5 is the shell to run `ls`, the thing that surprises me more is that uv_cpu_info leads to a fork 11:19
ugexe if i repeat the example in the gist it gives the same relevant lldb output each time, so i assume its not making progress (same if i resume + interrupt) 13:14
jnthnwrthngtn OK, so it really is stuck there. 13:17
lizmat github.com/rakudo/rakudo/pull/4659 13:18
Make sure that nqp::cpucores is only called once ever
if this fixes it, should probably be done at the MoarVM level 13:19
jnthnwrthngtn If this fixes it, it should probably be reported to libuv and maybe Apple. 13:23
ugexe that does actually work around it 13:30
maybe someone with a non-Monterey mac could run lldb on `shell("ls")` to see what it used to do 13:32
i can do more debugging later if anyone has any suggestions, but i'm off to work for now 13:34
lizmat $ r 'use nqp; nqp::cpucores for ^10000' 14:26
$ r 'use nqp; Kernel.cpu-cores for ^10000'
that's with #4659 applied 14:27
ugexe Kernel.cpu-cores is public api though, so if the number of cpu cores visible can change at runtime (i have no idea) then we probably shouldnt cache that value for that method 14:29
[Coke] I have a (checks) Big Sur mac if you want me to check something. 14:31
timo you can turn off cores via /sys somewhere 14:33
jnthnwrthngtn I think hot-changing number of CPU cores is a thing, for example a virtual machine may be able to have more cores added while it's running. 14:35
ugexe could someone tell [Coke] the lldb or gdb commands to run a script (containing just `shell("ls")`) so they can get the `bt all` output?
i don't think the runner works, and I only otherwise know how to attach to a running process (that script will finish running before he can attach to it)
jnthnwrthngtn However, I'm not sure it's terribly common, and since our thread pool already fixates its maximum at creation time anyway... ;) 14:36
timo if a user is calling that method regularly, they will have a reason, probably?
jnthnwrthngtn Well, TPS was doing nqp::cpucores regularly, but honestly I don't remember thinking "oh, we should do this for a fresh value", it was more "make things work" and "don't expect it to be too costly" 14:39
[Coke] ugexe: what about 'shell("ls");sleep 30' or somesuch?
someone forwarded me an article on hardening release builds that we might want to adopt in part. 14:41
cheatsheetseries.owasp.org/cheatsh...eoi7qt4p2Q 14:42
ugexe TPS caching the result from Kernel.cpu-cores wouldnt surprise me, but Kernel.cpu-cores itself caching might 14:43
nine [Coke]: just gdb rakudo -e '...' is what I do 14:44
[Coke] and once I'm in gdb? 14:45
(Surprised to find I have a gdb lying around on my mac.) 14:46
ugexe `bt all` or `bt full` or some such
[Coke] both say "no stack" 14:48
nine First `run` 14:49
Then ctrl+c (or whatever on the mac) to interrrupt at the point where you want that backtrace
[Coke] sorry, yes just figured that out. run says:
Unable to find Mach task port for process-id 15321: (os/kern) failure (0x5).
(please check gdb is codesigned - see taskgated(8))
so I'm guessing the system is *also* suprised there's a gdb on my old mac. 14:50
give me a bit.
ugexe do you have lldb?
[Coke] aye 14:52
ugexe that might work better on a mac
[Coke] stub code executed, process exited with status = 1
(I have a moar-2021.10 release installed) 14:59
vrurg Doing `RAKUDO_OPTIMIZER_DEBUG=1 make install` in rakudo ends up with 'Cannot find method 'scope' on object of type QAST::Var+{QAST::SpecialArg}'. I'm stilling trying to figure out what's going on, but perhaps somebody has good ideas? So far I see that mixin in SpecialArg role sometimes wipes out parents array on HOW. 15:04
Could be a moar issue. 15:05
jnthnwrthngtn Maybe, but moar doesn't really know about the content of meta-objects; so far as it's concerned they're just, well, objects. 15:09
How do you conclude it wipes out the parents array? 15:10
vrurg jnthnwrthngtn: was dumping HOW fields. 15:36
Weird thing, when it happens to routine parameters (__lowered_param_*) they're ok right after mixin, but may start failing a few lines down, within the same lower_signature sub. Since there're no races could be involved, and initially they all have >0 number of parents, I conclude that the array gets emptied accidentally. Loss of data by the VM is not excluded yet. 15:41
Ok, probably it's pure NQP issue, after all. Tried with JVM backend and got 'Unhandled exception: Sub+{is-implementation-detail} object coerced to string'. Won't be able to work on this until later today, so if somebody wants to look into it – I don't mind. ;) 16:07
afk for a couple of hour.
ugexe gist.github.com/ugexe/8c34b26c5edb...cf377ae334 -- updated the gist to also include the lldb output from big sur (which does not fork) 16:08
the big sur example is on 2021.08 whereas monterey is on master fwiw 16:09
hmm in the big sur example it doesnt call uv__get_cpu_speed among other things 16:58
[Coke] ugexe++ 17:46
ugexe rakudo 2020.08.2 works, 2020.10 does not. these correspond to github.com/libuv/libuv/commit/87f0...e65e1b81c7 18:28
ugexe __si_module_static_mdns_block_invoke -- i cant find anything on si_module_static_mdns but it kind of feels strange it is being queried as part of dlopen 19:54
strange in that it looks like it might be mDNS 19:55
vrurg BTW, do we have an article/paper where the exact technical meaning of wanted/unwanted is explained? I was googling awhile ago, with no apparent success. 20:17
timo that's what powers "useless use of" messages
vrurg timo: that's where my understanding basically ends. :) It looks like there are nuances I'm not aware of. For example, does it affect lowering of vars? 20:32
timo hm, maybe that's only influenced by whether a lexical is used by an inner scope or not 20:33
vrurg It should. But somehow I probably fail to indicate this... Whatever, that's what I was debugging when encountered the NQP bug with mixins. So, the bug goes first. 20:35
timo how do you mean "indicate"? 20:36
vrurg First I thought that 'whanted' attribute of a node means no variables involved are to be lowered. Second, I see that lowering code is using some annotations to skip variables. But this part of the optimizer is rather big and I didn't get deeper into it yet. 20:39
timo i'm a little surprised to hear wanted was involved in variable lowering 20:49
vrurg I don't say it, I _thought_ it is. :) That's why I'd like to find out more about it. Otherwise I either monkey-copying from other code. Or trying to guess, sometimes in a hard way. 20:52
timo i'm headachy today, i'm not sure if i can be of too much help 20:54
vrurg timo: don't bother and get well soon! I'll figure out eventually. :) 20:57
timo i can recommend asking many questions in here anyway 21:00
Geth MoarVM: 8a684b3304 | (Stefan Seifert)++ | 2 files
Fix out of bounds read of PHI facts in spesh

During spesh optimization, we remove reads of registers with dead writers from PHI nodes. It could happen that the PHI node ended up with no registers to read at all. However the following analysis code assumed that we'd always have at least 1 register to read from, resulting in an array read out of bounds error and a variety of failure modes. ... (7 more lines)
MoarVM: 7d58542da1 | (Jonathan Worthington)++ (committed using GitHub Web editor) | 2 files
Merge pull request #1610 from MoarVM/fix_phi_out_of_bounds_read

Fix out of bounds read of PHI facts in spesh
MasterDuke is there any difference between `has Int $.a is default(0)` and `has Int $a = 0` if $!a is never set to Nil? 21:02
lizmat not sure anymore, jnthnwrthngtn reworked that part with new-disp I believe 21:03
jnthnwrthngtn MasterDuke: I suspect the first is cheaper 21:29
MasterDuke: Although it's probably relatively fine margins, and likely vanishes once PEA is far enough along 21:30
nine: One merged, one with minor comment but approved, the other one I need more time on, but posted an initial concern. 21:31
MasterDuke performance-wise, i tried a sort of micro-benchmark and it seemed like maybe `= 0` was *very* slightly faster (with MVM_SPESH_BLOCKING=1), but it was small enough i'd need to re-run a lot more times to be sure. anyway they're close enough i'm not going to change existing code 21:32
but in this case i was actually more interested in semantics 21:33
jnthnwrthngtn They should result in the same outcome (assuming a literal value) 21:34
MasterDuke thanks, good to know
jnthnwrthngtn Actually if you never use $a beyond that point PEA may already be eating the difference.
timo spesh log could perhaps give a little bit of insight, but i couldn't analyze one right now
jnthnwrthngtn Also --profile and see if a load of Scalar allocations are getting optimized away :) 21:35
But if they aren't then a) the allocation dominates, b) the literal 0 means easy pickings for tossing out the type guard, and that leaves an attribute bind, which is a couple of machine instructions 21:36
(And if PEA does work to its full potential and your loop literally is no more than a variable decl with default or an assignment, then both become an empty loop.) 21:37
MasterDuke m: use nqp; class I { has Int $.n is default(0); method from-posix-nanos(I:U: Int:D $nanos --> I:D) { nqp::p6bindattrinvres(nqp::create(I),I,q|$!n|,$nanos) } }; my $a; my $b = nqp::time; $a = I.from-posix-nanos($b) for ^10_000_000; say now - INIT now; say $a 21:38
camelia 0.266082961
I.new(n => 1638394682931903452)
MasterDuke m: use nqp; class I { has Int $.n = 0; method from-posix-nanos(I:U: Int:D $nanos --> I:D) { nqp::p6bindattrinvres(nqp::create(I),I,q|$!n|,$nanos) } }; my $a; my $b = nqp::time; $a = I.from-posix-nanos($b) for ^10_000_000; say now - INIT now; say $a
camelia 0.265698586
I.new(n => 1638394692289436231)
MasterDuke ^^^ was my benchmark
timo huh, does that even make a difference when you're using bindattr like that? 21:40
jnthnwrthngtn Oh 21:41
Yeah, then it's irrelevant :)
Because create doesn't run BUILDALL and friends anyway
So there's no way the default closure will apply 21:42
And the Scalar is being discarded
m: for ^10_000_000 { my $x is default(0) }; say now - INIT now
camelia 1.430858666
jnthnwrthngtn m: for ^10_000_000 { my $x = 0 }; say now - INIT now 21:43
camelia 0.207846194
jnthnwrthngtn Huh? :)
m: for ^10_000_000 { my $x }; say now - INIT now
camelia 0.16197103
jnthnwrthngtn OK, that surprises me a lot
m: for ^10_000_000 { }; say now - INIT now
camelia 0.059409284
jnthnwrthngtn I wonder if `is default` somehow frustrates lexical -> local lowering and thus PEA? 21:44
Worth a dig. Perhaps a LHF
jnthnwrthngtn afk for a bit
MasterDuke 10000009 Scalar allocations 21:45
same with the `= 0` version 21:46
is default(0): In total, 10004354 call frames were entered and exited by the profiled code. Inlining eliminated the need to create 9 call frames (that's 0%). 21:47
= 0: In total, 9476 call frames were entered and exited by the profiled code. Inlining eliminated the need to create 19994888 call frames (that's 99.95%).
MasterDuke `sp_runbytecode_v r13(2), liti64(140269206293408), liti16(0), r6(2), r5(2) # [015] could not inline 'set' (21) candidate 0: bytecode is too large to inline` maybe? 21:49
well, there were a couple 'too large' could not inlines, and one `inline-preventing instruction: takeclosure` + `could not inline '' (3) candidate 0: target has a :noinline instruction` 21:52
timo huh, 99.95% inlined should make a big performance difference 22:02
MasterDuke is default(0) also did 687 GCs vs 114 for = 0 22:04
timo could the profiler have made an unfortunate impact somehow? 22:10
MasterDuke well, we saw the 7x perf difference here on camelia 22:11
timo what were you doing differently when you got a difference so small it wasn't easily measurable? 22:12
MasterDuke i wasn't actually assigning, i was using bindattr 22:13
also i wasn't creating a new variable in the loop body
timo ok 22:14
that makes sense now
MasterDuke gist.github.com/MasterDuke17/be5bc...9435d0743c perf reports for the two cases 22:15
timo i would really have expeted default(0) to be faster than = 0 since in theory the thing on the right could be code that needs to be run, but we do already recognize it's a static value, so why aren't the two the exact same
MasterDuke seems to me to be something about not being able to inline 'unit' because of the takeclosure 22:16
which doesn't happen in the spesh log of the assigning version 22:17
timo try isolating the for loop in its own sub? 22:21
MasterDuke m: sub foo() { for ^10_000_000 { my $x is default(0) } }; foo; say now - INIT now 22:22
camelia 1.725214816
MasterDuke m: sub foo() { for ^10_000_000 { my $x = 0 } }; foo; say now - INIT now
camelia 0.20743934
timo the default(0) is even slower now 22:27
+/- camelia being rather noisy i imagine 22:28
MasterDuke `sub foo { for ^10_000_000 { my $x is default(0) }; return 3 }; sub bar { say (5_000_000_000..5_000_001_000).grep(*.is-prime).tail; say foo; say (^1_000_000).pick gcd (^1_000_000).pick }; bar; say now - INIT now` shows the same behavior 22:39
in the after of 'foo'. `inline-preventing instruction: takeclosure` + `could not inline '' (3) candidate 0: target has a :noinline instruction` 22:40
timo maybe the output from --target=optimize gives a hint 22:45
MasterDuke added to gist 22:47
gist updated with both versions 22:49
oh. `- QAST::Var(lexical $x :decl(static))` for assign vs `- QAST::Var(lexical $x :decl(contvar)) :lexical_used_implicitly<?>` for is default 22:52
gist.github.com/MasterDuke17/be5bc...ze-L71-L96 compared to gist.github.com/MasterDuke17/be5bc...ze-L72-L88 22:54
ha, with rakudo option `--optimize=0`, assign takes 2.8s, but is default takes the same 1.3 23:02
japhb MasterDuke: Another big difference between those two gists is the presence of an extra block in the second (versus a lot more attribute binding in the first) 23:14
MasterDuke adding `$v.implicit-lexical-usage = False;` here github.com/rakudo/rakudo/blob/mast...le.pm6#L78 drops 0.05s off the is default version, but does have a single failing spectest 23:17
not ok 182 - Failure.new as a default value on an unconstrained Scalar works 23:18
# Failed test 'Failure.new as a default value on an unconstrained Scalar works'
# at t/spec/S02-names/is_default.rakudo.moar line 395
# Error: Object of type Failure in QAST::WVal, but not in SC