Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:08
reportable6 left
00:23
squashable6 joined
01:09
reportable6 joined
01:37
CaCode left
02:37
tellable6 left,
greppable6 left,
squashable6 left,
bisectable6 left,
committable6 left,
bloatable6 left,
evalable6 left,
statisfiable6 left,
releasable6 left,
sourceable6 left,
quotable6 left,
benchable6 left,
linkable6 left,
shareable6 left,
notable6 left,
reportable6 left,
coverable6 left,
unicodable6 left,
nativecallable6 left,
nativecallable6 joined,
linkable6 joined
02:38
greppable6 joined,
bisectable6 joined
02:39
squashable6 joined,
reportable6 joined,
sourceable6 joined,
coverable6 joined
02:40
tellable6 joined
03:24
vrurg joined
03:27
vrurg_ left
03:37
shareable6 joined
03:38
evalable6 joined
03:39
bloatable6 joined,
notable6 joined,
benchable6 joined
03:40
statisfiable6 joined,
vrurg left,
vrurg joined
04:39
unicodable6 joined,
quotable6 joined
04:50
CaCode joined
05:40
committable6 joined
05:46
samebchase joined
06:07
reportable6 left
06:40
releasable6 joined,
vrurg_ joined
06:42
vrurg left
08:25
CaCode left
|
|||
Geth | MoarVM/master: 4 commits pushed by (Nicholas Clark)++, niner++ | 08:56 | |
09:10
reportable6 joined
09:35
CaCode joined
09:48
Colt left
09:49
Colt joined
|
|||
nine | Looks like LibXML suffers from at least 2 bugs. One spesh-related (but not JIT) and one that appears even without spesh. And one test seems to get stuck currently because LibXML is trying to fetch something from the network but just hangs instead (or runs into a timeout, wasn't patient enough to check) | 09:53 | |
10:49
evalable6 left,
linkable6 left
10:50
linkable6 joined
10:51
evalable6 joined
|
|||
dogbert17 | nine: still down in the signed/unsigned rabbit hole? | 10:59 | |
nine | That's on pause while I am debugging Blin issues | 11:01 | |
dogbert17 | I stumbled upon a, what I believe is a nativecall related, bug yesterday | 11:03 | |
github.com/MoarVM/MoarVM/issues/1621 | 11:05 | ||
nine | Oh, the non-spesh bug exposed by LibXML is about Proxy. Once we encounter a Proxy in the args list, we stop processing arguments and delegate to raku-native-dispatch-deproxy instead. This causes us to not install guards for the remaining arguments | 11:10 | |
dogbert17 | so you found one of the bugs already, impressive | 11:11 | |
nine | Now I've even got a fix. Though it's kinda awful and I don't understand 100 % why it doesn't work the other way. | 11:59 | |
12:07
reportable6 left
|
|||
MasterDuke | how do i turn on github.com/rakudo/rakudo/blob/mast...#L301-L305 when building rakudo? moarvm/nqp/rakudo were all built with `make MVM_TRACING=1 install`, but when i run with --tracing it's instead going to the default (even though the value would be triggering that case) | 12:20 | |
also, when i just remove that #if, tracing still doesn't happen, though maybe that's because the MVM_interp_enable_tracing call is happening before the MVM_vm_create_instance call? | 12:24 | ||
well, moving MVM_interp_enable_tracing after the MVM_vm_create_instance doesn't change anything | 12:27 | ||
14:03
CaCode left
14:08
reportable6 joined
14:47
[Coke] left
14:51
[Coke] joined
15:54
Nicholas left
|
|||
nine | So, I've rewritten Proxy support in NativeCall so that instead of the clever but inefficient trick that would require that ugly fix, it works like multi dispatch and uses the ProxyReaderFactory. Works like a charm, except that now in a few cases the dispatch just does not get resumed | 16:44 | |
16:58
linkable6 left,
evalable6 left
|
|||
dogbert17 | uh oh | 17:00 | |
MasterDuke | every problem in computer science can be solved by adding a layer of indirection, time to break out the ProxyReaderFactoryFactory! | 17:09 | |
nine | This isn't Java. The solution is obvioulsy a ProxyReaderFactoryProxy | 17:39 | |
lizmat | .oO( with maybe a touch of sudo ? ) |
17:50 | |
17:58
Guest1254 joined
18:00
evalable6 joined
18:08
reportable6 left
18:09
reportable6 joined
18:59
linkable6 joined
19:36
Guest1254 left
19:47
Guest125 joined
19:48
Guest125 left
|
|||
japhb | lizmat: sudo MakeMeAProxyReaderFactoryProxy ? | 20:27 | |
sudo write-my-boilerplate | 20:28 | ||
21:13
kjp left
|
|||
timo | `no nonsense;` | 21:20 | |
MasterDuke | huh. using alloca where possible in the rakudo runner took 56,532,210 more instructions for 100 runs of `MVM_SPESH_BLOCKING=1 raku -e ''` | 21:31 | |
timo | oh? that is odd | 21:34 | |
MasterDuke | i do add conditionals for if the size is too big, but i still sort of expected it to be few instructions overall. going to check elapsed time with /usr/bin/time now | 21:36 | |
timo | huh | ||
MasterDuke | well, /usr/bin/time isn't very precise, but for 100 runs it thinks using alloca is 0.19s slower (in total) | 21:41 | |
the patch gist.github.com/MasterDuke17/497d9...9b29845492 if you're curious | 21:42 | ||
maybe some of those checks for the size can be assumed to always be ok and removed | 21:44 | ||
21:47
kjp joined
|
|||
timo | ok the first thing i see is file path lengths | 21:51 | |
we should be able to get a proper value for "maximum file path length the system allows" | |||
so anything more than that would be an error later anyway | |||
MasterDuke | "Linux has a maximum filename length of 255 characters for most filesystems (including EXT4), and a maximum path of 4096 characters" to quote the first hit on google | 21:54 | |
timo | we'll want to have this value for every system we support | ||
and of course never accidentally write past our allocated buffer when we are fed something longer | |||
MasterDuke | looks like for bsd max path is 1024 | 21:55 | |
and macos | 21:56 | ||
oh, and windows is 260 | |||
timo | oh christ, 260? | 21:57 | |
that's nothing?! | |||
21:57
harrow left
|
|||
MasterDuke | well, of course it's more complicated than that docs.microsoft.com/en-us/windows/w...limitation | 21:57 | |
but that looks like the initial default | 21:58 | ||
i'll just remove all those checks and see if it's faster. if not, no reason to continue the experiment | 22:00 | ||
timo | ah, good point | 22:02 | |
to get an upper and lower bound | |||
MasterDuke | huh. even with no checks, 6,889,088 more instructions for alloca and 0.04s more time according to /usr/bin/time | 22:23 | |
22:26
harrow joined
|
|||
timo | that is very odd. what does alloca actually compile to, then? | 22:39 | |
just the call into malloc is short, but then you'll also run through malloc, which does a bunch of stuff, whereas alloca should essentially be barely a change at all?? | 22:40 | ||
Ir also goes by cache lines, right? could it have something to do, then, with little changes giving us whole-cacheline differences at once? | 22:42 | ||
even if the malloc implementation just stays in cache most of the time, it'd still be more code than many alloca usages combined? | |||
but 7 million is out of how much? 10,000 million? | 22:43 | ||
MasterDuke | what do you mean, out of? | 22:52 | |
95769046524 total instructions for 100 runs with alloca, 95762157436 total instructions on master | 22:53 | ||
timo | m: say 95769046524 * 100 / 95762157436 | 22:54 | |
camelia | 100.007193956553 | ||
timo | i wonder. will this be difficult to measure reliably until we've made everything else a lot faster already? | 22:55 | |
perhaps the big difference is when we have a multithreaded program, but valgrind wouldn't tell us in that case | 22:56 | ||
also, can you try jemalloc or some other malloc implementation? | 22:57 | ||
MasterDuke | well, i figure the empty program is the best possible case for this since this is just about trying to speedup startup | 22:58 | |
i'll try with mimalloc. btw, did you see my comments about using it a little while ago? | 22:59 | ||
timo | i did not | 23:00 | |
MasterDuke | colabti.org/irclogger/irclogger_lo...-10-16#l56 | 23:01 | |
heh. with mimalloc LD_PRELOADed, alloca is .17 faster than master (though guess now i have to try master with mimalloc) | 23:03 | ||
timo | oh i see that i actually even replied to that | ||
.17 seconds? | |||
MasterDuke | they just released 2.0, i think we should seriously consider bundling/usring it with moarvm | 23:04 | |
well, sum of all the elapsed times with master is 1827, 1649 for alloca+mimalloc | 23:05 | ||
timo | empty program still? | ||
MasterDuke | yeah | ||
timo | that means a regular run of empty program is 1.8 seconds? | 23:06 | |
MasterDuke | 0.18 | ||
timo | you only used 100 for valgrind i guess? | ||
MasterDuke | for both | ||
timo | m: say 1827 / 100 | 23:07 | |
camelia | 18.27 | ||
timo | i'm confused :) :) | ||
MasterDuke | divide by 100 again | 23:08 | |
/usr/bin/time reports 0.18, i log 18 to a file (and do that 100 times) | 23:09 | ||
and then just sum all the lines in the file | 23:14 | ||
hm, wonder if using hyperfine would be more rigorous | 23:16 | ||
23:19
discord-raku-bot left
|
|||
MasterDuke | master+mimalloc is 1561 for startup time | 23:19 | |
23:19
discord-raku-bot joined
|
|||
MasterDuke | so somehow it's really looking like alloca isn't helping in this case | 23:20 | |
(but mimalloc is) | |||
ugh, so much slower to run callgrind 100 times... | 23:25 | ||
timo | oh for sure | ||
MasterDuke | an interesting yet involved experiment would be to rip out the fsa and just see how mimalloc does | 23:28 | |
timo | mhh the code is already there, you'd just want to toss out the size checks including allocating the size bit before the data | 23:29 | |
MasterDuke | oh right, FSA_SIZE_DEBUG is essentially that, isn't it? | 23:30 | |
interesting... | |||
maybe that'll be a christmas break project | |||
alloca+mimalloc is 162919560 instructions more than master+mimalloc | 23:31 | ||
and master is 9017008487 instructions more than master+mimalloc | 23:33 | ||
all total for 100 runs |