🦋 Welcome to the IRC channel of the core developers of the Raku Programming Language (raku.org #rakulang). This channel is logged for the purpose of history keeping about its development | evalbot usage: 'm: say 3;' or /msg camelia m: ... | Logs available at irclogs.raku.org/raku-dev/live.html | For MoarVM see #moarvm Set by lizmat on 8 June 2022. |
|||
00:08
JimmyZhuo joined
|
|||
JimmyZhuo | timo: 6guts.wordpress.com/2023/06/18/rec...re-summit/ (started with There was a long-standing problem that looked like the regex engine massively leaked memory in certain cases) | 00:12 | |
timo: and 6guts.wordpress.com/2019/01/02/my-...-for-2019/ (Improving compilation times) | |||
timo: consider CORE.c.setting is only 3.2MB, compiling it takes about 2GB memory and long time, it could be a lot of improvement here. but it's also a headache job. | 00:24 | ||
00:49
JimmyZhuo left
|
|||
nine | I profile CORE.c.setting compilation and uploaded the SQL to niner.name/files/profile.sql.zst | 07:13 | |
07:19
lizmat joined
|
|||
nine | The trouble with profiling CORE.c.setting is that when the profiler finds nqp::gethllsym('Raku', '@END_PHASERS') it attaches a new one with the self.dump_profile_data call, but those END phasers are never executed. | 07:26 | |
I simply removed that check: gist.github.com/niner/b5c6fa2ab415...31c669a0be | 07:27 | ||
07:34
lizmat left
|
|||
nine | One thing that occured to me recently: since RakuAST nodes are supposed to be usable from Raku user space code, we use IMPL-WRAP-LIST everywhere where we return some NQP array to turn it into a proper List. All internal callers then use IMPL-UNWRAP-LIST to get at the NQP array. | 07:48 | |
But, but, BUT! Every method call is already wrapped in an nqp::hllize op precisely because it could be a call to a foreign language (such as NQP). hllize already turns NQP arrays into Lists. Thus I'm pretty sure all those IMPL-WRAP-LIST calls and the creation of List objects that goes with them are unnecessary. | 07:50 | ||
IMPL-UNWRAP-LIST on the other hand may have to stay as its allowed to e.g. subclass RakuAST nodes in Raku. | 07:51 | ||
We spend 10 % of the time in GC | 08:26 | ||
08:32
lizmat joined
08:47
lizmat left
12:51
MasterDuke joined
|
|||
MasterDuke | heh. i can't even use moarperf to open nine++'s profile on this laptop (only 24gb of ram). it's opening on my desktop (32gb of ram), but it's slow | 12:54 | |
currently taking multiple minutes and hasn't loaded the overview page yet | |||
timo: got any suggestions to make it about 100x faster? | 12:56 | ||
could it be multi-threaded at all? it's pegging a core, but just one | 13:00 | ||
well, i can see the routines page. interesting, compile_var (NQP::src/vm/moar/QAST/QASTCompilerMAST.nqp:1596) is 0% jitted, even though it's called 523k times | 13:04 | ||
will have to make a spesh log on my desktop to see why... | 13:05 | ||
in jnthn's blog that JimmyZhuo linked he mentioned adjusting the nursery for large arrays. that still hasn't been done, but would probably be pretty simple. would it also make sense for hashes? | 13:31 | ||
6guts.wordpress.com/2023/06/18/rec...re-summit/ | |||
13:39
MasterDuke left
|
|||
timo | i can't get the sql file into sqlite3 because it says "out of memory", but there's probably a setting or command or something i can use to get sqlite3 to allow more | 14:05 | |
... when we output sql data, do we correctly escape stuff or do we open up the possibility of SQL injection through crafted sub names or type names? :D | 14:12 | ||
ah we weren't splitting the "insert into allocations" line when spitting out the sql i guess? | 14:16 | ||
also got an idea for getting the sql file ever so slightly smaller ... | 14:41 | ||
14:51
MasterDuke joined
|
|||
MasterDuke | timo: yeah, looks like calls is the only one split. the rest (except for allocations) are all pretty small, i guess allocations must have also been so when originally testing. should probably split it also | 14:52 | |
timo | yeah, i'm writing one sub that handles it generically | ||
MasterDuke | nice | ||
timo | routines may also become relatively big, though nowhere near as big as calls of course | 14:53 | |
MasterDuke | hm, so far most of the JIT mentions in the after of `compile_var` are a couple 'not devirtualized (type unknown)' | 15:05 | |
but i get `JIT was successful and compilation took 11211us`. wonder why nine's profile showed 0% jit? | |||
it is quite large. 'Frame size: 26904 bytes (2176 from inlined frames)' and 'Bytecode size: 88715 byte' | 15:06 | ||
it's not the biggest though | 15:08 | ||
timo | is it mostly inlined? | ||
if it's inlined into a frame that's speshed but not jitted, that could be the explanation | |||
MasterDuke | i don't think it could be inlined, it's way too big, right? | 15:09 | |
timo | i haven't looked yet | 15:11 | |
MasterDuke | `inline of 'compile_var'` doesn't show up in the spesh log | ||
timo | still up to my elbows in the sql profile output code | ||
MasterDuke | no worries. probably going to be afk for a while soon | 15:13 | |
15:36
MasterDuke left
16:41
MasterDuke joined
|
|||
MasterDuke | so far it's looking like most IMPL-WRAP-LISTs can be removed | 16:56 | |
timo | is there known trouble with state variables and rakudo's -n flag? | 17:09 | |
probably a "don't fix it, rakuast will be able to do it better" kind of thing | 17:10 | ||
MasterDuke | i think so | 17:14 | |
timo | it was much much shorter to write with something starting with lines() instead of using -n and state | 17:17 | |
MasterDuke | ha | 17:18 | |
timo | anyway I made a thing that splits the much-too-long lines and now sqlite3 can ingest the file just fine | ||
MasterDuke | nice | ||
timo | rakudo -p -e 'if .chars > 20000 { my $tblnm = .substr(0, 100).comb(/:s INSERT INTO <( \S+ )> VALUES /); my @bits; for .split("), (").rotor(5000, :partial) -> $values { @bits.push($values.join("), (")); }; $_ = @bits.join(");\nINSERT INTO $tblnm VALUES ("); }' profile.sql > splitted_profile.sql | 17:20 | |
MasterDuke | would it make sense to add that to the profile writing code in nqp? | 17:21 | |
timo | no | ||
only useful for files that were already written and couldn't be loaded | |||
we have 1 (one) of those | |||
BBIAB | |||
MasterDuke | well i mean we split up calls when writing the profile, should we generically split up anything too long? | 17:22 | |
timo | yes, that's a patch I have locally also | 17:23 | |
MasterDuke | ah | 17:24 | |
timo | but i didn't want to have to re-profile core setting compilation | 17:32 | |
MasterDuke | well, if you push you patch somewhere i'm going to have to try to re-profile all these IMPL-WRAP-LIST removals... | 17:33 | |
timo | yes, i will very soon | ||
MasterDuke | cool | 17:35 | |
btw, did you see github.com/timo/moarperf/issues/18 ? | 17:36 | ||
timo | yes, I've got WIP on updating all the dependencies including many code changes | 17:40 | |
nine | MasterDuke: have you tested using RakuAST nodes in Raku code after IMPL-WRAP-LIST removal? | ||
Geth | nqp/profile_sql_output_split_more_stuff: 320f5ad0ad | (Timo Paulssen)++ | src/vm/moar/HLL/Backend.nqp Refactor sql output, split more of the statements we can output nine++ tested sql profiler output with core setting compilation using RakuAST, and the resulting profile had an "allocations" insert statement that was 221 million characters long, which sqlite3 didn't like parsing as a single statement. This patch also has some attempts for re-using lists more. |
17:42 | |
MasterDuke | nine: not sure exactly what you mean. i'm testing by branching off the bootstrap-rakuast branch and building rakudo and running spectests | ||
timo | oops, my splitting must have b0rked the profile maybe | 17:44 | |
the allocations data may not only have been a bit too much to go into sqlite as one line, but may also take a little long to create stats over for moarperf to use, which is one of the things it does during startup, and the rest of the stuff is not usable before that is finished | 17:54 | ||
MasterDuke | it might be faster to create some indices first | 17:55 | |
nine | MasterDuke: the point of those IMPL-WRAP-LIST calls has never been to benefit the code in src/Raku/ast. It's so that user level Raku code can use RakuAST nodes just fine. | 17:58 | |
> RAKUDO_RAKUAST=1 ./rakudo-m -e 'say RakuAST::ArgList.new.args.^name' | 17:59 | ||
List | |||
MasterDuke | don't we have some tests that use rakuast nodes? or are those in m-test, not m-spectest? | ||
nine | There are not spectests for RakuAST. | ||
Would be weird as it's still in development | |||
timo | only the stuff in t/ that's not in t/spec | 18:00 | |
i assume | |||
MasterDuke | `dan@athena:~/r/rakudo$ RAKUDO_RAKUAST=1 ./rakudo-m -e 'say RakuAST::Routine.new.attach-target-names.^name' | 18:02 | |
List` | |||
that was `self.IMPL-WRAP-LIST(['routine', 'block'])` and is now `['routine', 'block']` | |||
nine | excellent! | 18:03 | |
Without the IMPL-WRAP-LIST we'll probably be able to spesh away the IMPL-UNWRAP-LIST calls completely | 18:07 | ||
MasterDuke | fwiw, building rakudo doesn't really seem faster | 18:10 | |
timo | actually, difficult to say what exactly is keeping the moarperf from starting up, as it's doing sqlite3 code on three of its threads | ||
MasterDuke | hm. the all-allocs query (minus some of the generated columns) runs pretty fast when i'm in sqlite command line | 18:26 | |
timo | hm, can i tell sqlite3 "hey this db is readonly now, so no need to lock out other threads" or so? | 18:27 | |
MasterDuke | pretty sure there's a way to do that, but i've only ever programmatically used sqlite directly from c | 18:29 | |
even what i believe is the full query is pretty fast. it might just be the json handling in raku that's slow? | 18:30 | ||
timo | right, you can put ?mode=readonly on the path you connect to. not sure if DBIish wants me to give that parameter differently | 18:32 | |
no, it was working inside sqlite_step or what it's called, that should be the SQL engine | |||
MasterDuke | does DBIish use sqlite3_open_v2? apparently that's what you need to use to be able to pass the readonly flag | 18:33 | |
doesn't look like it does | 18:34 | ||
timo | overview-data in 676.446943261 | ||
There were 223,843,653 object allocations. The dynamic optimizer was able to eliminate the need to allocate 11,760 additional objects (that's 0.01%) | 18:35 | ||
d'oh :D | |||
MasterDuke | `all-allocs in 1690.779572046` | ||
timo | woof. | 18:36 | |
ok so flame graphs really don't get a lot of worth out of very many entries under the same node, since they would be completely just absolutely tiny | 18:38 | ||
it also looks like moarperf can accidentally work on the same kind of thing multiple times in parallel. the "only do it once" logic doesn't lock a new attempt out from starting | 18:41 | ||
MasterDuke | oops | 18:44 | |
timo | no i misread the code, actually that is just done inside load_file, not called every time by the Routes i think | 18:47 | |
no i double misread | 18:49 | ||
routine overview in 13.110527253 | 18:50 | ||
MasterDuke | that was interesting. i just restarted it and now `all-allocs in 9.995467819` | 18:51 | |
timo | i think it matters a lot if the flamegraph data is requested at the same time or not | 18:53 | |
MasterDuke | maybe the flamegraph should have to be explicitly requested? | 18:57 | |
timo | also a possibility | 18:58 | |
whew, the performance of the flame graph function seems to suck | 18:59 | ||
MasterDuke | the js or the raku? | ||
ha. i nearly had a heart attack when i saw `Stage parse : 60.037` and then i remembered i was ssh'ed into my desktop | 19:01 | ||
timo | i put a "$totals++" inside sub children-of and it's reached 59 after 30 seconds and 119 at 60 seconds | ||
hrmpf. is there a `.blabla` i can put as a query to my sqlite dbiish handle or something to get it readonly? if i put ?mode=readonly it just creates a file that has that name :D | 19:03 | ||
ah it may only work if it's a file:// url actually | |||
DBDish::SQLite: Can't connect: unable to open database file (14) | 19:04 | ||
is that about the connect function you mentioned before? | |||
MasterDuke | i don't think it'll work, i think DBIish has to use *open_v2 instead of just *_open | ||
the non-v2 doesn't take flags | 19:05 | ||
timo | SQLITE_OPEN_NOMUTEX | 19:08 | |
The new database connection will use the "multi-thread" threading mode. This means that separate threads are allowed to use SQLite at the same time, as long as each thread is using a different database connection. | |||
i want this :3 | |||
MasterDuke | just a yak shave away | 19:09 | |
timo | and sqlite3_config can also be used to set up multi threaded mode before any connections are made | ||
19:32
finanalyst left,
finanalyst joined
|
|||
timo | i don't dare invent the API for this, but I'm giving multithreaded a try | 19:35 | |
19:37
finanalyst left
|
|||
MasterDuke | nice | 19:38 | |
timo | oh i might not even have had to change anything inside DBIish as long as i just use multiple db handles | 19:40 | |
19:48
finanalyst joined
19:54
finanalyst left
|
|||
timo | okay getting some amount of multi-core utilisation | 19:55 | |
anyway, i bet everybody's waiting with bated breath for the top routines by exclusive time? | 20:00 | ||
nine | Well... | 20:01 | |
20:01
MasterDuke left
20:02
finanalyst joined
|
|||
nine | ...to be honest, actually not terribly much. I think there's another list that's more useful, albeit more difficult to interpret. | 20:02 | |
While common wisdom is that you look at the top routine by exclusive time and optimize that, this is only useful for finding place where micro optimizations pay off. | |||
timo | that's true, yeah | 20:04 | |
you can't see in there when it'd be worth chopping off entire branches of the call graph tree | |||
searching for stuff that's got "ms per entry" finds stuff that has fewer entries and still high-ish time | 20:05 | ||
nine | At this point I suspect that most gains can be found algorithmically. Those are more visible in the inclusive tab, but of course the top one there will just be the entry point of your program. | 20:06 | |
timo | yes indeed | ||
nine | So it takes a lot of filtering to find the good stuff | ||
timo | method-def from Grammar for example has 25.9s inclusive total, 7.9k entries, so 3.28ms per entry. but i have to assume that also includes the body of the method, and probably more than two thirds of the code in the setting would be inside methods? | 20:07 | |
20:08
lizmat joined
|
|||
timo | statement-control:sym<if> is similar to that; 1.4k entries, 5.7s total inclusive, so 4ms per entry | 20:09 | |
same reasoning makes this not be a surprise: block has 2k entries 32s total, 15.46ms per second | 20:10 | ||
looking at exclusive time again, MATCH is first place, run_alt is second place. 2s and 1.4s respectively | 20:16 | ||
mark-whitespace in Grammar has 1s exclusive, 2.34s inclusive. the time in callees is a good chunk horizontal-whitespace with 475ms, !alt with 441ms, MARKER with 107ms and the rest is less | 20:19 | ||
attach from Actions has 1.14m entries, 12.3s total inclusive, almost all of which is under to-begin-time. i don't actually know what attach is for? maybe it's completely unsurprising | 20:22 | ||
hum. i'm not sure this is correct? the callees of as_mast (QASTCompilerMAST:651) has many different compile_node subs, and they all have inclusive times that add up to a multiple of the inclusive time of as_mast, but none of them are more than as_mast. now i have to wonder if that's a problem in the frontend, in the sql queries, or in the profiler itself | 20:27 | ||
20:50
lizmat left
21:53
finanalyst left
|
|||
nine | IIRC as_mast can also be called somewhere down from the original as_mast call | 22:04 | |
attach = set node origin, bring node to begin time and attach it to the tree, so expected |