🦋 Welcome to the IRC channel of the core developers of the Raku Programming Language (raku.org #rakulang). This channel is logged for the purpose of history keeping about its development | evalbot usage: 'm: say 3;' or /msg camelia m: ... | Logs available at irclogs.raku.org/raku-dev/live.html | For MoarVM see #moarvm
Set by lizmat on 8 June 2022.
00:08 JimmyZhuo joined
JimmyZhuo timo: 6guts.wordpress.com/2023/06/18/rec...re-summit/ (started with There was a long-standing problem that looked like the regex engine massively leaked memory in certain cases) 00:12
timo: and 6guts.wordpress.com/2019/01/02/my-...-for-2019/ (Improving compilation times)
timo: consider CORE.c.setting is only 3.2MB, compiling it takes about 2GB memory and long time, it could be a lot of improvement here. but it's also a headache job. 00:24
00:49 JimmyZhuo left
nine I profile CORE.c.setting compilation and uploaded the SQL to niner.name/files/profile.sql.zst 07:13
07:19 lizmat joined
nine The trouble with profiling CORE.c.setting is that when the profiler finds nqp::gethllsym('Raku', '@END_PHASERS') it attaches a new one with the self.dump_profile_data call, but those END phasers are never executed. 07:26
I simply removed that check: gist.github.com/niner/b5c6fa2ab415...31c669a0be 07:27
07:34 lizmat left
nine One thing that occured to me recently: since RakuAST nodes are supposed to be usable from Raku user space code, we use IMPL-WRAP-LIST everywhere where we return some NQP array to turn it into a proper List. All internal callers then use IMPL-UNWRAP-LIST to get at the NQP array. 07:48
But, but, BUT! Every method call is already wrapped in an nqp::hllize op precisely because it could be a call to a foreign language (such as NQP). hllize already turns NQP arrays into Lists. Thus I'm pretty sure all those IMPL-WRAP-LIST calls and the creation of List objects that goes with them are unnecessary. 07:50
IMPL-UNWRAP-LIST on the other hand may have to stay as its allowed to e.g. subclass RakuAST nodes in Raku. 07:51
We spend 10 % of the time in GC 08:26
08:32 lizmat joined 08:47 lizmat left 12:51 MasterDuke joined
MasterDuke heh. i can't even use moarperf to open nine++'s profile on this laptop (only 24gb of ram). it's opening on my desktop (32gb of ram), but it's slow 12:54
currently taking multiple minutes and hasn't loaded the overview page yet
timo: got any suggestions to make it about 100x faster? 12:56
could it be multi-threaded at all? it's pegging a core, but just one 13:00
well, i can see the routines page. interesting, compile_var (NQP::src/vm/moar/QAST/QASTCompilerMAST.nqp:1596) is 0% jitted, even though it's called 523k times 13:04
will have to make a spesh log on my desktop to see why... 13:05
in jnthn's blog that JimmyZhuo linked he mentioned adjusting the nursery for large arrays. that still hasn't been done, but would probably be pretty simple. would it also make sense for hashes? 13:31
6guts.wordpress.com/2023/06/18/rec...re-summit/
13:39 MasterDuke left
timo i can't get the sql file into sqlite3 because it says "out of memory", but there's probably a setting or command or something i can use to get sqlite3 to allow more 14:05
... when we output sql data, do we correctly escape stuff or do we open up the possibility of SQL injection through crafted sub names or type names? :D 14:12
ah we weren't splitting the "insert into allocations" line when spitting out the sql i guess? 14:16
also got an idea for getting the sql file ever so slightly smaller ... 14:41
14:51 MasterDuke joined
MasterDuke timo: yeah, looks like calls is the only one split. the rest (except for allocations) are all pretty small, i guess allocations must have also been so when originally testing. should probably split it also 14:52
timo yeah, i'm writing one sub that handles it generically
MasterDuke nice
timo routines may also become relatively big, though nowhere near as big as calls of course 14:53
MasterDuke hm, so far most of the JIT mentions in the after of `compile_var` are a couple 'not devirtualized (type unknown)' 15:05
but i get `JIT was successful and compilation took 11211us`. wonder why nine's profile showed 0% jit?
it is quite large. 'Frame size: 26904 bytes (2176 from inlined frames)' and 'Bytecode size: 88715 byte' 15:06
it's not the biggest though 15:08
timo is it mostly inlined?
if it's inlined into a frame that's speshed but not jitted, that could be the explanation
MasterDuke i don't think it could be inlined, it's way too big, right? 15:09
timo i haven't looked yet 15:11
MasterDuke `inline of 'compile_var'` doesn't show up in the spesh log
timo still up to my elbows in the sql profile output code
MasterDuke no worries. probably going to be afk for a while soon 15:13
15:36 MasterDuke left 16:41 MasterDuke joined
MasterDuke so far it's looking like most IMPL-WRAP-LISTs can be removed 16:56
timo is there known trouble with state variables and rakudo's -n flag? 17:09
probably a "don't fix it, rakuast will be able to do it better" kind of thing 17:10
MasterDuke i think so 17:14
timo it was much much shorter to write with something starting with lines() instead of using -n and state 17:17
MasterDuke ha 17:18
timo anyway I made a thing that splits the much-too-long lines and now sqlite3 can ingest the file just fine
MasterDuke nice
timo rakudo -p -e 'if .chars > 20000 { my $tblnm = .substr(0, 100).comb(/:s INSERT INTO <( \S+ )> VALUES /); my @bits; for .split("), (").rotor(5000, :partial) -> $values { @bits.push($values.join("), (")); }; $_ = @bits.join(");\nINSERT INTO $tblnm VALUES ("); }' profile.sql > splitted_profile.sql 17:20
MasterDuke would it make sense to add that to the profile writing code in nqp? 17:21
timo no
only useful for files that were already written and couldn't be loaded
we have 1 (one) of those
BBIAB
MasterDuke well i mean we split up calls when writing the profile, should we generically split up anything too long? 17:22
timo yes, that's a patch I have locally also 17:23
MasterDuke ah 17:24
timo but i didn't want to have to re-profile core setting compilation 17:32
MasterDuke well, if you push you patch somewhere i'm going to have to try to re-profile all these IMPL-WRAP-LIST removals... 17:33
timo yes, i will very soon
MasterDuke cool 17:35
btw, did you see github.com/timo/moarperf/issues/18 ? 17:36
timo yes, I've got WIP on updating all the dependencies including many code changes 17:40
nine MasterDuke: have you tested using RakuAST nodes in Raku code after IMPL-WRAP-LIST removal?
Geth nqp/profile_sql_output_split_more_stuff: 320f5ad0ad | (Timo Paulssen)++ | src/vm/moar/HLL/Backend.nqp
Refactor sql output, split more of the statements we can output

  nine++ tested sql profiler output with core setting compilation using
RakuAST, and the resulting profile had an "allocations" insert statement that was 221 million characters long, which sqlite3 didn't like parsing as a single statement.
This patch also has some attempts for re-using lists more.
17:42
MasterDuke nine: not sure exactly what you mean. i'm testing by branching off the bootstrap-rakuast branch and building rakudo and running spectests
timo oops, my splitting must have b0rked the profile maybe 17:44
the allocations data may not only have been a bit too much to go into sqlite as one line, but may also take a little long to create stats over for moarperf to use, which is one of the things it does during startup, and the rest of the stuff is not usable before that is finished 17:54
MasterDuke it might be faster to create some indices first 17:55
nine MasterDuke: the point of those IMPL-WRAP-LIST calls has never been to benefit the code in src/Raku/ast. It's so that user level Raku code can use RakuAST nodes just fine. 17:58
> RAKUDO_RAKUAST=1 ./rakudo-m -e 'say RakuAST::ArgList.new.args.^name' 17:59
List
MasterDuke don't we have some tests that use rakuast nodes? or are those in m-test, not m-spectest?
nine There are not spectests for RakuAST.
Would be weird as it's still in development
timo only the stuff in t/ that's not in t/spec 18:00
i assume
MasterDuke `dan@athena:~/r/rakudo$ RAKUDO_RAKUAST=1 ./rakudo-m -e 'say RakuAST::Routine.new.attach-target-names.^name' 18:02
List`
that was `self.IMPL-WRAP-LIST(['routine', 'block'])` and is now `['routine', 'block']`
nine excellent! 18:03
Without the IMPL-WRAP-LIST we'll probably be able to spesh away the IMPL-UNWRAP-LIST calls completely 18:07
MasterDuke fwiw, building rakudo doesn't really seem faster 18:10
timo actually, difficult to say what exactly is keeping the moarperf from starting up, as it's doing sqlite3 code on three of its threads
MasterDuke hm. the all-allocs query (minus some of the generated columns) runs pretty fast when i'm in sqlite command line 18:26
timo hm, can i tell sqlite3 "hey this db is readonly now, so no need to lock out other threads" or so? 18:27
MasterDuke pretty sure there's a way to do that, but i've only ever programmatically used sqlite directly from c 18:29
even what i believe is the full query is pretty fast. it might just be the json handling in raku that's slow? 18:30
timo right, you can put ?mode=readonly on the path you connect to. not sure if DBIish wants me to give that parameter differently 18:32
no, it was working inside sqlite_step or what it's called, that should be the SQL engine
MasterDuke does DBIish use sqlite3_open_v2? apparently that's what you need to use to be able to pass the readonly flag 18:33
doesn't look like it does 18:34
timo overview-data in 676.446943261
There were 223,843,653 object allocations. The dynamic optimizer was able to eliminate the need to allocate 11,760 additional objects (that's 0.01%) 18:35
d'oh :D
MasterDuke `all-allocs in 1690.779572046`
timo woof. 18:36
ok so flame graphs really don't get a lot of worth out of very many entries under the same node, since they would be completely just absolutely tiny 18:38
it also looks like moarperf can accidentally work on the same kind of thing multiple times in parallel. the "only do it once" logic doesn't lock a new attempt out from starting 18:41
MasterDuke oops 18:44
timo no i misread the code, actually that is just done inside load_file, not called every time by the Routes i think 18:47
no i double misread 18:49
routine overview in 13.110527253 18:50
MasterDuke that was interesting. i just restarted it and now `all-allocs in 9.995467819` 18:51
timo i think it matters a lot if the flamegraph data is requested at the same time or not 18:53
MasterDuke maybe the flamegraph should have to be explicitly requested? 18:57
timo also a possibility 18:58
whew, the performance of the flame graph function seems to suck 18:59
MasterDuke the js or the raku?
ha. i nearly had a heart attack when i saw `Stage parse      :  60.037` and then i remembered i was ssh'ed into my desktop 19:01
timo i put a "$totals++" inside sub children-of and it's reached 59 after 30 seconds and 119 at 60 seconds
hrmpf. is there a `.blabla` i can put as a query to my sqlite dbiish handle or something to get it readonly? if i put ?mode=readonly it just creates a file that has that name :D 19:03
ah it may only work if it's a file:// url actually
DBDish::SQLite: Can't connect: unable to open database file (14) 19:04
is that about the connect function you mentioned before?
MasterDuke i don't think it'll work, i think DBIish has to use *open_v2 instead of just *_open
the non-v2 doesn't take flags 19:05
timo SQLITE_OPEN_NOMUTEX 19:08
The new database connection will use the "multi-thread" threading mode. This means that separate threads are allowed to use SQLite at the same time, as long as each thread is using a different database connection.
i want this :3
MasterDuke just a yak shave away 19:09
timo and sqlite3_config can also be used to set up multi threaded mode before any connections are made
19:32 finanalyst left, finanalyst joined
timo i don't dare invent the API for this, but I'm giving multithreaded a try 19:35
19:37 finanalyst left
MasterDuke nice 19:38
timo oh i might not even have had to change anything inside DBIish as long as i just use multiple db handles 19:40
19:48 finanalyst joined 19:54 finanalyst left
timo okay getting some amount of multi-core utilisation 19:55
anyway, i bet everybody's waiting with bated breath for the top routines by exclusive time? 20:00
nine Well... 20:01
20:01 MasterDuke left 20:02 finanalyst joined
nine ...to be honest, actually not terribly much. I think there's another list that's more useful, albeit more difficult to interpret. 20:02
While common wisdom is that you look at the top routine by exclusive time and optimize that, this is only useful for finding place where micro optimizations pay off.
timo that's true, yeah 20:04
you can't see in there when it'd be worth chopping off entire branches of the call graph tree
searching for stuff that's got "ms per entry" finds stuff that has fewer entries and still high-ish time 20:05
nine At this point I suspect that most gains can be found algorithmically. Those are more visible in the inclusive tab, but of course the top one there will just be the entry point of your program. 20:06
timo yes indeed
nine So it takes a lot of filtering to find the good stuff
timo method-def from Grammar for example has 25.9s inclusive total, 7.9k entries, so 3.28ms per entry. but i have to assume that also includes the body of the method, and probably more than two thirds of the code in the setting would be inside methods? 20:07
20:08 lizmat joined
timo statement-control:sym<if> is similar to that; 1.4k entries, 5.7s total inclusive, so 4ms per entry 20:09
same reasoning makes this not be a surprise: block has 2k entries 32s total, 15.46ms per second 20:10
looking at exclusive time again, MATCH is first place, run_alt is second place. 2s and 1.4s respectively 20:16
mark-whitespace in Grammar has 1s exclusive, 2.34s inclusive. the time in callees is a good chunk horizontal-whitespace with 475ms, !alt with 441ms, MARKER with 107ms and the rest is less 20:19
attach from Actions has 1.14m entries, 12.3s total inclusive, almost all of which is under to-begin-time. i don't actually know what attach is for? maybe it's completely unsurprising 20:22
hum. i'm not sure this is correct? the callees of as_mast (QASTCompilerMAST:651) has many different compile_node subs, and they all have inclusive times that add up to a multiple of the inclusive time of as_mast, but none of them are more than as_mast. now i have to wonder if that's a problem in the frontend, in the sql queries, or in the profiler itself 20:27
20:50 lizmat left 21:53 finanalyst left
nine IIRC as_mast can also be called somewhere down from the original as_mast call 22:04
attach = set node origin, bring node to begin time and attach it to the tree, so expected