00:08 evalable6 joined 01:05 evalable6 joined
MasterDuke timotimo: doh, thanks. getting farther in stage parse now 01:09
samcv MasterDuke, can i see your patch so far? 01:10
MasterDuke samcv: github.com/MasterDuke17/MoarVM/tre...ng_storage 01:12
Geth MoarVM: c5c389101d | (Samantha McVey)++ | .appveyor.yml
Appveyor: diable VS2017 builds

We can't reenable them until we know where SetEnv.cmd is, since nmake isn't in the path without running it.
01:13
01:56 ilbot3 joined
samcv yeah i did 01:58
i didn't know it was that low, the binsize
what is the size of it?
i see MVM_FSA_BINS is 96. that can't be 96 bytes can it?
i mean if it's really that small, then why don't we just use alloca in MVM_string_join? 02:05
since we don't need it to persist
MasterDuke i don't remember the exact value, but a quick fprintf shows a byte size of 772 equals a bin of 96 02:07
samcv so you can only allocate up to 772 bytes before it just mallocs it then?
MasterDuke up to something about 772, don't know the exact value 02:08
760 i think 02:10
geekosaur samcv, you might want to be careful with alloca anyway/. although on currently supported platforms up to 3k is probably safe 02:12
(4k is asking for trouble if it causes multiple pages of stack segment to be allocated, and you need to be aware of anything else that also uses or allocates from stack) 02:13
samcv geekosaur, so you think i should lower it to 3000? 02:14
i have it set to 4096 right now. i'm fine with lowering it
MasterDuke .tell timotimo down to only 12 errors in stage parse, all exactly the same (gist updated with an example)
yoleaux MasterDuke: I'll pass your message to timotimo.
geekosaur if you occasionally see segfaults in the code that uses it, you're running into the problem 02:15
samcv but 760 bytes we could just alloca join onto the stack since that's not very much and it'd be faster than FSA, since we don't need to keep it around
geekosaur (basically, whether something ounts as a stack allocation or a bad pointer is determined by an OS heuristic that can guess wrong, so allocating too much on stack can cause the next use of the stack (function calls, another allocation, etc.) to be mistaken for a bad pointer 02:16
MasterDuke samcv: for `pieces` in MVM_string_join?
samcv yeah
MasterDuke sure, give it a shot 02:17
geekosaur and this is determined by stack pages, so the only safe use is if it allocates exactly one more page to stack. page size is 4096 bytes on supported platforms
basically,, you can try it, if you see occasional segfaults then back the size down and see if they go away
samcv well i haven't see any segfaults on this, or freebsd or on alpine with musl 02:18
but i can back it down from 4096
02:19 zakharyas joined
Geth MoarVM: 474ab7cdd1 | (Samantha McVey)++ | src/strings/ops.c
In KMP index use malloc not FSA. Set max stack alloc to 3000

Reduce stack alloc from 4096 to 3000, as recommended by
  geekosaur++. Use malloc instead of FSA because FSA would just
malloc anyway since it's larger than the max FSA amount.
05:25
samcv MasterDuke, if i bump MVM_FSA_BINS by 2 we don't overflow it during core setting compilation, well before it said 6.2GB max allocated 05:35
after it showed 5.1
increased it by 4x and it gets 1.5GB of mallocs 05:41
seems to shave 2s off core setting compilation 05:46
06:21 domidumont joined 06:24 domidumont joined 06:31 domidumont joined 06:53 brrt joined 08:12 leont joined 08:29 robertle_ joined
jnthn samcv: What does it do to total memory use, though? Also, of memory use of perl6 -e '' 09:13
dogbert17 to get everyone started: dilbert.com/strip/2017-10-02 09:19
09:19 brrt joined
nwc10 nice punchline in the 3rd panel 09:19
09:24 ilbot3 joined
dogbert17 so the gains will be realized later 09:24
09:29 ilbot3 joined 09:56 brrt joined 10:10 lizmat joined
brrt jnthn: thanks for reviewing! 10:34
timotimo jnthn: we could have a point where we allocate fewer buckets in the same page, for the very big pages 10:54
yoleaux 02:14Z <MasterDuke> timotimo: down to only 12 errors in stage parse, all exactly the same (gist updated with an example)
jnthn timotimo: We could, yeah, I pondered that before. It's probably sensible. 10:55
brrt: Welcome :-) Thanks for writing us a new JIT ;)
timotimo oh, the review is done? awesome!
brrt++
jnthn Yeah, provided brrt is happy for it to do so, it can go in :) 10:56
timotimo i really like the sound of that
lizmat is looking forward to the final grant report :-) 10:57
jnthn I semi-pondered "its the Star release this month" but...given how many far more risky things landed in Rakudo this month, I'd say expr JIT is some way down the risks list :)
timotimo yeah 10:59
i'll look into bigger bins getting smaller :)
jnthn I suspect if I work on anything else in Rakudo ahead of this month's release, it'll be hyper/race, which are so broken at the moment anyway I almost can't make it worse :)
timotimo i hear ya :|
jnthn But yeah, overall let's try and be a bit cautious about what we put in over the week leading up to the next bunch of releases 11:00
The Star ones do get wider use
timotimo we still have one and a half weeks or so, right? 11:01
jnthn Yeah, indeed 11:02
timotimo could even give the fsa logic to make the first page for a given bin smaller than all the rest 11:03
to better handle cases like "bin 91 gets five items ever, but 90 and 92 get hundreds"
we have setup_bin and add_page, so the distinction is already there in code 11:04
oh, no, not quite
oh, no, it does add a page
a very first page 11:05
i made it limit page sizes to 32kbyte, which means page 95 has 42 elements 11:18
and i might make the size limit for the very first page half or quarter that
jnthn OK 11:19
Let's try that :)
timotimo let's see, sam wanted to bump the count by 4 ... or really 4x?
oh wow 11:20
for perl6 -e i now get a good view of which sizes get how many pages allocated 11:21
11:21 domidumont joined
timotimo a whole lot of 'em only get the initial page, even though i quartered its size 11:21
hm, the maxresident difference is rather small it seems 11:23
hm, it looks like i might actually be using more memory
nope, there was some other difference in my measurements 11:24
only like 80k on -e '' 11:25
the difference is more pronounced for -e 'say "hi"' 11:27
jnthn Save 80k?
nwc10 at a guess, is that because -e '' doesn't even allocate some bucket sizes?
jnthn Yeah, probably that
timotimo m: say 75236 / 75458
camelia 0.997058
timotimo m: say 75236 - 75458
jnthn ooh, lunch time :)
camelia -222
timotimo about 200k saved in that scenario
let's do a real measurement: core setting compilation 11:29
11:43 AlexDaniel joined
AlexDaniel squashable6: status 11:45
squashable6 AlexDaniel, āš šŸ• Next SQUASHathon in 2 days and ā‰ˆ22 hours (2017-10-07 UTC-12āŒUTC+14). See github.com/rakudo/rakudo/wiki/Mont...Squash-Day
11:50 zakharyas joined
timotimo looks like it actually gets worse from my changes 11:50
Geth MoarVM/fsa_tune_page_sizes: 15ba542eea | (Timo Paulssen)++ | src/core/fixedsizealloc.c
limit FSA pages to 32k (8k for very first page)

helps perl6 -e '' and -e 'say "hi"' a lot, but seems to actually increase memory usage in a core setting compilation.
11:52
timotimo though of course the results will be different still with more stuff using the fsa 11:54
lizmat jnthn timotimo: could you give me a sanity check wrt to BUILDALL?
if a class has an empty BUILDPLAN (like a mixin that doesn't add any attributes)
doesn't that imply I don't need to generate a BUILDALL for that class, as its first parent will have the correct BUILDALL already generated ? 11:55
the invocant signature might not be 100% correct, but still valid anyway
timotimo hm, that does sound sensible 11:57
maybe i want something a little bit faster than the core setting compilation for measuring this :| 11:58
jnthn lizmat: If it has no attributes, then yeah, that sounds reasonable 12:03
lizmat that would save quite a few generated methods, as its quite common to only mixin methods, and not attributes 12:04
timotimo good point
12:10 AlexDaniel joined
timotimo 1342402 is the average before 12:17
1341298 is the average after
"1338332".."1343264" 12:18
that's the minmax of "after"
lizmat 1100 bytes difference ?
timotimo "1339700".."1343508" 12:19
that's the minmax of before
that's kbytes
lizmat ah, so ~1MB difference
or a bit more... 12:20
timotimo m: say 1338332 - 1343264; say 1339700 - 1343508
camelia -4932
-3808
timotimo that's how noisy the measurement is 12:21
the measurements for "with my patch" are further apart 12:22
i'm not sure if it makes sense to use these measurements, given the amount of noise is ~4x the difference
Geth MoarVM/even-moar-jit: dc3d40fb22 | (Bart Wiegmans)++ | 4 files
Improve the expression JIT documentation

Add a document describing its most important components (expression template processor / tree builder, tiler, and register allocator).
12:27
MoarVM/even-moar-jit: 6de7455e9a | (Bart Wiegmans)++ | 2 files
More documentation fixes

Some of the things in tiles.md were no longer true
MoarVM/even-moar-jit: 7f7ce9ca40 | (Bart Wiegmans)++ | 3 files
^cu_string - is lazy-loaded so use wrapper

The direct access of MVMCompUnit->body.strings was a legacy from simpler days when compunit strings were loaded eagerly. As they're now using lazy loading, that isn't really valid anymore.
Possible future development would be to force eager loading during JIT compilation and/or upgrading to second-generation memory.
12:28
brrt oh, oops, we're seeing a segv
12:28 AlexDaniel joined
Geth MoarVM/even-moar-jit: 33270f003d | (Bart Wiegmans)++ | src/jit/macro.expr
MVM_cu_string - second argument is *cu

Not idx, oops
12:31
timotimo i wonder what the main source of nondeterminism in core setting compilation is, the one that causes memory usage to vary so drastically 12:40
could it just be spesh? 12:41
brrt tbh i don't find spesh to be very nondeterministic 12:45
timotimo core setting compilation doesn't start any other threads, so it also won't do the logs it gets in different orders every time 12:46
lizmat random hash order ? 12:50
timotimo hm, how often do we iterate over hashes in compilation i wonder
lizmat I have no idea :-) 12:51
brrt i expect quite often 12:55
timotimo but we don't randomize hashes yet, do we? 12:59
lizmat timotimo: not sure 13:03
14:26 zakharyas joined 14:43 AlexDaniel joined
timotimo is idly playing around with systemtap 14:56
oh geez 14:57
tried to record stack traces for every fsa_alloc hit
it's filling up my disk good
i can't stop it o_O 14:58
now it did stop and the resulting file is apparently b0rken 14:59
timotimo frees up some disk space ... 15:02
15:06 brrt joined
brrt i'm investigating three spectest failures 15:28
t/spec/S29-os/system.rakudo.moar test 35 15:29
t/spec/S28-named-variables/init-instant.t
t/spec/S17-supply/watch-path.t
they are individually succesfull 15:30
so, any objections to me pushing the merge button? :-)
Zoffix \o/ \o/ \o/ 15:31
jnthn I've seen the first happen for a while
Not every time, but now and then
Others I dunno about
But...yeah, let's merge it
timotimo yeeeeaaahhhh
jnthn Sounds like timing or other issues if they work outside of harness
timotimo a sizable portion of allocations is from the nfa (in perl6 -e 'say "hi"') 15:33
almost a quarter 15:34
but that's only "how many times is the allocator called", it ignores the size of each allocation
and i think it skipped some events because I/O was too slow
jnthn I already reduced its allocations by a good bit before, I think
iirc the remaining one is the result 15:35
Which we need to hand back, and might live on for a while
timotimo yeah, it's not bad 15:36
the code, i mean
just an observation
another big chunk is from hash entries
brrt thank you everybody for your incredible patience 15:38
timotimo not sure i can get much sensible information out of this any more
but it was good to refresh my memory of how the systemtap portion of perf works 15:39
brrt also, geth is dead, maybe 15:40
jnthn Aww, no commit report
brrt++ though
Already built it here :)
brrt cool :-)
brrt hoping for the best
[Coke] did it merge to master? (do I need to build rakudo with nqpmaster and moarmaster?) 15:42
jnthn [Coke]: To MoarVM master, yes. No version bumps as yet 15:43
[Coke] building on win & mac.. 15:44
(src\profiler\heapsnapshot.c(823): warning C4293: '<<': shift count negative or too big, undefined behavior) 15:49
Zoffix wow 15:50
brrt++
\o/
brrt i'll take a look, although that is not part of the branch :-) 15:51
[Coke] brrt: that was not meant for you specifically. ;) 15:52
brrt [Coke] what platform are you on? 15:54
32 bit by any chance?
seems like your long is not long enough :-)
anyway, rather than 1l, probably better to write UINT64_C(1)
which is i think stdint defined
[Coke] brrt: that is on a win64 (I think) win 10 vm.
brrt hmm, that's somewhat surprising 15:55
[Coke] 64-bit, x64
brrt i think i may have seen that on the JIT as well on windows
[Coke] using ms vs Community 2017 15:56
brrt anyway, can you pls check if this helps: gist.github.com/bdw/cb8ecce419ec0a...2e883b5dc9
i have a VM as well, i just don't feel like starting it up :-)
15:56 zakharyas joined
[Coke] brrt: let me finish the as-is build first, then will re-try that. 15:56
brrt also, i will *not* be online for the coming afternoon / night, so, if any problems arise, i can't respond; if trouble is severe, you know where the revert button is 15:57
i don't expect it very much, but just so you know :-)
[Coke] spectest clean on mac. 15:58
brrt \o/
[Coke] nativecall cpp tests still failing on win 10 `nmake test`, no change there. kicking of win10 spectest. 15:59
(as I recall, we have a lot of win failures atm so I'm not sure this will show anything. :| 16:01
if `perl6 -V | grep -i jit` has moar::jit_arch set to a value, does that mean I have a jit? 16:02
(or is there a better way to tell?) 16:06
bartolin it would be great if someone could take a look at github.com/MoarVM/MoarVM/pull/714 . currently opening a socket to a remote host is broken on freebsd.
jnthn hm, thought i already reviewed/merged that one... 16:10
Geth MoarVM: d04c8dccbc | usev6++ | src/io/syncsocket.c
Fix getaddrinfo failing with EAI_HINTS on FreeBSD

This fixes spectest failures (e.g. in S32-io/IO-Socket-INET.t) on FreeBSD.
According to the man page of getaddrinfo() '[a]ll other elements of the addrinfo structure passed via hints must be zero or the null pointer'
  (similiar wording on Linux). This requirement is actually enforced on
FreeBSD:
   github.com/freebsd/freebsd/blob/89...nfo.c#L429
MoarVM: 3da7cce276 | (Jonathan Worthington)++ (committed using GitHub Web editor) | src/io/syncsocket.c
Merge pull request #714 from usev6/getaddrinfo_hints

Fix getaddrinfo failing with EAI_HINTS on FreeBSD
MoarVM: 23c16b3031 | (Patrick Sebastian Zimmermann)++ | 2 files
Probe for gcc -Werror=* support. This allows building MoarVM on older GCCs.

The -Werror=* probe has to run before setting the cc/ldmiscflags. The compiler_usability check only makes sense before the -Werror=* probe. Thus that one is also moved earlier.
16:11
MoarVM: 48f5efaaf9 | (Jonathan Worthington)++ (committed using GitHub Web editor) | 2 files
Merge pull request #696 from patzim/master

Probe for gcc -Werror=* support
bartolin thanks anyway :-) 16:12
Geth MoarVM/master: 9 commits pushed by Mario++, M++, (Jonathan Worthington)++ 16:13
[Coke] looks like win64 spectest just hung with 3 remaining tests. 16:42
16:45 travis-ci joined
travis-ci MoarVM build canceled. Jonathan Worthington 'Merge pull request #696 from patzim/master 16:45
travis-ci.org/MoarVM/MoarVM/builds/282804503 github.com/MoarVM/MoarVM/compare/3...f5efaaf92a
16:45 travis-ci left 16:55 lizmat joined 17:20 domidumont joined
Zoffix stresstests and version bumps 17:47
"2017.09.1-553-ga4fef0b" that's quite a high number :D 17:50
18:03 travis-ci joined
travis-ci MoarVM build passed. Jonathan Worthington 'Merge pull request #692 from duke-m/patch-2 18:03
travis-ci.org/MoarVM/MoarVM/builds/282805138 github.com/MoarVM/MoarVM/compare/4...fef0bd36cc
18:03 travis-ci left 18:20 AlexDaniel joined 19:09 brrt joined 19:21 zakharyas joined
samcv jnthn, 4x is max 65.5MB of memory usage. compared to 56.7MB 19:30
timotimo samcv: is that actually going from ~90 bins to ~360? 19:34
samcv yeah
96*4
and with 4x we only have 1.2GB alloced with FSA instead of 6.2GB
timotimo wow 19:35
samcv for core setting compilation. doing 2x, we have 5.1GB malloced in FSA
sorry i should say malloced, not alloced
jnthn samcv: Remind what you're tweaking this for? :)
samcv also core setting is 2s faster
reducing the mallocs
that the FSA has to do
it seems to make many GB more than is really needed 19:36
optimally
jnthn Yeah, but I thought you had a particular use case that 4x covers? 19:37
samcv i was recording the setting compilation
jnthn Ah, OK
I thought it was something about a KMP table fitting in or something 19:38
samcv from 6.2GB malloced to 1.2GB
nah
jnthn ah, OK 19:39
But we gain 10MB extra memory use for doing nothing?
timotimo did i already push my tuning branch for the fsa page size?
it would probably want a minimum items per page, too
otherwise it'll go down to 0 items per page, or 1 19:40
also, perhaps we should make bigger steps after a given page size 19:42
item size i mean 19:45
jnthn Yeah, possibly that also 19:48
samcv jnthn, yep we gain 10MB extra memory doing nothing 19:49
timotimo samcv: check the fsa_tune_page_sizes branch 19:50
give it a minimum "effective_item_count"
then let's see if it helps at all
samcv how do i get a branch i don't have
timotimo it should be enough to, after "git fetch", just git checkout fsa_tune_page_sizes
it ought to set it up to track origin/that_branch_name for you 19:51
samcv that does not work
oh i got it now
jnthn samcv: OK, then further tweakery needed to try and have our cake and eat it, IMO
I suspect something along the lines of what timotimo has looked at may do it
samcv and timotimo's branch has 52.8MB peak. so down 4mb 19:52
will try the setting now
timotimo i tried measuring the core setting
and the difference between two runs was enormous 19:53
samcv you mean two runs of the same branch?
timotimo yes 19:54
samcv interesting...
timotimo so you might have to be extra careful in your measurements, too 19:55
samcv so i got 1.2GB peak. which is similar to having the bin 4x 19:57
so that seems good
err wait. no it's 1.6GB
i was looking at the peak. 1.6G is the amount allocated by the FSA in total
timotimo you're using heaptrack for this?
samcv err no. the amonut allocated by malloc 19:59
my bad
but that's much closer to the 1.2Gb i got :)
yeah 20:00
i'm looking at the total amount that was malloc'd and comparing it. though i should also compare the peak usage as well
20:04 leont_ joined 20:40 lizmat joined
timotimo i wonder if we'd benefit from a "pre-size this hash for n elements" operation that we can call in the deserialization code 20:46
MasterDuke timotimo: did you try with MVM_SPESH_BLOCKING=1? did that reduce the variability? 20:49
timotimo i did not
MasterDuke oh, and how were you measuring? 21:03
timotimo just "time" on the commandline
MasterDuke nice and simple (and doesn't slow the build!) 21:05
samcv i'm going to case the switches for 8bit and ascii strings being joined (flat) 21:31
tests show 2.4x speed improvement joining very long 8bit strings 21:32
(that are flat)
lizmat m: use experimental :collation; $*COLLATION.set(:!tertiary); dd "a" coll "A" # samcv: is that correct ? 21:33
camelia Order::More
21:33 zakharyas joined
samcv ah. quaternary needs to be disabled 21:33
that breaks ties by codepoint
m: use experimental :collation; $*COLLATION.set(:!tertiary, :!quaternary); dd "a" coll "A" 21:34
camelia Order::More
samcv err maybe i spelled it wrong
lizmat yeah :-)
samcv m: use experimental :collation; $*COLLATION.set(:!tertiary, :!quaternary); dd "a" coll "A"
camelia Order::More
samcv m: use experimental :collation; $*COLLATION.set(:!tertiary, :!quaternary); say $*COLLATION
camelia collation-level => 5, Country => International, Language => None, primary => 1, secondary => 1, tertiary => 0, quaternary => 0
samcv 5 21:35
that sounds right
lizmat 5?
samcv hm
lizmat but More should be Same, right?
samcv yeah 5 is right. but More should be same when quaternary is removed 21:36
looks like a bug. will look at it in about an hour when i get back
lizmat shall I rakudobug it ?
samcv yeah
lizmat oki
samcv: RT #132216 21:40
synopsebot RT#132216 [new]: rt.perl.org/Ticket/Display.html?id=132216 'a' coll 'A" not Same but More
Geth MoarVM: d5db8486bb | (Timo Paulssen)++ | src/spesh/stats.c
skip stats for frames beyond spesh max bytecode size

an example file with a gigantic mainline - a huge hash literal - spent more than 95% of its time inside by_offset for the benefit of a frame that was going to be ignored by the planner anyway. This makes it as fast as running without spesh.
21:59
jnthn Hm, though that happens on another thread... 22:00
Nice catch, though :-)
timotimo it does
jnthn Though I wonder 22:01
timotimo but the regular threads waits for spesh to reach its gc sync point
jnthn Ah
Maybe we should not log such huge frames at all..hmm
timotimo that'd be a check inside the instructions that do logging
jnthn Though we'd have to look at the bytecode size on frame entry when we're logging
timotimo yes
jnthn No, I was thinking of doing it in MVM_frame_invoke
If we don't give it an entry record or correlation ID
Then it won't log anything for it 22:02
And given we'll never spesh it, that's fine
timotimo oh, that would prevent all ops from logging with a check that's already in place anyway
jnthn Yeah
So then we'd not even write into the log
timotimo yes, that'd increase spesh efficiency, too
jnthn aye 22:03
May help some of the giant test files too :)
timotimo it could very well!
jnthn Was the giant file with a hash in actually constructing it in the module mainline?
If it's all constant data, sticking a BEGIN in front of it would mean we make it at compile time and serialize it, which'd be faster still :) 22:04
timotimo yup, it's just "our %systems = ( ... )" 22:05
but you're right
i hadn't thought of that
a bit shameful that we use about a gig maxrss to compile this beast 22:06
i'll put "constant" in front, that should have the same effect
jnthn Yeah
Then it should load faster still 22:07
timotimo i'll have numbers in a mo'
ah, snap 22:08
it then has to be %( ) rather than just ( )
wow, yeah 22:10
that's much, much faster
from about 0.88 down to about 0.26 22:11
now i'm checking if the association id thing works 22:16
yeah, it has the speed increase effect 22:17
i'll run "make test" and "make spectest" this time, though
MasterDuke association id?
timotimo yeah, in order to do spesh logging we give every frame an ID 22:18
if a frame has no ID, no logging can take place
MasterDuke ah
timotimo that's not quite accurate 22:20
it also interplays with the simulated stack and such 22:21
i.e. spesh recreates what it thinks the callstack looked like when the log was created, so we don't have to do too complicated computations when creating the individual log entries
MasterDuke and we can just skip all that if the frame size is too big? 22:25
timotimo aye
jnthn: you think this can have bad effects if a huge frame comes between two frames that get speshed?
Geth MoarVM: d0646fafb9 | (Timo Paulssen)++ | 2 files
don't even generate log entries for huge frames

not giving a frame a correlation ID prevents any logging from taking place, the logs are not filled with useless data, fewer runs for the spesh worker to do.
22:27
timotimo wouldn't have been able to come up with this as fast if not for MasterDuke being my rubber duck :) 22:28
jnthn timotimo: It'll cope
The worst that'd happen is it infers something wrong from the stats and sticks a guard in that always fails, but that would be quite unlikely
MasterDuke "those who can't do, duck"? 22:29
timotimo haha
hm, lock-async seems to be hanging, but i think it also hangs for others 22:30
nope, it finished
just took a while
jnthn That one's odd; on my office machine it always completes fast. In my VM at home it often does, then occasinally goes super slow. 22:32
Always completes eventually though
22:40 bloatable6 joined 22:45 buggable joined 22:57 arnsholt joined