TimToady m: my $a = 0; my @a = eager ($a++,) Zxx 42; say $a; say @a; say $a 00:01
camelia 0
[(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41)]
42
TimToady it does seem a bit antisocial to ignore eagerness 00:03
m: my $a = 0; my @a = ($a++,) Zxx 42; say @a.is-lazy
camelia False
00:40 agentzh joined 02:49 ilbot3 joined 02:51 dogbert2 joined 02:53 mtj_ joined 03:34 geekosaur joined
Geth MoarVM: c9b3a35a51 | (Samantha McVey)++ | docs/ChangeLog
Add more details on speed improvement to changelog
04:47
05:20 agentzh joined 06:06 agentzh joined
samcv jnthn, putty probably doesn't use fallback fonts or something. i never liked putty because it wasn't very full features. noto fonts is not just one font because it covers so much of unicode it can't fit in one font 07:10
07:40 brrt joined 08:09 agentzh joined 08:50 domidumont joined
Geth MoarVM: 7d7a6b1881 | (Samantha McVey)++ | src/strings/ops.c
Fix bug in indexic_s if expanding cp is the last cp

There was a bug where if a codepoint such as 'st' which expands under foldcase was the last codepoint, it would return -1 instead of returning the correct index. Fixes the nqp tests that used to be todo'd.
10:23
jnthn m: my $a = 0; say (($a++,) Zxx 42).is-lazy 11:18
camelia False
jnthn Hmm 11:19
Ah, there was a revert
MasterDuke jnthn: i think i'm running into the bug you did this workaround for github.com/perl6/nqp/commit/79e10b...8c774ed165 12:14
which i think is related to this: 12:16
m: class :: { has uint64 $.x; }.new( x => 2**64-1 ).x.say
camelia -1
12:32 ilmari[m] joined
jnthn MasterDuke: Yeah, I think that's 'cus we had no getattr_u yet 13:00
13:00 geekosaur joined
MasterDuke yeah, pretty sure that's one i added in my `add a whole bunch of _u ops` branch 13:09
also trying to figure out this one now: 13:10
m: my uint64 $x = 2**64-1; say $x
camelia -1
MasterDuke i'm returning the right value out of mp_get_int64, but haven't yet figured out where it gets lost 13:11
14:11 agentzh joined 14:37 geekosaur joined
MasterDuke if i don't care about fixing attributes and the more general _u problem, i have a fix for github.com/MoarVM/MoarVM/issues/513, that make two todos now pass 14:37
however, this test github.com/perl6/roast/blob/master...ive.t#L167 `is class :: { has uint64 $.x; }.new( x => 2**64-1 ).x, 2**64-1, 'uint64 attributes don\'t get sign-extended';`, which is currently todoed, now dies 14:38
because attributes are all treated as signed, and since that value is too large for a signed native, my fix for #513 causes it to throw 14:39
my fix for #513 wouldn't cause it to throw if unsigned attributes worked correctly 14:42
so would modifying that test as a workaround be ok?
timotimo src\jit\emit_win32_x64.c(8) : warning C4129: 'j' : unrecognized character escape sequence 16:00
src\jit\emit_win32_x64.c(8) : warning C4129: 'e' : unrecognized character escape sequence
what could this be?
16:44 agentzh joined 17:30 zakharyas joined 17:32 agentzh joined 18:46 agentzh joined
MasterDuke timotimo: any idea why copying this github.com/MoarVM/MoarVM/blob/mast...1244-L1252 and changing 32s to 8s causes errors like `Unhandled exception: Error encoding UTF-8 string: could not encode codepoint 1836085614`? 19:26
19:34 Ven joined
MasterDuke .ask samcv parsing the json from a travis log with JSON::Fast spends 46% of its time here github.com/MoarVM/MoarVM/blob/mast...#L86-L139, which mentions it could be smarter. see anything you could do to optimize it? 19:46
yoleaux2 MasterDuke: I'll pass your message to samcv.
timotimo can you give me the actual code you used, MasterDuke?
MasterDuke gist.github.com/MasterDuke17/80bcd...915c563c8d 19:49
timotimo result->body.storage_type = MVM_STRING_GRAPHEME_32; 19:51
that's why it's erroring out
MasterDuke ah 19:53
timotimo i'd expect that re_nfg could be implemented to detect that no change is necessary at all 19:59
MasterDuke so should the MVM_STRING_GRAPHEME_8 case have the exact same body as the MVM_STRING_GRAPHEME_32 case?
timotimo alternatively it can use almost-all-of the previous string by refering to it from a rope and then a little piece in the middle that combines the end and beginning into a grapheme, and then most-of the second string 20:00
almost the same body. the pgraphs has a different meaning based on what the storage type is
so you have to copy pgraphs * sizeof(MVMGrapheme32) in one case and pgraehs * sizeof(MVMGrapheme8) 20:01
the heap analyzer ought to help here :)
MasterDuke same error 20:02
timotimo are you also setting the correct storage type for result?
MasterDuke nope
timotimo well, that's the problem, then 20:03
though of course both parts must have the same storage type 20:04
MasterDuke but i could only do that if i know that all the pieces and the separator are MVM_STRING_GRAPHEME_8 right?
timotimo right 20:10
otherwise you'll have to do a loop and assign
memcpy can't do upscaling like that
MasterDuke think the loop is faster than the current default case?
timotimo maybe there's some fast way to do it, and if you have a tight loop it could very much be that gcc gives you that for free 20:11
oh i didn't know about nfg_is_concat_stable
do we check that everywhere before we re_nfg? 20:12
yeah, we do
so it could be interesting to figure out why our json stuff thinks the concat isn't stable
is there any reason to assume that?
MasterDuke i hope that was a rhetorical question, i don't know this nfg stuff at all 20:15
timotimo i didn't look at the function yet, but i assume it checks the last grapheme of a and the first of b 20:17
and if the first of b is a combiner without its base, it'll complain 20:19
or if a ends in a pre-pendable combiner
MasterDuke i'm looking at it, and without knowing these things at all, the comments seem to suggest what you just said 20:20
timotimo OK, how about this: 20:22
we should have code to encode a substr of a string to utf8, right?
when we hit the case that concat isn't stable, we output the string in question
or maybe even the graphemes as integers
MasterDuke there is a MVM_string_utf8_encode_substr 20:23
timotimo fantastic
though maybe using the ints directly is more helpful
because the strings are supposed to have "magical" unicode chars in them
and they might turn invisible or hard-to-see when we just encode&print them
MasterDuke in moar or json::fast? 20:25
timotimo in moar
MasterDuke for debugging purposes? 20:29
timotimo yes, only for that 20:30
MasterDuke ha, i was a bit confused there at first 20:35
timotimo will you try doing that? 20:38
oh, interesting 20:43
look, it thinks as soon as there's a synthetic it wouldn't be stable
MasterDuke so if not nfg_is_concat_stable, print out the MVM_string_utf8_encode_substr of the piece and separator?
timotimo a newline is rather common, but it's a synthetic 20:44
maybe we'll want to special-case something there
(but first see what it's actually all about)
oh, also ... we'll want both the encoded and the pure graphemes, i think 20:45
MasterDuke why do we need the *_encode_substr function? 20:49
timotimo it's not strictly necessary, but if you see something like "-324" in the string, you don't necessarily know what's up
MasterDuke segv, did something wrong 20:59
timotimo perhaps, let me see your code?
MasterDuke fprintf(stderr, "pieces[i - 1]: %s, p: %s\n", MVM_string_utf8_encode(tc, pieces[i - 1], MVM_string_graphs(tc, pieces[i - 1]), 0), MVM_string_utf8_encode(tc, piece, MVM_string_graphs(tc, piece), 0)); 21:00
oh, i need to encode to C str? 21:01
geekosaur fprintf isn't going to understand lists of codepoints, much less lists of graphemes; you need to feed it UTF8 in C's (char []) 21:06
timotimo maybe we'll use only the graphemes for now 21:09
MasterDuke so that printed a whole lot 21:11
timotimo mhm
that's only if it's not stable, right?
can has a bit of example?
MasterDuke with pieces[i - 1] == `^M` a lot 21:12
timotimo which one is that again ...
MasterDuke `else if (!MVM_nfg_is_concat_stable(tc, pieces[i - 1], piece)) {`
timotimo that's a control character?
MasterDuke isn't that a \r that get left from bad dos <-> unix line endings? 21:14
here's the first couple lines of output gist.github.com/MasterDuke17/d75d9...747f1dddf0 21:15
timotimo maybe you're accessing one past the string? 21:16
MasterDuke i changed the format string to "pieces[i - 1]: |%s|, p: |%s|\n"
timotimo though it should have a bounds check
but yeah
those are all newlines
newline is a synthetic grapheme, because \r\n and \n are supposed to be the same thing
too many things at once, where do i get the json we have again? 21:17
MasterDuke api.travis-ci.org/jobs/212455149 21:18
timotimo oh, i know 21:19
that's because of how json_fast parses strings
it goes until it sees the first \*
and then it parses chunks and joins them
so you'll pretty much always have something like a newline where join is used 21:20
i don't see a "Done" in there at all
MasterDuke there's at least one line with "Done" according to grep 21:21
timotimo i must be doing something wrong 21:22
{"id":212455149,"number":"7436.3","config":{"language":"perl","os":"linux","sudo":false,"fast_finish":true,"install":"echo","dist":"trusty","addons":{"apt":{"sources":["ubuntu-toolchain-r-test"],"packages":["gcc-6","openjdk-7-jdk"]}},"script":["perl Configure.pl $RAKUDO_OPTIONS --moar-option=--cc=$(command -v gcc-6 >/dev/null 2>&1; if [ $? -eq 1 ]; then printf \"gcc\"; else printf \"gcc-6\"; fi)","make
test","make install"],"branches":{"only":["nom","/smoke-me/"]},"notifications":{"irc":{"channels":["irc.freenode.net#perl6-dev"],"on_success":"change","on_failure":"always","template":["Rakudo build %{result}. %{author} '%{commit_message}'","%{build_url} %{compare_url}"]}},"env":"RAKUDO_OPTIONS=\"--backends=moar --gen-nqp=master
--gen-moar=master\"",".result":"configured","group":"stable"},"repository_id":1242635,"build_id":212455146,"state":"finished","result":1,"status":1,"started_at":"2017-03-18T14:22:12Z","finished_at":"2017-03-18T14:29:07Z","log":"","commit":"dd3f22875c8f803ec97e0d4e39eef2c29ff49695","branch":"nom","message":"Log all commits to date\n\nDocuments commits:\ne1ebb50 f94a2c7 3de5fb2 79bb179 65b0040 10f5f74
9644fc3","committed_at":"2017-03-18T14:21:35Z","author_name":"Zoffix Znet","author_email":"cpan@zoffix.com","committer_name":"Zoffix Znet","committer_email":"cpan@zoffix.com","compare_url":"github.com/rakudo/rakudo/compare/3...ker":null}
this be all i get
MasterDuke added the file i have to my gist 21:24
timotimo thanks
MasterDuke and there's definitely one and only one "Done" in it
timotimo yeah that's like 100x as long as what i get 21:25
MasterDuke wait, nm, could be more
timotimo hm?
IOninja travis trimmed an old job log?
I see no log in browser in that url... nor Done
timotimo *shrug*, maybe i'm not logged in or whatever? 21:26
IOninja gonna try on my box in ~10m (on phone right now)
timotimo i'll see what i can do about nfg_concat_is_stable 21:27
because we definitely know that the \n grapheme is okay (as it's a control characterr
)
MasterDuke would it make sense for json::fast to also grab the next char if it sees a '\r'?
timotimo well, to do that, it'll have to concat, too 21:28
we wouldn't escape that problem
"escape" 21:29
badum-tss
MasterDuke but wouldn't "foo\r\nbar" get split into "foo", "\r", "\nbar" right now?
or "foo", "\r", "\n", "bar"?
timotimo yup 21:30
the latter
MasterDuke wouldn't you want "foo", "\r\n, "bar"? 21:31
"foo", "\r\n", "bar"
timotimo if you want to write the code that handles consecutive escape sequences like that, sure. 21:32
MasterDuke heh, i do not 21:35
timotimo so there :)
but we don't actually have \r\n, right?
in our file i mean 21:36
oh crap, we totally do 21:37
a whole fricking lot of 'em
in that case we always have to renormalize
okay, it might be worthwhile to special-case \r\n
i'll try implementing it
IOninja yeah, that link expired. tis empty on my box too 21:38
nearly empty
MasterDuke glad i saved it off then 21:39
timotimo could have just used a newer link 21:40
may want to put a fresher link into tickets that are about this
IOninja I just grabbed one from a new job; noms 2.3GB there too: gist.githubusercontent.com/zoffixz...tfile1.txt
I've updated my snippet to use ^ that permalink now: github.com/timo/json_fast/issues/2...-287570921 21:41
timotimo i forgot to use -Ilib
so i was using system-wide installed JSON::Fast %)
IOninja heh
timotimo stripped off half a gig 21:42
time perl6 -Ilib -e 'use JSON::Fast; from-json(slurp("/tmp/mem_leak.json"))'
2.83user 0.63system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 2166808maxresident)k
0inputs+0outputs (0major+537101minor)pagefaults 0swaps
timo@schmetterling ~/p/e/json_fast (master)> time perl6 -e 'use JSON::Fast; from-json(slurp("/tmp/mem_leak.json"))'
2.56user 0.79system 0:03.38elapsed 99%CPU (0avgtext+0avgdata 2620180maxresident)k
0inputs+0outputs (0major+650303minor)pagefaults 0swaps
MasterDuke nice 21:43
IOninja \o/ 21:44
timotimo not good enough, of course 21:48
but a bit better
# at t/06-control-characters.t line 42 21:49
# expected: '"\r\n"'
# got: '"
"' 21:50
welp :|
that isn't right
okay, fixed it
pushed it, you can try if it makes things better 21:51
it could be that it has been handling \r\n wrong for a while 21:52
yup, totally 21:54
sounds like a version bump is in order
MasterDuke seeing less mem usage also
re_nfg is still 49% according to perf though
timotimo right 21:55
not unexpected
we only reduced the number of concat operations
not gotten around the problem in general 21:56
MasterDuke yup
fwiw, here's heaptrack's summary now i.imgur.com/4bw1Bup.png 21:58
for just parsing it once
samcv MasterDuke, so it may be from \r \n it having to renormalize so much? 22:05
and we call re_nfg so much when we concatenate yes? 22:06
MasterDuke believe so 22:08
but timotimo's really the one who knows what's going on 22:09
samcv yeah that's pretty wasteful
i'm pretty sure i understand what's happening too
from the scrollback
MasterDuke heh, i was having the conversation and don't understand 22:10
samcv i'll look at it later today. doing stuff for the next few hours
:)
MasterDuke cool
timotimo i found it a little bit surprising that re_nfg allocated such a gigantic amount of data 22:14
samcv how much is gigantic?
MasterDuke 2g to parse a 200k json file 22:15
samcv lol.
ok that's gigantic
timotimo a little bit :3
MasterDuke i'm kind of surprised re_nfg hasn't cropped up in any other profiles 22:16
samcv maybe because we've had \n line endings? 22:17
timotimo it had lots of \r\n in it 22:20
i improved the memory usage by half a gig (a quarter in total) by special-casing \r\n to be done in one step
jnthn Huh, how on earth is re_nfg managing that... 22:25
yoleaux2 16:58Z <japhb> jnthn: 6guts.wordpress.com/2017/03/16/con...semantics/ keeps asserting incorrect arithmetic (6+5=12,9+5=15,9+5=16).
19:53Z <IOninja> jnthn: recall I kept mentioning buggable leaking? Seems it's just something with JSON::Fast (and it doesn't seem to be leaking but just using lots of RAM). Swapping to JSON::Tiny avoids the issue.
MasterDuke i think heaptrack said most of the allocations happened here github.com/MoarVM/MoarVM/blob/mast...ops.c#L100 `MVMGrapheme32 *out_buffer = MVM_malloc(bufsize * sizeof(MVMGrapheme32));`
and bufsize == in->body.num_graphs 22:26
jnthn Oh heh 22:27
Yeah, that's a pretty lazy implementation :)
Though I'm now curious how JSON::Fast triggers that case :) 22:28
MasterDuke i think it's lots of concats where !nfg_is_concat_stable 22:29
in MVM_string_join
timotimo it sees \r\n which is < 0 and so it never says is_stable
jnthn Ah
But that can never combine with anything else, so we could potentially special-case it. 22:30
But still, re_nfg could be done in a smarter way
MasterDuke most were here github.com/MoarVM/MoarVM/blob/mast...ps.c#L1237 `else if (!MVM_nfg_is_concat_stable(tc, pieces[i - 1], piece))`
jnthn has had a headache all evening, so will be pretty useless for thinking much about this :/ 22:32
timotimo yup 22:33
i had headache yesterday and was also rather useless 22:36
jnthn Had sore throat/ears the last couple of days, and somewhat today, and today gained a stuffy nose, so I guess I'm doing some kind of cold. 22:37
timotimo ugh, i hate that kind of thing
oh, it seems like i recurse in parse-string instead of looping somehow 22:38
that could explain why it joins often 22:39
yeah, if it doesn't see a " after the escape sequence, it'll recurse 22:41
that's Bad
but i have a fix 22:45
time perl6 -Ilib -e 'use JSON::Fast; from-json(slurp("/tmp/mem_leak.json"))' 22:48
1.39user 0.05system 0:01.46elapsed 98%CPU (0avgtext+0avgdata 92652maxresident)k
how do you like this?
MasterDuke hot damn! 22:49
timotimo now only has to be correct
jnthn Did it get any faster with all the reduction in allocations? :) 22:50
timotimo yeah, a bit 22:57
let me do proper measurements
samcv timotimo, can it be any json file with \r\n lineendings? 22:58
timotimo um ... huh.
no, it got significantly slower 22:59
somehow it also reached almost a gig again?!? 23:05
grr 23:08
i haven't a good clue what's going wrong 23:10
can't get it down any further :( 23:11
not sure why it was 1/10 in between
MasterDuke 1/20 even, right? 23:17
edit history you can undo through? 23:18
timotimo yeah, well
MasterDuke samcv: if you want the file we've been testing with you can find it here gist.github.com/MasterDuke17/d75d9...747f1dddf0 23:24
samcv mem_leak.json?
there's two there 23:25
MasterDuke yeah
samcv what's the profiling program yoeu're using MasterDuke 23:27
heh 2.4GB that's insane
MasterDuke the memory one? heaptrack 23:28
you're using arch right? it's in the aur
samcv yeah
MasterDuke you have to run moar directly though
i.e., the command that actually is exec'ed in perl6-m 23:29
samcv yeah i get what you mean
;)
why do we even have to re-nfg it. unless it's trying to concat two codepoints that may combine 23:36
anyway i've gotto go to dinner
renfging the whole string seems wasteful, but i don't think it should allocate that much space.. 23:37
so that needs to be fixed regardless i guess
MasterDuke well, the comment right above the allocate says `/* Create the output buffer. We used to believe it can't ever be bigger than the initial estimate, but utf8-c8 showed us otherwise. */` 23:38
but maybe that allocation can be delayed until we know more info and can be more precise about it? 23:39
samcv but that shouldn't account for all the 1.2GB though? 23:41
it's not that big of a file 23:42
MasterDuke but it's joining lots of pieces, potentially having to re_nfg each piece and the separator each time 23:43
samcv yeah 23:46
i mean we really should only renormalize the end of it 23:47
not the whole thing
MasterDuke what "it"? 23:49
samcv uh. string `in` 23:51
i'm not sure how strands work tbh. but we just need to go back along the string until we're sure it can't combine differently with the thing on the right 23:52
and then normalize that section
though i guess it's fed the string after it has already been concatted?
MasterDuke that certainly sounds faster
well, i think there are two fixes to be made. re_nfg can be made better/faster, but MVM_string_join also shouldn't be calling it in all the cases it does 23:53
samcv yeah
will be back in a few hours or something. bbl dinner 23:54
MasterDuke later...