00:01 agentzh joined 00:03 hoelzro joined 00:14 pyrimidine joined 00:32 lizmat joined 00:40 dalek joined 01:26 agentzh joined
timotimo for the record, i don't think adding the blocked & unblocked code for get_or_vivify_loop is wrong 01:57
02:11 pyrimidine joined 02:49 ilbot3 joined 03:10 pyrimidine joined 03:11 agentzh joined 03:20 pyrimidine joined 04:04 pyrimidine joined 05:08 pyrimidine joined 05:16 pyrimidine joined 06:04 pyrimidine joined 06:23 pyrimidine joined 07:05 pyrimidine joined 07:06 brrt joined
brrt good * #moarvm 07:06
.tell puddu we can have a reasoned argument about whether synchronicity 'sucks' or not. fwiw, it was just a realizaiton on my part how limiting and arbitrary synchronicity really is 07:07
and it seems to me that clone() or fork() or what have you… are pretty heavy weight concurrency primitives
07:28 pyrimidine joined 07:57 domidumont joined 08:13 brrt joined 08:20 domidumont joined 08:32 zakharyas joined
jnthn morning o/ 09:50
brrt moarning 09:57
so, i only have a few thorny things to resolve and then arglist compilation is finished
jnthn :) 09:58
brrt but, it feels like there's something i've forgotten about that
jnthn Hope not overly thorny
brrt well, just, messy, i guess
part of the mess is; 09:59
the function that does this now weighs 719 - 585 LoC 10:00
134
and still needs to deal with: CALL nodes that can take register arguments 10:01
(i.e. dynamic dispatch)
and spilling-over-CALL
i.e. all values that 'survive' the CALL node are supposed to be spilled because they are not really in memory 10:02
eh, i said that all wrong
the call may overwrite all of them
10:04 brrt joined
jnthn Hopefully there'll be some way to break it into slightly smaller parts :) 10:05
dogbert11 morning, have all ENOCOFFEE errors been taken care of? 10:13
jnthn Just about :)
Fixed one thing from yesterday already; now looking at github.com/MoarVM/MoarVM/issues/543 10:14
dogbert11 that looks vaguely familiar :) 10:18
10:27 agentzh joined 10:28 Sufrostico[m] joined 10:31 brrt joined
jnthn Think I found it 10:37
And yeah, a program could only ever hit this once in its lifetime or never again :) 10:38
So unlikely...but, as you discovered, possible
dogbert11 jnthn++ you made quick work of that one :) 10:39
did it have something to to with starting the threads? 10:41
jnthn Waiting for 50 runs to complete
Yeah, we just forgot to tell the GC we were blocking when starting the event loop worker thread
So it couldn't work-steal if it GC'd at exactly the wrong time
Geth MoarVM: e3e3b4cae4 | (Jonathan Worthington)++ | src/io/eventloop.c
Missing GC block marking in event loop starting.

This could cause very occasional hangs if GC happened right when we were starting up the event loop worker thread. Fixes #543.
10:45
jnthn Yup, 50 runs completed fine 10:48
Geth MoarVM: ec99d41f59 | (Jonathan Worthington)++ | src/6model/reprs/CArray.c
Fix CArray marshalling of type objects.

We didn't detect them and passed junk, instead of the NULL we should have.
10:53
jnthn That's one I discovered yesterday 10:54
Passed total junk instead of a NULL
CStruct seems to have a similar bug 10:56
dogbert11 you're on a roll
there are three RT's related to CStruct, i.e. 129798, 127237 and 127730. One of them might be related. 10:58
jnthn Sadly, it ain't one of those 11:01
dogbert11 :( 11:02
have you found anything else while working on the async stuff? 11:03
jnthn No, the patch I just pushed is the only Moar/Rakudo issue I ran into 11:06
Except the lack of support for function pointers in CStructs, which is a NYI rather than a bug
dogbert11 sneaks off for lunch 11:11
11:12 zakharyas joined
jnthn Oh, I was wrong, CStruct is correct 11:15
Ah well, at least now there's test coverage :)
11:31 travis-ci joined
travis-ci MoarVM build failed. Jonathan Worthington 'Missing GC block marking in event loop starting. 11:31
travis-ci.org/MoarVM/MoarVM/builds/206133785 github.com/MoarVM/MoarVM/compare/7...e3b4cae4a4
11:31 travis-ci left 11:40 ilmari[m] joined
jnthn Failure was a race with NQP_REVISION bump, fwiw 11:53
11:56 zakharyas joined 12:03 lizmat joined 12:08 ilmari[m] joined
dogbert11 is back, downloads the latest MoarVM ... 12:09
12:11 brrt left, brrt joined
lizmat just as a data point, but HARNESS_TYPE is still segfaulting occasionally 12:28
HARNESS_TYPE=6 :-)
brrt get thee a core dump 12:29
nwc10 I can't replicate this with valgrind
I can with ASAN
brrt or that
12:30 lizmat joined
dogbert11 lizmat: it would be nice if we could get rid of that error :-) 12:33
lizmat indeed :-) 12:34
dogbert11 question is if jnthn has enough info to be able to fix it, we'll see after lunch 12:35
brrt (or anyone else for that matter :-)) 12:36
dogbert11 indeed :)
I believe that nwc10 has ASAN output for the harness6 SEGV. Here's some gdb stuff gist.github.com/dogbert17/71bd542c...0bf44dae61 12:38
brrt hmm, can you get that again, with --optimize=0 12:39
(MoarVM compiled with that)
dogbert11 unfortunately harness6 is doubly broken, if TEST_JOBS=1 it runs out of file handles and if TEST_JOBS=1+ it often SEGV's
brrt: will fix ...
timotimo that's a very bogus pointer :) 12:40
(well, it's most probably an offset from a null pointer)
brrt thx dogbert11
dogbert11 with my luck it probably won't crash 12:41
lizmat ===( 0;216 50/51 0/? 95/97 834/842 134/134 0/? 112/112 1moar(49314,0x70000ce37000) malloc: *** error for object 0x7fdbbd4d6dc0: pointer being freed was not allocated 12:42
*** set a breakpoint in malloc_error_break to debug
dogbert11 brrt, lizmat: have it running now, gdb is attached 12:43
brrt bt full :-) 12:44
(i wonder if this would be a good use for hack)
dogbert11 I'm running it with TEST_JOBS=4, should be enough, we'll see 12:45
12:47 pyrimidine joined
dogbert11 brrt: gist.github.com/dogbert17/9dc96027...cd9eee8662 12:52
12:53 pyrimidine joined
brrt i see 12:55
thats pretty suspicious 12:56
dogbert11: still have an open session?
dogbert11 yes
timotimo there's no "if(cur_bytes)" in front of that? o_O
brrt cur_bytes is not NULL
timotimo i mean, how else would you reach 0x38?
brrt well, not using initialized memory 12:57
dogbert11: can you do: up 1, print *decoder
dogbert11 (gdb) print *decoder 12:58
$1 = {common = {header = {sc_forward_u = {forwarder = 0x4dc06ef8, sc = {sc_idx = 28408, idx = 19904}, sci = 0x4dc06ef8, st = 0x4dc06ef8}, owner = 3, flags = 16, size = 24}, st = 0xa6a1a68}, body = {
ds = 0x445f45a8, sep_spec = 0x4dd5a408}}
brrt doesn't look obviously corrupt... 13:00
timotimo well, what does the ds look like? 13:02
dogbert11 how do I get to that? 13:05
is this correct or nonsense? 13:07
(gdb) p *decoder.body.ds
$7 = {bytes_head = 0x38, bytes_tail = 0x0, chars_head = 0x0, chars_tail = 0x0, abs_byte_pos = 0, bytes_head_pos = 0, chars_head_pos = 0, encoding = 0, norm = {form = MVM_NORMALIZE_NFD, buffer = 0x0,
buffer_size = 0, buffer_start = 0, buffer_end = 0, buffer_norm_end = 0, first_significant = 0, quick_check_property = 0, translate_newlines = 0}, decoder_state = 0x50}
brrt it appears kind of uninitialized to me 13:08
13:08 Geth joined
dogbert11 question is whether the 'real' problem is visible in the gdb output 13:15
brrt it's pretty informative though :-) 13:17
dogbert11 added the result of a 'p MVM_dump_backtrace(tc)' to the end of the gist 13:20
Geth MoarVM: 5f9d6985a9 | (Jonathan Worthington)++ | 3 files
Provide a way to put Decoder in nl-translate mode.

So that we can use it in Proc::Async and have \r\n -> \n happen.
13:47
nwc10 jnthn++ # back from lunch with patches 13:48
nwc10 wonders what ilmari[m] will return with
jnthn Lunch featured mushrooms :)
nwc10 *I* like mushrooms. Some people don' 13:49
't
ilmari nwc10: a stomach full of burrito
nwc10 I forget the list of people I can "steal" unwanted mushrooms from
ilmari++
jnthn My wife is in the set of people I can "steal" mushrooms from :) 13:50
Meaning that I rarely cook them at home
nwc10 aha right now your comment makes more sense
jnthn But today was lunch out 13:51
And there was a mushroom-including dish to be had :)
dogbert11 do you pick mushrooms in the forests when it's season? 14:05
jnthn No; I'd have no idea which ones might kill me
14:06 Sufrostico[m] joined
dogbert11 there's one in particular, .i.e. Amanita Virosa 14:06
lizmat jnthn: also, if you pick mushrooms in that area, be sure to also take your geiger counter, as mushrooms are known concentrators of fallout 14:08
www.theverge.com/2017/2/24/14733094...-chernobyl 14:09
jnthn Going to the supermarket/restaurant feels like so much less hassle :P 14:11
dogbert11 jnthn, do you have any theories wrt lizmat's harness6 SEGV or is more debugging necessary? If so any suggestions on what I should do. 14:24
jnthn dogbert11: I need to take a look over the various bits of ASAN/GDB output and see if I can get a reproduction 14:26
Still working on the Proc::Async newline stuff at the moment though
dogbert11 cool, 'make [spectest|stresstest] HARNESS_TYPE=6 TEST_JOBS=4+' probably does the trick when you get to it 14:28
jnthn Just doing a Windows build to make sure the changes help there :) 14:29
dogbert11 hopefully all your changes will work on the first try (they never do for me though :-) 14:38
jnthn Yeah, the test file now passes on Windows 14:41
14:51 hoelzro joined 15:08 Sufrostico[m] left
jnthn Looking at gist.github.com/dogbert17/eaba7dfc...b1748fbe23 I'm pondering putting a sanity check into the Decoder REPR impl to make sure we're never using it from multiple threads at the same time 15:14
To either rule that out, or perhaps to discover that is somehow bizzarely what's happening 15:15
dogbert11 I can try it (and probably lizmat as well) 15:16
jnthn is running harness6 at the moment
lizmat dogbert11: actually am in the middle of something else atm, so please go ahead :-)
jnthn So far up to S10 and going strong
dogbert11 it usually takes a while and as always it doesn't fail everytime 15:17
jnthn have, I got a SEGV
dogbert11 .oO
lizmat hmmm... I got a fail in t/spec/S17-supply/supplier-preserving.t (test 1) 15:18
in HARNESS_TYPE=5
dogbert11 uh oh
15:18 brrt joined
jnthn Aww it said core dumped but I can't actually find the core file 15:18
I thought they ended up in the cwd 15:19
dogbert11 did you start from the rakudo dir?
jnthn yeah 15:20
timotimo on modern systems with journald it might have put the core file into the journal for you
dogbert11 or did you forget 'ulimit -c unlimited'
jnthn This is just a stock ubuntu 16.04
dogbert11: oh...d'oh
I guess that doesn't stick between sessions :) 15:21
lizmat: Urgh, ran that 50 times without issue earlier today... 15:22
I assume you have the latest version of the test file?
lizmat think so
jnthn (I corrected one issue in it this morning)
Though that affected test 11 15:23
lizmat yeah, most recent
flapper :-( no pbs this time 15:24
[Coke] u: - 15:29
dogbert11 t/04-nativecall/06-struct.t .............. Failed 5/28 subtests # hmm 15:39
interesting, the test file passes if run with ./perl6 but not if they're run with make. looks like a fudgeup :) 15:47
Geth MoarVM: 296ece0d28 | (Jonathan Worthington)++ | 2 files
Ensure Decoder REPR never sees concurrent use.
16:00
jnthn Never tripped so far locally 16:01
Though a good sanity check to have in there
All the SEGVs I see are from gen2 gc of a p6bigint 16:02
dogbert11 so it's still a mystery? 16:05
timotimo the bigints again?! 16:06
or still?
dogbert11 are they a BIG problem :)
timotimo ugh :)
16:12 brrt joined
jnthn Interestingly, the latest SEGV was in add_guards_and_facts 16:16
dogbert11 does that give any clues? 16:18
timotimo oh, yikes. that's spesh, isn't it? 16:22
jnthn Yeah
Too early to say; so far it's apparently that the arg we are considering contains a Scalar container, which has a value that points to a mostly-but-not-quite zeroed bit of memory 16:23
16:25 brrt joined 16:30 agentzh joined
jnthn Also, sadly, I only have a core dump for that one rather than having caught it under the debugger, which makes analysis a bit harder 16:49
16:54 pyrimidine joined 16:58 timotimo joined 17:11 domidumont joined 17:12 zakharyas joined 17:15 pyrimidine joined 17:18 domidumont joined 17:21 pyrimidine joined
dogbert17 jnthn, are there any checks/asserts you can smuggle in somewhere, then we can test while you rest :-) 17:21
17:26 agentzh joined
jnthn Typical, after the latest one I've just put in it didn't crash 17:33
dogbert17 :(
dogbert17 is running as stresstest as well 17:34
jnthn Pushed another option for debugging; can turn it on by changing MVM_ARRAY_CONC_DEBUG 0 to a 1 17:41
Geth: y u no report commit?
17:41 ilmari[m] joined
jnthn No luck so far, anyway 17:42
dogbert17 cool
it's an elusive bug 17:43
jnthn Indeed
Best bet seems to be gradually ruling out things it isn't
dogbert17 yes indeed 17:44
jnthn It's interesting that in every crash I've had except the spesh one, it's been in gc_cleanup
dogbert17 checks his gists 17:45
jnthn I've never seen the decode stream one
But the thing I put in today to prevent concurrent operations on it will help us rule out the obvious failure mode there 17:46
dogbert17 did you release the first check, i.e. did it come with the latest MoarVM bump? 17:47
jnthn No, the latest two are at MoarVM HEAD 17:48
And the final one is just a debug feature
dogbert17 good to know
jnthn Seems to crash less with the latter debug check turned on 17:50
dogbert17 all my gists have MVM_string_decodestrem_destroy in them. don't remember if nwc10's ASAN report was the same
jnthn But when it does, (a) same failure mode, (b) it didn't trip the check
dogbert17 uh oh
jnthn Yeah, I don't have that in any of mine. Bizzare
dogbert17 who knows, this one might turn out to be blogworthy 17:52
17:52 agentzh joined
jnthn :) 17:53
Think I'll break off for now
Though will keep it running to see if I get any different failure modes
dogbert17 I will try your checks in the meantime 17:55
18:17 zakharyas joined
[Coke] engineering.instagram.com/dismissi....wcdeivlvd - "Dismissing Python Garbage Collection at Instagram" 18:26
timotimo it feels like that was already mentioned here 18:28
in the context of "look, they do the 'let the OS clean house' thing, too!"
[Coke] ah, sorry 18:29
timotimo the "reading is a writing operation" point is quite interesting, too 18:31
it's good that we don't do refcounts
Geth MoarVM: 773711e114 | (Jonathan Worthington)++ | 2 files
Debug option to detect concurrent VMArray use.

We should make this memory-safe in the future; in the meantime, this can be turned on to see if it's to blame for problems. Doesn't yet catch all cases (for example, read while write isn't yet caught).
18:35
19:29 Ven joined 19:44 yoleaux2 joined 19:47 pyrimidine joined 19:52 agentzh joined
nwc10 jnthn++ 19:54
jnthn: paste.scsys.co.uk/555094 19:55
your new thing catches it
19:55 pyrimidine joined
nwc10 "Deocder" 19:55
I CAN HAZ VARIANT SPEELNG
dogbert17 nwc10++ nice catch 20:03
nwc10 I think it's far more doubleplussjnthn for having a hunch about what the crazy problem might be 20:04
I just built it and ran it in a loop
dogbert17 deocders are way above my paygrade :)
jnthn Interesting 20:09
Though the question remains why I got the SEGV *and* didn't trip warning...
Uh, error
nwc10 yours or mine?
jnthn I locally got a SEGV
nwc10 ahOK
I have
jnthn After putting that decoder check in
nwc10 #define MVM_ARRAY_CONC_DEBUG 1
and
define FSA_SIZE_DEBUG 1
jnthn So it's really interesting that it hit it 20:10
nwc10 and this is ASAN
jnthn But on the other hand...might not be our SEGV
nwc10 and somehow I could never make problems with valgrind
jnthn It seems very timing-sensitive
dogbert17 have run several times with 'MVM_ARRAY_CONC_DEBUG 1', no crash 20:11
jnthn Like, that MVM_ARRAY_CONC_DEBUG makes it a lot less likely to break
And that's much less of a slow-down than valgrind 20:12
Valgrind serializes everything onto one thread
All of this points fairly strongly towards a data race
nwc10 ohhhhhh. I didn't know that
jnthn The question is what in.
Might be worth me pouring over helgrind output some 20:13
lizmat jnthn: I still suspect something to do with grammars, as that is what HARNESS_TYPE=6 is doing a lot, parsing TAP
jnthn Though I might need to work on suppressions and cleanup of various things
lizmat I've been tempted to speed up prove6 by not using grammars, but then the segfaults may go 20:14
jnthn Does prove6 need speeding up? :) 20:15
But yes, let's make it work first before worrying about that :P
jnthn got the SEGV again 20:16
Yet again, it's in mp_clear, when doing MVM_gc_collect_free_gen2_unmarked
It's especially odd that I've not yet seen the DecodeStream failure mode in a load of runs 20:17
We can likely put together something small that stresses the relevant code paths on that one, though
20:27 zakharyas joined
dogbert17 got a SEGV 20:45
0 0x40024cb0 in ?? ()
#1 0x404f74ba in malloc_printerr (action=<optimized out>, str=0x405e9df4 "double free or corruption (out)", ptr=0x4a542578) at malloc.c:4996
#2 0x404f812d in _int_free (av=0x4062f420 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3840
had turned off MVM_ARRAY_CONC_DEBUG though 20:46
20:55 geekosaur joined
nwc10 got ASAN barfage, so the bug that the debug stuff traps is not the only one 21:25
paste.scsys.co.uk/555095 21:26
21:30 agentzh joined
dogbert17 and I got this bizarre output all of sudden, gist.github.com/dogbert17/33da77db...e83b1ae4f3 21:31
what a rats nest 21:34
two or more bugs interacting, brrr 21:35
ok, got another SEGV, (well SIGABRT) this time with MVM_ARRAY_CONC_DEBUG = 1 21:43
jnthn nwc10: Yeah, that's the one I get nearly every time 21:46
dogbert17 could there be somthing wrong with the grammar in TAP.pm6 21:48
jnthn What will be interseting is if we get any more ASAN or GDB output that points to decode streams, or if all those are now turned into the exception throws 21:50
I'm quite confident we can golf that one down
dogbert17: It could be, but...grammars aren't a particularly notable sources of Int 21:51
*source
And the SEGVs (aside from the decode stream ones, which it seems we mighta turned into an exception now) all point to GC of an Int 21:52
dogbert17 jnthn: did you see gist.github.com/dogbert17/33da77db...e83b1ae4f3 ?
jnthn oh, output-ruler 21:53
That's where we were in spesh at the point I got a core dump earlier 21:54
That's...an interesting coincidence
Off to rest o/ 22:01
dogbert17 night 22:03
jnthn 'night
22:45 pyrimidine joined 23:08 pyrimidine joined