00:01
agentzh joined
00:03
hoelzro joined
00:14
pyrimidine joined
00:32
lizmat joined
00:40
dalek joined
01:26
agentzh joined
|
|||
timotimo | for the record, i don't think adding the blocked & unblocked code for get_or_vivify_loop is wrong | 01:57 | |
02:11
pyrimidine joined
02:49
ilbot3 joined
03:10
pyrimidine joined
03:11
agentzh joined
03:20
pyrimidine joined
04:04
pyrimidine joined
05:08
pyrimidine joined
05:16
pyrimidine joined
06:04
pyrimidine joined
06:23
pyrimidine joined
07:05
pyrimidine joined
07:06
brrt joined
|
|||
brrt | good * #moarvm | 07:06 | |
.tell puddu we can have a reasoned argument about whether synchronicity 'sucks' or not. fwiw, it was just a realizaiton on my part how limiting and arbitrary synchronicity really is | 07:07 | ||
and it seems to me that clone() or fork() or what have you… are pretty heavy weight concurrency primitives | |||
07:28
pyrimidine joined
07:57
domidumont joined
08:13
brrt joined
08:20
domidumont joined
08:32
zakharyas joined
|
|||
jnthn | morning o/ | 09:50 | |
brrt | moarning | 09:57 | |
so, i only have a few thorny things to resolve and then arglist compilation is finished | |||
jnthn | :) | 09:58 | |
brrt | but, it feels like there's something i've forgotten about that | ||
jnthn | Hope not overly thorny | ||
brrt | well, just, messy, i guess | ||
part of the mess is; | 09:59 | ||
the function that does this now weighs 719 - 585 LoC | 10:00 | ||
134 | |||
and still needs to deal with: CALL nodes that can take register arguments | 10:01 | ||
(i.e. dynamic dispatch) | |||
and spilling-over-CALL | |||
i.e. all values that 'survive' the CALL node are supposed to be spilled because they are not really in memory | 10:02 | ||
eh, i said that all wrong | |||
the call may overwrite all of them | |||
10:04
brrt joined
|
|||
jnthn | Hopefully there'll be some way to break it into slightly smaller parts :) | 10:05 | |
dogbert11 | morning, have all ENOCOFFEE errors been taken care of? | 10:13 | |
jnthn | Just about :) | ||
Fixed one thing from yesterday already; now looking at github.com/MoarVM/MoarVM/issues/543 | 10:14 | ||
dogbert11 | that looks vaguely familiar :) | 10:18 | |
10:27
agentzh joined
10:28
Sufrostico[m] joined
10:31
brrt joined
|
|||
jnthn | Think I found it | 10:37 | |
And yeah, a program could only ever hit this once in its lifetime or never again :) | 10:38 | ||
So unlikely...but, as you discovered, possible | |||
dogbert11 | jnthn++ you made quick work of that one :) | 10:39 | |
did it have something to to with starting the threads? | 10:41 | ||
jnthn | Waiting for 50 runs to complete | ||
Yeah, we just forgot to tell the GC we were blocking when starting the event loop worker thread | |||
So it couldn't work-steal if it GC'd at exactly the wrong time | |||
Geth | MoarVM: e3e3b4cae4 | (Jonathan Worthington)++ | src/io/eventloop.c Missing GC block marking in event loop starting. This could cause very occasional hangs if GC happened right when we were starting up the event loop worker thread. Fixes #543. |
10:45 | |
jnthn | Yup, 50 runs completed fine | 10:48 | |
Geth | MoarVM: ec99d41f59 | (Jonathan Worthington)++ | src/6model/reprs/CArray.c Fix CArray marshalling of type objects. We didn't detect them and passed junk, instead of the NULL we should have. |
10:53 | |
jnthn | That's one I discovered yesterday | 10:54 | |
Passed total junk instead of a NULL | |||
CStruct seems to have a similar bug | 10:56 | ||
dogbert11 | you're on a roll | ||
there are three RT's related to CStruct, i.e. 129798, 127237 and 127730. One of them might be related. | 10:58 | ||
jnthn | Sadly, it ain't one of those | 11:01 | |
dogbert11 | :( | 11:02 | |
have you found anything else while working on the async stuff? | 11:03 | ||
jnthn | No, the patch I just pushed is the only Moar/Rakudo issue I ran into | 11:06 | |
Except the lack of support for function pointers in CStructs, which is a NYI rather than a bug | |||
dogbert11 sneaks off for lunch | 11:11 | ||
11:12
zakharyas joined
|
|||
jnthn | Oh, I was wrong, CStruct is correct | 11:15 | |
Ah well, at least now there's test coverage :) | |||
11:31
travis-ci joined
|
|||
travis-ci | MoarVM build failed. Jonathan Worthington 'Missing GC block marking in event loop starting. | 11:31 | |
travis-ci.org/MoarVM/MoarVM/builds/206133785 github.com/MoarVM/MoarVM/compare/7...e3b4cae4a4 | |||
11:31
travis-ci left
11:40
ilmari[m] joined
|
|||
jnthn | Failure was a race with NQP_REVISION bump, fwiw | 11:53 | |
11:56
zakharyas joined
12:03
lizmat joined
12:08
ilmari[m] joined
|
|||
dogbert11 is back, downloads the latest MoarVM ... | 12:09 | ||
12:11
brrt left,
brrt joined
|
|||
lizmat | just as a data point, but HARNESS_TYPE is still segfaulting occasionally | 12:28 | |
HARNESS_TYPE=6 :-) | |||
brrt | get thee a core dump | 12:29 | |
nwc10 | I can't replicate this with valgrind | ||
I can with ASAN | |||
brrt | or that | ||
12:30
lizmat joined
|
|||
dogbert11 | lizmat: it would be nice if we could get rid of that error :-) | 12:33 | |
lizmat | indeed :-) | 12:34 | |
dogbert11 | question is if jnthn has enough info to be able to fix it, we'll see after lunch | 12:35 | |
brrt | (or anyone else for that matter :-)) | 12:36 | |
dogbert11 | indeed :) | ||
I believe that nwc10 has ASAN output for the harness6 SEGV. Here's some gdb stuff gist.github.com/dogbert17/71bd542c...0bf44dae61 | 12:38 | ||
brrt | hmm, can you get that again, with --optimize=0 | 12:39 | |
(MoarVM compiled with that) | |||
dogbert11 | unfortunately harness6 is doubly broken, if TEST_JOBS=1 it runs out of file handles and if TEST_JOBS=1+ it often SEGV's | ||
brrt: will fix ... | |||
timotimo | that's a very bogus pointer :) | 12:40 | |
(well, it's most probably an offset from a null pointer) | |||
brrt | thx dogbert11 | ||
dogbert11 | with my luck it probably won't crash | 12:41 | |
lizmat | ===( 0;216 50/51 0/? 95/97 834/842 134/134 0/? 112/112 1moar(49314,0x70000ce37000) malloc: *** error for object 0x7fdbbd4d6dc0: pointer being freed was not allocated | 12:42 | |
*** set a breakpoint in malloc_error_break to debug | |||
dogbert11 | brrt, lizmat: have it running now, gdb is attached | 12:43 | |
brrt | bt full :-) | 12:44 | |
(i wonder if this would be a good use for hack) | |||
dogbert11 | I'm running it with TEST_JOBS=4, should be enough, we'll see | 12:45 | |
12:47
pyrimidine joined
|
|||
dogbert11 | brrt: gist.github.com/dogbert17/9dc96027...cd9eee8662 | 12:52 | |
12:53
pyrimidine joined
|
|||
brrt | i see | 12:55 | |
thats pretty suspicious | 12:56 | ||
dogbert11: still have an open session? | |||
dogbert11 | yes | ||
timotimo | there's no "if(cur_bytes)" in front of that? o_O | ||
brrt | cur_bytes is not NULL | ||
timotimo | i mean, how else would you reach 0x38? | ||
brrt | well, not using initialized memory | 12:57 | |
dogbert11: can you do: up 1, print *decoder | |||
dogbert11 | (gdb) print *decoder | 12:58 | |
$1 = {common = {header = {sc_forward_u = {forwarder = 0x4dc06ef8, sc = {sc_idx = 28408, idx = 19904}, sci = 0x4dc06ef8, st = 0x4dc06ef8}, owner = 3, flags = 16, size = 24}, st = 0xa6a1a68}, body = { | |||
ds = 0x445f45a8, sep_spec = 0x4dd5a408}} | |||
brrt | doesn't look obviously corrupt... | 13:00 | |
timotimo | well, what does the ds look like? | 13:02 | |
dogbert11 | how do I get to that? | 13:05 | |
is this correct or nonsense? | 13:07 | ||
(gdb) p *decoder.body.ds | |||
$7 = {bytes_head = 0x38, bytes_tail = 0x0, chars_head = 0x0, chars_tail = 0x0, abs_byte_pos = 0, bytes_head_pos = 0, chars_head_pos = 0, encoding = 0, norm = {form = MVM_NORMALIZE_NFD, buffer = 0x0, | |||
buffer_size = 0, buffer_start = 0, buffer_end = 0, buffer_norm_end = 0, first_significant = 0, quick_check_property = 0, translate_newlines = 0}, decoder_state = 0x50} | |||
brrt | it appears kind of uninitialized to me | 13:08 | |
13:08
Geth joined
|
|||
dogbert11 | question is whether the 'real' problem is visible in the gdb output | 13:15 | |
brrt | it's pretty informative though :-) | 13:17 | |
dogbert11 | added the result of a 'p MVM_dump_backtrace(tc)' to the end of the gist | 13:20 | |
Geth | MoarVM: 5f9d6985a9 | (Jonathan Worthington)++ | 3 files Provide a way to put Decoder in nl-translate mode. So that we can use it in Proc::Async and have \r\n -> \n happen. |
13:47 | |
nwc10 | jnthn++ # back from lunch with patches | 13:48 | |
nwc10 wonders what ilmari[m] will return with | |||
jnthn | Lunch featured mushrooms :) | ||
nwc10 | *I* like mushrooms. Some people don' | 13:49 | |
't | |||
ilmari | nwc10: a stomach full of burrito | ||
nwc10 | I forget the list of people I can "steal" unwanted mushrooms from | ||
ilmari++ | |||
jnthn | My wife is in the set of people I can "steal" mushrooms from :) | 13:50 | |
Meaning that I rarely cook them at home | |||
nwc10 | aha right now your comment makes more sense | ||
jnthn | But today was lunch out | 13:51 | |
And there was a mushroom-including dish to be had :) | |||
dogbert11 | do you pick mushrooms in the forests when it's season? | 14:05 | |
jnthn | No; I'd have no idea which ones might kill me | ||
14:06
Sufrostico[m] joined
|
|||
dogbert11 | there's one in particular, .i.e. Amanita Virosa | 14:06 | |
lizmat | jnthn: also, if you pick mushrooms in that area, be sure to also take your geiger counter, as mushrooms are known concentrators of fallout | 14:08 | |
www.theverge.com/2017/2/24/14733094...-chernobyl | 14:09 | ||
jnthn | Going to the supermarket/restaurant feels like so much less hassle :P | 14:11 | |
dogbert11 | jnthn, do you have any theories wrt lizmat's harness6 SEGV or is more debugging necessary? If so any suggestions on what I should do. | 14:24 | |
jnthn | dogbert11: I need to take a look over the various bits of ASAN/GDB output and see if I can get a reproduction | 14:26 | |
Still working on the Proc::Async newline stuff at the moment though | |||
dogbert11 | cool, 'make [spectest|stresstest] HARNESS_TYPE=6 TEST_JOBS=4+' probably does the trick when you get to it | 14:28 | |
jnthn | Just doing a Windows build to make sure the changes help there :) | 14:29 | |
dogbert11 | hopefully all your changes will work on the first try (they never do for me though :-) | 14:38 | |
jnthn | Yeah, the test file now passes on Windows | 14:41 | |
14:51
hoelzro joined
15:08
Sufrostico[m] left
|
|||
jnthn | Looking at gist.github.com/dogbert17/eaba7dfc...b1748fbe23 I'm pondering putting a sanity check into the Decoder REPR impl to make sure we're never using it from multiple threads at the same time | 15:14 | |
To either rule that out, or perhaps to discover that is somehow bizzarely what's happening | 15:15 | ||
dogbert11 | I can try it (and probably lizmat as well) | 15:16 | |
jnthn is running harness6 at the moment | |||
lizmat | dogbert11: actually am in the middle of something else atm, so please go ahead :-) | ||
jnthn | So far up to S10 and going strong | ||
dogbert11 | it usually takes a while and as always it doesn't fail everytime | 15:17 | |
jnthn | have, I got a SEGV | ||
dogbert11 | .oO | ||
lizmat | hmmm... I got a fail in t/spec/S17-supply/supplier-preserving.t (test 1) | 15:18 | |
in HARNESS_TYPE=5 | |||
dogbert11 | uh oh | ||
15:18
brrt joined
|
|||
jnthn | Aww it said core dumped but I can't actually find the core file | 15:18 | |
I thought they ended up in the cwd | 15:19 | ||
dogbert11 | did you start from the rakudo dir? | ||
jnthn | yeah | 15:20 | |
timotimo | on modern systems with journald it might have put the core file into the journal for you | ||
dogbert11 | or did you forget 'ulimit -c unlimited' | ||
jnthn | This is just a stock ubuntu 16.04 | ||
dogbert11: oh...d'oh | |||
I guess that doesn't stick between sessions :) | 15:21 | ||
lizmat: Urgh, ran that 50 times without issue earlier today... | 15:22 | ||
I assume you have the latest version of the test file? | |||
lizmat | think so | ||
jnthn | (I corrected one issue in it this morning) | ||
Though that affected test 11 | 15:23 | ||
lizmat | yeah, most recent | ||
flapper :-( no pbs this time | 15:24 | ||
[Coke] | u: - | 15:29 | |
dogbert11 | t/04-nativecall/06-struct.t .............. Failed 5/28 subtests # hmm | 15:39 | |
interesting, the test file passes if run with ./perl6 but not if they're run with make. looks like a fudgeup :) | 15:47 | ||
Geth | MoarVM: 296ece0d28 | (Jonathan Worthington)++ | 2 files Ensure Decoder REPR never sees concurrent use. |
16:00 | |
jnthn | Never tripped so far locally | 16:01 | |
Though a good sanity check to have in there | |||
All the SEGVs I see are from gen2 gc of a p6bigint | 16:02 | ||
dogbert11 | so it's still a mystery? | 16:05 | |
timotimo | the bigints again?! | 16:06 | |
or still? | |||
dogbert11 | are they a BIG problem :) | ||
timotimo | ugh :) | ||
16:12
brrt joined
|
|||
jnthn | Interestingly, the latest SEGV was in add_guards_and_facts | 16:16 | |
dogbert11 | does that give any clues? | 16:18 | |
timotimo | oh, yikes. that's spesh, isn't it? | 16:22 | |
jnthn | Yeah | ||
Too early to say; so far it's apparently that the arg we are considering contains a Scalar container, which has a value that points to a mostly-but-not-quite zeroed bit of memory | 16:23 | ||
16:25
brrt joined
16:30
agentzh joined
|
|||
jnthn | Also, sadly, I only have a core dump for that one rather than having caught it under the debugger, which makes analysis a bit harder | 16:49 | |
16:54
pyrimidine joined
16:58
timotimo joined
17:11
domidumont joined
17:12
zakharyas joined
17:15
pyrimidine joined
17:18
domidumont joined
17:21
pyrimidine joined
|
|||
dogbert17 | jnthn, are there any checks/asserts you can smuggle in somewhere, then we can test while you rest :-) | 17:21 | |
17:26
agentzh joined
|
|||
jnthn | Typical, after the latest one I've just put in it didn't crash | 17:33 | |
dogbert17 | :( | ||
dogbert17 is running as stresstest as well | 17:34 | ||
jnthn | Pushed another option for debugging; can turn it on by changing MVM_ARRAY_CONC_DEBUG 0 to a 1 | 17:41 | |
Geth: y u no report commit? | |||
17:41
ilmari[m] joined
|
|||
jnthn | No luck so far, anyway | 17:42 | |
dogbert17 | cool | ||
it's an elusive bug | 17:43 | ||
jnthn | Indeed | ||
Best bet seems to be gradually ruling out things it isn't | |||
dogbert17 | yes indeed | 17:44 | |
jnthn | It's interesting that in every crash I've had except the spesh one, it's been in gc_cleanup | ||
dogbert17 checks his gists | 17:45 | ||
jnthn | I've never seen the decode stream one | ||
But the thing I put in today to prevent concurrent operations on it will help us rule out the obvious failure mode there | 17:46 | ||
dogbert17 | did you release the first check, i.e. did it come with the latest MoarVM bump? | 17:47 | |
jnthn | No, the latest two are at MoarVM HEAD | 17:48 | |
And the final one is just a debug feature | |||
dogbert17 | good to know | ||
jnthn | Seems to crash less with the latter debug check turned on | 17:50 | |
dogbert17 | all my gists have MVM_string_decodestrem_destroy in them. don't remember if nwc10's ASAN report was the same | ||
jnthn | But when it does, (a) same failure mode, (b) it didn't trip the check | ||
dogbert17 | uh oh | ||
jnthn | Yeah, I don't have that in any of mine. Bizzare | ||
dogbert17 | who knows, this one might turn out to be blogworthy | 17:52 | |
17:52
agentzh joined
|
|||
jnthn | :) | 17:53 | |
Think I'll break off for now | |||
Though will keep it running to see if I get any different failure modes | |||
dogbert17 | I will try your checks in the meantime | 17:55 | |
18:17
zakharyas joined
|
|||
[Coke] | engineering.instagram.com/dismissi....wcdeivlvd - "Dismissing Python Garbage Collection at Instagram" | 18:26 | |
timotimo | it feels like that was already mentioned here | 18:28 | |
in the context of "look, they do the 'let the OS clean house' thing, too!" | |||
[Coke] | ah, sorry | 18:29 | |
timotimo | the "reading is a writing operation" point is quite interesting, too | 18:31 | |
it's good that we don't do refcounts | |||
Geth | MoarVM: 773711e114 | (Jonathan Worthington)++ | 2 files Debug option to detect concurrent VMArray use. We should make this memory-safe in the future; in the meantime, this can be turned on to see if it's to blame for problems. Doesn't yet catch all cases (for example, read while write isn't yet caught). |
18:35 | |
19:29
Ven joined
19:44
yoleaux2 joined
19:47
pyrimidine joined
19:52
agentzh joined
|
|||
nwc10 | jnthn++ | 19:54 | |
jnthn: paste.scsys.co.uk/555094 | 19:55 | ||
your new thing catches it | |||
19:55
pyrimidine joined
|
|||
nwc10 | "Deocder" | 19:55 | |
I CAN HAZ VARIANT SPEELNG | |||
dogbert17 | nwc10++ nice catch | 20:03 | |
nwc10 | I think it's far more doubleplussjnthn for having a hunch about what the crazy problem might be | 20:04 | |
I just built it and ran it in a loop | |||
dogbert17 | deocders are way above my paygrade :) | ||
jnthn | Interesting | 20:09 | |
Though the question remains why I got the SEGV *and* didn't trip warning... | |||
Uh, error | |||
nwc10 | yours or mine? | ||
jnthn | I locally got a SEGV | ||
nwc10 | ahOK | ||
I have | |||
jnthn | After putting that decoder check in | ||
nwc10 | #define MVM_ARRAY_CONC_DEBUG 1 | ||
and | |||
define FSA_SIZE_DEBUG 1 | |||
jnthn | So it's really interesting that it hit it | 20:10 | |
nwc10 | and this is ASAN | ||
jnthn | But on the other hand...might not be our SEGV | ||
nwc10 | and somehow I could never make problems with valgrind | ||
jnthn | It seems very timing-sensitive | ||
dogbert17 | have run several times with 'MVM_ARRAY_CONC_DEBUG 1', no crash | 20:11 | |
jnthn | Like, that MVM_ARRAY_CONC_DEBUG makes it a lot less likely to break | ||
And that's much less of a slow-down than valgrind | 20:12 | ||
Valgrind serializes everything onto one thread | |||
All of this points fairly strongly towards a data race | |||
nwc10 | ohhhhhh. I didn't know that | ||
jnthn | The question is what in. | ||
Might be worth me pouring over helgrind output some | 20:13 | ||
lizmat | jnthn: I still suspect something to do with grammars, as that is what HARNESS_TYPE=6 is doing a lot, parsing TAP | ||
jnthn | Though I might need to work on suppressions and cleanup of various things | ||
lizmat | I've been tempted to speed up prove6 by not using grammars, but then the segfaults may go | 20:14 | |
jnthn | Does prove6 need speeding up? :) | 20:15 | |
But yes, let's make it work first before worrying about that :P | |||
jnthn got the SEGV again | 20:16 | ||
Yet again, it's in mp_clear, when doing MVM_gc_collect_free_gen2_unmarked | |||
It's especially odd that I've not yet seen the DecodeStream failure mode in a load of runs | 20:17 | ||
We can likely put together something small that stresses the relevant code paths on that one, though | |||
20:27
zakharyas joined
|
|||
dogbert17 | got a SEGV | 20:45 | |
0 0x40024cb0 in ?? () | |||
#1 0x404f74ba in malloc_printerr (action=<optimized out>, str=0x405e9df4 "double free or corruption (out)", ptr=0x4a542578) at malloc.c:4996 | |||
#2 0x404f812d in _int_free (av=0x4062f420 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3840 | |||
had turned off MVM_ARRAY_CONC_DEBUG though | 20:46 | ||
20:55
geekosaur joined
|
|||
nwc10 | got ASAN barfage, so the bug that the debug stuff traps is not the only one | 21:25 | |
paste.scsys.co.uk/555095 | 21:26 | ||
21:30
agentzh joined
|
|||
dogbert17 | and I got this bizarre output all of sudden, gist.github.com/dogbert17/33da77db...e83b1ae4f3 | 21:31 | |
what a rats nest | 21:34 | ||
two or more bugs interacting, brrr | 21:35 | ||
ok, got another SEGV, (well SIGABRT) this time with MVM_ARRAY_CONC_DEBUG = 1 | 21:43 | ||
jnthn | nwc10: Yeah, that's the one I get nearly every time | 21:46 | |
dogbert17 | could there be somthing wrong with the grammar in TAP.pm6 | 21:48 | |
jnthn | What will be interseting is if we get any more ASAN or GDB output that points to decode streams, or if all those are now turned into the exception throws | 21:50 | |
I'm quite confident we can golf that one down | |||
dogbert17: It could be, but...grammars aren't a particularly notable sources of Int | 21:51 | ||
*source | |||
And the SEGVs (aside from the decode stream ones, which it seems we mighta turned into an exception now) all point to GC of an Int | 21:52 | ||
dogbert17 | jnthn: did you see gist.github.com/dogbert17/33da77db...e83b1ae4f3 ? | ||
jnthn | oh, output-ruler | 21:53 | |
That's where we were in spesh at the point I got a core dump earlier | 21:54 | ||
That's...an interesting coincidence | |||
Off to rest o/ | 22:01 | ||
dogbert17 | night | 22:03 | |
jnthn | 'night | ||
22:45
pyrimidine joined
23:08
pyrimidine joined
|