|
00:01
agentzh joined
00:03
hoelzro joined
00:14
pyrimidine joined
00:32
lizmat joined
00:40
dalek joined
01:26
agentzh joined
|
|||
| timotimo | for the record, i don't think adding the blocked & unblocked code for get_or_vivify_loop is wrong | 01:57 | |
|
02:11
pyrimidine joined
02:49
ilbot3 joined
03:10
pyrimidine joined
03:11
agentzh joined
03:20
pyrimidine joined
04:04
pyrimidine joined
05:08
pyrimidine joined
05:16
pyrimidine joined
06:04
pyrimidine joined
06:23
pyrimidine joined
07:05
pyrimidine joined
07:06
brrt joined
|
|||
| brrt | good * #moarvm | 07:06 | |
| .tell puddu we can have a reasoned argument about whether synchronicity 'sucks' or not. fwiw, it was just a realizaiton on my part how limiting and arbitrary synchronicity really is | 07:07 | ||
| and it seems to me that clone() or fork() or what have you… are pretty heavy weight concurrency primitives | |||
|
07:28
pyrimidine joined
07:57
domidumont joined
08:13
brrt joined
08:20
domidumont joined
08:32
zakharyas joined
|
|||
| jnthn | morning o/ | 09:50 | |
| brrt | moarning | 09:57 | |
| so, i only have a few thorny things to resolve and then arglist compilation is finished | |||
| jnthn | :) | 09:58 | |
| brrt | but, it feels like there's something i've forgotten about that | ||
| jnthn | Hope not overly thorny | ||
| brrt | well, just, messy, i guess | ||
| part of the mess is; | 09:59 | ||
| the function that does this now weighs 719 - 585 LoC | 10:00 | ||
| 134 | |||
| and still needs to deal with: CALL nodes that can take register arguments | 10:01 | ||
| (i.e. dynamic dispatch) | |||
| and spilling-over-CALL | |||
| i.e. all values that 'survive' the CALL node are supposed to be spilled because they are not really in memory | 10:02 | ||
| eh, i said that all wrong | |||
| the call may overwrite all of them | |||
|
10:04
brrt joined
|
|||
| jnthn | Hopefully there'll be some way to break it into slightly smaller parts :) | 10:05 | |
| dogbert11 | morning, have all ENOCOFFEE errors been taken care of? | 10:13 | |
| jnthn | Just about :) | ||
| Fixed one thing from yesterday already; now looking at github.com/MoarVM/MoarVM/issues/543 | 10:14 | ||
| dogbert11 | that looks vaguely familiar :) | 10:18 | |
|
10:27
agentzh joined
10:28
Sufrostico[m] joined
10:31
brrt joined
|
|||
| jnthn | Think I found it | 10:37 | |
| And yeah, a program could only ever hit this once in its lifetime or never again :) | 10:38 | ||
| So unlikely...but, as you discovered, possible | |||
| dogbert11 | jnthn++ you made quick work of that one :) | 10:39 | |
| did it have something to to with starting the threads? | 10:41 | ||
| jnthn | Waiting for 50 runs to complete | ||
| Yeah, we just forgot to tell the GC we were blocking when starting the event loop worker thread | |||
| So it couldn't work-steal if it GC'd at exactly the wrong time | |||
| Geth | MoarVM: e3e3b4cae4 | (Jonathan Worthington)++ | src/io/eventloop.c Missing GC block marking in event loop starting. This could cause very occasional hangs if GC happened right when we were starting up the event loop worker thread. Fixes #543. |
10:45 | |
| jnthn | Yup, 50 runs completed fine | 10:48 | |
| Geth | MoarVM: ec99d41f59 | (Jonathan Worthington)++ | src/6model/reprs/CArray.c Fix CArray marshalling of type objects. We didn't detect them and passed junk, instead of the NULL we should have. |
10:53 | |
| jnthn | That's one I discovered yesterday | 10:54 | |
| Passed total junk instead of a NULL | |||
| CStruct seems to have a similar bug | 10:56 | ||
| dogbert11 | you're on a roll | ||
| there are three RT's related to CStruct, i.e. 129798, 127237 and 127730. One of them might be related. | 10:58 | ||
| jnthn | Sadly, it ain't one of those | 11:01 | |
| dogbert11 | :( | 11:02 | |
| have you found anything else while working on the async stuff? | 11:03 | ||
| jnthn | No, the patch I just pushed is the only Moar/Rakudo issue I ran into | 11:06 | |
| Except the lack of support for function pointers in CStructs, which is a NYI rather than a bug | |||
| dogbert11 sneaks off for lunch | 11:11 | ||
|
11:12
zakharyas joined
|
|||
| jnthn | Oh, I was wrong, CStruct is correct | 11:15 | |
| Ah well, at least now there's test coverage :) | |||
|
11:31
travis-ci joined
|
|||
| travis-ci | MoarVM build failed. Jonathan Worthington 'Missing GC block marking in event loop starting. | 11:31 | |
| travis-ci.org/MoarVM/MoarVM/builds/206133785 github.com/MoarVM/MoarVM/compare/7...e3b4cae4a4 | |||
|
11:31
travis-ci left
11:40
ilmari[m] joined
|
|||
| jnthn | Failure was a race with NQP_REVISION bump, fwiw | 11:53 | |
|
11:56
zakharyas joined
12:03
lizmat joined
12:08
ilmari[m] joined
|
|||
| dogbert11 is back, downloads the latest MoarVM ... | 12:09 | ||
|
12:11
brrt left,
brrt joined
|
|||
| lizmat | just as a data point, but HARNESS_TYPE is still segfaulting occasionally | 12:28 | |
| HARNESS_TYPE=6 :-) | |||
| brrt | get thee a core dump | 12:29 | |
| nwc10 | I can't replicate this with valgrind | ||
| I can with ASAN | |||
| brrt | or that | ||
|
12:30
lizmat joined
|
|||
| dogbert11 | lizmat: it would be nice if we could get rid of that error :-) | 12:33 | |
| lizmat | indeed :-) | 12:34 | |
| dogbert11 | question is if jnthn has enough info to be able to fix it, we'll see after lunch | 12:35 | |
| brrt | (or anyone else for that matter :-)) | 12:36 | |
| dogbert11 | indeed :) | ||
| I believe that nwc10 has ASAN output for the harness6 SEGV. Here's some gdb stuff gist.github.com/dogbert17/71bd542c...0bf44dae61 | 12:38 | ||
| brrt | hmm, can you get that again, with --optimize=0 | 12:39 | |
| (MoarVM compiled with that) | |||
| dogbert11 | unfortunately harness6 is doubly broken, if TEST_JOBS=1 it runs out of file handles and if TEST_JOBS=1+ it often SEGV's | ||
| brrt: will fix ... | |||
| timotimo | that's a very bogus pointer :) | 12:40 | |
| (well, it's most probably an offset from a null pointer) | |||
| brrt | thx dogbert11 | ||
| dogbert11 | with my luck it probably won't crash | 12:41 | |
| lizmat | ===( 0;216 50/51 0/? 95/97 834/842 134/134 0/? 112/112 1moar(49314,0x70000ce37000) malloc: *** error for object 0x7fdbbd4d6dc0: pointer being freed was not allocated | 12:42 | |
| *** set a breakpoint in malloc_error_break to debug | |||
| dogbert11 | brrt, lizmat: have it running now, gdb is attached | 12:43 | |
| brrt | bt full :-) | 12:44 | |
| (i wonder if this would be a good use for hack) | |||
| dogbert11 | I'm running it with TEST_JOBS=4, should be enough, we'll see | 12:45 | |
|
12:47
pyrimidine joined
|
|||
| dogbert11 | brrt: gist.github.com/dogbert17/9dc96027...cd9eee8662 | 12:52 | |
|
12:53
pyrimidine joined
|
|||
| brrt | i see | 12:55 | |
| thats pretty suspicious | 12:56 | ||
| dogbert11: still have an open session? | |||
| dogbert11 | yes | ||
| timotimo | there's no "if(cur_bytes)" in front of that? o_O | ||
| brrt | cur_bytes is not NULL | ||
| timotimo | i mean, how else would you reach 0x38? | ||
| brrt | well, not using initialized memory | 12:57 | |
| dogbert11: can you do: up 1, print *decoder | |||
| dogbert11 | (gdb) print *decoder | 12:58 | |
| $1 = {common = {header = {sc_forward_u = {forwarder = 0x4dc06ef8, sc = {sc_idx = 28408, idx = 19904}, sci = 0x4dc06ef8, st = 0x4dc06ef8}, owner = 3, flags = 16, size = 24}, st = 0xa6a1a68}, body = { | |||
| ds = 0x445f45a8, sep_spec = 0x4dd5a408}} | |||
| brrt | doesn't look obviously corrupt... | 13:00 | |
| timotimo | well, what does the ds look like? | 13:02 | |
| dogbert11 | how do I get to that? | 13:05 | |
| is this correct or nonsense? | 13:07 | ||
| (gdb) p *decoder.body.ds | |||
| $7 = {bytes_head = 0x38, bytes_tail = 0x0, chars_head = 0x0, chars_tail = 0x0, abs_byte_pos = 0, bytes_head_pos = 0, chars_head_pos = 0, encoding = 0, norm = {form = MVM_NORMALIZE_NFD, buffer = 0x0, | |||
| buffer_size = 0, buffer_start = 0, buffer_end = 0, buffer_norm_end = 0, first_significant = 0, quick_check_property = 0, translate_newlines = 0}, decoder_state = 0x50} | |||
| brrt | it appears kind of uninitialized to me | 13:08 | |
|
13:08
Geth joined
|
|||
| dogbert11 | question is whether the 'real' problem is visible in the gdb output | 13:15 | |
| brrt | it's pretty informative though :-) | 13:17 | |
| dogbert11 | added the result of a 'p MVM_dump_backtrace(tc)' to the end of the gist | 13:20 | |
| Geth | MoarVM: 5f9d6985a9 | (Jonathan Worthington)++ | 3 files Provide a way to put Decoder in nl-translate mode. So that we can use it in Proc::Async and have \r\n -> \n happen. |
13:47 | |
| nwc10 | jnthn++ # back from lunch with patches | 13:48 | |
| nwc10 wonders what ilmari[m] will return with | |||
| jnthn | Lunch featured mushrooms :) | ||
| nwc10 | *I* like mushrooms. Some people don' | 13:49 | |
| 't | |||
| ilmari | nwc10: a stomach full of burrito | ||
| nwc10 | I forget the list of people I can "steal" unwanted mushrooms from | ||
| ilmari++ | |||
| jnthn | My wife is in the set of people I can "steal" mushrooms from :) | 13:50 | |
| Meaning that I rarely cook them at home | |||
| nwc10 | aha right now your comment makes more sense | ||
| jnthn | But today was lunch out | 13:51 | |
| And there was a mushroom-including dish to be had :) | |||
| dogbert11 | do you pick mushrooms in the forests when it's season? | 14:05 | |
| jnthn | No; I'd have no idea which ones might kill me | ||
|
14:06
Sufrostico[m] joined
|
|||
| dogbert11 | there's one in particular, .i.e. Amanita Virosa | 14:06 | |
| lizmat | jnthn: also, if you pick mushrooms in that area, be sure to also take your geiger counter, as mushrooms are known concentrators of fallout | 14:08 | |
| www.theverge.com/2017/2/24/14733094...-chernobyl | 14:09 | ||
| jnthn | Going to the supermarket/restaurant feels like so much less hassle :P | 14:11 | |
| dogbert11 | jnthn, do you have any theories wrt lizmat's harness6 SEGV or is more debugging necessary? If so any suggestions on what I should do. | 14:24 | |
| jnthn | dogbert11: I need to take a look over the various bits of ASAN/GDB output and see if I can get a reproduction | 14:26 | |
| Still working on the Proc::Async newline stuff at the moment though | |||
| dogbert11 | cool, 'make [spectest|stresstest] HARNESS_TYPE=6 TEST_JOBS=4+' probably does the trick when you get to it | 14:28 | |
| jnthn | Just doing a Windows build to make sure the changes help there :) | 14:29 | |
| dogbert11 | hopefully all your changes will work on the first try (they never do for me though :-) | 14:38 | |
| jnthn | Yeah, the test file now passes on Windows | 14:41 | |
|
14:51
hoelzro joined
15:08
Sufrostico[m] left
|
|||
| jnthn | Looking at gist.github.com/dogbert17/eaba7dfc...b1748fbe23 I'm pondering putting a sanity check into the Decoder REPR impl to make sure we're never using it from multiple threads at the same time | 15:14 | |
| To either rule that out, or perhaps to discover that is somehow bizzarely what's happening | 15:15 | ||
| dogbert11 | I can try it (and probably lizmat as well) | 15:16 | |
| jnthn is running harness6 at the moment | |||
| lizmat | dogbert11: actually am in the middle of something else atm, so please go ahead :-) | ||
| jnthn | So far up to S10 and going strong | ||
| dogbert11 | it usually takes a while and as always it doesn't fail everytime | 15:17 | |
| jnthn | have, I got a SEGV | ||
| dogbert11 | .oO | ||
| lizmat | hmmm... I got a fail in t/spec/S17-supply/supplier-preserving.t (test 1) | 15:18 | |
| in HARNESS_TYPE=5 | |||
| dogbert11 | uh oh | ||
|
15:18
brrt joined
|
|||
| jnthn | Aww it said core dumped but I can't actually find the core file | 15:18 | |
| I thought they ended up in the cwd | 15:19 | ||
| dogbert11 | did you start from the rakudo dir? | ||
| jnthn | yeah | 15:20 | |
| timotimo | on modern systems with journald it might have put the core file into the journal for you | ||
| dogbert11 | or did you forget 'ulimit -c unlimited' | ||
| jnthn | This is just a stock ubuntu 16.04 | ||
| dogbert11: oh...d'oh | |||
| I guess that doesn't stick between sessions :) | 15:21 | ||
| lizmat: Urgh, ran that 50 times without issue earlier today... | 15:22 | ||
| I assume you have the latest version of the test file? | |||
| lizmat | think so | ||
| jnthn | (I corrected one issue in it this morning) | ||
| Though that affected test 11 | 15:23 | ||
| lizmat | yeah, most recent | ||
| flapper :-( no pbs this time | 15:24 | ||
| [Coke] | u: - | 15:29 | |
| dogbert11 | t/04-nativecall/06-struct.t .............. Failed 5/28 subtests # hmm | 15:39 | |
| interesting, the test file passes if run with ./perl6 but not if they're run with make. looks like a fudgeup :) | 15:47 | ||
| Geth | MoarVM: 296ece0d28 | (Jonathan Worthington)++ | 2 files Ensure Decoder REPR never sees concurrent use. |
16:00 | |
| jnthn | Never tripped so far locally | 16:01 | |
| Though a good sanity check to have in there | |||
| All the SEGVs I see are from gen2 gc of a p6bigint | 16:02 | ||
| dogbert11 | so it's still a mystery? | 16:05 | |
| timotimo | the bigints again?! | 16:06 | |
| or still? | |||
| dogbert11 | are they a BIG problem :) | ||
| timotimo | ugh :) | ||
|
16:12
brrt joined
|
|||
| jnthn | Interestingly, the latest SEGV was in add_guards_and_facts | 16:16 | |
| dogbert11 | does that give any clues? | 16:18 | |
| timotimo | oh, yikes. that's spesh, isn't it? | 16:22 | |
| jnthn | Yeah | ||
| Too early to say; so far it's apparently that the arg we are considering contains a Scalar container, which has a value that points to a mostly-but-not-quite zeroed bit of memory | 16:23 | ||
|
16:25
brrt joined
16:30
agentzh joined
|
|||
| jnthn | Also, sadly, I only have a core dump for that one rather than having caught it under the debugger, which makes analysis a bit harder | 16:49 | |
|
16:54
pyrimidine joined
16:58
timotimo joined
17:11
domidumont joined
17:12
zakharyas joined
17:15
pyrimidine joined
17:18
domidumont joined
17:21
pyrimidine joined
|
|||
| dogbert17 | jnthn, are there any checks/asserts you can smuggle in somewhere, then we can test while you rest :-) | 17:21 | |
|
17:26
agentzh joined
|
|||
| jnthn | Typical, after the latest one I've just put in it didn't crash | 17:33 | |
| dogbert17 | :( | ||
| dogbert17 is running as stresstest as well | 17:34 | ||
| jnthn | Pushed another option for debugging; can turn it on by changing MVM_ARRAY_CONC_DEBUG 0 to a 1 | 17:41 | |
| Geth: y u no report commit? | |||
|
17:41
ilmari[m] joined
|
|||
| jnthn | No luck so far, anyway | 17:42 | |
| dogbert17 | cool | ||
| it's an elusive bug | 17:43 | ||
| jnthn | Indeed | ||
| Best bet seems to be gradually ruling out things it isn't | |||
| dogbert17 | yes indeed | 17:44 | |
| jnthn | It's interesting that in every crash I've had except the spesh one, it's been in gc_cleanup | ||
| dogbert17 checks his gists | 17:45 | ||
| jnthn | I've never seen the decode stream one | ||
| But the thing I put in today to prevent concurrent operations on it will help us rule out the obvious failure mode there | 17:46 | ||
| dogbert17 | did you release the first check, i.e. did it come with the latest MoarVM bump? | 17:47 | |
| jnthn | No, the latest two are at MoarVM HEAD | 17:48 | |
| And the final one is just a debug feature | |||
| dogbert17 | good to know | ||
| jnthn | Seems to crash less with the latter debug check turned on | 17:50 | |
| dogbert17 | all my gists have MVM_string_decodestrem_destroy in them. don't remember if nwc10's ASAN report was the same | ||
| jnthn | But when it does, (a) same failure mode, (b) it didn't trip the check | ||
| dogbert17 | uh oh | ||
| jnthn | Yeah, I don't have that in any of mine. Bizzare | ||
| dogbert17 | who knows, this one might turn out to be blogworthy | 17:52 | |
|
17:52
agentzh joined
|
|||
| jnthn | :) | 17:53 | |
| Think I'll break off for now | |||
| Though will keep it running to see if I get any different failure modes | |||
| dogbert17 | I will try your checks in the meantime | 17:55 | |
|
18:17
zakharyas joined
|
|||
| [Coke] | engineering.instagram.com/dismissi....wcdeivlvd - "Dismissing Python Garbage Collection at Instagram" | 18:26 | |
| timotimo | it feels like that was already mentioned here | 18:28 | |
| in the context of "look, they do the 'let the OS clean house' thing, too!" | |||
| [Coke] | ah, sorry | 18:29 | |
| timotimo | the "reading is a writing operation" point is quite interesting, too | 18:31 | |
| it's good that we don't do refcounts | |||
| Geth | MoarVM: 773711e114 | (Jonathan Worthington)++ | 2 files Debug option to detect concurrent VMArray use. We should make this memory-safe in the future; in the meantime, this can be turned on to see if it's to blame for problems. Doesn't yet catch all cases (for example, read while write isn't yet caught). |
18:35 | |
|
19:29
Ven joined
19:44
yoleaux2 joined
19:47
pyrimidine joined
19:52
agentzh joined
|
|||
| nwc10 | jnthn++ | 19:54 | |
| jnthn: paste.scsys.co.uk/555094 | 19:55 | ||
| your new thing catches it | |||
|
19:55
pyrimidine joined
|
|||
| nwc10 | "Deocder" | 19:55 | |
| I CAN HAZ VARIANT SPEELNG | |||
| dogbert17 | nwc10++ nice catch | 20:03 | |
| nwc10 | I think it's far more doubleplussjnthn for having a hunch about what the crazy problem might be | 20:04 | |
| I just built it and ran it in a loop | |||
| dogbert17 | deocders are way above my paygrade :) | ||
| jnthn | Interesting | 20:09 | |
| Though the question remains why I got the SEGV *and* didn't trip warning... | |||
| Uh, error | |||
| nwc10 | yours or mine? | ||
| jnthn | I locally got a SEGV | ||
| nwc10 | ahOK | ||
| I have | |||
| jnthn | After putting that decoder check in | ||
| nwc10 | #define MVM_ARRAY_CONC_DEBUG 1 | ||
| and | |||
| define FSA_SIZE_DEBUG 1 | |||
| jnthn | So it's really interesting that it hit it | 20:10 | |
| nwc10 | and this is ASAN | ||
| jnthn | But on the other hand...might not be our SEGV | ||
| nwc10 | and somehow I could never make problems with valgrind | ||
| jnthn | It seems very timing-sensitive | ||
| dogbert17 | have run several times with 'MVM_ARRAY_CONC_DEBUG 1', no crash | 20:11 | |
| jnthn | Like, that MVM_ARRAY_CONC_DEBUG makes it a lot less likely to break | ||
| And that's much less of a slow-down than valgrind | 20:12 | ||
| Valgrind serializes everything onto one thread | |||
| All of this points fairly strongly towards a data race | |||
| nwc10 | ohhhhhh. I didn't know that | ||
| jnthn | The question is what in. | ||
| Might be worth me pouring over helgrind output some | 20:13 | ||
| lizmat | jnthn: I still suspect something to do with grammars, as that is what HARNESS_TYPE=6 is doing a lot, parsing TAP | ||
| jnthn | Though I might need to work on suppressions and cleanup of various things | ||
| lizmat | I've been tempted to speed up prove6 by not using grammars, but then the segfaults may go | 20:14 | |
| jnthn | Does prove6 need speeding up? :) | 20:15 | |
| But yes, let's make it work first before worrying about that :P | |||
| jnthn got the SEGV again | 20:16 | ||
| Yet again, it's in mp_clear, when doing MVM_gc_collect_free_gen2_unmarked | |||
| It's especially odd that I've not yet seen the DecodeStream failure mode in a load of runs | 20:17 | ||
| We can likely put together something small that stresses the relevant code paths on that one, though | |||
|
20:27
zakharyas joined
|
|||
| dogbert17 | got a SEGV | 20:45 | |
| 0 0x40024cb0 in ?? () | |||
| #1 0x404f74ba in malloc_printerr (action=<optimized out>, str=0x405e9df4 "double free or corruption (out)", ptr=0x4a542578) at malloc.c:4996 | |||
| #2 0x404f812d in _int_free (av=0x4062f420 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3840 | |||
| had turned off MVM_ARRAY_CONC_DEBUG though | 20:46 | ||
|
20:55
geekosaur joined
|
|||
| nwc10 | got ASAN barfage, so the bug that the debug stuff traps is not the only one | 21:25 | |
| paste.scsys.co.uk/555095 | 21:26 | ||
|
21:30
agentzh joined
|
|||
| dogbert17 | and I got this bizarre output all of sudden, gist.github.com/dogbert17/33da77db...e83b1ae4f3 | 21:31 | |
| what a rats nest | 21:34 | ||
| two or more bugs interacting, brrr | 21:35 | ||
| ok, got another SEGV, (well SIGABRT) this time with MVM_ARRAY_CONC_DEBUG = 1 | 21:43 | ||
| jnthn | nwc10: Yeah, that's the one I get nearly every time | 21:46 | |
| dogbert17 | could there be somthing wrong with the grammar in TAP.pm6 | 21:48 | |
| jnthn | What will be interseting is if we get any more ASAN or GDB output that points to decode streams, or if all those are now turned into the exception throws | 21:50 | |
| I'm quite confident we can golf that one down | |||
| dogbert17: It could be, but...grammars aren't a particularly notable sources of Int | 21:51 | ||
| *source | |||
| And the SEGVs (aside from the decode stream ones, which it seems we mighta turned into an exception now) all point to GC of an Int | 21:52 | ||
| dogbert17 | jnthn: did you see gist.github.com/dogbert17/33da77db...e83b1ae4f3 ? | ||
| jnthn | oh, output-ruler | 21:53 | |
| That's where we were in spesh at the point I got a core dump earlier | 21:54 | ||
| That's...an interesting coincidence | |||
| Off to rest o/ | 22:01 | ||
| dogbert17 | night | 22:03 | |
| jnthn | 'night | ||
|
22:45
pyrimidine joined
23:08
pyrimidine joined
|
|||