01:28
FROGGS joined
01:36
lue joined
02:11
colomon joined
|
|||
nwc10 | mmmm, had to "fix" coffee machine to get second cup | 09:43 | |
emptied waste, took grinder mechanism out, cleaned off escaped coffe, but it back together, everything happy first time) | 09:44 | ||
no idea why this worked | |||
have coffee. | |||
for now. | |||
jnthn | phew :) | 09:52 | |
nwc10 | and time to plan a hiding place: www.bbc.co.uk/news/entertainment-arts-27358560 | ||
jnthn | mine seems to clog itself increasingly often, needing a cleaning to stop it depositing coffee everywhere... | ||
nwc10 | the idae is coffee + water + heat => happy human + compost | 09:53 | |
jnthn: problem with git was git - it was me failing to realise that the git 1.5.6 didn't have https support | |||
the error gets lost somewhere | |||
and we have some fun on sparc - the atomic ops seem to generate v9 assembler instructions | 09:54 | ||
gcc default is v9 | |||
er | |||
v7 | |||
so, sadly anyone with a CPU more than 19 years is out of luck | |||
FROGGS | :/ | ||
I have an Amiga 2000 here with a 7.9MHz CPU I was hoping to use | 09:55 | ||
nwc10 | Can you fix the perl 5 port? The problemm was that they were relying on vfork... | ||
FROGGS | nwc10: are you serious? :o) | 09:56 | |
nwc10 | not quite. I wouldn't expect anyone to do it with hardware that old | ||
but the serious bit is, given that Amiga is never going to die | |||
would be nice if they figured out how to fix it, and punted it back upstream | 09:57 | ||
it's an ASCII based platform, no cross compiling and not massively strange | |||
FROGGS | yeah, I don't even know if my system still boots | ||
nwc10 | so it's not a massive problem | ||
I wouln't suggest bothering. Rakudo is more interesting | |||
FROGGS | it was a very nice thing back then | ||
nwc10 | jnthn: bad news - spesh_trace + lexopts + nom goes boom *boringly* with ASAN | 09:58 | |
Stage start : 0.000 | |||
Stage parse : Internal error: invalid thread ID in GC work pass | |||
jnthn: I can't get the sparc to link properly even with -mcpu=v9 and -DAO_REQUIRE_CAS | |||
./libmoar.so: undefined reference to `AO_fetch_compare_and_swap_full' | 09:59 | ||
that define is supposed to force emulation of everything else | |||
and it's not providing one, even in 3rdparty/libatomic_ops/src/libatomic_ops.a | |||
jnthn | Urgh | 10:00 | |
nwc10 | I will try to hammer it further after a bath | ||
I'm tempted to try to do it two ways | |||
jnthn | nwc10: On the invalid thread ID thing - the way to find that most efficiently is with the mprotected fromspace thingy | ||
nwc10 | 1) there seems to be a complete emulation layer | ||
(CPU independant) | |||
2) fix sparc better | |||
jnthn | But I'll use the info from retupmoca to see if I can hunt it down from that. | ||
nwc10 | as we're likely to need to offer the first to all Debians superfun platforms | 10:01 | |
jnthn: the mprotected fromspace stuff is a bit stale | |||
and there was at least 1 fix I didn't send (which might just be relevant) | |||
something turns on allocate to gen 2 to stop stuff moving | |||
it calls somethign else which also does this | 10:02 | ||
the inner thing turns off gen 2 | |||
10:02
vendethiel joined
|
|||
nwc10 | the rest of the stuff is not gen2 | 10:02 | |
jnthn | ah | ||
nwc10 | the outer thing turns of gen 2 | ||
I forget where, but an assert should find it | |||
I didn't have an idea for a great fix | |||
possibly just a nesting count, which aborts if it overflows | |||
jnthn | Well, turn it into a counter rather than a boolean I guess | ||
nwc10 | Perl 5 has a big complex thing which can unwind stuff on scope exit | ||
but once you have the big complex thing, you can use if for stuff like this | 10:03 | ||
jnthn: the fromspace stuff is also hideously slow once you get to trying to do it for every allocation | 10:04 | ||
and it doesn't have enough memory to be perfect | |||
(as you found some that it missed) | |||
but it would be several days to get to the point of building the setting | |||
jnthn | nwc10: We can't get to that point normally and then throw it at the setting only? | 10:05 | |
nwc10 | I was thinking that | ||
however, I think it would take quite a while to clean the local patches up | |||
jnthn | OK | 10:06 | |
I'll try and find it using...other means. | |||
As a first resort. | |||
timotimo | the nintendo 3ds has 128mb ram, so it could just barely run an empty perl6-m (it's an arm9) | 10:20 | |
watching the directory where moarvm is outputting its spesh log using perl6-m is devastating. it gets so many change events that it takes ages to catch up with the actually current state | 10:25 | ||
10:29
brrt joined
|
|||
brrt | hey, quick question | 10:29 | |
is the MIT license 'compatible' with the Artistic License? | |||
oh, i see, it is | 10:31 | ||
mit license is rather liberal | |||
jnthn | I think so, yes | 10:32 | |
tadzik | I like it a lot. I tend to think of it as ādo whatever you want, just don't pretend that you wrote itā | 10:33 | |
jnthn | um, wtf | 10:34 | |
container_type_info doesn't even show up in my core.setting spesh log... | |||
jnthn downloads the one from retupmoca | 10:35 | ||
oh, duh | 10:37 | ||
was looking at wrong output | |||
timotimo | know that feeling >_> | ||
brrt is off for lunching | 10:38 | ||
10:38
brrt left
|
|||
jnthn | invoke_o r56(13), r50(29) | 10:43 | |
[Annotation: INS Deopt One (idx 68 -> pc 7880)] | |||
sp_guardcontconc r56(13), liti16(4), liti16(5) | |||
That's the usage of the spesh slot 5 | |||
timotimo: Did you also get the SEGV? | 10:51 | ||
timotimo | i sometimes did | 10:53 | |
jnthn | OK. I found a couple of potential issues | ||
One of which could lead to reading junk memory | 10:54 | ||
timotimo | also, when an exception is thrown in a thread and caught by the default handler, it seems like the vm sometimes exits very uncleanly | ||
may be known. | 10:55 | ||
jnthn | Default handler in the VM? Hmm | 10:56 | |
timotimo: gist.github.com/jnthn/19e017bca18c42a10c04 | 10:58 | ||
Gonna push that one after local testing, though, 'cus even if it doesn't fix the SEGV it's needed. | 11:00 | ||
timotimo | would you be okay with me introducing a sp_nonnull opcode so that i can optimize isstr/ishash/isnum/islist/... into an sp_nonnull instead of creating multiple bytecodes with extra registers? | 11:02 | |
is there a policy? | |||
we only have 256 slots, right? | |||
no, wait, we write a 16bit integer for the opcode | 11:03 | ||
so we do have enough :) | |||
jnthn | What would sp_nonnull do? | ||
timotimo | the opposite of isnull | 11:04 | |
nwc10 | jnthn: left it running on x86_64 - crash is non-deterministic | ||
jnthn | You can't emit a not_i after it? | ||
timotimo | no, because then i have to allocate extra registers :) | 11:05 | |
nwc10 | this time it got through the setting, and gave a similar error in thenext thing | ||
timotimo | remember we don't have a design for that yet? :) | ||
dalek | arVM/spesh_trace: bed1693 | jnthn++ | src/spesh/facts.c: Be more careful over concreteness in facts. Of note, we could try to look "inside" a container type's type object - meaning we looked past the end of it, leading to reading random memory. |
11:06 | |
jnthn | timotimo: But...isnull puts its result in an int register? | ||
timotimo | aye | ||
jnthn | And you know you're the writer of that register | 11:07 | |
So if you immediately follow it with a negation... | |||
timotimo | i can just do it? what about versions of registers? | ||
jnthn | Ah... | ||
timotimo | don't i have to be careful about that? | ||
jnthn | You'll get away with it...at the moment. | ||
I'm doing that trick with the invoke optimization at present | |||
But yeah, it's...hmm | |||
Well, we could have an isnonnull to go with isnull... | 11:08 | ||
I don't see why it has to be a spesh op | |||
timotimo | say ... do you think we'll ever get a perl6-m that only uses ~50 megabytes or less fo rma when starting up empty? without partial loading of the setting? | ||
jnthn | We have eq and ne for all the other things... | ||
11:08
brrt joined
|
|||
jnthn | I'd like to hope so, but need to work out where the memory is going. | 11:08 | |
timotimo | i wanted to make it a spesh because we don't have to emit it during normal bytecode building | 11:09 | |
yes, we do :\ | |||
jnthn | I'm saying that it could be useful for emitting in normal bytecode building though | ||
timotimo | aye, it could | ||
i don't feel like adding new bytecodes in two different places on two branches, so i'll wait for the spesh_log branch (more concretely: my spin-off-branch of it) to be merged | 11:10 | ||
jnthn | nwc10: bed1693 may fix the problems | ||
timotimo: If you could also try bed1693 that'd be great | |||
timotimo | will do | ||
reading junk memory could definitely explain the sporadicness | 11:13 | ||
seems fine so far | 11:14 | ||
nwc10 | jnthn: was already testing. now at Stage optimize | 11:15 | |
timotimo | seems like i can't get it to break any more. well done! | 11:16 | |
jnthn: i'm still convinced interning strings more agressively would give us a ram usage benefit | |||
nwc10 | does the JVM have existing good tools that would help track down general memory usage? | 11:17 | |
timotimo | the regular heap analyzer will tell us "you've got one billion instances of SixModelObject using up all the rams!" | 11:18 | |
i think hoelzro prototyped something that'd inspect the SMO for what class it represents | |||
nwc10 | that's boringly unhelpful | ||
summon hoelzro | |||
timotimo | i cast "summon bigger fish" | ||
jnthn | nwc10, timotimo: yay that things seem better | 11:19 | |
jnthn wonders if that means "yes, we can merge spesh_trace" | 11:20 | ||
Certainly there's more bits I want to do but I don't think they need a branch... | |||
nwc10 | I'm in spectest | 11:21 | |
jnthn | k | ||
nwc10 | are you in a position to review my dancing bears? | ||
at this point? or post noms | |||
or post noms, walk, and other important things | |||
jnthn | Yeah, that's next on my hit list after getting spesh_trace merged | 11:22 | |
Just wanted to find/fix this SEGV first. | |||
nwc10 | totally understandable | ||
jnthn | And it looks like we have :) | ||
nwc10 | my fanclub is patiently waiting at my feet | ||
she doesnt' care that it's raining | |||
spctest failed the usual S17 | |||
timotimo | yeah, can't get it to asplode during compilation any more | 11:24 | |
so i suppose i'm +1 on merging :) | |||
jnthn | yays | ||
timotimo | will that also include my small const i64 branch? :3 | 11:26 | |
jnthn | Well, guess that wants re-testing on top of latest spesh to make sure it really was the bug I just fixed that was the source of the crashes seen... | 11:27 | |
nwc10 | jnthn: there is something wierd going on on sparc. Pre-processed source suggests that it defines an inline function | 11:28 | |
timotimo | i can get on that testing :) | ||
jnthn | o.O | ||
dalek | Heuristic branch merge: pushed 40 commits to MoarVM by jnthn | 11:29 | |
nwc10 | so, OK, why the missing symbol reference? What is trying to take its address, or whatever | ||
timotimo | wow, whoops: register operand index 34128 out of range 0..10 | 11:30 | |
jnthn | you don't say... | 11:31 | |
timotimo | i think my little merge broke stage0 loading :) | ||
jnthn | ah | ||
yeah | |||
be careful you put your new ops at the end | |||
timotimo | hm, but i added my opcodes *after* execname | ||
ah | |||
execname forgot the goto NEXT :) | |||
wrong | 11:32 | ||
i did. | |||
the merge made it wibbly wobbly :) | |||
that wasn't it :| | |||
jnthn | OK, merged all my various branches | 11:34 | |
nwc10 | drink! | ||
(coffee machine still working) | 11:35 | ||
(fret not - there is a backup coffee machine here too) | |||
timotimo | oh, lexopts is now in master? :D | ||
jnthn | yup :) | 11:38 | |
your CPU can read unaligned values for all of int32 int64 num64 | |||
Well, it gets that right :) | |||
timotimo | i don't understand why the bytecode verifier is complaining about my code :( | 11:40 | |
jnthn | nwc10: Is "nmake src\spesh\graph.i" meant to be dumping to stdout? | 11:41 | |
nwc10 | no | ||
this might mean that the Perl 5 Win32 makefiles *are* buggy :-) | 11:42 | ||
the *nix ones use '> ' as the output flag to capture it to a file | |||
seems that Win32 needs the same trick | |||
I'm surprised that no-one wanted those targets before | |||
jnthn | nmake src\spesh\graph.asm | ||
Is quite a fail too it seems :( | 11:43 | ||
nwc10 | I'm already using them to try to figure out sparc | ||
oh bother | |||
jnthn | It puts an object file into a file with .asm syntax | ||
nwc10 | *something* ought to work on win32 | ||
jnthn | And then tries to link it...with nothing else. | ||
nwc10 | so wrong output flag? | ||
I read something on the Internet and assumed that it was true :-( | |||
jnthn | oh... | 11:44 | |
Yeah | |||
/Fo<file> name object file | |||
*always* means object file | |||
nwc10 | OK | ||
seems to mean "output" file on gcc | 11:45 | ||
as it made it possible to generate src/core/interp.s not interp.s at the top level | |||
jnthn | /Fa[file] name assembly listing file | 11:46 | |
and | |||
/Fi[file] name preprocessed file | |||
look useful | |||
jnthn tries | |||
nwc10 | /Fi looks more useful than redirect stdout | ||
also, does /E work instead of -E ? | |||
jnthn | Unfortunately just adopting those alone doesn't help... | 11:47 | |
nwc10 | bother. You don't *need* those two commits for the third one to work. But they are very useful to verify that nothing changed in the generated code in the move from macros to inline functions | 11:48 | |
jnthn | Ah | 11:50 | |
brrt | brrt-to-the-future.blogspot.nl/2014...eriod.html blawg post about dynasm :-) | ||
jnthn | For preprocessed and output to a file we want /P, not -E | 11:51 | |
So, I fixed that one :) | |||
FROGGS | ahh, something to read, nice :o) | ||
brrt: "An example is included below"... should there be a code block in your blog? | 11:54 | ||
brrt | yes.... didn't work in blogger | 11:55 | |
:-( | |||
nwc10 | jnthn: at some point "my" machine is supposed to get a friend, running something Win32, with a build environment | 11:58 | |
planned to include VC6 :-) | |||
jnthn | *sigh* | 12:09 | |
So... YOu need "/c /FAs" to keep it from invoking the linker | |||
That gets rid of the error erruption | |||
BUT | |||
We also pass /GL in the c falgs | |||
Which means "link time code generation" | |||
Which has the side-effect of making it decided to do absolutely nothing about generating an assembly file. | 12:10 | ||
nwc10 | OK :-( | ||
does /GL belong in the C flags? Should it be part of the codegen flags? (Sorry if I'm not quite using the correct terms) | 12:11 | ||
P | |||
Perl 5 messes things up. it conflates cpp flags and cc flags | |||
but conceptually there ireally should be clean distinction between pre-processor flags (.c to .i) | |||
then .i to .s | |||
then .s to .o | |||
jnthn | Well, this is optimization flags I guess | ||
nwc10 | and then whatever happens next to .o | ||
bother. optimisation flags do belong at the .i to .s stage | 12:12 | ||
jnthn | Yeah | 12:13 | |
I'm not sure how to fix it | |||
Anyway, I now have a state where it works if you know to remove /GL in the makefile... | 12:14 | ||
jnthn can think of epic hacks, but... :) | |||
jnthn goes rebase -i'ing to get his improvements into the patches, anwyays... | 12:19 | ||
12:21
brrt left
|
|||
dalek | arVM: 47536df | nicholas++ | build/ (2 files): Add Makefile rules to generate pre-processed source. Someone awkwardly, @objflags@ can contain -DMVM_BUILD_SHARED, which affects the pre-processed source, so for correctness that needs to be in the CPP command ce2a286 | nicholas++ | src/ (4 files): Ensure that MVMCompUnit can correctly free data_start. This gets slightly complicated because the memory may have been allocated by malloc or by mmap, depending on the origin of the bytecode. 433d41d | (Timo Paulssen)++ | / (6 files): ops to for 32/16 bit 64 int literals and isnonnull |
12:29 | |
12:29
dalek joined
|
|||
timotimo | someone? :) | 12:38 | |
nwc10 | oops, typo that neither of us spotted | ||
jnthn | haha | 12:39 | |
Totally missed that | |||
FROGGS | is there missing word? | ||
jnthn | I think s/Someone/Somewhat/ :) | ||
timotimo | i think it ought to be "somewhat" | ||
jnthn | OK, onto dancing bear #2... | 12:40 | |
timotimo | jnthn: do you have an idea for how to model things like "if we're in BB n, this fact about this register is true"? like when we branch into a block because something is null, for example? | ||
jnthn | timotimo: There's not really a good way to do that in the current model | 12:41 | |
timotimo | OK | ||
jnthn: i've redone a patch to introduce the small int literal ops, it still builds, now i'm going to try to figure out why the F adding the code to the compiler itself breaks | 12:45 | ||
may i push the change that introduces the ops already? and add isnonnull, too? | |||
jnthn | It doesn't break the build of anything? | 12:46 | |
timotimo | doesn't seem so. | ||
it's not used anywhere :) | |||
meh. no rakudo to run update-ops now :) | 12:47 | ||
nwc10: i'll want to mention your portability work in the weekly tomorrow; you've done powerpc and arm7 and arm6? something else? | 12:48 | ||
and these are limited to linux, aye? | |||
nwc10 | sparc thing seems to be structural confusion in libatomicops - if AO_HAVE_compare_and_swap exists, it refuses to do the emulation code for AO_HAVE_fetch_compare_and_swap | 12:49 | |
timotimo: not done arm7. would be useful for someone to test on ARMv7, and on ARMv5 | |||
ARMv7 should either build, or be a 3 line fix | |||
timotimo | right; i only have a raspberry pi myself, and a nintendo DS with a homebrew cartridge | ||
that'll hardly have enough ram to run moar :) | |||
nwc10 | DS isn't interesting, as it seems to also be ARMv6 | 12:50 | |
oh wow, not, it's ARMv4 | 12:51 | ||
Looks like it has less grunt than a 1998 RiscPC | 12:52 | ||
and that's impressive | |||
timotimo | it has two processors | ||
one for sound "only" | |||
nwc10 | wikipedia says one is ARM9, which now I've checked further is ARMv4 architecture | ||
I'd forgotten this | |||
timotimo | CPUOne 67.028 MHz ARM946E-S[2] and one 33.514 MHz ARM7TDMI | ||
nwc10 | other, ARM7, is v3 | 12:53 | |
timotimo | ah! | ||
i didn't realize there's a difference between with and without v | |||
nwc10 | oh, quite a bit :-) | ||
which I think is why ARM11 is the last CPU with a number | |||
after that it's only architectures that keep numbers | |||
timotimo | 4mb of ram means *nope*, though | ||
nwc10 | CPUs get brand names | ||
timotimo: I'm failing to get sparc linux to work out | 12:54 | ||
timotimo | the inline thingie you mentioned above? | ||
nwc10 | it's not inline, I was getting confused | ||
timotimo | ah, ok | ||
but something is giving you trouble. noted :) | |||
nwc10 | between AO_fetch_compare_and_swap and AO_compare_and_swap | 12:55 | |
or something like that | |||
anyway, | |||
timotimo | so maybe it's trouble in libatomic_ops? | ||
nwc10 | done ARMv6, would welcome testers with ARMv7 and ARMv5 | ||
timotimo | or what it's called. | ||
nwc10 | it is trouble in it | ||
12:49 < nwc10> sparc thing seems to be structural confusion in libatomicops - if AO_HAVE_compare_and_swap exists, it refuses to do the emulation code for AO_HAVE_fetch_compare_and_swap | |||
OK, yes, to fully answer your question *properly* | 12:56 | ||
ARMv6 linux, PPC linux (but fails a nativecall test), failing to get sparc linux, have no other access | |||
in particular, it's likely that soeone has access to a MIPs linux setup with native toolchain | |||
er, better expressed | |||
ARM linux (only ARMv6 tested so far), PPC linux (only ppc64 tested so far, and fails a nativecall test), failing to get sparc linux, have no other access | 12:57 | ||
but, we are working on big endian now | |||
timotimo | jnthn: my commit to introduce isnonnull and the const_i64_* opcodes are not breaking rakudo build so far | ||
so i'm free to push? | |||
nwc10 | and on things will more alignment constraint than x86 | ||
jnthn | timotimo: Yes, think that should be OK | ||
timotimo | thanks | ||
13:00
dalek joined
|
|||
jnthn | nwc10++ # porting | 13:02 | |
timotimo | ah, with the changes applied afresh, it doesn't break the build any more to have these ops emitted by the mast compiler | 13:03 | |
weird, but i'll accept it. | |||
jnthn finally goes to read the brrt++ blog post :) | 13:04 | ||
dalek | arVM: eecf142 | (Timo Paulssen)++ | src/mast/compiler.c: use smaller const_i64_* ops sometimes. |
13:05 | |
arVM: c3c427f | (Timo Paulssen)++ | src/spesh/facts.c: introduce the const_i64_* ops to spesh. also, switch/case is a bit nicer to look at. |
13:07 | ||
jnthn | + /* Now we'll do a terrible thing */ | 13:09 | |
really? :P | |||
timotimo | oh! | 13:10 | |
well, the code does look terrible, doesn't it? | |||
jnthn | It's not so bad :) | ||
That switch/case is certainly an improvement. | 13:11 | ||
I think it may want to do an ISTYPE on the arg though, before blindly assuming it's a MAST::IVal | 13:13 | ||
(It always should be, of course...) | |||
timotimo | i'll have an opt for isstr/... soon, too | ||
if the result would be 0 due to type mismatch, it turns into a const + known value, otherwise it turns out into an isnonnull | 13:14 | ||
dalek | arVM: a6205ef | (Timo Paulssen)++ | src/spesh/optimize.c: try to optimize islist/isnum/... (not helpful yet) the problem is that islist and friends also include a null check on their argument. At best, we could - if we only know the type - turn these ops into a negated "isnull" op, but that doesn't exist. So we have to wait for spesh's ability to allocate new registers and add new operations on those. |
13:18 | |
arVM: 59053a4 | (Timo Paulssen)++ | src/spesh/optimize.c: Merge remote-tracking branch 'origin/optimize_isreprid' Conflicts: src/spesh/optimize.c f34772b | (Timo Paulssen)++ | tools/spesh_diff.p6: teach spesh_diff.p6 about the new output of spesh dump |
|||
timotimo | ^ no build breakage detected | ||
afk | |||
FROGGS | nwc10++ # patches | 13:24 | |
jnthn | bbiab | 13:34 | |
arVM: 36366c0 | (Timo Paulssen)++ | tools/spesh_diff.p6: teach spesh_diff about Facts, put most common pattern first. |
14:03 | ||
14:05
zakharyas joined
|
|||
timotimo | given the massive amount of sp_log instructions emitted, it's hard to find actual things that have been optimized | 14:11 | |
jnthn | timotimo: Well, they don't appear in the final output. | 14:14 | |
timotimo | oh | 14:15 | |
i didn't know that | |||
jnthn | There's 3 bits of output now | 14:16 | |
before, after inserting logging, and later on in the file the completed specialization. | |||
timotimo | oh! | 14:17 | |
so that's what was weird about the output in between | |||
so should i ignore stuff starting with "inserting logging for"? | |||
ah, yes, i see "finished specialization of" | 14:18 | ||
and i've been skipping that | |||
jnthn | Yes | ||
Now we | |||
* Hit the invocation count limit | 14:19 | ||
* Insert logging instructions and produce code | |||
* Wait for N runs (4 at present) | |||
* Use the logged info as part of generating specialized bytecode | |||
* Install that for future runs | |||
timotimo | aaah | 14:20 | |
trying a patch to spesh_diff now | |||
something's wrong. | 14:22 | ||
the tool now seems to be skipping everything | 14:23 | ||
oooh snap | 14:27 | ||
it's completely changed now | |||
there's no longer a Before: and After: | |||
jnthn | Well, the before/after correspond to before, then after we added the logging | ||
timotimo | instead, there is Before:, After: and "only something" in the "finished" thing | ||
aye, i need to hold on to the cuids and merge the stuff properly :\ | |||
jnthn | Yeah, it's a bit more work to correlate | 14:28 | |
timotimo | yes, and now having multiple spesh outputs for the same cuid is no longer easy to tell apart :\ | ||
well, except if they are spaced apart completely | |||
JimmyZ | Error while compiling op ifnull (source text: "nqp::ifnull($!nominal, 0)"): P6opaque: no such attribute '$!result_reg' | 14:29 | |
I got an error when building nqp .... | |||
timotimo | did you get back to master on moarvm and nqp? | ||
JimmyZ | yes | ||
jnthn | timotimo: uh... | 14:30 | |
Bytecode validation error at offset 0, instruction 0: | |||
invalid extension opcode 40728 - should be less than 1024 | |||
FROGGS | JimmyZ: did you made clean? | ||
jnthn | JimmyZ: That error means you don't have latest NQP | ||
timotimo | jnthn: did i cause that? huh? | ||
jnthn | Or didn't make clean or something | ||
nwc10 | invalid extension opcode 16864 - should be less than 1024 | ||
timotimo | but locally i can build it just fine :\ | ||
JimmyZ | argh | ||
timotimo | crappity crap, i guess | ||
JimmyZ | I missed updated bootstrap | 14:31 | |
jnthn | timotimo: uh | 14:34 | |
timotimo: unsigned char override_second_argument; | |||
That will have a junk value | |||
timotimo | oh! why did i forget that | 14:35 | |
feel free to commit that for me | |||
jnthn | for any op | ||
other than iconst_64 | |||
timotimo | aye | ||
i wonder how i got away with that | 14:36 | ||
only locally, of course | 14:37 | ||
dalek | arVM: bac3210 | jnthn++ | src/mast/compiler.c: Avoid using an uninitialized variable. |
14:38 | |
jnthn | That helps it here | 14:41 | |
timotimo | phew | 14:42 | |
what is blocking us from getting a filename and line number for warnings such as "use of uninitialized value of type *" on moarvm? | 14:51 | ||
nwc10 | jnthn: that gets me past it too. ASAN doesn't spot those | 14:52 | |
14:53
brrt joined
|
|||
jnthn | timotimo: Probably nothing in MoarVM itself... | 14:57 | |
timotimo: Suspect something needs tweaking in Rakudo. | |||
nwc10: too bad.. | |||
FROGGS | jnthn: can I split up a bigger Q:PIR {} into two smaller ones to inject another pir block conditionally? | 14:58 | |
eww, wrong channel to ask that basically :o) | 14:59 | ||
nwc10 | jnthn: would need to run with valgrind to spot those, which is much slower | ||
jnthn | FROGGS: Probably | 15:02 | |
timotimo | i may have something workable for spesh_diff now | 15:05 | |
jnthn: is there a way to remove guards if we're not using the facts we've obtained through them? | 15:07 | ||
i.e. a guard on something that's dead perhaps? | 15:08 | ||
hmm. sounds tricky | |||
dalek | arVM: 6966670 | (Timo Paulssen)++ | tools/spesh_diff.p6: the layout of spesh logs has changed dramatically. |
15:09 | |
timotimo | + set r0(2), r0(1) - for real? :) | ||
this is even no-oppyer than other sets that have resulted from decont :D | |||
gist.github.com/timo/2e1f2f559931bc70f7ea - how come there's two spesh ops writing to the same register + version there? | 15:11 | ||
r2(4) in both cases | |||
jnthn | That's the invocant opt I mentioned before now | 15:21 | |
timotimo | oh, ok | ||
do we have something that counts how often guards fail? | |||
jnthn | Not yet | ||
Don't have "did we even use this guard" tracking yet either. | 15:22 | ||
timotimo | right. | ||
jnthn | Spesh has many wish-list items | 15:25 | |
JimmyZ | My friend asks where the MoarVM donate URL is ... | 15:35 | |
timotimo | jnthn: some simple things i could pick off of your wish list? | 15:39 | |
jnthn | timotimo: can and can_s still could use a look | 15:47 | |
timotimo: Also unbox_i/unbox_n/unbox_s On P6int/P6num/P6str may be an easy thing (part of repr spesh) | |||
timotimo | which repr opt should i use as inspiration for the unbox stuff? | 15:49 | |
jnthn | There's not a good place to look for get-y stuff yet I think | ||
I mean, look for spesh in src/6model/reprs/ | |||
But I think bindattr opt in P6opaque may be closest so far | 15:50 | ||
timotimo | would i generate a sp_fastunbox_i/s/n that takes an offset from the object base pointer to the contained integer/num/string object? or something like that? | 15:52 | |
jnthn | No | ||
There's much more general spesh ops for it | |||
Already in the oplist | |||
sp_get_i .s w(int64) r(obj) int16 :pure | 15:53 | ||
sp_get_n .s w(num64) r(obj) int16 :pure | |||
sp_get_s .s w(str) r(obj) int16 :pure | |||
timotimo | oh, i see | ||
jnthn | The int16 is a byte offset from the start of the object. | ||
timotimo | fair enough; i can just direct that at the correct slot index thingie? | ||
ah, byte offset | |||
gotcha | |||
jnthn | Not sure if they're implemented in interp.c yet, but should be easy | ||
And yeah, it's byte offset. | |||
timotimo | they are not yet implemented | 15:54 | |
jnthn | They are designed so that when brrt++ JITs them, it's a gonna be really cheap :D | ||
timotimo | i'll try my best :) | ||
and guards should become much cheaper when JIT is in; now they do a trip through the interpreter loop, for example | |||
jnthn | Right | ||
I think they'll just want to look like | 15:55 | ||
GET_REG(cur_op, o).i64 = *((MVMint64 *)((char *)GET_REG(cur_op, 2) + GET_UI16(cur_op, 4))); | |||
bbiab | 15:56 | ||
timotimo | ah, thanks | ||
i thought i'd have to do something with REAL_DATA | |||
jnthn | no, that's just for p6opaque | 15:57 | |
& | 15:58 | ||
timotimo | i think the 2 and 4 there want to be 4 and 6 | 16:00 | |
hm. | |||
nope | |||
registers are only 2 bytes | |||
jnthn | right. | 16:38 | |
though GET_REG(cur_op, o).i64 shoulda been GET_REG(cur_op, 0).i64 | 16:39 | ||
timotimo | yes | 16:46 | |
i fixed that | |||
i get pointer mismatch problems | |||
after trying it, i ate with friends and drove to the hackspace | |||
jnthn | lizmat: To give you an idea of the S17 bottleneck for me, I'm at 160s or so when the first S17 test starts, 315s or so when the last one ends. | 16:47 | |
lizmat | :-( | ||
jnthn | uh, meant that in #perl6 | ||
Anyway, now you can understand a bit why I'm curious about trying to make tests shorter or parallelize better or something. | 16:48 | ||
timotimo | a missing & was to blame for my troubles | 17:07 | |
dalek | arVM: a2afdb9 | (Timo Paulssen)++ | src/core/interp.c: implement sp_get_i/n/s |
17:08 | |
17:09
btyler joined
|
|||
jnthn | (char *)&GET_REG(cur_op, 2) | 17:10 | |
I think more robust would be | |||
(char *)(GET_REG(cur_op, 2).o) | 17:11 | ||
timotimo | ah, sounds good | ||
jnthn | The other way might run us into trouble on 32-bit BE machines... | ||
hm, maybe not | |||
But still...something makes me think it'll be fragile | 17:12 | ||
timotimo | mhh | ||
jnthn | Otherwise look fine | ||
timotimo | how should i best write down the offset into the code? for p6int, p6num, p6str it's basically the size of the header, right? | ||
so sizeof(MVMObject)? | |||
jnthn | offsetof | 17:13 | |
timotimo | wow, that exists? | ||
jnthn | C built-in. | 17:14 | |
timotimo | nice! | ||
gist.github.com/timo/52be9e43e985845f7201 - is this the correct approach? | 17:24 | ||
i don't see any sp_get_i being run | 17:25 | ||
during nqp build | |||
should probably also have to limit that to only 64bit integers | 17:27 | ||
signed | |||
jnthn | yes | 17:28 | |
Bit surprised we don't hit it... | |||
timotimo | all i get for it is getreprname | 17:29 | |
tadzik | you are 8 bit surprised? | ||
(sorry) | |||
timotimo | which i *could* optimize into a const_s :) | ||
nwc10 | Program received signal SIGBUS, Bus error. | 17:35 | |
(gdb) p &tgt_facts->value | |||
$1 = (union {...} *) 0x14056f2 | |||
why are they all misaligned? :-( | |||
timotimo | jnthn: if i am to specialize reprname, where should i look for the literal index for the right string? | ||
i suppose i'll build a few speshes for p6opaque | 17:40 | ||
tadzik | today's beer: March Smokes | ||
or something | |||
Rauchmarzen, actually | |||
nwc10 thinks that the problem is that MVM_spesh_alloc() is a very naughty piece of code. | 17:41 | ||
timotimo | isn't it just a bump-the-pointer thingie? | 17:46 | |
nwc10 | it would seem to be. and I infer that some of the things that it is asked to allocate are 2 * $n bytes long | ||
so the pointers returned after that are not suitably aligned for some of the other things that it is asked to allocate | 17:47 | ||
Sadly the PPC build also explodes. Not sure why | 17:48 | ||
jnthn | nwc10: spesh alloc exists so we can trivially throw away all the nodes in one go at the end, easing memory management of the thng. | ||
nwc10 | 2017 MVMObject *obj = REPR(type)->allocate(tc, STABLE(type)); | ||
type is NULL | |||
jnthn | timotimo: reprname really isn't hot enough to be worth it | ||
nwc10 | jnthn: I think that it needs to take an "alignment" parameter, and on platforms where alighment matters, bump the pointer until alignment is reached | ||
timotimo | jnthn: it seems like the p6opaque spesh function checks that the type_reg has a sknown type and the type is non-null | 17:49 | |
jnthn | nwc10: If it were to always make sure it was on an 8-byte boundary, would that do it? | ||
timotimo | but optimize_repr_op also checks that and only then will it even call spesh on the repr | ||
nwc10 | jnthn: yes, that would do it, but use more RAM | ||
that is the KISS solution | |||
I think you can also do that alignment conditionally on MVM_CAN_UNALIGNED_NUM64 and MVM_CAN_UNALIGNED_INT64 | 17:50 | ||
(they are defined in the inverse sense to be easy here) | |||
whilst MVM_CAN_UNALIGNED_INT32 exists, we aren't actually using it yet. Which is sort of misinformation | |||
timotimo | wtf. why does reprname even get the spesh function called?! | 17:52 | |
jnthn | uh, yeah...that is odd | 17:53 | |
timotimo | haha | 17:54 | |
silly me | |||
i typo'd in the search box for the ops | |||
it actually *is* get_i | |||
it gets called quite often for box_i, but not unbox_i it seems | 17:55 | ||
unfortunately, box_i isn't as easy to speshify | 18:00 | ||
maybe an op like fastcreate + an offset to copy the *thing* to? | 18:01 | ||
jnthn | timotimo: That's my idea for box_i, yeah | 18:12 | |
timotimo: It turns one op into two, but they're two that can JIT nicely | 18:13 | ||
timotimo | an idea why we're hitting box_i often but never unbox_i? | ||
optimize.c does hit unbox_i a few times, mabye the type is never known? | 18:15 | ||
jnthn | Well, partly because we always know the type in box_i | ||
timotimo | yes, we do | ||
i'll stash the box_i stuff away for now and maybe look for something else to do | 18:21 | ||
like can_s | |||
oh, objprimspec seems speshable, no? | 18:23 | ||
hm, but probably not hot | 18:24 | ||
oh dang, now i'm getting All positional args must appear first | 18:59 | ||
my can is probably wrong then :) | |||
nwc10 | jnthn: is the memory allocated by MVM_spesh_alloc transient? ie does it get thrown away (en masse) fairly soon after it's allocated? | 19:04 | |
jnthn | nwc10: Typically yes | 19:05 | |
nwc10: Code has to get hot enough to spesh | |||
nwc10: Once it does we allocate the graph and insert logging instructions | |||
dalek | arVM: facf41a | (Timo Paulssen)++ | src/6model/6model. (2 files): can_method_cache_only function for spesh purposes |
||
arVM: 8bce6b2 | (Timo Paulssen)++ | src/spesh/facts. (2 files): harvest strings in facts discovery process |
|||
jnthn | nwc10: Then a few runs later, we examine the recordings, make optimized bytecode, and throw the graph away (freeing the spesh_alloc'd memory) | 19:06 | |
timotimo | gist.github.com/timo/716656be19b892dda8ab - jnthn, can you tell what's wrong with this? | ||
jnthn | you did it wrong! | 19:11 | |
jnthn looks :) | |||
ins->operands[0].lit_i64 = can_result; | 19:13 | ||
1? | |||
timotimo | but that's supposed to be the result? | ||
jnthn | 0 is the result reg | ||
1 is the constant to put in there | |||
timotimo | oh! | ||
durrrr :) | |||
i should be setting that on the facts instead | |||
lizmat | .oO( an off by one error ) |
||
timotimo | thanks | ||
jnthn | np | ||
timotimo | yes, that does work much better :) | 19:15 | |
i think we can probably emit can instead of can_s in a bunch of cases | |||
we probably have many const_s + can_s | |||
which could be can instead | |||
now doing a spesh log with the setting >:) | 19:21 | ||
jnthn | 1.5 million lines coming your way! :P | 19:25 | |
timotimo | :) | ||
4149871 ../rakudo/test.txt | |||
i should probably not run that through a perl6 script to pick it apart %) | 19:26 | ||
r-m needs to get faster >_> | |||
nwc10 | this is a "faster" bootstrapping problem? :-) | ||
timotimo | heh. | ||
nah, looking at that is optional | |||
the script got a whole bunch faster when i put the when clause for the most common type of line first instead of last | 19:27 | ||
jnthn puts aside vacation plotting for a bit to see if he can do the rest of this SC quadratic elimination work... | 19:35 | ||
nwc10 | vacation plotting is some sort of Computer Science thing? Or where you surprise your friends by randomly visiting? | 19:36 | |
and killing the quadratic would be awesome | |||
jnthn | Neither. Plotting to take myself to some faraway place with ice or lava or other awesome :) | 19:37 | |
timotimo | my time estimate is 10 minutes for those lines | 19:39 | |
hm. the eta seems to be decreasing too slowly | 19:41 | ||
Failed to open pipe: 12 | 20:00 | ||
nnnnooooooooo :( | |||
and the estimate was off by about 2x | 20:01 | ||
well, at least i talready wrote out the files :) | 20:03 | ||
oh, it didn't | 20:04 | ||
what a surprise >_< | |||
this script ought to learn to be more robust. | |||
want to have spurtasync for moarvm please :) | 20:06 | ||
(no rush) | 20:14 | ||
nwc10 | jnthn: paste.scsys.co.uk/371471 -- running with this on the Pi and so far so good (NQP passes all tests) | 20:21 | |
jnthn | nice :) | 20:22 | |
nwc10 | jnthn: paste.scsys.co.uk/371476 -- 32074aa0566e breaks ppc64 -- First pass at turning some logs into fact+guard. | 20:24 | |
big backtrace, starting This representation (VMArray) does not support attribute storage | 20:25 | ||
dalek | arVM: 1a224d3 | (Timo Paulssen)++ | src/spesh/optimize.c: spesh can and can_s ops into const_i64 |
20:26 | |
arVM: 516207f | (Timo Paulssen)++ | tools/spesh_diff.p6: estimate run time of spesh_diff |
|||
arVM: 8efcab4 | (Timo Paulssen)++ | tools/spesh_diff.p6: write out results ASAP. |
|||
nwc10 | jnthn: I'm having trouble working out why | 20:27 | |
jnthn | nwc10: Also, that commit was done in a branch that broke stuff along the way | ||
timotimo | jnthn: are our parsing bytecode stuffs benefitting at all from spesh so far? | ||
nwc10 | jnthn: OK. | ||
that does make it hard to bisect | |||
jnthn | nwc10: That is, not all commits worked well here either. | ||
Yeah...such is bisect and branches. | 20:28 | ||
nwc10 | OK, so at bed169375f087dfb11939110fa18921536b1a2a7 -- Be more careful over concreteness in facts. | 20:31 | |
which I think was master at one point | |||
PPC64 is bust | |||
P6opaque: no such attribute '$!ast' | |||
spesh writes bytecode, which is then passed to the validator? | 20:32 | ||
jnthn | No | ||
spesh bytecode is assumed valid | |||
But it doesn't endian-transform as it writes. | |||
So the bytecode itself should be native endian. | |||
nwc10 | OK, something else weird then | ||
jnthn | Darn | 20:33 | |
nwc10 | you could say that | ||
timotimo | Failed to open pipe: 12 | ||
in sub QX at src/gen/m-CORE.setting:779 | |||
wow, that is helpful. | |||
jnthn | Also darn here. Got opt in, but it generates a busted setting :/ | 20:34 | |
timotimo | - can_s r8(1), r6(1), r7(1) | 20:42 | |
- unless_i r8(1), BB(3) | |||
- Successors: 3, 2 | |||
+ const_i64 r8(1), liti64(1) | |||
+ Successors: 2 | |||
yays :) | |||
it doesn't seem like i've decremented the usages of the constant when i turn the if into an unconditional jump | 20:43 | ||
jnthn | Was gonna say, why ain't it just gone... | ||
Also, did it not remove the thing setting r7(1) to a const string? | 20:44 | ||
OK, it seems it's a minor part of my opt that is bad. | 20:45 | ||
The overall thing seems to help. | |||
Gets rid of much of the quadratic. | 20:46 | ||
Stage mast goes from ~20s to ~13s here | |||
lizmat | wow! | ||
jnthn | Clean build of NQP came out OK | 20:48 | |
Rakudo in progress | |||
Then a spectest | |||
In theory I can get us a bit more too | |||
But that'll take more debuggering the bits of my patch that cause some kind of explosion... | |||
Anyway, 10s off NQP build and 8s or so (parse/optimize are a little faster too) this weekend, it seems :) | 20:49 | ||
dalek | arVM: bc1677d | jnthn++ | src/ (9 files): Change the way we store SCs in object headers. We used to store a pointer directly to the SC. Now we store a pair of 32-bit integers: an index into a new array where we can look up the SC, and the index the object lives at in the SC, if known. This second index is not being set up everywhere consistently yet; this patch does what's needed to switch over to the new header layout and be able to build NQP/Rakudo. |
20:57 | |
arVM: 375f5d7 | jnthn++ | src/6model/sc.c: Use SC index from object header when available. Falls back to the linear search for cases that don't (yet) have the index stashed. However, it is in the common cases, which makes the CORE.setting compilation complete in 90% of the time it used to. |
|||
timotimo | aaaw only 10% improvement? :( | 21:00 | |
does that improvement only happen in stage mast? | |||
or is it spread all over? | |||
jnthn | 10% improvement to the entire setting compilation | ||
Stage mast is just one part of it | |||
There it's working in 65% of the time it used to | 21:01 | ||
timotimo | that's definitely very good! | 21:02 | |
well, the biggest win is that the quadratic factor is now ... linear? | |||
jnthn | Right | ||
timotimo | so as we implement more stuff, the win will become bigger | ||
jnthn | Now I need to pick through my patches that didn't make the cut and work out which of the small tweaks to avoid linear search in a few more places bust stuff | 21:03 | |
timotimo | :) | ||
here's a const_s that's not disappearing properly :( | 21:04 | ||
haha, i'm silly | 21:08 | ||
dalek | arVM: 1ec728b | (Timo Paulssen)++ | src/spesh/optimize.c: no need to keep the flag alive when removing iffy |
21:13 | |
arVM: 5689569 | (Timo Paulssen)++ | src/spesh/optimize.c: check opcode before losing it in optimize_can_op |
|||
arVM: 4785424 | jnthn++ | src/6model/serialization.c: Mark objects/stables with index in deserialize. |
21:14 | ||
timotimo | scrolling through the spesh stuff i see a few funny constructions with sets cascading and stuff | 21:18 | |
but that's probably not a very big win to work with that | |||
jnthn | Yeah, in the C profile, serialize is way down now | 21:22 | |
timotimo | awesome :) | 21:23 | |
removing even a single BB early on in a frame that has many blocks causes the diff to become huge and noisy :( | 21:27 | ||
21:30
raiph joined
|
|||
jnthn suspects that, with some care, we can optimize many "for" loops in NQP to avoid invocations. | 21:30 | ||
Which'd probably also cut down on a lot of the takeclosure operations | 21:31 | ||
timotimo | oooh | ||
jnthn | GC in CORE.setting compilation is 20%, which is a bit higher than the typical profile I see. | ||
timotimo | at what level will that optimization reside? | ||
jnthn | NQP's optimizer | 21:32 | |
We already do most of the analysis needed for it, I *think*. | |||
timotimo | that would probably be pretty awesome | ||
jnthn | It's a bit messy to implement, but probably not too bad. | ||
Also will make the iterator object open to escape analysis, when we eventually have it... | 21:33 | ||
timotimo | it'll probably put quite a bit of distance between rakudo and nqp in the benchmarks again ;) | ||
jnthn | Aye, though Rakudo got a bit better this weekend also... :) | 21:34 | |
15%-20% off the array store benchmark, for example | |||
timotimo | concrete and null are orthogonal, aye? | 21:35 | |
were you able to figure out why the spectests suffered so heavily? just a case of not doing the same thing often enough for the spesh to pay off? | |||
i think i'll start a benchmark run | 21:39 | ||
jnthn: anything in particular i could try to look at for making our parsing stuff faster through spesh? | 21:41 | ||
jnthn | Well, getattribute in p6opaque, but it's really subtle and fiddly because of autoviv stuff | 21:47 | |
timotimo | mhh | 21:48 | |
jnthn | Yes, a benchmark run would be good. | ||
timotimo | already in the timing phase :) | ||
ah, i probably ought to bench nqp, too | 21:53 | ||
jnthn | :) | 21:56 | |
Gotta teach tomorrow, so probably shouldn't stay up too late :) | |||
'night | |||
timotimo | aaw | ||
so long and thanks for all the spesh :) | 21:57 | ||
lizmat | gnight jnthn | 21:59 | |
timotimo | doing the benchmarks for nqp-m now | 22:10 | |
oh. actually, i'm still using --optimize=1 for the benchmark'd moars | |||
Internal error: inconsistent bind result | 22:32 | ||
without spesh it doesn't occur | |||
this is in the forest fire benchmark | |||
also the forest fire results don't seem right | 22:33 | ||
it's wrong in perl6-p, too, though | |||
hoelzro | alright, I've finally answered the summons | 23:01 | |
unfortunately, I never made a lot of headway on that SMO thing =/ |