01:28 FROGGS joined 01:36 lue joined 02:11 colomon joined
nwc10 mmmm, had to "fix" coffee machine to get second cup 09:43
emptied waste, took grinder mechanism out, cleaned off escaped coffe, but it back together, everything happy first time) 09:44
no idea why this worked
have coffee.
for now.
jnthn phew :) 09:52
nwc10 and time to plan a hiding place: www.bbc.co.uk/news/entertainment-arts-27358560
jnthn mine seems to clog itself increasingly often, needing a cleaning to stop it depositing coffee everywhere...
nwc10 the idae is coffee + water + heat => happy human + compost 09:53
jnthn: problem with git was git - it was me failing to realise that the git 1.5.6 didn't have https support
the error gets lost somewhere
and we have some fun on sparc - the atomic ops seem to generate v9 assembler instructions 09:54
gcc default is v9
er
v7
so, sadly anyone with a CPU more than 19 years is out of luck
FROGGS :/
I have an Amiga 2000 here with a 7.9MHz CPU I was hoping to use 09:55
nwc10 Can you fix the perl 5 port? The problemm was that they were relying on vfork...
FROGGS nwc10: are you serious? :o) 09:56
nwc10 not quite. I wouldn't expect anyone to do it with hardware that old
but the serious bit is, given that Amiga is never going to die
would be nice if they figured out how to fix it, and punted it back upstream 09:57
it's an ASCII based platform, no cross compiling and not massively strange
FROGGS yeah, I don't even know if my system still boots
nwc10 so it's not a massive problem
I wouln't suggest bothering. Rakudo is more interesting
FROGGS it was a very nice thing back then
nwc10 jnthn: bad news - spesh_trace + lexopts + nom goes boom *boringly* with ASAN 09:58
Stage start : 0.000
Stage parse : Internal error: invalid thread ID in GC work pass
jnthn: I can't get the sparc to link properly even with -mcpu=v9 and -DAO_REQUIRE_CAS
./libmoar.so: undefined reference to `AO_fetch_compare_and_swap_full' 09:59
that define is supposed to force emulation of everything else
and it's not providing one, even in 3rdparty/libatomic_ops/src/libatomic_ops.a
jnthn Urgh 10:00
nwc10 I will try to hammer it further after a bath
I'm tempted to try to do it two ways
jnthn nwc10: On the invalid thread ID thing - the way to find that most efficiently is with the mprotected fromspace thingy
nwc10 1) there seems to be a complete emulation layer
(CPU independant)
2) fix sparc better
jnthn But I'll use the info from retupmoca to see if I can hunt it down from that.
nwc10 as we're likely to need to offer the first to all Debians superfun platforms 10:01
jnthn: the mprotected fromspace stuff is a bit stale
and there was at least 1 fix I didn't send (which might just be relevant)
something turns on allocate to gen 2 to stop stuff moving
it calls somethign else which also does this 10:02
the inner thing turns off gen 2
10:02 vendethiel joined
nwc10 the rest of the stuff is not gen2 10:02
jnthn ah
nwc10 the outer thing turns of gen 2
I forget where, but an assert should find it
I didn't have an idea for a great fix
possibly just a nesting count, which aborts if it overflows
jnthn Well, turn it into a counter rather than a boolean I guess
nwc10 Perl 5 has a big complex thing which can unwind stuff on scope exit
but once you have the big complex thing, you can use if for stuff like this 10:03
jnthn: the fromspace stuff is also hideously slow once you get to trying to do it for every allocation 10:04
and it doesn't have enough memory to be perfect
(as you found some that it missed)
but it would be several days to get to the point of building the setting
jnthn nwc10: We can't get to that point normally and then throw it at the setting only? 10:05
nwc10 I was thinking that
however, I think it would take quite a while to clean the local patches up
jnthn OK 10:06
I'll try and find it using...other means.
As a first resort.
timotimo the nintendo 3ds has 128mb ram, so it could just barely run an empty perl6-m (it's an arm9) 10:20
watching the directory where moarvm is outputting its spesh log using perl6-m is devastating. it gets so many change events that it takes ages to catch up with the actually current state 10:25
10:29 brrt joined
brrt hey, quick question 10:29
is the MIT license 'compatible' with the Artistic License?
oh, i see, it is 10:31
mit license is rather liberal
jnthn I think so, yes 10:32
tadzik I like it a lot. I tend to think of it as ā€ždo whatever you want, just don't pretend that you wrote itā€ 10:33
jnthn um, wtf 10:34
container_type_info doesn't even show up in my core.setting spesh log...
jnthn downloads the one from retupmoca 10:35
oh, duh 10:37
was looking at wrong output
timotimo know that feeling >_>
brrt is off for lunching 10:38
10:38 brrt left
jnthn invoke_o r56(13), r50(29) 10:43
[Annotation: INS Deopt One (idx 68 -> pc 7880)]
sp_guardcontconc r56(13), liti16(4), liti16(5)
That's the usage of the spesh slot 5
timotimo: Did you also get the SEGV? 10:51
timotimo i sometimes did 10:53
jnthn OK. I found a couple of potential issues
One of which could lead to reading junk memory 10:54
timotimo also, when an exception is thrown in a thread and caught by the default handler, it seems like the vm sometimes exits very uncleanly
may be known. 10:55
jnthn Default handler in the VM? Hmm 10:56
timotimo: gist.github.com/jnthn/19e017bca18c42a10c04 10:58
Gonna push that one after local testing, though, 'cus even if it doesn't fix the SEGV it's needed. 11:00
timotimo would you be okay with me introducing a sp_nonnull opcode so that i can optimize isstr/ishash/isnum/islist/... into an sp_nonnull instead of creating multiple bytecodes with extra registers? 11:02
is there a policy?
we only have 256 slots, right?
no, wait, we write a 16bit integer for the opcode 11:03
so we do have enough :)
jnthn What would sp_nonnull do?
timotimo the opposite of isnull 11:04
nwc10 jnthn: left it running on x86_64 - crash is non-deterministic
jnthn You can't emit a not_i after it?
timotimo no, because then i have to allocate extra registers :) 11:05
nwc10 this time it got through the setting, and gave a similar error in thenext thing
timotimo remember we don't have a design for that yet? :)
dalek arVM/spesh_trace: bed1693 | jnthn++ | src/spesh/facts.c:
Be more careful over concreteness in facts.

Of note, we could try to look "inside" a container type's type object - meaning we looked past the end of it, leading to reading random memory.
11:06
jnthn timotimo: But...isnull puts its result in an int register?
timotimo aye
jnthn And you know you're the writer of that register 11:07
So if you immediately follow it with a negation...
timotimo i can just do it? what about versions of registers?
jnthn Ah...
timotimo don't i have to be careful about that?
jnthn You'll get away with it...at the moment.
I'm doing that trick with the invoke optimization at present
But yeah, it's...hmm
Well, we could have an isnonnull to go with isnull... 11:08
I don't see why it has to be a spesh op
timotimo say ... do you think we'll ever get a perl6-m that only uses ~50 megabytes or less fo rma when starting up empty? without partial loading of the setting?
jnthn We have eq and ne for all the other things...
11:08 brrt joined
jnthn I'd like to hope so, but need to work out where the memory is going. 11:08
timotimo i wanted to make it a spesh because we don't have to emit it during normal bytecode building 11:09
yes, we do :\
jnthn I'm saying that it could be useful for emitting in normal bytecode building though
timotimo aye, it could
i don't feel like adding new bytecodes in two different places on two branches, so i'll wait for the spesh_log branch (more concretely: my spin-off-branch of it) to be merged 11:10
jnthn nwc10: bed1693 may fix the problems
timotimo: If you could also try bed1693 that'd be great
timotimo will do
reading junk memory could definitely explain the sporadicness 11:13
seems fine so far 11:14
nwc10 jnthn: was already testing. now at Stage optimize 11:15
timotimo seems like i can't get it to break any more. well done! 11:16
jnthn: i'm still convinced interning strings more agressively would give us a ram usage benefit
nwc10 does the JVM have existing good tools that would help track down general memory usage? 11:17
timotimo the regular heap analyzer will tell us "you've got one billion instances of SixModelObject using up all the rams!" 11:18
i think hoelzro prototyped something that'd inspect the SMO for what class it represents
nwc10 that's boringly unhelpful
summon hoelzro
timotimo i cast "summon bigger fish"
jnthn nwc10, timotimo: yay that things seem better 11:19
jnthn wonders if that means "yes, we can merge spesh_trace" 11:20
Certainly there's more bits I want to do but I don't think they need a branch...
nwc10 I'm in spectest 11:21
jnthn k
nwc10 are you in a position to review my dancing bears?
at this point? or post noms
or post noms, walk, and other important things
jnthn Yeah, that's next on my hit list after getting spesh_trace merged 11:22
Just wanted to find/fix this SEGV first.
nwc10 totally understandable
jnthn And it looks like we have :)
nwc10 my fanclub is patiently waiting at my feet
she doesnt' care that it's raining
spctest failed the usual S17
timotimo yeah, can't get it to asplode during compilation any more 11:24
so i suppose i'm +1 on merging :)
jnthn yays
timotimo will that also include my small const i64 branch? :3 11:26
jnthn Well, guess that wants re-testing on top of latest spesh to make sure it really was the bug I just fixed that was the source of the crashes seen... 11:27
nwc10 jnthn: there is something wierd going on on sparc. Pre-processed source suggests that it defines an inline function 11:28
timotimo i can get on that testing :)
jnthn o.O
dalek Heuristic branch merge: pushed 40 commits to MoarVM by jnthn 11:29
nwc10 so, OK, why the missing symbol reference? What is trying to take its address, or whatever
timotimo wow, whoops: register operand index 34128 out of range 0..10 11:30
jnthn you don't say... 11:31
timotimo i think my little merge broke stage0 loading :)
jnthn ah
yeah
be careful you put your new ops at the end
timotimo hm, but i added my opcodes *after* execname
ah
execname forgot the goto NEXT :)
wrong 11:32
i did.
the merge made it wibbly wobbly :)
that wasn't it :|
jnthn OK, merged all my various branches 11:34
nwc10 drink!
(coffee machine still working) 11:35
(fret not - there is a backup coffee machine here too)
timotimo oh, lexopts is now in master? :D
jnthn yup :) 11:38
your CPU can read unaligned values for all of int32 int64 num64
Well, it gets that right :)
timotimo i don't understand why the bytecode verifier is complaining about my code :( 11:40
jnthn nwc10: Is "nmake src\spesh\graph.i" meant to be dumping to stdout? 11:41
nwc10 no
this might mean that the Perl 5 Win32 makefiles *are* buggy :-) 11:42
the *nix ones use '> ' as the output flag to capture it to a file
seems that Win32 needs the same trick
I'm surprised that no-one wanted those targets before
jnthn nmake src\spesh\graph.asm
Is quite a fail too it seems :( 11:43
nwc10 I'm already using them to try to figure out sparc
oh bother
jnthn It puts an object file into a file with .asm syntax
nwc10 *something* ought to work on win32
jnthn And then tries to link it...with nothing else.
nwc10 so wrong output flag?
I read something on the Internet and assumed that it was true :-(
jnthn oh... 11:44
Yeah
/Fo<file> name object file
*always* means object file
nwc10 OK
seems to mean "output" file on gcc 11:45
as it made it possible to generate src/core/interp.s not interp.s at the top level
jnthn /Fa[file] name assembly listing file 11:46
and
/Fi[file] name preprocessed file
look useful
jnthn tries
nwc10 /Fi looks more useful than redirect stdout
also, does /E work instead of -E ?
jnthn Unfortunately just adopting those alone doesn't help... 11:47
nwc10 bother. You don't *need* those two commits for the third one to work. But they are very useful to verify that nothing changed in the generated code in the move from macros to inline functions 11:48
jnthn Ah 11:50
brrt brrt-to-the-future.blogspot.nl/2014...eriod.html blawg post about dynasm :-)
jnthn For preprocessed and output to a file we want /P, not -E 11:51
So, I fixed that one :)
FROGGS ahh, something to read, nice :o)
brrt: "An example is included below"... should there be a code block in your blog? 11:54
brrt yes.... didn't work in blogger 11:55
:-(
nwc10 jnthn: at some point "my" machine is supposed to get a friend, running something Win32, with a build environment 11:58
planned to include VC6 :-)
jnthn *sigh* 12:09
So... YOu need "/c /FAs" to keep it from invoking the linker
That gets rid of the error erruption
BUT
We also pass /GL in the c falgs
Which means "link time code generation"
Which has the side-effect of making it decided to do absolutely nothing about generating an assembly file. 12:10
nwc10 OK :-(
does /GL belong in the C flags? Should it be part of the codegen flags? (Sorry if I'm not quite using the correct terms) 12:11
P
Perl 5 messes things up. it conflates cpp flags and cc flags
but conceptually there ireally should be clean distinction between pre-processor flags (.c to .i)
then .i to .s
then .s to .o
jnthn Well, this is optimization flags I guess
nwc10 and then whatever happens next to .o
bother. optimisation flags do belong at the .i to .s stage 12:12
jnthn Yeah 12:13
I'm not sure how to fix it
Anyway, I now have a state where it works if you know to remove /GL in the makefile... 12:14
jnthn can think of epic hacks, but... :)
jnthn goes rebase -i'ing to get his improvements into the patches, anwyays... 12:19
12:21 brrt left
dalek arVM: 47536df | nicholas++ | build/ (2 files):
Add Makefile rules to generate pre-processed source.

Someone awkwardly, @objflags@ can contain -DMVM_BUILD_SHARED, which affects the pre-processed source, so for correctness that needs to be in the CPP command ce2a286 | nicholas++ | src/ (4 files): Ensure that MVMCompUnit can correctly free data_start.
This gets slightly complicated because the memory may have been allocated by malloc or by mmap, depending on the origin of the bytecode. 433d41d | (Timo Paulssen)++ | / (6 files): ops to for 32/16 bit 64 int literals and isnonnull
12:29
12:29 dalek joined
timotimo someone? :) 12:38
nwc10 oops, typo that neither of us spotted
jnthn haha 12:39
Totally missed that
FROGGS is there missing word?
jnthn I think s/Someone/Somewhat/ :)
timotimo i think it ought to be "somewhat"
jnthn OK, onto dancing bear #2... 12:40
timotimo jnthn: do you have an idea for how to model things like "if we're in BB n, this fact about this register is true"? like when we branch into a block because something is null, for example?
jnthn timotimo: There's not really a good way to do that in the current model 12:41
timotimo OK
jnthn: i've redone a patch to introduce the small int literal ops, it still builds, now i'm going to try to figure out why the F adding the code to the compiler itself breaks 12:45
may i push the change that introduces the ops already? and add isnonnull, too?
jnthn It doesn't break the build of anything? 12:46
timotimo doesn't seem so.
it's not used anywhere :)
meh. no rakudo to run update-ops now :) 12:47
nwc10: i'll want to mention your portability work in the weekly tomorrow; you've done powerpc and arm7 and arm6? something else? 12:48
and these are limited to linux, aye?
nwc10 sparc thing seems to be structural confusion in libatomicops - if AO_HAVE_compare_and_swap exists, it refuses to do the emulation code for AO_HAVE_fetch_compare_and_swap 12:49
timotimo: not done arm7. would be useful for someone to test on ARMv7, and on ARMv5
ARMv7 should either build, or be a 3 line fix
timotimo right; i only have a raspberry pi myself, and a nintendo DS with a homebrew cartridge
that'll hardly have enough ram to run moar :)
nwc10 DS isn't interesting, as it seems to also be ARMv6 12:50
oh wow, not, it's ARMv4 12:51
Looks like it has less grunt than a 1998 RiscPC 12:52
and that's impressive
timotimo it has two processors
one for sound "only"
nwc10 wikipedia says one is ARM9, which now I've checked further is ARMv4 architecture
I'd forgotten this
timotimo CPUOne 67.028 MHz ARM946E-S[2] and one 33.514 MHz ARM7TDMI
nwc10 other, ARM7, is v3 12:53
timotimo ah!
i didn't realize there's a difference between with and without v
nwc10 oh, quite a bit :-)
which I think is why ARM11 is the last CPU with a number
after that it's only architectures that keep numbers
timotimo 4mb of ram means *nope*, though
nwc10 CPUs get brand names
timotimo: I'm failing to get sparc linux to work out 12:54
timotimo the inline thingie you mentioned above?
nwc10 it's not inline, I was getting confused
timotimo ah, ok
but something is giving you trouble. noted :)
nwc10 between AO_fetch_compare_and_swap and AO_compare_and_swap 12:55
or something like that
anyway,
timotimo so maybe it's trouble in libatomic_ops?
nwc10 done ARMv6, would welcome testers with ARMv7 and ARMv5
timotimo or what it's called.
nwc10 it is trouble in it
12:49 < nwc10> sparc thing seems to be structural confusion in libatomicops - if AO_HAVE_compare_and_swap exists, it refuses to do the emulation code for AO_HAVE_fetch_compare_and_swap
OK, yes, to fully answer your question *properly* 12:56
ARMv6 linux, PPC linux (but fails a nativecall test), failing to get sparc linux, have no other access
in particular, it's likely that soeone has access to a MIPs linux setup with native toolchain
er, better expressed
ARM linux (only ARMv6 tested so far), PPC linux (only ppc64 tested so far, and fails a nativecall test), failing to get sparc linux, have no other access 12:57
but, we are working on big endian now
timotimo jnthn: my commit to introduce isnonnull and the const_i64_* opcodes are not breaking rakudo build so far
so i'm free to push?
nwc10 and on things will more alignment constraint than x86
jnthn timotimo: Yes, think that should be OK
timotimo thanks
13:00 dalek joined
jnthn nwc10++ # porting 13:02
timotimo ah, with the changes applied afresh, it doesn't break the build any more to have these ops emitted by the mast compiler 13:03
weird, but i'll accept it.
jnthn finally goes to read the brrt++ blog post :) 13:04
dalek arVM: eecf142 | (Timo Paulssen)++ | src/mast/compiler.c:
use smaller const_i64_* ops sometimes.
13:05
arVM: c3c427f | (Timo Paulssen)++ | src/spesh/facts.c:
introduce the const_i64_* ops to spesh.

also, switch/case is a bit nicer to look at.
13:07
jnthn + /* Now we'll do a terrible thing */ 13:09
really? :P
timotimo oh! 13:10
well, the code does look terrible, doesn't it?
jnthn It's not so bad :)
That switch/case is certainly an improvement. 13:11
I think it may want to do an ISTYPE on the arg though, before blindly assuming it's a MAST::IVal 13:13
(It always should be, of course...)
timotimo i'll have an opt for isstr/... soon, too
if the result would be 0 due to type mismatch, it turns into a const + known value, otherwise it turns out into an isnonnull 13:14
dalek arVM: a6205ef | (Timo Paulssen)++ | src/spesh/optimize.c:
try to optimize islist/isnum/... (not helpful yet)

the problem is that islist and friends also include a null check on their argument. At best, we could - if we only know the type - turn these ops into a negated "isnull" op, but that doesn't exist. So we have to wait for spesh's ability to allocate new registers and add new operations on those.
13:18
arVM: 59053a4 | (Timo Paulssen)++ | src/spesh/optimize.c:
Merge remote-tracking branch 'origin/optimize_isreprid'

Conflicts: src/spesh/optimize.c f34772b | (Timo Paulssen)++ | tools/spesh_diff.p6: teach spesh_diff.p6 about the new output of spesh dump
timotimo ^ no build breakage detected
afk
FROGGS nwc10++ # patches 13:24
jnthn bbiab 13:34
arVM: 36366c0 | (Timo Paulssen)++ | tools/spesh_diff.p6:
teach spesh_diff about Facts, put most common pattern first.
14:03
14:05 zakharyas joined
timotimo given the massive amount of sp_log instructions emitted, it's hard to find actual things that have been optimized 14:11
jnthn timotimo: Well, they don't appear in the final output. 14:14
timotimo oh 14:15
i didn't know that
jnthn There's 3 bits of output now 14:16
before, after inserting logging, and later on in the file the completed specialization.
timotimo oh! 14:17
so that's what was weird about the output in between
so should i ignore stuff starting with "inserting logging for"?
ah, yes, i see "finished specialization of" 14:18
and i've been skipping that
jnthn Yes
Now we
* Hit the invocation count limit 14:19
* Insert logging instructions and produce code
* Wait for N runs (4 at present)
* Use the logged info as part of generating specialized bytecode
* Install that for future runs
timotimo aaah 14:20
trying a patch to spesh_diff now
something's wrong. 14:22
the tool now seems to be skipping everything 14:23
oooh snap 14:27
it's completely changed now
there's no longer a Before: and After:
jnthn Well, the before/after correspond to before, then after we added the logging
timotimo instead, there is Before:, After: and "only something" in the "finished" thing
aye, i need to hold on to the cuids and merge the stuff properly :\
jnthn Yeah, it's a bit more work to correlate 14:28
timotimo yes, and now having multiple spesh outputs for the same cuid is no longer easy to tell apart :\
well, except if they are spaced apart completely
JimmyZ Error while compiling op ifnull (source text: "nqp::ifnull($!nominal, 0)"): P6opaque: no such attribute '$!result_reg' 14:29
I got an error when building nqp ....
timotimo did you get back to master on moarvm and nqp?
JimmyZ yes
jnthn timotimo: uh... 14:30
Bytecode validation error at offset 0, instruction 0:
invalid extension opcode 40728 - should be less than 1024
FROGGS JimmyZ: did you made clean?
jnthn JimmyZ: That error means you don't have latest NQP
timotimo jnthn: did i cause that? huh?
jnthn Or didn't make clean or something
nwc10 invalid extension opcode 16864 - should be less than 1024
timotimo but locally i can build it just fine :\
JimmyZ argh
timotimo crappity crap, i guess
JimmyZ I missed updated bootstrap 14:31
jnthn timotimo: uh 14:34
timotimo: unsigned char override_second_argument;
That will have a junk value
timotimo oh! why did i forget that 14:35
feel free to commit that for me
jnthn for any op
other than iconst_64
timotimo aye
i wonder how i got away with that 14:36
only locally, of course 14:37
dalek arVM: bac3210 | jnthn++ | src/mast/compiler.c:
Avoid using an uninitialized variable.
14:38
jnthn That helps it here 14:41
timotimo phew 14:42
what is blocking us from getting a filename and line number for warnings such as "use of uninitialized value of type *" on moarvm? 14:51
nwc10 jnthn: that gets me past it too. ASAN doesn't spot those 14:52
14:53 brrt joined
jnthn timotimo: Probably nothing in MoarVM itself... 14:57
timotimo: Suspect something needs tweaking in Rakudo.
nwc10: too bad..
FROGGS jnthn: can I split up a bigger Q:PIR {} into two smaller ones to inject another pir block conditionally? 14:58
eww, wrong channel to ask that basically :o) 14:59
nwc10 jnthn: would need to run with valgrind to spot those, which is much slower
jnthn FROGGS: Probably 15:02
timotimo i may have something workable for spesh_diff now 15:05
jnthn: is there a way to remove guards if we're not using the facts we've obtained through them? 15:07
i.e. a guard on something that's dead perhaps? 15:08
hmm. sounds tricky
dalek arVM: 6966670 | (Timo Paulssen)++ | tools/spesh_diff.p6:
the layout of spesh logs has changed dramatically.
15:09
timotimo + set r0(2), r0(1) - for real? :)
this is even no-oppyer than other sets that have resulted from decont :D
gist.github.com/timo/2e1f2f559931bc70f7ea - how come there's two spesh ops writing to the same register + version there? 15:11
r2(4) in both cases
jnthn That's the invocant opt I mentioned before now 15:21
timotimo oh, ok
do we have something that counts how often guards fail?
jnthn Not yet
Don't have "did we even use this guard" tracking yet either. 15:22
timotimo right.
jnthn Spesh has many wish-list items 15:25
JimmyZ My friend asks where the MoarVM donate URL is ... 15:35
timotimo jnthn: some simple things i could pick off of your wish list? 15:39
jnthn timotimo: can and can_s still could use a look 15:47
timotimo: Also unbox_i/unbox_n/unbox_s On P6int/P6num/P6str may be an easy thing (part of repr spesh)
timotimo which repr opt should i use as inspiration for the unbox stuff? 15:49
jnthn There's not a good place to look for get-y stuff yet I think
I mean, look for spesh in src/6model/reprs/
But I think bindattr opt in P6opaque may be closest so far 15:50
timotimo would i generate a sp_fastunbox_i/s/n that takes an offset from the object base pointer to the contained integer/num/string object? or something like that? 15:52
jnthn No
There's much more general spesh ops for it
Already in the oplist
sp_get_i .s w(int64) r(obj) int16 :pure 15:53
sp_get_n .s w(num64) r(obj) int16 :pure
sp_get_s .s w(str) r(obj) int16 :pure
timotimo oh, i see
jnthn The int16 is a byte offset from the start of the object.
timotimo fair enough; i can just direct that at the correct slot index thingie?
ah, byte offset
gotcha
jnthn Not sure if they're implemented in interp.c yet, but should be easy
And yeah, it's byte offset.
timotimo they are not yet implemented 15:54
jnthn They are designed so that when brrt++ JITs them, it's a gonna be really cheap :D
timotimo i'll try my best :)
and guards should become much cheaper when JIT is in; now they do a trip through the interpreter loop, for example
jnthn Right
I think they'll just want to look like 15:55
GET_REG(cur_op, o).i64 = *((MVMint64 *)((char *)GET_REG(cur_op, 2) + GET_UI16(cur_op, 4)));
bbiab 15:56
timotimo ah, thanks
i thought i'd have to do something with REAL_DATA
jnthn no, that's just for p6opaque 15:57
& 15:58
timotimo i think the 2 and 4 there want to be 4 and 6 16:00
hm.
nope
registers are only 2 bytes
jnthn right. 16:38
though GET_REG(cur_op, o).i64 shoulda been GET_REG(cur_op, 0).i64 16:39
timotimo yes 16:46
i fixed that
i get pointer mismatch problems
after trying it, i ate with friends and drove to the hackspace
jnthn lizmat: To give you an idea of the S17 bottleneck for me, I'm at 160s or so when the first S17 test starts, 315s or so when the last one ends. 16:47
lizmat :-(
jnthn uh, meant that in #perl6
Anyway, now you can understand a bit why I'm curious about trying to make tests shorter or parallelize better or something. 16:48
timotimo a missing & was to blame for my troubles 17:07
dalek arVM: a2afdb9 | (Timo Paulssen)++ | src/core/interp.c:
implement sp_get_i/n/s
17:08
17:09 btyler joined
jnthn (char *)&GET_REG(cur_op, 2) 17:10
I think more robust would be
(char *)(GET_REG(cur_op, 2).o) 17:11
timotimo ah, sounds good
jnthn The other way might run us into trouble on 32-bit BE machines...
hm, maybe not
But still...something makes me think it'll be fragile 17:12
timotimo mhh
jnthn Otherwise look fine
timotimo how should i best write down the offset into the code? for p6int, p6num, p6str it's basically the size of the header, right?
so sizeof(MVMObject)?
jnthn offsetof 17:13
timotimo wow, that exists?
jnthn C built-in. 17:14
timotimo nice!
gist.github.com/timo/52be9e43e985845f7201 - is this the correct approach? 17:24
i don't see any sp_get_i being run 17:25
during nqp build
should probably also have to limit that to only 64bit integers 17:27
signed
jnthn yes 17:28
Bit surprised we don't hit it...
timotimo all i get for it is getreprname 17:29
tadzik you are 8 bit surprised?
(sorry)
timotimo which i *could* optimize into a const_s :)
nwc10 Program received signal SIGBUS, Bus error. 17:35
(gdb) p &tgt_facts->value
$1 = (union {...} *) 0x14056f2
why are they all misaligned? :-(
timotimo jnthn: if i am to specialize reprname, where should i look for the literal index for the right string?
i suppose i'll build a few speshes for p6opaque 17:40
tadzik today's beer: March Smokes
or something
Rauchmarzen, actually
nwc10 thinks that the problem is that MVM_spesh_alloc() is a very naughty piece of code. 17:41
timotimo isn't it just a bump-the-pointer thingie? 17:46
nwc10 it would seem to be. and I infer that some of the things that it is asked to allocate are 2 * $n bytes long
so the pointers returned after that are not suitably aligned for some of the other things that it is asked to allocate 17:47
Sadly the PPC build also explodes. Not sure why 17:48
jnthn nwc10: spesh alloc exists so we can trivially throw away all the nodes in one go at the end, easing memory management of the thng.
nwc10 2017 MVMObject *obj = REPR(type)->allocate(tc, STABLE(type));
type is NULL
jnthn timotimo: reprname really isn't hot enough to be worth it
nwc10 jnthn: I think that it needs to take an "alignment" parameter, and on platforms where alighment matters, bump the pointer until alignment is reached
timotimo jnthn: it seems like the p6opaque spesh function checks that the type_reg has a sknown type and the type is non-null 17:49
jnthn nwc10: If it were to always make sure it was on an 8-byte boundary, would that do it?
timotimo but optimize_repr_op also checks that and only then will it even call spesh on the repr
nwc10 jnthn: yes, that would do it, but use more RAM
that is the KISS solution
I think you can also do that alignment conditionally on MVM_CAN_UNALIGNED_NUM64 and MVM_CAN_UNALIGNED_INT64 17:50
(they are defined in the inverse sense to be easy here)
whilst MVM_CAN_UNALIGNED_INT32 exists, we aren't actually using it yet. Which is sort of misinformation
timotimo wtf. why does reprname even get the spesh function called?! 17:52
jnthn uh, yeah...that is odd 17:53
timotimo haha 17:54
silly me
i typo'd in the search box for the ops
it actually *is* get_i
it gets called quite often for box_i, but not unbox_i it seems 17:55
unfortunately, box_i isn't as easy to speshify 18:00
maybe an op like fastcreate + an offset to copy the *thing* to? 18:01
jnthn timotimo: That's my idea for box_i, yeah 18:12
timotimo: It turns one op into two, but they're two that can JIT nicely 18:13
timotimo an idea why we're hitting box_i often but never unbox_i?
optimize.c does hit unbox_i a few times, mabye the type is never known? 18:15
jnthn Well, partly because we always know the type in box_i
timotimo yes, we do
i'll stash the box_i stuff away for now and maybe look for something else to do 18:21
like can_s
oh, objprimspec seems speshable, no? 18:23
hm, but probably not hot 18:24
oh dang, now i'm getting All positional args must appear first 18:59
my can is probably wrong then :)
nwc10 jnthn: is the memory allocated by MVM_spesh_alloc transient? ie does it get thrown away (en masse) fairly soon after it's allocated? 19:04
jnthn nwc10: Typically yes 19:05
nwc10: Code has to get hot enough to spesh
nwc10: Once it does we allocate the graph and insert logging instructions
dalek arVM: facf41a | (Timo Paulssen)++ | src/6model/6model. (2 files):
can_method_cache_only function for spesh purposes
arVM: 8bce6b2 | (Timo Paulssen)++ | src/spesh/facts. (2 files):
harvest strings in facts discovery process
jnthn nwc10: Then a few runs later, we examine the recordings, make optimized bytecode, and throw the graph away (freeing the spesh_alloc'd memory) 19:06
timotimo gist.github.com/timo/716656be19b892dda8ab - jnthn, can you tell what's wrong with this?
jnthn you did it wrong! 19:11
jnthn looks :)
ins->operands[0].lit_i64 = can_result; 19:13
1?
timotimo but that's supposed to be the result?
jnthn 0 is the result reg
1 is the constant to put in there
timotimo oh!
durrrr :)
i should be setting that on the facts instead
lizmat
.oO( an off by one error )
timotimo thanks
jnthn np
timotimo yes, that does work much better :) 19:15
i think we can probably emit can instead of can_s in a bunch of cases
we probably have many const_s + can_s
which could be can instead
now doing a spesh log with the setting >:) 19:21
jnthn 1.5 million lines coming your way! :P 19:25
timotimo :)
4149871 ../rakudo/test.txt
i should probably not run that through a perl6 script to pick it apart %) 19:26
r-m needs to get faster >_>
nwc10 this is a "faster" bootstrapping problem? :-)
timotimo heh.
nah, looking at that is optional
the script got a whole bunch faster when i put the when clause for the most common type of line first instead of last 19:27
jnthn puts aside vacation plotting for a bit to see if he can do the rest of this SC quadratic elimination work... 19:35
nwc10 vacation plotting is some sort of Computer Science thing? Or where you surprise your friends by randomly visiting? 19:36
and killing the quadratic would be awesome
jnthn Neither. Plotting to take myself to some faraway place with ice or lava or other awesome :) 19:37
timotimo my time estimate is 10 minutes for those lines 19:39
hm. the eta seems to be decreasing too slowly 19:41
Failed to open pipe: 12 20:00
nnnnooooooooo :(
and the estimate was off by about 2x 20:01
well, at least i talready wrote out the files :) 20:03
oh, it didn't 20:04
what a surprise >_<
this script ought to learn to be more robust.
want to have spurtasync for moarvm please :) 20:06
(no rush) 20:14
nwc10 jnthn: paste.scsys.co.uk/371471 -- running with this on the Pi and so far so good (NQP passes all tests) 20:21
jnthn nice :) 20:22
nwc10 jnthn: paste.scsys.co.uk/371476 -- 32074aa0566e breaks ppc64 -- First pass at turning some logs into fact+guard. 20:24
big backtrace, starting This representation (VMArray) does not support attribute storage 20:25
dalek arVM: 1a224d3 | (Timo Paulssen)++ | src/spesh/optimize.c:
spesh can and can_s ops into const_i64
20:26
arVM: 516207f | (Timo Paulssen)++ | tools/spesh_diff.p6:
estimate run time of spesh_diff
arVM: 8efcab4 | (Timo Paulssen)++ | tools/spesh_diff.p6:
write out results ASAP.
nwc10 jnthn: I'm having trouble working out why 20:27
jnthn nwc10: Also, that commit was done in a branch that broke stuff along the way
timotimo jnthn: are our parsing bytecode stuffs benefitting at all from spesh so far?
nwc10 jnthn: OK.
that does make it hard to bisect
jnthn nwc10: That is, not all commits worked well here either.
Yeah...such is bisect and branches. 20:28
nwc10 OK, so at bed169375f087dfb11939110fa18921536b1a2a7 -- Be more careful over concreteness in facts. 20:31
which I think was master at one point
PPC64 is bust
P6opaque: no such attribute '$!ast'
spesh writes bytecode, which is then passed to the validator? 20:32
jnthn No
spesh bytecode is assumed valid
But it doesn't endian-transform as it writes.
So the bytecode itself should be native endian.
nwc10 OK, something else weird then
jnthn Darn 20:33
nwc10 you could say that
timotimo Failed to open pipe: 12
in sub QX at src/gen/m-CORE.setting:779
wow, that is helpful.
jnthn Also darn here. Got opt in, but it generates a busted setting :/ 20:34
timotimo - can_s r8(1), r6(1), r7(1) 20:42
- unless_i r8(1), BB(3)
- Successors: 3, 2
+ const_i64 r8(1), liti64(1)
+ Successors: 2
yays :)
it doesn't seem like i've decremented the usages of the constant when i turn the if into an unconditional jump 20:43
jnthn Was gonna say, why ain't it just gone...
Also, did it not remove the thing setting r7(1) to a const string? 20:44
OK, it seems it's a minor part of my opt that is bad. 20:45
The overall thing seems to help.
Gets rid of much of the quadratic. 20:46
Stage mast goes from ~20s to ~13s here
lizmat wow!
jnthn Clean build of NQP came out OK 20:48
Rakudo in progress
Then a spectest
In theory I can get us a bit more too
But that'll take more debuggering the bits of my patch that cause some kind of explosion...
Anyway, 10s off NQP build and 8s or so (parse/optimize are a little faster too) this weekend, it seems :) 20:49
dalek arVM: bc1677d | jnthn++ | src/ (9 files):
Change the way we store SCs in object headers.

We used to store a pointer directly to the SC. Now we store a pair of 32-bit integers: an index into a new array where we can look up the SC, and the index the object lives at in the SC, if known. This second index is not being set up everywhere consistently yet; this patch does what's needed to switch over to the new header layout and be able to build NQP/Rakudo.
20:57
arVM: 375f5d7 | jnthn++ | src/6model/sc.c:
Use SC index from object header when available.

Falls back to the linear search for cases that don't (yet) have the index stashed. However, it is in the common cases, which makes the CORE.setting compilation complete in 90% of the time it used to.
timotimo aaaw only 10% improvement? :( 21:00
does that improvement only happen in stage mast?
or is it spread all over?
jnthn 10% improvement to the entire setting compilation
Stage mast is just one part of it
There it's working in 65% of the time it used to 21:01
timotimo that's definitely very good! 21:02
well, the biggest win is that the quadratic factor is now ... linear?
jnthn Right
timotimo so as we implement more stuff, the win will become bigger
jnthn Now I need to pick through my patches that didn't make the cut and work out which of the small tweaks to avoid linear search in a few more places bust stuff 21:03
timotimo :)
here's a const_s that's not disappearing properly :( 21:04
haha, i'm silly 21:08
dalek arVM: 1ec728b | (Timo Paulssen)++ | src/spesh/optimize.c:
no need to keep the flag alive when removing iffy
21:13
arVM: 5689569 | (Timo Paulssen)++ | src/spesh/optimize.c:
check opcode before losing it in optimize_can_op
arVM: 4785424 | jnthn++ | src/6model/serialization.c:
Mark objects/stables with index in deserialize.
21:14
timotimo scrolling through the spesh stuff i see a few funny constructions with sets cascading and stuff 21:18
but that's probably not a very big win to work with that
jnthn Yeah, in the C profile, serialize is way down now 21:22
timotimo awesome :) 21:23
removing even a single BB early on in a frame that has many blocks causes the diff to become huge and noisy :( 21:27
21:30 raiph joined
jnthn suspects that, with some care, we can optimize many "for" loops in NQP to avoid invocations. 21:30
Which'd probably also cut down on a lot of the takeclosure operations 21:31
timotimo oooh
jnthn GC in CORE.setting compilation is 20%, which is a bit higher than the typical profile I see.
timotimo at what level will that optimization reside?
jnthn NQP's optimizer 21:32
We already do most of the analysis needed for it, I *think*.
timotimo that would probably be pretty awesome
jnthn It's a bit messy to implement, but probably not too bad.
Also will make the iterator object open to escape analysis, when we eventually have it... 21:33
timotimo it'll probably put quite a bit of distance between rakudo and nqp in the benchmarks again ;)
jnthn Aye, though Rakudo got a bit better this weekend also... :) 21:34
15%-20% off the array store benchmark, for example
timotimo concrete and null are orthogonal, aye? 21:35
were you able to figure out why the spectests suffered so heavily? just a case of not doing the same thing often enough for the spesh to pay off?
i think i'll start a benchmark run 21:39
jnthn: anything in particular i could try to look at for making our parsing stuff faster through spesh? 21:41
jnthn Well, getattribute in p6opaque, but it's really subtle and fiddly because of autoviv stuff 21:47
timotimo mhh 21:48
jnthn Yes, a benchmark run would be good.
timotimo already in the timing phase :)
ah, i probably ought to bench nqp, too 21:53
jnthn :) 21:56
Gotta teach tomorrow, so probably shouldn't stay up too late :)
'night
timotimo aaw
so long and thanks for all the spesh :) 21:57
lizmat gnight jnthn 21:59
timotimo doing the benchmarks for nqp-m now 22:10
oh. actually, i'm still using --optimize=1 for the benchmark'd moars
Internal error: inconsistent bind result 22:32
without spesh it doesn't occur
this is in the forest fire benchmark
also the forest fire results don't seem right 22:33
it's wrong in perl6-p, too, though
hoelzro alright, I've finally answered the summons 23:01
unfortunately, I never made a lot of headway on that SMO thing =/