samcv now let me see how many spectests this breaks :) 00:00
jnthn samcv: (most codepoints) sounds fine 00:06
timotimo: No, I'm a bit confused by how that'd happen
samcv ok cool no spectests failed
jnthn timotimo: Will have to look, but it's decidedly bed time for me :) 00:07
timotimo well, it doesn't look like we ever root (or even mark) a ThreadContext 00:13
samcv cool. i'm adding back the power to configure primary/secondary/tertiary level sort :) 00:18
though it's going to have to work slightly differently. though it may give the same result. because all primary levels are higher than all secondary levels and all secondary are higher than all tertiary levels 00:19
it will only apply your settings if comparing between the same level. otherwise it will compare normally
so if you reverse primary level, only comparing primary level vs primary level will your change be effected
and... it works! wow 00:23
wait why does MVM_string_codes return s->body.num_graphs and MVM_string_graphs also returns body.num_graphs? 00:49
timotimo that seems wrong 01:19
but we might not have that function used anywhere?
01:52 ilbot3 joined
samcv hmm 01:52
well theres nqp::codes_s but its' not used in nqp 01:53
what's the nqp op to get the number of codepoints?
or the MVM function?
o-oh it does self.NFC.codes 01:54
in rakudo
could that be slow though?
so MVM_string_codes is totally bogus 01:56
timotimo it could be slow, yes
it allocates, for one 01:57
samcv we don't just keep track of the number of codepoints?
that shouldn't be that hard to do i would think 01:58
timotimo how often do we need that info? 02:02
samcv idk
but it is an op. though it's not added to nqp 02:03
seems bad to have it be totally wrong
timotimo right, it is 02:12
can probably replace it with a NYI exception 02:13
bedtime for me 02:14
long day ahead of me
o/
04:15 deep-book-gk_ joined 04:18 deep-book-gk_ left 06:42 robertle joined 06:54 statisfiable6 joined 09:07 praisethemoon joined 11:12 colomon joined
jnthn codes_s should probably just grab a codepoint iter and loop 11:31
And increment for each
We could do an optimized path for some of the cases
nine jnthn: you talked about aliasing and scalars yesterday. Did you mean scalars in general, like ones coming from outside the block spesh is looking at, or scalars created in that block? 11:33
11:34 vendethiel joined
jnthn nine: Ones coming outside of the block we pretty much have to assume are aliased 11:36
But in a huge number of cases the first thing we do upon receiving them is decont
11:38 lizmat joined
nine jnthn: but if you meant the ones created in the block I don't understand how the "block it if we see a call" rule can be too weak, since I think there's no way to pass the scalar to a different thread without making a call? 11:40
It's easy to see how it's too strong though. 11:41
(for the "it's only deconted" reason)
jnthn nine: Ah, I was talking then about where we're heading, not what we have today. Today's one doesn't try to track if the thing might be aliased. 11:47
nine Is there any documentation about how spesh works? I do understand its job but would like to learn more about how it's implemented. 11:51
jnthn Not a great deal; I'm sure I had some slides with the basics, beyond that the key data structures involved are decently described in the header files. src/spesh/graph.h is the best place to start reading. 11:58
12:03 colomon joined
nine Ok, thanks! 12:06
12:10 colomon joined 12:46 colomon joined 12:52 dogbert2 joined
Geth MoarVM: abc38137b3 | (Samantha McVey)++ | src/strings/ops.c
Fix MVM_string_compare to support deterministic comparing of synthetics

Previously we compared naively by grapheme, and ended up comparing synthetic codepoints with non-synthetics. This would cause synthetics to be sorted incorrectly, in addition to it making comparing things non-deterministic; if the synthetics were added in a different order, you would get a different result with MVM_string_compare. ... (6 more lines)
13:40
13:46 brrt joined
nine Comparing the precomp file of NativeCall::Types with just a recompile, the files differ by 8 32 bit values and 2 64 bit values spread out between 0x0000ff70 and 0x00010150 which is about 73 % into the file. What could those be? 13:47
Don't look like strings or time stamps and it's not just a different order either. 13:48
jnthn If you valgrind it, does it warn about write getting uninitialized bytes? 13:49
I think there's still some case that wasn't yet tracked down where things are aligned in the output, but the padding bytes aren't zeroed, and the memory was malloc'd 13:50
It's fine in that we ignore them when reading
But maybe not so fine for what you're doing?
nine This is just a plain unmodified MoarVM. Wouldn't the uninitialized read have popped up time and again?
jnthn I see them in the occasional valgrind output alongside the actual things I've been hutning 13:51
*hunting
nine I'm investigating reproducible builds as distros like Debian are pushing strongly into that direction.
jnthn But since I knew they they were harmless I learned to disregard them.
Yeah, then they're not so harmless for that
It'll be somewhere in src/mast/compiler.c that'll want fixing, I'd expect
13:53 colomon joined
nine Ok, that explanation does fit with the data I'm seeing. Though that doesn't explain those 2 64 bit differences 13:53
jnthn No, those are a bit more odd 13:54
nine I do get a "Syscall param write(buf) points to uninitialised byte(s)" 14:12
Looks like it's in the SC data (surprise, surprise) 14:17
jnthn oh 14:18
Wasn't quite expecting it in SC data
Geth MoarVM/even-moar-jit: 23 commits pushed by (Jonathan Worthington)++, (Jimmy Zhuo)++, (Timo Paulssen)++, (Samantha McVey)++, (Bart Wiegmans)++
review: github.com/MoarVM/MoarVM/compare/5...328d2e1c74
14:19
nine Well vm->serialized starts at 0x0b7d8 and is size 0x0a6b8. The values in question are between 0x0ff70 and 0x10150 14:21
jnthn Seems guilty then
nine also 0x0b7d8+0x0a6b8 comes just a couple bytes short of the mbc file's size, so I guess the numbers make sense 14:23
And when I compile with all correct arguments, the size matches exactly even. 14:31
So....what's the story behind this? #define vm tc 14:34
jnthn Once upon a time, the MAST assembler was compiled both into MoarVM and into a Parrot dynops library 14:38
That's how we bootstrapped off NQP on Parrot.
nine Oh...that does sound kinda horrible 14:39
jnthn Yeah. Well, that code is a victim of its own correctness I guess. 14:40
It's required very few changes/fixes, those it did need were very localized, so there was never really an incentive to sink time into erasing this history :) 14:41
Not to mention that in the long term we should drop probably MAST altogether and just produce the bytes :)
nine Ok, narrowed it down to writer->root.stables_data 14:43
jnthn Hmmm
jnthn still has no good guesses 14:44
Though it's starting to sound a bit less like padding
nine Turning the MVM_mallocs in MVM_serialization_serialize into MVM_callocs seems to have improved things. Though I do see additional differences in these tests. 14:51
As opposed to my very first one 14:52
As the sizes are "Some guesses." it's not that surprising. Not all compilation units will fill those default buffers completely. 14:58
15:00 colomon joined 15:08 dogbert2 joined 15:17 praisethemoon joined
dogbert2 Created an issue for the problem uncovered yesterday. github.com/MoarVM/MoarVM/issues/620 15:23
15:48 colomon joined
nine One of the remaining bits seems to be in a method cache 15:59
Removing the 2 ^parameterize methods in the file seems to improve things. 16:36
The same is not true for the other methods
16:39 colomon joined 17:52 dogbert2 joined
nine No wonder this makes no sense! The sizes for the sections are calculated in a different order than the sections are written, so all positions were far off. 17:53
Now this makes more sense: it's the very last 4 bytes of writer->root.objects_data 18:00
18:03 statisfiable6 joined
nine And it iiiiis..... the padding between sections. Who'd have thought? We write writer->root.objects_data bytes but advance the offset by MVM_ALIGN_SECTION(writer->objects_data_offset) 18:15
Only one difference left, apparently in the closures table 18:19
And that can be fixed by initializing the full memory after reallocing the closures table. Now why that's necessary is an open question. 18:31
18:55 zakharyas joined 18:58 zakharyas joined
timotimo cool of you to investigate 19:02
i was wondering if we should have a stage mibus one
minus 19:03
a slimmed down nqp without optimizer and repl and maybe some other things you could leave out
then verify it
with --dump
plus somethingbthat dumps the sc
19:05 zakharyas joined 19:08 greppable6 joined, committable6 joined
nine Now there's still the time_n() in alt_nfas getting baked into $ast.name(QAST::Node.unique('alt_nfa_') ~ '_' ~ ~nqp::time_n()); 19:17
Why would that be necessary when there's already a call to unique()? 19:18
jnthn I can't remember how unique those have to be but I think they may have to be unique *across* compilation units, not just per compilation unit 19:19
nine As we're talking about regexes here, would they have to be unique even between identical regexes? 19:23
jnthn A regex can have many alternations... iirc, but I may not, the issue is that they're cached in the meta-object by the name, and so if you subclass then re-used names betwene the super and child grammars if they're in different compilation units would be problematic 19:26
Gotta be afk for a bit now, though, so can't look in detail... Surely we can eliminate the timestamp though :)
bbiab
20:05 colomon joined 20:08 dogbert2 joined 21:16 Geth joined 22:18 dogbert2 joined
timotimo If the program really needs this behavior there is no really easy way out. One possibility is to create an anonymous file (just unlink it after creation), size the file using ftrunctate, and then map the file in two places. In one place map it with MAP_SHARED and write permission but without execution. For the second mapping use execution permissions but no write permissions. This might be a bit confusing at 23:31
first but can be handled. The program must be adjusted to write to one location and expect to execute code in another one. This is reasonably safe in case the two mappings are allowed to be randomied. The example code in the next section illustrates how this should work.
"Using this approach instead of one mapping which is writable and executable at the same time is safer because the attacker has to know two independently randomized addresses (this assumes mmap is allowed to perform the randomizations)." 23:32
though i thought we had a write-only mapping first and then turn it exec-only? 23:33
well, read-write
Geth MoarVM: b07acdfd92 | (Timo Paulssen)++ | src/jit/compile.c
disable jit when we're not allowed to make memory executable
23:52
timotimo turn on "deny_execmem" and watch one program after the other crash 23:53