samcv | now let me see how many spectests this breaks :) | 00:00 | |
jnthn | samcv: (most codepoints) sounds fine | 00:06 | |
timotimo: No, I'm a bit confused by how that'd happen | |||
samcv | ok cool no spectests failed | ||
jnthn | timotimo: Will have to look, but it's decidedly bed time for me :) | 00:07 | |
timotimo | well, it doesn't look like we ever root (or even mark) a ThreadContext | 00:13 | |
samcv | cool. i'm adding back the power to configure primary/secondary/tertiary level sort :) | 00:18 | |
though it's going to have to work slightly differently. though it may give the same result. because all primary levels are higher than all secondary levels and all secondary are higher than all tertiary levels | 00:19 | ||
it will only apply your settings if comparing between the same level. otherwise it will compare normally | |||
so if you reverse primary level, only comparing primary level vs primary level will your change be effected | |||
and... it works! wow | 00:23 | ||
wait why does MVM_string_codes return s->body.num_graphs and MVM_string_graphs also returns body.num_graphs? | 00:49 | ||
timotimo | that seems wrong | 01:19 | |
but we might not have that function used anywhere? | |||
01:52
ilbot3 joined
|
|||
samcv | hmm | 01:52 | |
well theres nqp::codes_s but its' not used in nqp | 01:53 | ||
what's the nqp op to get the number of codepoints? | |||
or the MVM function? | |||
o-oh it does self.NFC.codes | 01:54 | ||
in rakudo | |||
could that be slow though? | |||
so MVM_string_codes is totally bogus | 01:56 | ||
timotimo | it could be slow, yes | ||
it allocates, for one | 01:57 | ||
samcv | we don't just keep track of the number of codepoints? | ||
that shouldn't be that hard to do i would think | 01:58 | ||
timotimo | how often do we need that info? | 02:02 | |
samcv | idk | ||
but it is an op. though it's not added to nqp | 02:03 | ||
seems bad to have it be totally wrong | |||
timotimo | right, it is | 02:12 | |
can probably replace it with a NYI exception | 02:13 | ||
bedtime for me | 02:14 | ||
long day ahead of me | |||
o/ | |||
04:15
deep-book-gk_ joined
04:18
deep-book-gk_ left
06:42
robertle joined
06:54
statisfiable6 joined
09:07
praisethemoon joined
11:12
colomon joined
|
|||
jnthn | codes_s should probably just grab a codepoint iter and loop | 11:31 | |
And increment for each | |||
We could do an optimized path for some of the cases | |||
nine | jnthn: you talked about aliasing and scalars yesterday. Did you mean scalars in general, like ones coming from outside the block spesh is looking at, or scalars created in that block? | 11:33 | |
11:34
vendethiel joined
|
|||
jnthn | nine: Ones coming outside of the block we pretty much have to assume are aliased | 11:36 | |
But in a huge number of cases the first thing we do upon receiving them is decont | |||
11:38
lizmat joined
|
|||
nine | jnthn: but if you meant the ones created in the block I don't understand how the "block it if we see a call" rule can be too weak, since I think there's no way to pass the scalar to a different thread without making a call? | 11:40 | |
It's easy to see how it's too strong though. | 11:41 | ||
(for the "it's only deconted" reason) | |||
jnthn | nine: Ah, I was talking then about where we're heading, not what we have today. Today's one doesn't try to track if the thing might be aliased. | 11:47 | |
nine | Is there any documentation about how spesh works? I do understand its job but would like to learn more about how it's implemented. | 11:51 | |
jnthn | Not a great deal; I'm sure I had some slides with the basics, beyond that the key data structures involved are decently described in the header files. src/spesh/graph.h is the best place to start reading. | 11:58 | |
12:03
colomon joined
|
|||
nine | Ok, thanks! | 12:06 | |
12:10
colomon joined
12:46
colomon joined
12:52
dogbert2 joined
|
|||
Geth | MoarVM: abc38137b3 | (Samantha McVey)++ | src/strings/ops.c Fix MVM_string_compare to support deterministic comparing of synthetics Previously we compared naively by grapheme, and ended up comparing synthetic codepoints with non-synthetics. This would cause synthetics to be sorted incorrectly, in addition to it making comparing things non-deterministic; if the synthetics were added in a different order, you would get a different result with MVM_string_compare. ... (6 more lines) |
13:40 | |
13:46
brrt joined
|
|||
nine | Comparing the precomp file of NativeCall::Types with just a recompile, the files differ by 8 32 bit values and 2 64 bit values spread out between 0x0000ff70 and 0x00010150 which is about 73 % into the file. What could those be? | 13:47 | |
Don't look like strings or time stamps and it's not just a different order either. | 13:48 | ||
jnthn | If you valgrind it, does it warn about write getting uninitialized bytes? | 13:49 | |
I think there's still some case that wasn't yet tracked down where things are aligned in the output, but the padding bytes aren't zeroed, and the memory was malloc'd | 13:50 | ||
It's fine in that we ignore them when reading | |||
But maybe not so fine for what you're doing? | |||
nine | This is just a plain unmodified MoarVM. Wouldn't the uninitialized read have popped up time and again? | ||
jnthn | I see them in the occasional valgrind output alongside the actual things I've been hutning | 13:51 | |
*hunting | |||
nine | I'm investigating reproducible builds as distros like Debian are pushing strongly into that direction. | ||
jnthn | But since I knew they they were harmless I learned to disregard them. | ||
Yeah, then they're not so harmless for that | |||
It'll be somewhere in src/mast/compiler.c that'll want fixing, I'd expect | |||
13:53
colomon joined
|
|||
nine | Ok, that explanation does fit with the data I'm seeing. Though that doesn't explain those 2 64 bit differences | 13:53 | |
jnthn | No, those are a bit more odd | 13:54 | |
nine | I do get a "Syscall param write(buf) points to uninitialised byte(s)" | 14:12 | |
Looks like it's in the SC data (surprise, surprise) | 14:17 | ||
jnthn | oh | 14:18 | |
Wasn't quite expecting it in SC data | |||
Geth | MoarVM/even-moar-jit: 23 commits pushed by (Jonathan Worthington)++, (Jimmy Zhuo)++, (Timo Paulssen)++, (Samantha McVey)++, (Bart Wiegmans)++ review: github.com/MoarVM/MoarVM/compare/5...328d2e1c74 |
14:19 | |
nine | Well vm->serialized starts at 0x0b7d8 and is size 0x0a6b8. The values in question are between 0x0ff70 and 0x10150 | 14:21 | |
jnthn | Seems guilty then | ||
nine | also 0x0b7d8+0x0a6b8 comes just a couple bytes short of the mbc file's size, so I guess the numbers make sense | 14:23 | |
And when I compile with all correct arguments, the size matches exactly even. | 14:31 | ||
So....what's the story behind this? #define vm tc | 14:34 | ||
jnthn | Once upon a time, the MAST assembler was compiled both into MoarVM and into a Parrot dynops library | 14:38 | |
That's how we bootstrapped off NQP on Parrot. | |||
nine | Oh...that does sound kinda horrible | 14:39 | |
jnthn | Yeah. Well, that code is a victim of its own correctness I guess. | 14:40 | |
It's required very few changes/fixes, those it did need were very localized, so there was never really an incentive to sink time into erasing this history :) | 14:41 | ||
Not to mention that in the long term we should drop probably MAST altogether and just produce the bytes :) | |||
nine | Ok, narrowed it down to writer->root.stables_data | 14:43 | |
jnthn | Hmmm | ||
jnthn still has no good guesses | 14:44 | ||
Though it's starting to sound a bit less like padding | |||
nine | Turning the MVM_mallocs in MVM_serialization_serialize into MVM_callocs seems to have improved things. Though I do see additional differences in these tests. | 14:51 | |
As opposed to my very first one | 14:52 | ||
As the sizes are "Some guesses." it's not that surprising. Not all compilation units will fill those default buffers completely. | 14:58 | ||
15:00
colomon joined
15:08
dogbert2 joined
15:17
praisethemoon joined
|
|||
dogbert2 | Created an issue for the problem uncovered yesterday. github.com/MoarVM/MoarVM/issues/620 | 15:23 | |
15:48
colomon joined
|
|||
nine | One of the remaining bits seems to be in a method cache | 15:59 | |
Removing the 2 ^parameterize methods in the file seems to improve things. | 16:36 | ||
The same is not true for the other methods | |||
16:39
colomon joined
17:52
dogbert2 joined
|
|||
nine | No wonder this makes no sense! The sizes for the sections are calculated in a different order than the sections are written, so all positions were far off. | 17:53 | |
Now this makes more sense: it's the very last 4 bytes of writer->root.objects_data | 18:00 | ||
18:03
statisfiable6 joined
|
|||
nine | And it iiiiis..... the padding between sections. Who'd have thought? We write writer->root.objects_data bytes but advance the offset by MVM_ALIGN_SECTION(writer->objects_data_offset) | 18:15 | |
Only one difference left, apparently in the closures table | 18:19 | ||
And that can be fixed by initializing the full memory after reallocing the closures table. Now why that's necessary is an open question. | 18:31 | ||
18:55
zakharyas joined
18:58
zakharyas joined
|
|||
timotimo | cool of you to investigate | 19:02 | |
i was wondering if we should have a stage mibus one | |||
minus | 19:03 | ||
a slimmed down nqp without optimizer and repl and maybe some other things you could leave out | |||
then verify it | |||
with --dump | |||
plus somethingbthat dumps the sc | |||
19:05
zakharyas joined
19:08
greppable6 joined,
committable6 joined
|
|||
nine | Now there's still the time_n() in alt_nfas getting baked into $ast.name(QAST::Node.unique('alt_nfa_') ~ '_' ~ ~nqp::time_n()); | 19:17 | |
Why would that be necessary when there's already a call to unique()? | 19:18 | ||
jnthn | I can't remember how unique those have to be but I think they may have to be unique *across* compilation units, not just per compilation unit | 19:19 | |
nine | As we're talking about regexes here, would they have to be unique even between identical regexes? | 19:23 | |
jnthn | A regex can have many alternations... iirc, but I may not, the issue is that they're cached in the meta-object by the name, and so if you subclass then re-used names betwene the super and child grammars if they're in different compilation units would be problematic | 19:26 | |
Gotta be afk for a bit now, though, so can't look in detail... Surely we can eliminate the timestamp though :) | |||
bbiab | |||
20:05
colomon joined
20:08
dogbert2 joined
21:16
Geth joined
22:18
dogbert2 joined
|
|||
timotimo | If the program really needs this behavior there is no really easy way out. One possibility is to create an anonymous file (just unlink it after creation), size the file using ftrunctate, and then map the file in two places. In one place map it with MAP_SHARED and write permission but without execution. For the second mapping use execution permissions but no write permissions. This might be a bit confusing at | 23:31 | |
first but can be handled. The program must be adjusted to write to one location and expect to execute code in another one. This is reasonably safe in case the two mappings are allowed to be randomied. The example code in the next section illustrates how this should work. | |||
"Using this approach instead of one mapping which is writable and executable at the same time is safer because the attacker has to know two independently randomized addresses (this assumes mmap is allowed to perform the randomizations)." | 23:32 | ||
though i thought we had a write-only mapping first and then turn it exec-only? | 23:33 | ||
well, read-write | |||
Geth | MoarVM: b07acdfd92 | (Timo Paulssen)++ | src/jit/compile.c disable jit when we're not allowed to make memory executable |
23:52 | |
timotimo | turn on "deny_execmem" and watch one program after the other crash | 23:53 |