10:07 rnddim is now known as ShimmerFairy
Geth MoarVM: patrickbkr++ created pull request #2010:
fix gettid on older glibc's
10:14
timo should we put an `#ifndef gettid` or so? 11:34
ShimmerFairy timo: Taking a quick look, if we include unistd.h (which we do, looking at the PR changes), and if we define the _GNU_SOURCE macro (which I don't know if we do), then we'd already be pulling in a definition of gettid, which might cause conflicts. gettid is a real function prototype on my system, though, so ifndef wouldn't catch it. 12:16
timo OK, sounds like merging the pull request as-is is fine, then? 12:17
ShimmerFairy I just manually edited in the PR fix to my local copy, and it compiled fine (though I didn't bother re-running the Configure script first, on the very slim offchance that matters). So it at least builds fine, even if I don't care for how it unconditionally replaces the real gettid() function for people with new-enough glibcs. 12:22
(For reference, glibc 2.30 came out in August 2019, so you'd have to be running a pretty old system to not have gettid() at this point. Not so old to be implausible, though.)
timo right, for building binary releases it's generally a good idea to just build on the oldest OS you can think of, so you get maximum portability, at least that's what I think the thought is 12:25
ShimmerFairy Makes sense, though I would suggest that nearly 7 years old software is pushing it a bit (but I'm a gentoo user, so I have an unusual perspective on using old versions of software). 12:27
In any case, if we had a build system that could test for the presence of a function and define a macro to let us conditionally put in a replacement when needed, that would be better. 12:28
timo we do have that in the build system, yes. I have recent-ish commits in my branch for musttail-based interpreter loop stuff 12:30
ShimmerFairy It's pretty much just a theoretical concern, I just have a philosophical issue with overriding the system-provided function when available (after all, what if someday gettid() isn't the same as that syscall?). It's probably just fine as-is, though. 12:34
timo that makes sense to me 12:35
I wouldn't be against saying something like "when you're making a release tarball, set -DMVM_REALLY_OLD_GLIBC and work off of that 12:36
ShimmerFairy Looks like there are glibc version macros you could use to gate the gettid() #define, not quite sure yet if you're meant to use them in user code though. 12:39
timo: Just added a comment on the PR about a conditional check that I think should work, figured it'd be better explaining it there than in IRC. 12:49
timo does this work fine with musl libc too? I don't know if it defines the __GLIBC__ symbol and such 12:55
ShimmerFairy I wouldn't know, I'm not very practiced in writing tests for older systems. (The best solution is still probably just the build system testing and defining a HAVE_GETTID-style macro.) 12:58
Oh, looks like there's __GLIBC_PREREQ(2, 30) that makes it easier to test for a glibc version. 13:03
Also, a quick glance tells me that musl doesn't really define any macros to test it's being used? 13:07
As of a few years ago, at last, the musl devs were utterly allergic to defining any kind of __MUSL__ macros, so having the config system test for the presence of the function is the only good solution for musl users. 13:16
*at least 13:17
timo there is only a single mention of "gettid()" in the entire file. I think it's probably still correct to use `syscall(SYS_gettid)` even when glibc offers a gettid() function, and since this is for a very linux-specific feature, it might be no issue at all to just always use syscall here? 15:39
ShimmerFairy That sounds reasonable to me. I do wonder what the point of the gettid() function is in the first place, if there's a reason beyond "it'd be nice to not need syscall() directly". If for example gettid() existed on non-Linux systems, and if our linux-specific code could someday be applied to those other systems, then it'd make sense to use gettid() instead. 15:50
timo the code that uses it is to output the jitdump format, which currently is to my knowledge linux specific - at least the specification lives in the linux source tree 15:57
ShimmerFairy Sounds good to me then, not like it can't be changed later on anyway. If we want something unconditional and not involving the build system any more, then I think using the syscall() that always works would be better than the gettid() that sometimes doesn't. 16:14
In all though, I want to reiterate that my "objection" really isn't much of one. I just couldn't help but notice the PR was technically clobbering gettid() on systems that do have it, and wanted to at least point it out. 16:16
timo yeah it's fair
I pushed a commit but Geth isn't pointing it out. maybe it actually ended up in patrickb's own repository rather than the moarvm one, so there wasn't a call to the notification webhook 16:26
Geth MoarVM/main: 2766e8ef86 | (Patrick Bƶker)++ (committed using GitHub Web editor) | src/jit/compile.c
fix gettid on older glibc's (#2010)

  * fix gettid on older glibc's
Glibc < 2.30 does not define `gettid()`. The man page states:
   Glibc does not provide a wrapper for this system call; call it using
... (12 more lines)
timo this has the squash message which also has my message as part of it 16:27
I'm still not sure how we should address the nativecall issue on clang where clang and gcc disagree on the upper bits of a smaller-than-64bit argument being cleared by the caller or not 16:29
that's what is breaking one of our nativecall tests in CI 16:31
anyway thanks patrickb++
[Coke] wasn't there a commit that fixed the Changelog? 16:48
timo it looks like there wasn't one yet, but you did ask for one, so I guess I'll write one? a bit later today though, gotta err a runnand 16:50
[Coke] thought I saw an email but I don't see it in the closed PR. :(
timo maybe it was done in the changelog wiki page instead of in the code repo? 16:51
[Coke] there is no changelog wiki page for moarvm. 16:57
that's for rakudo itself.
later today is perfectly fine, thanks 16:59
lizmat meanwhile I'll be bumping NQP and Rakudo :-) 17:01
timo what would the changelog entry look like? I imagine it'd go into a freshly-created "New in 2025.06" section? "+ Don't rely on `gettid` function for JITDUMP format" maybe? 18:41
[Coke]: opinions? 19:01
[Coke] New: is fine. Theoretically it's a 2025.06 but we don't know for sure yet. 19:27
or New in * 19:28
timo I think commonly changelogs have a section at the top with a "dummy name" essentially 19:30
[Coke] App::Mi6 has {{$NEXT}} 19:36
there's no automation on it in the moar release process, so any placeholder is fine. You can even use the next release as a placeholder.
patrickb that gettid fix is a direct copy of libuv code. That's why I was pretty confident there wouldn't be any strange side effects of preprocessor overriding the name. 19:44
[Coke] mentioned in #raku-dev, but we need to catch potential release breakers like this... before the release. simple matter of adding all the binary rleleae OSes to the azure pipeline yml? 19:45
timo possibly, yeah. we may have to go through docker images rather than just having azure give us the OS we want? 19:48
patrickb yeah. Azure doesn't give us such old distros. It's been quite fiddly to get the current setup working. (And had to fix stuff up once or twice as well because distros shit down their package mirrors...) 20:05
[Coke] I assume we want like a split, where we have new stuff in azure "standard", and specific stuff running in containers. 20:49
(or maybe we just move everything into containers)
Geth MoarVM/utf8_and_c8_decode_mark_thread_blocked: 43749e1ec1 | (Timo Paulssen)++ | 2 files
When decoding over 10k bytes of utf8 or utf8-c8 data, mark thread blocked

this allows GC runs to happen while a thread is busy going through a large buffer of bytes.
Smaller buffers of bytes don't really need us to go to the effort of blocking and unblocking the thread, as with a series of smaller decodes, the allocation of the resulting string would join in on GC runs waiting to happen in a timely manner, I expect.
21:23
MoarVM: timo++ created pull request #2012:
When decoding over 10k bytes of utf8 or utf8-c8 data, mark thread blo…
21:24
timo lizmat: this one's for you
Geth MoarVM/utf8_and_c8_decode_mark_thread_blocked: 545d0337a6 | (Timo Paulssen)++ | docs/ChangeLog
Changelog entry for gc while decode
21:28
lizmat wow... I had completely forgotten about that... it all makes sense now :-)
timo what "that" in particular? 21:37
also, I can't tell you how i stumbled upon that gist, I think it was an open tab in one of my fifty million browser windows for uhhhhhhh a couple of months probably 21:38
anyway, decoding a gigabyte of utf8 in 12 seconds isn't *that* terrible right? 21:42
japhb timo: It's kinda terrible. :-| 21:44
timo decoding and normalizing i should say
actually 12 seconds is utf8-c8, utf8 is 9.2s 21:45
japhb Don't know what's normal for normalizing time in the broader world. I know that verification and initial decode of UTF-8 can be pretty dang fast these days -- within a factor of 2 of the raw memcopy rate as I recall 21:46
But our normalization is a lot more than that, so ... not sure
timo we do have a fast path in there in theory for situations where we don't need to do anything for normalization 21:47
japhb Maybe comparison with Swift might be enlightening? As I recall Swift has proper grapheme support (though not necessarily implemented the same as ours)
timo yeah I believe they have it put into any ops that do traversal in strings
instead of up-front 21:48
doing verification without normalization is almost trivial, but for normalization you have to hit the unicode database 21:51
decoding the same stuff to latin1 is 0.94 seconds (this is the entire process lifetime, not just the decoding) 21:56
decoding a gigabyte of just zeroes is a pretty poor benchmark in any case; the big benefit is you can generate the full array of zeroes very quickly 21:58
japhb nodnod 22:01
timo anyway, oops i broke it :) 22:10
lizmat ugexe: re POPULATE, I think PRODUCE-META-ATTACHABLES is the wrong place for it, as it is also being called for roles 22:11
tellable6 lizmat, I'll pass your message to ugexe
lizmat oops, ww 22:12
Geth MoarVM/utf8_and_c8_decode_mark_thread_blocked: 0a3852be50 | (Timo Paulssen)++ | src/strings/utf8_c8.c
Don't b0rk the temp root stack in utf8-c8 decode
22:25
timo if only we had a macro that made this easier 22:29
someone got opinions on attempting a fuzzing campaign against a fuzzing target that is "compile code with legacy compiler, then with rakuast and if one but not the other gives an error, consider that a "desired result" so we find cases where they disagree? 22:33
disagree about something being wrong, that is
japhb Seems valuable to me. :-) 22:48
22:59 apogee_ntv left 23:00 apogee_ntv joined