This channel is intended for people just starting with the Raku Programming Language (raku.org). Logs are available at irclogs.raku.org/raku-beginner/live.html
Set by lizmat on 8 June 2022.
03:46 discord-raku-bot left 03:47 discord-raku-bot joined 04:01 guifa left 06:15 frost joined 08:04 frost left 08:31 discord-raku-bot left, discord-raku-bot joined 08:37 discord-raku-bot left, discord-raku-bot joined 08:49 discord-raku-bot left, discord-raku-bot joined 10:23 guer joined
guer I have difficulties trying to call an (old) API which I suspect can't handle ut8-strings ("åäö", ...). I've tried lots more or less "creative" conversion attempts ut8 -> latin-1, but they all end up in a ny utf8-string. What is the proper way? 10:29
... in a new ...
Maybe I could rephrase my question as: 10:38
How do I get a string from a Buf as in: "'åäö'.encode('latin-1') ==> Buf"
10:40 discord-raku-bot left, discord-raku-bot joined
lizmat but you cannot convert utf8 into latin1, unless the actual string only has codepoints < 128 ? 10:49
or am I mistaking ASCII for Latin 1 here? 10:52
guer Yes, maybe. Latin-1 has/had representaatioins for those characters/strings. 10:53
lizmat ok, waking up a bit I guess 10:57
so you're saying that the old API is tripping on 'åäö' ? 10:59
m: say "åäö".ords # that seems to be correct values ?
camelia (229 228 246)
11:00 thundergnat joined
guer Can't remeber really, Lat1 was a long time ago, But they seems plusible. 11:00
plausible 11:01
thundergnat m: say 'åäö'.encode('Latin-1').decode('Latin-1'); # perhaps?
camelia åäö
lizmat m: .say for 'åäö'.encode('Latin-1')
camelia 229
228
246
lizmat the encoding into Latin-1 didn't change thos 11:02
e
guer thundergnat: Thats what I started out with. But the result seems to be a ut8 string. 11:03
thundergnat Well, utf8 upto code point 255 is identical to Latin-1.
lizmat right 11:04
so if that API borks, it's getting fed other codepoints ?
lizmat yawns again, trying to wake up
guer Hmm, maybe I have to dig deeper into Lat1 codes ... But if they where identical why would the api complain? 11:07
lizmat guer: that's my point: you would have to find out about what the API is complaining exactly :-) 11:08
guer thundergnat: Hard experiences from some decades ago teached me that the unicode points for these Swedish characters do not map Latin1 representations. 11:10
well, anyway: thanks for your help. I hoped that this way to test could sharpen my complaints to the API manainers, but I can't get any further rightt now. 11:14
lizmat m: .say for (229,228,246).Buf.decode("latin1").uninames 11:16
camelia No such method 'Buf' for invocant of type 'List'
in block <unit> at <tmp> line 1
lizmat m: .say for Buf.new(229,228,246).decode("latin1").uninames
camelia LATIN SMALL LETTER A WITH RING ABOVE
LATIN SMALL LETTER A WITH DIAERESIS
LATIN SMALL LETTER O WITH DIAERESIS
lizmat guer: looks like those 3 *are* latin-1 ?
guer Yes, I know. That was one of the reasons that I suspected the output was unicode, not Lat1. 11:17
thundergnat It's actually remarkably difficult to get Raku to NOT treat those characters as Latin-1 characters. Raku normalizes by default. 11:35
m: put 'åäö'.comb>>.NFD>>.list>>.chrs>>.uniname>>.join(', ').join: "\t"; 11:36
camelia LATIN SMALL LETTER A, COMBINING RING ABOVE LATIN SMALL LETTER A, COMBINING DIAERESIS LATIN SMALL LETTER O, COMBINING DIAERESIS
guer 1. I don't understand why the above shows that the string is Latin-1. To me  it is just two possible unicode ways to represent the string. 11:45
2. Maybe I wasn't clear enough from the beginning: The string I presented above *is* a utf8 string. So showing that this string witthout conversion is still a Unicode string confuses me. 11:49
Thanks again for your efforts! I hope I won't bother you again in this matter. 12:00
12:39 Nemokosch joined 13:33 discord-raku-bot left, discord-raku-bot joined
stevied so when testing a module, is there a good way to run unit tests on methods that are private? 15:32
I guess there's a big debate about whether it should even be done 15:33
15:35 discord-raku-bot left 15:36 discord-raku-bot joined 16:28 guer left 16:42 Nemokosch left
looking at raku.land/zef:tony-o/Data::Dump 22:44
says "all of these options can be overridden in the DATA_DUMP environment variable in the format: indent=4,color=false"
where is the variable and how do I change it? I don't see it in %*ENV 22:45
adding it to %*ENV doesn't make a difference 22:48
kjp Sorry to be rather late to the party, but that discussion 12 hours ago about utf8 and latin-1 seemed to be very confused. In particular, people seemed to be conflating "utf8" and "unicode code point". 23:49
utf8 is only identical to latin-1 (really ASCII) up to code point 127. Unicode code points are identical to latin-1 up to 255. 23:50
Thus 229, 228 and 246 are both latin-1 and unicode code points for the same characters, but they are not the utf8 representaion of anything. 23:52