This channel is intended for people just starting with the Raku Programming Language (raku.org). Logs are available at irclogs.raku.org/raku-beginner/live.html Set by lizmat on 8 June 2022. |
|||
03:46
discord-raku-bot left
03:47
discord-raku-bot joined
04:01
guifa left
06:15
frost joined
08:04
frost left
08:31
discord-raku-bot left,
discord-raku-bot joined
08:37
discord-raku-bot left,
discord-raku-bot joined
08:49
discord-raku-bot left,
discord-raku-bot joined
10:23
guer joined
|
|||
guer | I have difficulties trying to call an (old) API which I suspect can't handle ut8-strings ("åäö", ...). I've tried lots more or less "creative" conversion attempts ut8 -> latin-1, but they all end up in a ny utf8-string. What is the proper way? | 10:29 | |
... in a new ... | |||
Maybe I could rephrase my question as: | 10:38 | ||
How do I get a string from a Buf as in: "'åäö'.encode('latin-1') ==> Buf" | |||
10:40
discord-raku-bot left,
discord-raku-bot joined
|
|||
lizmat | but you cannot convert utf8 into latin1, unless the actual string only has codepoints < 128 ? | 10:49 | |
or am I mistaking ASCII for Latin 1 here? | 10:52 | ||
guer | Yes, maybe. Latin-1 has/had representaatioins for those characters/strings. | 10:53 | |
lizmat | ok, waking up a bit I guess | 10:57 | |
so you're saying that the old API is tripping on 'åäö' ? | 10:59 | ||
m: say "åäö".ords # that seems to be correct values ? | |||
camelia | (229 228 246) | ||
11:00
thundergnat joined
|
|||
guer | Can't remeber really, Lat1 was a long time ago, But they seems plusible. | 11:00 | |
plausible | 11:01 | ||
thundergnat | m: say 'åäö'.encode('Latin-1').decode('Latin-1'); # perhaps? | ||
camelia | åäö | ||
lizmat | m: .say for 'åäö'.encode('Latin-1') | ||
camelia | 229 228 246 |
||
lizmat | the encoding into Latin-1 didn't change thos | 11:02 | |
e | |||
guer | thundergnat: Thats what I started out with. But the result seems to be a ut8 string. | 11:03 | |
thundergnat | Well, utf8 upto code point 255 is identical to Latin-1. | ||
lizmat | right | 11:04 | |
so if that API borks, it's getting fed other codepoints ? | |||
lizmat yawns again, trying to wake up | |||
guer | Hmm, maybe I have to dig deeper into Lat1 codes ... But if they where identical why would the api complain? | 11:07 | |
lizmat | guer: that's my point: you would have to find out about what the API is complaining exactly :-) | 11:08 | |
guer | thundergnat: Hard experiences from some decades ago teached me that the unicode points for these Swedish characters do not map Latin1 representations. | 11:10 | |
well, anyway: thanks for your help. I hoped that this way to test could sharpen my complaints to the API manainers, but I can't get any further rightt now. | 11:14 | ||
lizmat | m: .say for (229,228,246).Buf.decode("latin1").uninames | 11:16 | |
camelia | No such method 'Buf' for invocant of type 'List' in block <unit> at <tmp> line 1 |
||
lizmat | m: .say for Buf.new(229,228,246).decode("latin1").uninames | ||
camelia | LATIN SMALL LETTER A WITH RING ABOVE LATIN SMALL LETTER A WITH DIAERESIS LATIN SMALL LETTER O WITH DIAERESIS |
||
lizmat | guer: looks like those 3 *are* latin-1 ? | ||
guer | Yes, I know. That was one of the reasons that I suspected the output was unicode, not Lat1. | 11:17 | |
thundergnat | It's actually remarkably difficult to get Raku to NOT treat those characters as Latin-1 characters. Raku normalizes by default. | 11:35 | |
m: put 'åäö'.comb>>.NFD>>.list>>.chrs>>.uniname>>.join(', ').join: "\t"; | 11:36 | ||
camelia | LATIN SMALL LETTER A, COMBINING RING ABOVE LATIN SMALL LETTER A, COMBINING DIAERESIS LATIN SMALL LETTER O, COMBINING DIAERESIS | ||
guer | 1. I don't understand why the above shows that the string is Latin-1. To me it is just two possible unicode ways to represent the string. | 11:45 | |
2. Maybe I wasn't clear enough from the beginning: The string I presented above *is* a utf8 string. So showing that this string witthout conversion is still a Unicode string confuses me. | 11:49 | ||
Thanks again for your efforts! I hope I won't bother you again in this matter. | 12:00 | ||
12:39
Nemokosch joined
13:33
discord-raku-bot left,
discord-raku-bot joined
|
|||
stevied | so when testing a module, is there a good way to run unit tests on methods that are private? | 15:32 | |
I guess there's a big debate about whether it should even be done | 15:33 | ||
15:35
discord-raku-bot left
15:36
discord-raku-bot joined
16:28
guer left
16:42
Nemokosch left
|
|||
looking at raku.land/zef:tony-o/Data::Dump | 22:44 | ||
says "all of these options can be overridden in the DATA_DUMP environment variable in the format: indent=4,color=false" | |||
where is the variable and how do I change it? I don't see it in %*ENV | 22:45 | ||
adding it to %*ENV doesn't make a difference | 22:48 | ||
kjp | Sorry to be rather late to the party, but that discussion 12 hours ago about utf8 and latin-1 seemed to be very confused. In particular, people seemed to be conflating "utf8" and "unicode code point". | 23:49 | |
utf8 is only identical to latin-1 (really ASCII) up to code point 127. Unicode code points are identical to latin-1 up to 255. | 23:50 | ||
Thus 229, 228 and 246 are both latin-1 and unicode code points for the same characters, but they are not the utf8 representaion of anything. | 23:52 |