|
01:36
daxelrod joined
12:43
wknight8111 joined
13:05
wknight8111 joined
13:36
wknight8111_ joined
14:29
cognominal joined
15:39
cognominal joined
15:54
pmichaud joined
|
|||
| tewk_ | I'm not going to make the meeting, I'll backlog like I always do. | 16:38 | |
| REPORT | |||
| * Talked with japhb about ManagedStruct definitions generation. | 16:39 | ||
| That wasn't in my original plan, but It seems very useful. | |||
| I'm going to try to fit in in instead of STDCALL and porting to OSX. | |||
| STDCALL support is a lower priority item that ManagedStruct support. | |||
| * The Parrot_jit_build_call_func is completely out of date. Its pre PCC. | |||
| It still used hardcoded registers numbers for argument and object passing. | |||
| * I've rewritten most of the argument passing portion of the build_call function and | |||
| now am starting on handling return values from c functions. | |||
| * The old generated jit called string_to_cstring to convert STRINGS to char *, but never freed the cstrings. | |||
| So I'm working generating the necessary free calls in the jit instruction stream. | |||
| EOR | |||
| Instead of saying pre PCC, its probably better to say pre variable register counts, when certain registers had particular calling convention purposes. | 16:43 | ||
|
16:58
jhorwitz joined
17:43
barney joined
17:45
cotto_work joined
18:06
Auzon joined
18:12
wknight8111 joined
18:24
allison joined
18:28
NotFound joined
18:29
DietCoke joined
|
|||
| DietCoke | Hello, folks. | 18:30 | |
| allison | hello, coke | ||
| wknight8111 | hello | ||
|
18:30
chromatic joined
|
|||
| NotFound | Hola | 18:31 | |
| cotto_work | hi | ||
| chromatic | morning | ||
| pmichaud | good afternoon | 18:32 | |
| barney | hi | ||
| davidfetter | óla | ||
| DietCoke | My report basically consists of fighting with svn merge for some time this week, and learning enough to drop my from-scratch attempt at tcl on PCT. Why don't we go in "hello" order. Allison? | ||
| allison | - I've got the pdd25 branch down to 3 failing test files, and I've nearly finished debugging another. | 18:33 | |
| - Spent time talking to potential Parrot sponsors. | |||
| - (Spent lots of time on OSCON.) | |||
| EOR | |||
| wknight8111 | * my $real_job = I_got_a_new_job($me++) | 18:34 | |
| * Lots of GC debugging work, some nasty segfaults to track down | |||
| * Progress slow but steady | |||
| EOR | |||
| pmichaud | (steady progress)++ | ||
| NotFound | Fixing bugs, applying patches, and working on pdb | ||
| I have two questions | 18:35 | ||
| EOR | |||
| chromatic | Traveling; not a lot of time. | 18:36 | |
| Helped Allison debug one of the remaining pdd25cx problems. | |||
| Giving Andrew as much help as possible. | |||
| wknight8111 | chromatic++ | 18:37 | |
| chromatic | Working on some things that weren't appropriate to land before the release. | ||
| Will branch for the strings PDD shortly; going to pull in NotFound for that. | |||
| EOR | |||
|
18:37
jonathan joined
|
|||
| cotto_work | mostly PMC-related bugfixes and closed tickets | 18:37 | |
| queue 1 question | |||
| eor | 18:38 | ||
| pmichaud | I worked mostly on lexical issues this week, trying to understand them and coming up with a way that Parrot can handle them properly | ||
| I also tracked down the PGE bugs in the pdd25cx branch -- that appears to be a register alligator bug | |||
| did a little more work on getting HLL to work with PCT -- several of chromatic++'s fixes this past week help there | |||
| other than that, no formal report this week -- busy with $otherjob stuff and preparing for lots of trips | 18:39 | ||
| wknight8111 | (register alligator)++ | ||
| spinclad | (alligator bug)-- | ||
| barney | Simplified and extended the Pipp grammar. | ||
| Stared support for $this in Pipp. | |||
| Better support for quoted strings in Pipp. | |||
| Fixed bug with constant table. | |||
| Added some languages to Perl::Critic testing. | |||
| Released Parrot 0.6.4. | |||
| Registered for YAPC::EU | 18:40 | ||
| .eor | |||
| DietCoke | davidfetter? If not, particle. | 18:42 | |
| davidfetter | particle | ||
| davidfetter slacker | |||
| particle | ~ tewk pasted his report earlier, see the logs. | ||
| ~ meetings to discuss smoking parrot with ms osl keep getting canceled. hope third time is a charm (post-oscon) | |||
| ~ parrot foundation setup continues (banking issues, website, donation software, charitable org listing, etc) | |||
| ~ talking to some potential parrot foundation donors tonight | |||
| ~ need to finalize plans for oscon travel, and quickly! | |||
| .end | |||
| DietCoke | Anyone else? | 18:43 | |
| jonathan | Mee! | ||
| This week... | |||
| * Nearly forgot Parrot Sketch! :-O | |||
| * Spent my time on my Rakudo day on Thursday mostly on implementing Perl 6 enums; the main bits are done now. | |||
| * Figured out along the way that anonymous classes should be relatively easy-ish | |||
| * Tried to contribute to the lexicals discussion a bit, though was struggling for brain cycles | |||
| * Other random odds and ends too; think I fixed a segfault due to an off-by-one...for some reason my brain is all hazy at the moment. | |||
| * Will do Rakudo day on Friday this week, since it fits best | |||
| EOR | |||
| jonathan needs to be afk for a bit now - sorry | |||
| DietCoke | heh. Anyone else? | 18:44 | |
| ok. I think there were 2 folks with questions. | 18:45 | ||
| NotFound? | |||
| NotFound | First question is simple: new name for pdb. Last proposal is pbc_debug | ||
| chromatic | +1 | ||
| allison | bytecode_debug | ||
| chromatic | Whose bytecode? | ||
| pmichaud | pbc_debug +1 | ||
| allison | parrot_bytecode_debug | 18:46 | |
| pbc is cryptic | |||
| pmichaud | parrot_bdb :-) | ||
| allison | parrot_debug | ||
| pmichaud | pbcdb | ||
| NotFound | parrot_debug++ | ||
| particle | parrot_debugger++ or parrot_debug++ | 18:47 | |
| barney | also: pdump -> pbc_dump ?? | ||
| allison | in theory, it can also debug pasm or pir, so parrot_ makes sense | ||
| NotFound | (I thinked it was simple) | ||
| allison | I like parrot_debugger | ||
| pmichaud | parrot_debugger +1 | 18:48 | |
| NotFound | parrot_debugger +1 | ||
| allison | parrot_debugger +2 | ||
| NotFound | parrot_debugger wins! | ||
| DietCoke | in that case, please update the other executable we just renamed to pbc_foo to parrot_foo. =-) | 18:49 | |
| allison | barney: pdump -> parrot_dump? | ||
| pmichaud | depends on the other executable. If it's only for bytecode files, then it should probably remain 'pbc' | ||
| DietCoke | pmichaud: the debugger is only for pbc. =-) | ||
| NotFound | Second questions is about literal strings in pir. The spec says they can only contains ascii chars, but there are test with iso-8859-1 and utf8. | ||
| allison | or parrot_ | ||
| barney agrees with pmichaud | |||
| allison | parrot_bytecode_, I meant to type | 18:50 | |
| DietCoke | NotFound: are those prefixed with unicode:'' or something similar? | ||
| allison | NotFound: that depends on the string metadata | ||
| barney | parrot_bytecode_ ++ | ||
| NotFound | And also, is not clear if the charset and encoding prefix are intended for the string generated or the contains of the literal. | ||
| pmichaud | the literal strings in PIR are always specified using ASCII | 18:51 | |
| what they encode may be iso8859-1 or utf8 | |||
| unicode:"\\xbb" legal | |||
| NotFound | pmichaud: that's the way I understand the spec, but current test fails to meet it. | ||
| pmichaud | "Ā«" not legal | 18:52 | |
| allison | NotFound: which test? | ||
| NotFound | Forgot my notes, one second... | ||
| DietCoke | pmichaud: not according to the PDD. | ||
| "docs/pdds/draft/pdd19_pir.pod" 1295 lines --14%-- 194,16 13% | |||
| allison | The PDD is more advanced than the current implementation | ||
| pmichaud | I'm speaking only of PDD, myself | 18:53 | |
| allison | so there are really two levels of answer here: what does it do now, and what should it do? | ||
| DietCoke | ok. the PPD shows a non-ascii aexample. | ||
| pmichaud | DietCoke: you're correct, I had not seen pdd19_pir.pod:194 | ||
|
18:53
wknight8111_ joined
|
|||
| allison | (and we may be dealing with an inconsistently updated spec, here) | 18:54 | |
| NotFound | t/op/stringu/t | 18:55 | |
| t/op/stringu.t | |||
| allison | NotFound: where does the spec say that literal strings can only be in ASCII? | ||
| pmichaud | line 129 | 18:57 | |
| NotFound | "Only 7-bit ASCII is accepted in string constants; to use characters outside that range, specify an encoding in the way below." | ||
| DietCoke | ... and then it goes on to show you below how to use something else. =-) | ||
| pmichaud | NotFound: which test in t/op/stringu.t contains a non-ascii char? | ||
| DietCoke | I'd add some verbiage like "unless you specify otherwise, as described below" | 18:58 | |
| allison | line 129 of PDD 28? | ||
| DietCoke | pdd 19 | ||
| pmichaud | line 129 of pdd19 | ||
| allison | oh, that's easy, PDD 19 is wrong | ||
| NotFound | DietCoke: the description below is how to escape it, not how to write it directly. | ||
| allison | it's still in draft, you can't take it as authoritative yet | ||
| DietCoke | NotFound: encoding != escaping. | 18:59 | |
| pmichaud | NotFound: which test in t/op/stringu.t contains a non-ascii char inside the "..." ? | ||
| allison | pmichaud: test named "UTF8 as malformed ascii" | ||
| pmichaud | well, yes, that's testing that it's in fact an error | ||
| barney | line 197 of pdd19 ? | ||
| allison | and "UTF8 literals" | 19:00 | |
| NotFound | "UTF8 literals" also | ||
| pmichaud | UTF8 literals I agree with, although that one matches the example given on pdd19:194 | ||
| allison | pmichaud: it's testing that it's an error when you specify ASCII encoding, but allowed as a string literal | ||
| spinclad | t/op/stringu.t:187ff | ||
| pmichaud | so, what's the question again? (I think the answer is simply that pdd19 needs clarification.) | 19:01 | |
| allison | remember, PDD 19 was pulled together from a pile of old documentation | ||
| NotFound | So whay is the intention of the spec? They must always be escaped or not? | ||
| what | |||
| DietCoke | no. | ||
| spinclad | :200ff disagrees: 'no escapes' | ||
| (disagrees with pdd) | 19:02 | ||
| pmichaud | if an encoding is given, no escaping. | ||
| barney | Tests and spec seem to be in line. Question is whether the spec is correct. | ||
| pmichaud | if no encoding is given, then the contents of the "..." must be 7-bit ascii | ||
| allison | hang on... editing the text now | 19:03 | |
| pmichaud | note that unicode: is not an encoding, so unicode:"Ā«" is not valid, although utf8:unicode:"Ā«" is. | ||
| DietCoke | Folks, I have to run. I do hope we can agree to name our executables in some of consistent fashion. cotto_work still has a question when this question is resolved. | 19:04 | |
| See folks next week. | |||
| cotto_work | bye | ||
| NotFound | And the second part is: when unicode: and no encoding is specified, the default utf8 is appliable to the generated string only, or to the escapes in the content also? | ||
| allison | edited result "The default encoding for a double-quoted string constant is 7-bit | ||
| ASCII, other character sets and encodings must be marked explicitly using a | |||
| charset or encoding flag." | 19:05 | ||
| pmichaud | I think we need to also make it clear that 7-bit ascii is required when the encoding is not | ||
| allison | well, it's not exactly "required" | 19:06 | |
| NotFound | I think 7-bit ascii is redundant. | ||
| "Ascii extended" is not ascii. | |||
| pmichaud | allison: how will the compiler know how to process the bytes in the "..." if the encoding isn't known? | ||
| allison | it throws an exception | 19:07 | |
| that was what the second test was checking | |||
| pmichaud | argggh. isn't "throw an exception" equivalent to "didn't meet the requirement?" | ||
| allison | but, you can enter escaped characters | ||
| pmichaud | ...because an encoding wasn't given? | 19:08 | |
| allison | if you enter characters that aren't ascii, it'll treat them as ascii | ||
| if an encoding isn't given, it's the same as if you specified an encoding of "ascii:" | |||
| exactly the same | |||
| pmichaud | ascii or fixed8? | ||
| allison | ascii | ||
| (at least, that's what it was) | 19:09 | ||
| pmichaud | so, we treat "ascii" as specifying both an encoding and a charset? | ||
| NotFound | I think that not allowing any non ascii char will be a cleaner way. | ||
| Compiler can explcitly say what they intend when generating pir. | |||
| pmichaud | ascii:"\\xab" throws an exception? | ||
| or is it "backslash, x, a, b" ? | 19:10 | ||
| allison | throws an exception | ||
| '\\xab' (single quotes) is backslash, x, a, b | 19:11 | ||
| pmichaud | ascii:"\\x0d" is a newline? | ||
| NotFound | pmichaud: no, is cr | ||
| pmichaud | sorry, cr | 19:12 | |
| allison | pmichaud: should be | ||
| spinclad | utf8:unicode:"\\x0d" is a newline? | ||
| pmichaud | ucs2:"\\x0d" is a newline? | ||
| (or do we decide not to support ucs2?) | |||
| allison | it's only characters outside the ASCII range that throw an exception when the string is ASCII | 19:13 | |
| spinclad | s/newline/cr/ | ||
| chromatic | By ASCII, you mean 7 bits? | ||
| pmichaud | my point is that for some encodings we can't always decide if a backslash is an escape or part of the character being encoded | ||
| allison | NotFound: (I'm leaving the 7-bit in the PDD, because people always have that question) | ||
| pmichaud | that's why lines 196-197 say that escapes are not honored when the encoding is specified | 19:14 | |
| allison | pmichaud: a backslash is always an escape in a double-quoted string | ||
| NotFound | allison: agree, but mentioning it one time in a note will be enough. | ||
| pmichaud | allison: fair enough; we then need to remove the mention that escape sequences are not honored when an encoding is specified. | 19:15 | |
| and we don't support encodings where backslash may be a valid byte | |||
| allison | in the current implementation, backslashes are honored | ||
| even when another encoding is specified | 19:16 | ||
| pmichaud | fair enough -- again, I've been restricting myself to spec. | ||
| allison | which is more useful? | ||
| pmichaud | good question. | ||
| allison | consistency is valuable | ||
| pmichaud | I think it's more useful to always restrict the "..." to ascii chars, personally -- with everything else escaped. | ||
| NotFound | I agree. | ||
| allison | that's excessive | ||
| that means you can never directly type a UTF 8 string | 19:17 | ||
| pmichaud | in PIR code? | ||
| NotFound | allison: if not, we must take into account a lot of things, and we complicate the parsing. | ||
| allison | can never pass a UTF 8 string in from an HLL parser | ||
| pmichaud | does it matter for PIR? | ||
| we pass UTF-8 strings in all the time -- they get encoded by PCT | |||
| allison | it's also an unnecessary restriction | ||
| the strings are just a series of bytes | |||
| NotFound | allison: yes, I think the HLL must be clear about his intention when generating pir. | 19:18 | |
| allison | there's no reason to restrict which bytes | ||
| pmichaud | besides, aren't we moving _away_ from UTF8? | ||
| (I know we'll always support it, but internally the strings will be something else...?) | |||
| allison | the restriction enters in from how you specify the encoding | ||
| NotFound | allison: that is not what the docs says about complete unicode supoort. | ||
| allison | pmichaud: internally strings will always be stored in whatever their natural encoding is | 19:19 | |
| strings are just a blob of data | |||
| pmichaud | so then an HLL will have to specify that encoding as part of the string constant anyway, yes? | ||
| sorry, string literal | |||
| allison | how you read that data depends on the encoding and character set | ||
| wknight8111 | ...except in my GC, where strings are apparently always stored as a segfault | ||
| allison | wknight8111: heh :) | ||
| string literals are fundamentally the same as regular strings, but not modifiable | 19:20 | ||
| pmichaud | if my HLL has a utf-8 string, it needs to either (1) indicate in the PIR that the string is encoded at utf8, or (2) escape the non-ASCII chars | ||
| s/at/as/ | |||
| it can't just stick the UTF-8 string inside of a pair of double quotes and expect PIR to know what to do with it | 19:21 | ||
| (unless PIR is specified as defaulting to utf8) | |||
| allison | pmichaud: no, you have to specify an encoding | ||
| wknight8111 | pir should understand utf8:"a utf8 string here" | ||
| pmichaud | what if my encoding has another meaning for backslash? | ||
| allison | if you specify no encoding or character set, parrot treats it as an ASCII string | ||
| pmichaud | anyway, I'll stop here -- no matter what Parrot does the HLL tools will be able to work with it. | 19:22 | |
| allison | pmichaud: that's a good point | ||
| pmichaud | it just seems inconsistent that we allow ambiguous bytes in the string | 19:23 | |
| NotFound | And if utf16 or ucs2 is specified the string can contain 16 or 32 bit encoded unicode chars inside a 8-bit encoded file? That can be a nightmare for text editors. | ||
| allison | for now, I'm not modifying PDD 19 where it says that specifying an encoding stops parrot from processing backslashes in strings | ||
| pmichaud | so unicode:"Ā«" is an error? | ||
| allison | no | ||
| pmichaud | sorry, I said that wrong | ||
| fixed8:"\\x0a" is a 4-character string? | |||
| allison | but utf8:unicode:"\\uwhatever" is an error | 19:24 | |
| pmichaud | no, not an error -- it should be backslash+u+whatever | ||
| allison | (not an error, but the backslash isn't treated as special) | ||
| yes | |||
| NotFound | So the encoding part defines both the literal interpretation and the generated string? | 19:25 | |
|
19:26
Auzon left
|
|||
| pmichaud | NotFound: that's the way I interpret it. | 19:26 | |
| it does mean that we can't specify, say, ucs2 literals | |||
| allison | NotFound: to be specific, the encoding and charset flags on a literal string specify the metadata on the literal string | ||
| anything that the literal string is assigned to, adopts that metadata from the literal string | 19:27 | ||
| NotFound | allison: that looks inconsistent to me. Utf8 must be literal but ucs2 must be always escaped. | 19:28 | |
| pmichaud | (not that it matters to me that we can't specify ucs2 literals :-) | ||
| allison | ? | ||
| ah, we just need to add a ucs2 encoding flag | 19:29 | ||
| if we intend it to be used on any regular basis | |||
| pmichaud | there's no way to encode ucs2 literals containing double quotes | ||
| NotFound | There is no sane way to encode 16 or 32 bit chars in an 8 bit text file, except escaping. | 19:30 | |
| pmichaud | (even with a ucs2 encoding flag.) | ||
| spinclad | pmichaud: ucs2:'"' | ||
| pmichaud | spinclad, okay, a string with both single and double quotes, then :-) | ||
| spinclad | ok | ||
| pmichaud | also I can't see your null byte in there. | 19:31 | |
| pmichaud looks carefully. | |||
| NotFound | And don't event talk about allowing ucs2 pir source files. | ||
| pmichaud | I'll assume it's there. :-) | ||
| spinclad | no null byte. counted string | ||
| allison | a) we would have to introduce a new quoting syntax, and b) presumably, if you're working with 16 or 32 bit chars, you aren't doing it in an 8 bit text file | ||
| pmichaud | spinclad: in ucs2, a double quote is \\x00\\x22 | ||
| allison | but, really, ucs2 is not a high priority | ||
| particle | in win32, all files are stored as usc2 by the os | 19:32 | |
| allison | as long as there is some way to create ucs2 strings, we can call it good | ||
| pmichaud | I'm speaking particularly of ucs2 literals | ||
| allison | pmichaud: if we have a demand for ucs2 literals that can't use escapes, we can do the work to add them | 19:33 | |
| spinclad | ok, ucs2:'<box>' | ||
| pmichaud | allison: I'm saying that the spec should allow escapes | 19:34 | |
| allison | should allow escapes in literal strings that specify an encoding? | ||
| pmichaud | and not allow oddly-encoded strings in PIR source | ||
| NotFound | Then we must allow them in utf8, for consistency. | ||
| pmichaud | or we can choose to be explicitly inconsistent | ||
| I have no trouble with that, fwiw | 19:35 | ||
| allison | more accurately, when an encoding is specified, it should have the metadata to declare whether its strings process escapes | ||
| pmichaud | I have no problem with saying that utf8:"..." allows utf8 encoded stuff inside the quotes, *and* processes escapes. | ||
| NotFound | I think the clean way is to always escape any non ascii character. | ||
| pmichaud | the only reliable way to represent any generic ucs2 literal is if we allow escapes. | 19:36 | |
| or if we separate the PIR encoding from the resulting literal | |||
| allison | okay, the answer for now | 19:37 | |
| pmichaud | (which is effectively what escapes do, but escaping every character is a bit much, I agree.) | ||
| allison | we allow escapes and non-ascii characters in double-quoted strings | ||
| double-quoted strings are just blobs of data | |||
| NotFound | pmichaud: allowing mixing complicates the parsing for no real gain, IMO. | ||
| pmichaud | NotFound: I'm not worried about the parsing as much as I am the result | 19:38 | |
| I'd much rather be able to produce my constant string in the .pbc output directly than to have to have transcode operations at runtime because there wasn't a way to do it in the PIR originally. | |||
| the transcode operations produce extra GC-able elements, which is bad. | |||
| allison | the encoding and character set determine how the resulting data is treated | ||
| NotFound | pmichaud: but generating any encoding wanted is not a problem, if the specs clearly states what is. | 19:39 | |
| allison | I think every one in the conversation has switched between all three of the positions during this conversation, so we'll have to call that good | 19:40 | |
| done | |||
| pmichaud | NotFound: right now the encoding specifies both the interpretation of the double-quoted string and the encoding of the resulting string. But there are some encodings that we cannot represent in a double-quoted string without having an escaping mechanism. | ||
| NotFound | allison: there is a remaining problem: if unicode: is specified, how the escpaes are interpereted? A 8 bit chars that forms ut8, or as unicode points? | ||
| pmichaud | unicode is not an encoding | ||
| so it's a normal double-quoted string, where escapes are honored. | 19:41 | ||
| NotFound | Not, but the spec says that default is utf8. | ||
| pmichaud | if there are any non-ASCII characters in the double quotes, they would need to be utf8 | ||
| barney | An easy question: Should \\" be added to line 186 of PDD19 | ||
| ? | |||
| NotFound | But the doubt is how to interpret the escaped ones. | 19:42 | |
| pmichaud | I have no doubt about how to interpret the escaped ones | ||
| (for utf8) | |||
| unicode:"\\xaa" | |||
| unicode:"Ā«" | |||
| are the same. | |||
| barney: Is \\" processed as an escape? | 19:43 | ||
| allison | barney: yes, \\" works in double quoted strings | ||
| pmichaud | it doesn't in the current implementation | ||
| allison | (it's absolutely critical, otherwise you can't enter a quote in a double-quoted string | ||
| pmichaud | yes, you can enter a quote in the double quoted string, it's \\x22 | ||
| allison | fair enough | 19:44 | |
| NotFound | pmichaud: is reasonably, but the spec is not clear enough about that, IMO. | ||
| allison | NotFound: pdd 19 or pdd 28? | ||
| pmichaud | NotFound: I don't disagree that the spec is unclear. I'm just saying that it's possible for us to have utf-8 encoding and escapes in a single string w/o it being ambiguous | ||
| NotFound | allison: 19 | 19:45 | |
| allison | pdd 19 is certainly not clear yet | ||
| pmichaud | allison: we're only talking about pdd19 here. I don't think pdd28 specifies anything about PIR representation of literals | ||
| barney | \\" Is speced in line 129 of PDD19 | ||
| pmichaud | barney: aha. okay. | ||
| the current implementation doesn't allow \\" | |||
| allison | barney: added to the spec | 19:46 | |
| pmichaud | sorry, I'm wrong, I typoed | ||
| ignore me. | |||
| pmichaud-- | |||
| barney | Is there a way to have a single quote in a single quoted string? | ||
| pmichaud | \\" works now. Yes, it should be added to 186 of pdd19. | ||
| NotFound | The problem I see with this approach is that a generated pir that contains both utf8 and iso-8859-1 unescaped characters is not good for the sanity using a text editor no writting specifically yo handle pir source. | ||
| pmichaud | (1) if someone is editing generated pir, they need to be able to handle it | 19:47 | |
| (2) all of PCT's string generation in PIR converts non-ASCII to escapes. But just because PCT does it that way doesn't mean that we always want it to do so that way | |||
| (3) If someone has string literals in a non-western language, I don't know that I want the generated PIR to always be a bunch of escape characters. It would make sense to allow the utf8 directly in the string literals. | 19:48 | ||
| (e.g., chinese) | |||
| allison | okay, the escapes are not persistent in the string | ||
| they're only a way of representing a character that can't otherwise by typed | 19:49 | ||
| pmichaud | ...or parsed by PIR. | ||
| chromatic | Don't we need some sort of BOM or encoding marker at the start of the PIR file then? | ||
| allison | as soon as that literal is read into anything, there is no difference between the escape and the utf8 character | ||
| pmichaud | ...isn't it "as soon as the literal is compiled, there is no difference..."? | ||
| NotFound | pmichaud: I also finds nice to be able to write my own name 'JuliƔn' in pir, but not sure it pays the price of support all that. | ||
| spinclad | .oO { do we need a BOM at the start of a ucs: string? } | ||
| particle | .pragma encoding utf8 ?? | ||
| pmichaud | surely PIR doesn't store the escape sequences in the literals it produces. | 19:50 | |
| allison | pmichaud: basically, there's only a difference in the source file | ||
| pmichaud | right. | ||
| chromatic | How do we expect a random text editor to parse .pragma encoding utf8 ? | ||
| pmichaud | so unicode:"Ā«" and unicode:"\\xab" would produce exactly the same result. | 19:51 | |
| even down to being the same .pbc output. | |||
| allison | pmichaud: exactly | ||
| particle | bom is also ball of mud | ||
| NotFound | So unicode:"\\xab" and utf8::unicode:"\\xab" is also the same result? | 19:52 | |
| So unicode:"\\xab" and utf8:unicode:"\\xab" is also the same result? | |||
| pmichaud | I don't see a problem with that for utf8 | ||
| NotFound | No problem, just wants to be clear about that. | ||
| allison | NotFound: yes | ||
| consistency++ | 19:53 | ||
| wknight8111 | consistency++ | ||
| pmichaud | we'll have to figure out something to do for ucs2 | ||
| and personally | |||
| NotFound | consistency++ | ||
| pmichaud | I'd prefer it if unicode:"..." accepted utf8 strings in the PIR text but produced Parrot's default internal representation for the constant | 19:54 | |
| (i.e., the one in pdd28) | |||
| wknight8111 | couldn't parrot just parse ucs2: as utf16:? | ||
| allison | parrot doesn't have a default internal representation | ||
| NotFound | I think ucs2 or utf816 literals must be forbidden, at least in 8 bit encoded source files. | ||
| chromatic | Agreed. | 19:55 | |
| allison | (the default internal representation was an idea from an earlier draft that didn't make it in the final cut) | ||
| NotFound | I mean utf16 | ||
| barney | What is the specific problem of ucs2 ? | ||
| s/of/with/ | 19:56 | ||
| pmichaud | we're not doing NFG? | ||
| NotFound | barney: the problem I see is that many people confuses it with utf16. | ||
| allison | not as a universal standard, no. NFG is just another additional encoding/charset | ||
|
19:56
cotto_work left
19:57
cotto_work joined
|
|||
| pmichaud | since (for speed reasons) I'm going to be converting a lot of things into NFG, there's no way for me to specify a NFG literal without escaping everything? | 19:57 | |
| allison | the thing about string data, is you want to avoid transforming it whenever you can | ||
|
19:57
coke joined
|
|||
| allison | escaping everything won't specify an NFG literal | 19:57 | |
| NFG is just a storage format | 19:58 | ||
| pmichaud | okay, how do I specify an NFG literal? | ||
| DietCoke | ... wow. haven't even gotten to cotto's question, have ya. =-) | ||
| cotto_work | no | ||
| pmichaud | or do my literals always get transcoded at runtime? | ||
| or...? | |||
| spinclad | 'ball of mud' | ||
| DietCoke | Ok. Don't forget cotto. heading back out. =-) | ||
| allison | (trying to decide if it's an encoding or charset flag) | 19:59 | |
| NotFound | DietCoke: sorry, I imagined this has to be a long discussion, but I think is important to clarify this issues. | ||
| allison | ... it's an encoding flag | ||
| pmichaud | pdd28 says that nfg is always unicode codepoints | ||
| allison | nfg: | ||
| yes, but they're stored differently (encoded differently) | 20:00 | ||
| particle | can we interrupt this endless discussion to give cotto his time, so he can get on with life? | ||
| allison | yes | ||
| NotFound | particle: no problem | ||
| pmichaud | cotto: still around? | ||
| cotto_work | yes | ||
| pmichaud | still have a question? | ||
| cotto_work | yes. It should be a quick one. | ||
| The Array PMC's freeze/thaw/visit functions are broken. Are they worth fixing or should that rt be rejected? | 20:01 | ||
| pmichaud | (suggestion for string encoding: allison is undoubtedly busy with oscon, and I don't think string parsing is a pressing issue. Can we save it until the post-oscon hackathon?) | ||
| allison | cotto_work: they are worth fixing | ||
| cotto_work | thanks. | ||
| particle | the one thing worth saving in Array pmc as far as i'm concerned is the sparse storage | ||
| NotFound | The urgent questions have been anserwed, the other can be delayed. | ||
| allison | pmichaud: also, a good bit will be worked out as we implement the strings PDD | 20:02 | |
| particle | if that can be rolled into fixed/resizable pmc variants, maybe Array can go away | ||
| pmichaud | okay. allison and I can review string literals and encodings wrt nfg at the oscon hackathon. and yes, string pdd implementation will add more useful information | ||
| cotto_work | particle, you mean sparseness? | ||
| particle | yes | ||
| pmichaud | I just want to put a hook in that it would be good to have a way to specify literals in PIR that go directly to NFG without requiring an explicit transcode step at runtime. | ||
| NotFound | That is the reason why I asked, we can't sanely work in strings without some clarity in this points. | 20:03 | |
| pmichaud | I will shut up now until cotto's question is finished. | ||
| NotFound | But as I said, the urgent ones had been cleared. | ||
| allison | I need to review the PIR PDD and launch it out of draft. That'll likely be my hackathon task (including some string conversation with pmichaud). | ||
| cotto: is your question answered? | 20:05 | ||
| NotFound | I win the price for the longer first question? ;) | ||
| cotto_work | if sparseness if the only thing worth preserving about the Array, would it be better to make the other Array types sparse? | 20:06 | |
| spinclad | NotFound: i give you an hour of my life as a prize. | ||
| allison | NotFound: you win the prize :) | ||
| NotFound | (No matter it really was the second) | ||
| allison | cotto_work: potentially, yes | ||
| cotto_work: though, it's still worth making freeze/thaw/visit work | 20:07 | ||
| cotto_work | meaning "if someone can find the tuits"? | ||
| barney | SparseResizablePMCArray | ||
| cotto_work | ok. I can see how freeze/thaw/visit would be a step in the right direction | 20:08 | |
| eoq | |||
| allison | cotto_work: yes, if someone has time. it's not wasted, because they'll have to work for whatever sparse Array results | ||
| okay, any other questions before we go? | 20:09 | ||
| chromatic | Where shall we have lunch? | ||
| pmichaud | I will miss parrotsketch next week. | ||
| chromatic | Technically, that wasn't a question. | ||
| pmichaud | (are we having parrotsketch next week?) | 20:10 | |
| allison | should we skip parrotsketch next week for OSCON? | ||
| chromatic | Probably. | ||
| allison | then yes, no parrotsketch next week | ||
| we'll resume on July 29th | |||
| thanks everybody! | 20:11 | ||
| EOPS | |||
|
20:11
pmichaud left,
cotto_work left,
NotFound left
20:12
allison left
20:13
jonathan left,
chromatic left
|
|||