ユニコの話題スレにあったなと関連するメーリス来て思い出したので貼り
名前とか晒さないのが2ch的エチケット?一応
興味ある方はどうぞー
| 2021-04-04 01:53:23 replies: アノにマスさん
| proposal - EMIT and non-ASCII values
`------------------------------------------
I don't think this proposal works for extended characters. While `$a4 emit` works for `ä` this explicitly doesn't work (if my understanding is correct) for unicode positions that are multicharacter in UTF8. You have to know at the point of emitting, what the expected coding is.
If we define it as UTF8 then EMIT can know that the byte is part of a multi-byte character, and hold it until it gets the next byte before passing to the operating system, but at the moment I don't believe that Forth 2012 is defined as UTF8, so a conformant system would have to emit that first byte (which I think will have its top bit set) as a character.
For webForth in C (on Arduino) I feed the characters to Serial.write which (I think) treats it as UTF8, but for webForth in Javascript I flip it around and the base primitive is TX!S which puts out a string - TYPE calls this directly, and EMIT passes a 1 character string, TX!S just passes it to the Javascript which is string oriented - I define the stream as UTF8 encoded at initialization. This is also a LOT faster than passing characters individually to a string oriented system anyway.
I'm not suggesting what I've done is the right solution - but I think any proposal understanding by someone with better understanding (than me, or the proposer of this) of how Unicode and UTF8 work before changes are made.
,------------------------------------------
| see:
https://forth-standard.org/proposals/emit-and-non-ascii-values