From: Rob Pike <robpike@gmail.com> Date: Mon, 20 Mar 2023 07:27:34 +1100 To: Ralph Corderoy <ralph@inputplus.co.uk> Message-ID-Hash: 4CIQUL4LMERBXU76GRUDZRM6KNK7CKIJ CC: tuhs@tuhs.org Subject: [TUHS] Re: Bell Foreign-Language UNIX Efforts Archived-At: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs@tuhs.org/message/4CIQUL4LMERBXU76GRUDZRM6KNK7CKIJ/> As my mail quoted in https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt says, Ken worked out a new packing that avoided all the problems with the existing ones. He didn't alter Prosser's encoding. UTF-8, as it was later called, was not based on anything but it was deeply informed by a couple of years of work coming to grips with the problem of programming with multibyte characters. What Prosser did do, and what we - all of us - are very grateful for, is start the conversation about replacing UTF with something practical. (Speaking of design by committee, the multibyte stuff in C89 was atrocious, and I heard was done in committee to get someone, perhaps the Japanese, to sign off.) Regarding windows, Nathan Myrhvold visited Bell Labs around this time, and we tried to talk to him about this, but he wasn't interested, claiming they had it all worked out. We later learned what he meant, and lamented. Not the only time someone wasn't open to hear an idea that might be worth hearing, but an educational one. It's important historically to understand how all the forces came together that day. The world was ready for a solution to international text, the proposed character set was acceptable to most but the ASCII compatibility issues were unbearable, the proposed solution to that was noxious, various committees were starting to solve the problem in committee, leading to technical briefs of varying quality, none right, and somehow a phone call was made one afternoon to a couple of people who had been thinking and working these issues for ages, one of whom was a genius. And it all worked out, which is truly unusual. -rob
Wed Mar 22 21:04:05 EDT 2023
From: Rob Pike <robpike@gmail.com> Date: Mon, 20 Mar 2023 20:22:52 +1100 To: arnold@skeeve.com Message-ID-Hash: EZLOHBOMUAMC342PE5OEL7QFZ6VBDRV6 CC: tuhs@tuhs.org Subject: [TUHS] Re: Bell Foreign-Language UNIX Efforts Archived-At: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs@tuhs.org/message/EZLOHBOMUAMC342PE5OEL7QFZ6VBDRV6/> Exactly the way we did it in Plan 9, and published in the paper cited earlier. In fact, it's possible the library work was done as early as 1989, but I'm not sure. Certainly by 1990. -rob On Mon, Mar 20, 2023 at 6:55 PM <arnold@skeeve.com> wrote: > Hi Rob. > > Rob Pike <robpike@gmail.com> wrote: > > > (Speaking of design by committee, the multibyte stuff in C89 was > atrocious, > > and I heard was done in committee to get someone, perhaps the Japanese, > to > > sign off.) > > It's not lovely, but I wouldn't call it atrocious. It gets the job > done; code using it can handle multibyte encodings while being totally > character-set agnostic. I speak from experience, gawk does this. > (I use the "restartable" routins - mbrlen() and so on.) > > I understand that Unicode + UTF-8 solve the issue completely. But I'd > like to ask, in all seriousness and so that I can learn, given the world > as it was in 1989, how would you solve the problem? If you had designed > the C level routines, what would they have looked like? > > Thanks, > > Arnold >
Wed Mar 22 21:03:28 EDT 2023
From: Rob Pike <robpike@gmail.com> Date: Wed, 22 Mar 2023 23:02:33 +1100 To: Skip Tavakkolian <fariborz.t@gmail.com> Message-ID-Hash: TQWQ4V4ZNSW4PDXPM3HHXZGAYYNNPTV2 CC: tuhs@tuhs.org Subject: [TUHS] Re: Bell Foreign-Language UNIX Efforts Archived-At: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs@tuhs.org/message/TQWQ4V4ZNSW4PDXPM3HHXZGAYYNNPTV2/> The appendix version named it plain UTF, repurposing the extant name to the new encoding. The -8 came later, as it is in these linked documents, because some people wanted a UTF-7 and a UTF-16. Those people should be punished. -rob On Wed, Mar 22, 2023 at 9:09 PM Skip Tavakkolian <fariborz.t@gmail.com> wrote: > Also here: > https://github.com/0intro/plan9/tree/main/sys/doc > > > On Wed, Mar 22, 2023, 3:02 AM Skip Tavakkolian <fariborz.t@gmail.com> > wrote: > >> http://p9f.org/sys/doc/utf.ps >> >> On Wed, Mar 22, 2023, 12:41 AM <arnold@skeeve.com> wrote: >> >>> Thanks. Is there a link to postscript or pdf of the paper? I undoubtedly >>> read it decades ago, but I doubt that I have it handy. >>> >>> Thanks, >>> >>> Arnold >>> >>> Rob Pike <robpike@gmail.com> wrote: >>> >>> > Pretty much, as it was the Plan 9 UTF man page at the time. This link >>> will >>> > be essentially the same. >>> > >>> > http://man.cat-v.org/plan_9/6/utf >>> > >>> > -rob >>> > >>> > >>> > On Wed, Mar 22, 2023 at 6:12 PM Mehdi Sadeghi <mehdi@mehdix.org> >>> wrote: >>> > >>> > > It's a long shot but is that appendix around by any chance? >>> > > >>> > > >>> > > Mehdi >>> > > >>> > > >>> > > On 3/22/23 03:52, Rob Pike wrote: >>> > > >>> > > the paper had an appendix that described UTF-8's encoding >>> rigorously, but >>> > > that was dropped >>> > > >>> > > >>> >>
Wed Mar 22 21:03:23 EDT 2023
From tuhs.org!tuhs-bounces Tue Mar 21 22:52:38 -0400 2023 From: Rob Pike <robpike@gmail.com> Date: Wed, 22 Mar 2023 13:52:16 +1100 To: Larry McVoy <lm@mcvoy.com> Message-ID-Hash: NI6WPNPUYHGUZ7BHZBZBAIW424VTVTGZ CC: tuhs@tuhs.org Subject: [TUHS] Re: Bell Foreign-Language UNIX Efforts Archived-At: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs@tuhs.org/message/NI6WPNPUYHGUZ7BHZBZBAIW424VTVTGZ/> Thanks for your support but C89 didn't specify an encoding. In classic committee fashion, it refused to take a stand about anything that might limit adoption. The problem was that the API it offered was clumsy and made encoding errors hard to ignore. (Grepping a file for a string, do you really care if there is an irrelevant binary blob in the middle that isn't kosher UTF-8?) Also, it provided no support for printing "wide" characters. This is all covered in the paper cited above.* The original UTF was compatible with ASCII but not robust if there was an alignment problem, and also used printable ASCII characters in multibyte sequences. You could find a '/' inside a Cyrillic character encoding, which broke Unix badly. That's why FSS-UTF, File-safe UTF, was the name given to Prosser's variant. It's wrong to give us credit for properties we didn't introduce. But UTF-8 is more regular, simpler to encode and decode, and more robust than its predecessors. Most important, it did introduce the self-synchronization property, which was the key that opened the door for us at X-Open. -rob * In a classic Usenix whoops, the paper had an appendix that described UTF-8's encoding rigorously, but that was dropped when it was published in the conference proceedings. Perhaps that's why the RFC got in the mix and started some of the confusion about its origin. On Wed, Mar 22, 2023 at 1:25 PM Larry McVoy <lm@mcvoy.com> wrote: > The brilliance of UTF-8 was to encode ASCII as is. That seems obvious in > retrospect but as Rob says, the multibyte crud in C89 was just awful, > and that was the answer at the time. Fitting ASCII in as is meant > that all of the Unix utilities, sed, grep, awk, etc, had close to no > performance hit if you were processing ascii. That's pretty cool when > you get that and you can process Japanese et al as well. > > I kind of cringe when I say it is brilliant to not break what exists > already, to me, that's just part of what you do as an engineer. But > history has shown that not breaking stuff, fitting the new into the > old, is brilliant. So kudos to Rob and Ken for doing that (but truth > be told, I'd be stunned if they didn't, they are great engineers). > > On Mon, Mar 20, 2023 at 07:27:34AM +1100, Rob Pike wrote: > > As my mail quoted in > > https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt says, > > Ken worked out a new packing that avoided all the problems with the > > existing ones. He didn't alter Prosser's encoding. UTF-8, as it was later > > called, was not based on anything but it was deeply informed by a couple > of > > years of work coming to grips with the problem of programming with > > multibyte characters. What Prosser did do, and what we - all of us - are > > very grateful for, is start the conversation about replacing UTF with > > something practical. > > > > (Speaking of design by committee, the multibyte stuff in C89 was > atrocious, > > and I heard was done in committee to get someone, perhaps the Japanese, > to > > sign off.) > > > > Regarding windows, Nathan Myrhvold visited Bell Labs around this time, > and > > we tried to talk to him about this, but he wasn't interested, claiming > they > > had it all worked out. We later learned what he meant, and lamented. Not > > the only time someone wasn't open to hear an idea that might be worth > > hearing, but an educational one. > > > > It's important historically to understand how all the forces came > together > > that day. The world was ready for a solution to international text, the > > proposed character set was acceptable to most but the ASCII compatibility > > issues were unbearable, the proposed solution to that was noxious, > various > > committees were starting to solve the problem in committee, leading to > > technical briefs of varying quality, none right, and somehow a phone call > > was made one afternoon to a couple of people who had been thinking and > > working these issues for ages, one of whom was a genius. And it all > worked > > out, which is truly unusual. > > > > -rob > > -- > --- > Larry McVoy Retired to fishing > http://www.mcvoy.com/lm/boat >
Wed Mar 22 18:29:41 EDT 2023
With this simple and to-the-point method, you'll be making money faster than you can say 'abracadabra'. ==> https://bit.ly/3ldkMjy Much Love, Maddison Gaffney
Wed Mar 22 18:22:35 EDT 2023
#include <stdio.h> #include <stdlib.h> enum Token { LINVALID, LEND, LRIGHT, LLEFT, LINC, LDEC, LOUT, LIN, LLOOP, LENDL, }; enum Symbol { INVALID, END, RIGHT, LEFT, INC, DEC, OUT, IN, LOOP, ENDL, }; typedef struct Sym Sym; struct Sym { int inst; Sym *loop; Sym *next; }; typedef struct Bf { int *i; // instructions size_t in; // instructions number size_t is; // instructions size unsigned char *d; // data int *ip; // instruction pointer unsigned char *dp; // data pointer } Bf; void * emalloc(size_t s) { void *p; p = malloc(s); if(!p){ fprintf(stderr, "Error: Out of memory.\n"); exit(EXIT_FAILURE); } return p; } void * erealloc(void *p, size_t s) { p = realloc(p, s); if(!p){ fprintf(stderr, "Error: Out of memory.\n"); exit(EXIT_FAILURE); } return p; } int lex(FILE *f, size_t *l, size_t *b) { int c; while((c = fgetc(f)) != EOF){ switch(c){ default: fprintf(stderr, "Error: Invalid character at line %zu column %zu: %c\n", *l, *b, c); exit(EXIT_FAILURE); case '>': return LRIGHT; case '<': return LLEFT; case '+': return LINC; case '-': return LDEC; case '.': return LOUT; case ',': return LIN; case '[': return LLOOP; case ']': return LENDL; case '\n': *l += 1; *b = 0; break; } } if(ferror(f)){ fprintf(stderr, "Error: I/O error when reading file\n"); exit(EXIT_FAILURE); } return LEND; } Sym * parse(FILE *f) { int t; size_t l, b; Sym **stk; size_t stkn, stks; Sym *h; Sym *n; Sym **p; l = 1; b = 1; stkn = 0; stks = sizeof(*stk); stk = emalloc(stks); for(p = &h; (t = lex(f, &l, &b)) != LEND;){ *p = emalloc(sizeof(**p)); n = *p; n->inst = INVALID; n->loop = 0; n->next = 0; switch(t){ default: fprintf(stderr, "Error: Invalid token at line %zu column %zu: %d\n", l, b, t); exit(EXIT_FAILURE); case LRIGHT: n->inst = RIGHT; p = &n->next; break; case LLEFT: n->inst = LEFT; p = &n->next; break; case LINC: n->inst = LINC; p = &n->next; break; case LDEC: n->inst = LDEC; p = &n->next; break; case LOUT: n->inst = OUT; p = &n->next; break; case LIN: n->inst = IN; p = &n->next; break; case LLOOP: n->inst = LOOP; p = &n->loop; if(stkn >= stks){ stk = erealloc(stk, 2 * stks); } stk[stkn++] = n; break; case LENDL: n->inst = ENDL; if(stkn <= 0){ fprintf(stderr, "Error: Too many loop endings at line %zu column %zu.\n", l, b); exit(EXIT_FAILURE); } n = stk[--stkn]; p = &n->next; break; } } if(stkn > 0){ fprintf(stderr, "Error: Not enough loop endings.\n"); exit(EXIT_FAILURE); } free(stk); return h; } void emit() { } void run() { } int main(void) { Sym *s; s = parse(stdin); printf("lol\n"); return EXIT_SUCCESS; }
Wed Mar 22 18:06:06 EDT 2023
If you're a business owner with at least two W2 employees... then THIS is exactly what you're looking for. I'd like to introduce you to apply-for-ertc.com which helps you apply for the Cares Act Program in just a few minutes! apply-for-ertc.com makes it simple for you to: > get free stimulus money you don't have to pay back... > get the last 3 years of stimulus payments all at once... > get quick approval... > get up to 26 thousand per employee... And what makes this even better? Now you never have to worry about doing any of the number crunching or figuring out how much money you qualify for! Which also means you're not stuck feeling like this is too much paperwork to sort out. And best of all... you'll be done with apply-for-ertc.com in less than 2 minutes and it costs nothing to apply. So again, if you're a business owner with at least two W2 employees, understand this: > this program will end soon... > If you act now before April you can still claim the prior 3 years... Go to apply-for-ertc.com Thank you for your interest in our communications. We understand that everyone's preferences are different and we respect your decision to opt out of our messages. I may receive a commission please note that I only recommend products or services that I believe are of high quality. 836 Southampton Road Step B# 146 Benicia Ca 94510 You can unsubscribe from our promotional offers with the link below. https://www.rewardocity.com/?info=okturing.com
Wed Mar 22 17:44:48 EDT 2023
Hello, Do you struggle with creating high-quality videos that capture your audience's attention? Are you frustrated with the complicated and time-consuming process of video editing? We understand your pain, and that's why we created Pictory – the AI-powered video editing tool that simplifies the process for you. Say goodbye to the hassle of video editing and hello to effortless, professional-looking videos that elevate your brand. Click here to see how Pictory can revolutionize your video marketing game. simplevideobot.com Regards, Glenda 224 Westwood Cir., Dalton, GA 30721 Unsubscribe: optoutforever.com/?site=okturing.com
Wed Mar 22 10:20:19 EDT 2023
But if enough people see beyond the veil, it’s not a veil anymore - it’s just lingerie the powerful wear when fucking us.
Wed Mar 22 10:09:20 EDT 2023
yes, a lack of working capital is holding me back. no, i'm not clicking that. thanks. bye.
Tue Mar 21 23:40:55 EDT 2023
From tuhs.org!tuhs-bounces Tue Mar 21 22:52:38 -0400 2023 From: Rob Pike <robpike@gmail.com> Date: Wed, 22 Mar 2023 13:52:16 +1100 To: Larry McVoy <lm@mcvoy.com> Message-ID-Hash: NI6WPNPUYHGUZ7BHZBZBAIW424VTVTGZ CC: tuhs@tuhs.org Subject: [TUHS] Re: Bell Foreign-Language UNIX Efforts Archived-At: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs@tuhs.org/message/NI6WPNPUYHGUZ7BHZBZBAIW424VTVTGZ/> Thanks for your support but C89 didn't specify an encoding. In classic committee fashion, it refused to take a stand about anything that might limit adoption. The problem was that the API it offered was clumsy and made encoding errors hard to ignore. (Grepping a file for a string, do you really care if there is an irrelevant binary blob in the middle that isn't kosher UTF-8?) Also, it provided no support for printing "wide" characters. This is all covered in the paper cited above.* The original UTF was compatible with ASCII but not robust if there was an alignment problem, and also used printable ASCII characters in multibyte sequences. You could find a '/' inside a Cyrillic character encoding, which broke Unix badly. That's why FSS-UTF, File-safe UTF, was the name given to Prosser's variant. It's wrong to give us credit for properties we didn't introduce. But UTF-8 is more regular, simpler to encode and decode, and more robust than its predecessors. Most important, it did introduce the self-synchronization property, which was the key that opened the door for us at X-Open. -rob * In a classic Usenix whoops, the paper had an appendix that described UTF-8's encoding rigorously, but that was dropped when it was published in the conference proceedings. Perhaps that's why the RFC got in the mix and started some of the confusion about its origin. On Wed, Mar 22, 2023 at 1:25 PM Larry McVoy <lm@mcvoy.com> wrote: > The brilliance of UTF-8 was to encode ASCII as is. That seems obvious in > retrospect but as Rob says, the multibyte crud in C89 was just awful, > and that was the answer at the time. Fitting ASCII in as is meant > that all of the Unix utilities, sed, grep, awk, etc, had close to no > performance hit if you were processing ascii. That's pretty cool when > you get that and you can process Japanese et al as well. > > I kind of cringe when I say it is brilliant to not break what exists > already, to me, that's just part of what you do as an engineer. But > history has shown that not breaking stuff, fitting the new into the > old, is brilliant. So kudos to Rob and Ken for doing that (but truth > be told, I'd be stunned if they didn't, they are great engineers). > > On Mon, Mar 20, 2023 at 07:27:34AM +1100, Rob Pike wrote: > > As my mail quoted in > > https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt says, > > Ken worked out a new packing that avoided all the problems with the > > existing ones. He didn't alter Prosser's encoding. UTF-8, as it was later > > called, was not based on anything but it was deeply informed by a couple > of > > years of work coming to grips with the problem of programming with > > multibyte characters. What Prosser did do, and what we - all of us - are > > very grateful for, is start the conversation about replacing UTF with > > something practical. > > > > (Speaking of design by committee, the multibyte stuff in C89 was > atrocious, > > and I heard was done in committee to get someone, perhaps the Japanese, > to > > sign off.) > > > > Regarding windows, Nathan Myrhvold visited Bell Labs around this time, > and > > we tried to talk to him about this, but he wasn't interested, claiming > they > > had it all worked out. We later learned what he meant, and lamented. Not > > the only time someone wasn't open to hear an idea that might be worth > > hearing, but an educational one. > > > > It's important historically to understand how all the forces came > together > > that day. The world was ready for a solution to international text, the > > proposed character set was acceptable to most but the ASCII compatibility > > issues were unbearable, the proposed solution to that was noxious, > various > > committees were starting to solve the problem in committee, leading to > > technical briefs of varying quality, none right, and somehow a phone call > > was made one afternoon to a couple of people who had been thinking and > > working these issues for ages, one of whom was a genius. And it all > worked > > out, which is truly unusual. > > > > -rob > > -- > --- > Larry McVoy Retired to fishing > http://www.mcvoy.com/lm/boat >