Unicode and character encoding in general is a huge complex subject area and I am only provisionally conversant with the possibilities inherent therein. I know that using UTF-8 tends to show me the least number of untranslatable characters in the text that I can actually read (IE: mostly English) and that is as far as I have studied the subject. The tables below are an example, but nowhere near a 'standard'.
Honestly, I just threw this generic table together because it
illustrates how Instrumentation can subsume existing standards.
These Unicode tables will provide a native way to enter text from
multiple writing systems so that it can be used with
Instrumentation's internal search functions. There are probably
many "More Intuitive" text table formats than this direct import,
but this is usable, if awkward.
Including the Unicode
tables will make Instrumentation independent of the need for any
keyboard beyond the Instrumentation
Chorded Keyboard for entering text in any character set
covered by Unicode.
EBCDIC
(actually "Extended, Binary Coded Decimal, Interchange
Code") is the character table used by teletypes and the IBM
System/360, which was the 8 Kilobyte mainframe computer used
in the 1960s. That's right, 8K of RAM. If you needed more space
than that, you had to page it in and out from the tape
memory back-up.
Those old movie images of (BIG SCARY COMPUTER!) reel to reel tapes spinning madly are actually showing the computer reading and writing its 8K of RAM while it slogs through some "COmmon Business Oriented Language" sort routine. Of course, this was when magnetic core memory was used and a 'bit' occupied a chunk of ferrite that you could see without your bifocals (and still, we were happy).
This block is just one small part of the Unicode 3.1 (ISO 10646-2) standard (I think). Instrumentation can encode 24 bits in a single area, so we should be able to import the entire useful part of the UCS address space with no modification. I picked EBCDIC, because:
There will be many other single and double byte character maps
within this Area of the Instrumentation address space, this just
happens to be the one I used as an example.
This block doesn't actually use the Types at all. The number in
each cell is the "Symbolic Name" from the Linux "charmap" schema.
The "Character Encoding" number might have been a better choice,
but I picked the one that was easiest to scan (the HTML tables
below were created by a Java program that I wrote).
This block would require the Employment lath. That would put it
somewhere in the #02 Specialization area. Currently I'm planning
on dedicating the entire #02 area to character translation by
importing the bottom six bytes of the Unicode mapping standard
directly into this area. There will probably be a block or two in
the "user Interface Area" that will provide ISO character set
names as an external index into these tables (because the Unicode
tables, sadly, do not follow the Instrumentation
standard indexing plan).
The terms are organized into four tables, each containing four
sub-tables (or banks) which consist of sixteen cells. each cell contains a
standard character name. the headings before the four tables
below are totally generic. the dark rows and columns within each
table are also utterly generic.
I'm really only using this format out of habit. If there is any
pattern to the character layout below it is purely coincidental.
sub-table0 | col1 | col2 | Both | |
#00 - NULL (NUL) | #01 - START OF HEADING (SOH) | #02 - START OF TEXT (STX) | #03 - END OF TEXT (ETX) | |
row1 | #04 - END OF TRANSMISSION (EOT) | #05 - ENQUIRY (ENQ) | #06 - ACKNOWLEDGE (ACK) | #07 - BELL (BEL) |
row2 | #08 - BACKSPACE (BS) | #09 - CHARACTER TABULATION (HT) | #0A - LINE FEED (LF) | #0B - LINE TABULATION (VT) |
Both | #0C - FORM FEED (FF) | #0D - CARRIAGE RETURN (CR) | #0E - SHIFT OUT (SO) | #0F - SHIFT IN (SI) |
sub-table1 | col1 | col2 | Both | |
#10 - DATALINK ESCAPE (DLE) | #11 - DEVICE CONTROL ONE (DC1) | #12 - DEVICE CONTROL TWO (DC2) | #13 - DEVICE CONTROL THREE (DC3) | |
row1 | #14 - DEVICE CONTROL FOUR (DC4) | #15 - NEGATIVE ACKNOWLEDGE (NAK) | #16 - SYNCHRONOUS IDLE (SYN) | #17 - END OF TRANSMISSION BLOCK (ETB) |
row2 | #18 - CANCEL (CAN) | #19 - END OF MEDIUM (EM) | #1A - SUBSTITUTE (SUB) | #1B - ESCAPE (ESC) |
Both | #1C - FILE SEPARATOR (IS4) | #1D - GROUP SEPARATOR (IS3) | #1E - RECORD SEPARATOR (IS2) | #1F - UNIT SEPARATOR (IS1) |
sub-table2 | col1 | col2 | Both | |
#20 - SPACE | #21 - | #22 - | #23 - | |
row1 | #24 - | #25 - PERCENT SIGN | #26 - AMPERSAND | #27 - APOSTROPHE |
row2 | #28 - LEFT PARENTHESIS | #29 - RIGHT PARENTHESIS | #2A - ASTERISK | #2B - PLUS SIGN |
Both | #2C - COMMA | #2D - HYPHEN-MINUS | #2E - FULL STOP | #2F - SOLIDUS |
sub-table3 | col1 | col2 | Both | |
#30 - DIGIT ZERO | #31 - DIGIT ONE | #32 - DIGIT TWO | #33 - DIGIT THREE | |
row1 | #34 - DIGIT FOUR | #35 - DIGIT FIVE | #36 - DIGIT SIX | #37 - DIGIT SEVEN |
row2 | #38 - DIGIT EIGHT | #39 - DIGIT NINE | #3A - COLON | #3B - SEMICOLON |
Both | #3C - LESS-THAN SIGN | #3D - EQUALS SIGN | #3E - GREATER-THAN SIGN | #3F - QUESTION MARK |
sub-table0 | col1 | col2 | Both | |
#40 - | #41 - LATIN CAPITAL LETTER A | #42 - LATIN CAPITAL LETTER B | #43 - LATIN CAPITAL LETTER C | |
row1 | #44 - LATIN CAPITAL LETTER D | #45 - LATIN CAPITAL LETTER E | #46 - LATIN CAPITAL LETTER F | #47 - LATIN CAPITAL LETTER G |
row2 | #48 - LATIN CAPITAL LETTER H | #49 - LATIN CAPITAL LETTER I | #4A - LATIN CAPITAL LETTER J | #4B - LATIN CAPITAL LETTER K |
Both | #4C - LATIN CAPITAL LETTER L | #4D - LATIN CAPITAL LETTER M | #4E - LATIN CAPITAL LETTER N | #4F - LATIN CAPITAL LETTER O |
sub-table1 | col1 | col2 | Both | |
#50 - LATIN CAPITAL LETTER P | #51 - LATIN CAPITAL LETTER Q | #52 - LATIN CAPITAL LETTER R | #53 - LATIN CAPITAL LETTER S | |
row1 | #54 - LATIN CAPITAL LETTER T | #55 - LATIN CAPITAL LETTER U | #56 - LATIN CAPITAL LETTER V | #57 - LATIN CAPITAL LETTER W |
row2 | #58 - LATIN CAPITAL LETTER X | #59 - LATIN CAPITAL LETTER Y | #5A - LATIN CAPITAL LETTER Z | #5B - |
Both | #5C - | #5D - | #5E - | #5F - LOW LINE |
sub-table2 | col1 | col2 | Both | |
#60 - | #61 - LATIN SMALL LETTER A | #62 - LATIN SMALL LETTER B | #63 - LATIN SMALL LETTER C | |
row1 | #64 - LATIN SMALL LETTER D | #65 - LATIN SMALL LETTER E | #66 - LATIN SMALL LETTER F | #67 - LATIN SMALL LETTER G |
row2 | #68 - LATIN SMALL LETTER H | #69 - LATIN SMALL LETTER I | #6A - LATIN SMALL LETTER J | #6B - LATIN SMALL LETTER K |
Both | #6C - LATIN SMALL LETTER L | #6D - LATIN SMALL LETTER M | #6E - LATIN SMALL LETTER N | #6F - LATIN SMALL LETTER O |
sub-table3 | col1 | col2 | Both | |
#70 - LATIN SMALL LETTER P | #71 - LATIN SMALL LETTER Q | #72 - LATIN SMALL LETTER R | #73 - LATIN SMALL LETTER S | |
row1 | #74 - LATIN SMALL LETTER T | #75 - LATIN SMALL LETTER U | #76 - LATIN SMALL LETTER V | #77 - LATIN SMALL LETTER W |
row2 | #78 - LATIN SMALL LETTER X | #79 - LATIN SMALL LETTER Y | #7A - LATIN SMALL LETTER Z | #7B - |
Both | #7C - VERTICAL LINE | #7D - | #7E - | #7F - DELETE (DEL) |
sub-table0 | col1 | col2 | Both | |
#80 - PADDING CHARACTER (PAD) | #81 - HIGH OCTET PRESET (HOP) | #82 - BREAK PERMITTED HERE (BPH) | #83 - NO BREAK HERE (NBH) | |
row1 | #84 - INDEX (IND) | #85 - NEXT LINE (NEL) | #86 - START OF SELECTED AREA (SSA) | #87 - END OF SELECTED AREA (ESA) |
row2 | #88 - CHARACTER TABULATION SET (HTS) | #89 - CHARACTER TABULATION WITH JUSTIFICATION (HTJ) | #8A - LINE TABULATION SET (VTS) | #8B - PARTIAL LINE FORWARD (PLD) |
Both | #8C - PARTIAL LINE BACKWARD (PLU) | #8D - REVERSE LINE FEED (RI) | #8E - SINGLE-SHIFT TWO (SS2) | #8F - SINGLE-SHIFT THREE (SS3) |
sub-table1 | col1 | col2 | Both | |
#90 - DEVICE CONTROL STRING (DCS) | #91 - PRIVATE USE ONE (PU1) | #92 - PRIVATE USE TWO (PU2) | #93 - SET TRANSMIT STATE (STS) | |
row1 | #94 - CANCEL CHARACTER (CCH) | #95 - MESSAGE WAITING (MW) | #96 - START OF GUARDED AREA (SPA) | #97 - END OF GUARDED AREA (EPA) |
row2 | #98 - START OF STRING (SOS) | #99 - SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) | #9A - SINGLE CHARACTER INTRODUCER (SCI) | #9B - CONTROL SEQUENCE INTRODUCER (CSI) |
Both | #9C - STRING TERMINATOR (ST) | #9D - OPERATING SYSTEM COMMAND (OSC) | #9E - PRIVACY MESSAGE (PM) | #9F - APPLICATION PROGRAM COMMAND (APC) |
sub-table2 | col1 | col2 | Both | |
#A0 - | #A1 - | #A2 - | #A3 - | |
row1 | #A4 - | #A5 - | #A6 - BROKEN BAR | #A7 - |
row2 | #A8 - | #A9 - | #AA - | #AB - |
Both | #AC - NOT SIGN | #AD - | #AE - | #AF - |
sub-table3 | col1 | col2 | Both | |
#B0 - | #B1 - | #B2 - | #B3 - | |
row1 | #B4 - | #B5 - | #B6 - | #B7 - |
row2 | #B8 - | #B9 - | #BA - | #BB - |
Both | #BC - | #BD - | #BE - | #BF - |
sub-table0 | col1 | col2 | Both | |
#C0 - | #C1 - | #C2 - | #C3 - | |
row1 | #C4 - | #C5 - LATIN CAPITAL LETTER A WITH RING ABOVE | #C6 - LATIN CAPITAL LETTER AE | #C7 - |
row2 | #C8 - | #C9 - | #CA - | #CB - |
Both | #CC - | #CD - | #CE - | #CF - |
sub-table1 | col1 | col2 | Both | |
#D0 - | #D1 - | #D2 - | #D3 - | |
row1 | #D4 - | #D5 - | #D6 - | #D7 - |
row2 | #D8 - LATIN CAPITAL LETTER O WITH STROKE | #D9 - | #DA - | #DB - |
Both | #DC - | #DD - | #DE - | #DF - |
sub-table2 | col1 | col2 | Both | |
#E0 - | #E1 - | #E2 - | #E3 - | |
row1 | #E4 - | #E5 - LATIN SMALL LETTER A WITH RING ABOVE | #E6 - LATIN SMALL LETTER AE | #E7 - |
row2 | #E8 - | #E9 - | #EA - | #EB - |
Both | #EC - | #ED - | #EE - | #EF - |
sub-table3 | col1 | col2 | Both | |
#F0 - | #F1 - | #F2 - | #F3 - | |
row1 | #F4 - | #F5 - | #F6 - | #F7 - |
row2 | #F8 - LATIN SMALL LETTER O WITH STROKE | #F9 - | #FA - | #FB - |
Both | #FC - | #FD - | #FE - | #FF - |