Trond Trosterud, Tromsø
Replacing typewriters with computers has made it relatively easy to tailor one's own keyboard for localizing glyphs (letters, numbers, and other symbols), but at the same time it has also resulted in the creation of a large number of conflicting "standards", most of them made in a rather arbitrary way.
At the same time,the exchange of letters outside the basic A-Z Latin alphabet is becoming more and more secure all the time, and with the introduction of the UCS, or ISO/IEC 10646 (a.k.a. Unicode), the number of accessible glyps for the average user is exploding (55000 glyphs are fixed already today, and the number is growing). This situation has created an urgent need to facilitate accidental typing of large number of glyphs. The users have this large body of glyphs that are accessible in principle, but the access remain passive; they can read but not produce.
For L1 use (production of the primary language of the user), some keyboard solutions exist, to the extent that any computer solution exists for the language in question. The issue here is rather to ensure that the body of glyphs that is provided by the UCS but not part of the L1 of the user will become accessible to her. One basic method to produce UCS glyphs is already standardized (as ISO/IEC 14xxx), the basic philosophy is that each glyph via a glyph table may be identified via a (decimal or hexadecimal) number, and that typing in a code (so the computer reads the number of the glyph as just that) followed by a certain number, will produce the intended glyph. In the present paper, it will be argued that this solution is needed, but not as the only one. It has two major deficits: It is slow in use, and the glyphs are hard to remember, making the user dependent upon large tables. Thus, ISO/IEC 14xxx should be suppleted by other methods, geared towards faster and easier-to-remember ways of occacionally producing a large set of glyphs. The solution outlined here will be of outermost importance for users dealing with multilingual text or lerge set of technical symbols, e.g. university librarians.
The key positions will be identified according to the keyboard grid in ISO standard 9995. Cf. Fig. 1, with invariant glyphs inserted for the ease of reference.
|
|
00 |
01 |
02 |
03 |
04 |
05 |
06 |
07 |
08 |
09 |
10 |
11 |
12 |
|
E |
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
0 |
+ |
|
|
D |
|
q |
w |
e |
r |
t |
y |
u |
i |
o |
p |
|
|
|
C |
|
a |
s |
d |
f |
g |
h |
j |
k |
l |
|
|
|
|
B |
|
z |
x |
c |
v |
b |
n |
m |
, |
. |
|
|
|
|
A |
|
|
|
|
|
|
|
|
|
|
|
|
|
Thus, the key occupied by the letter "s" will be identified as position C02, "t" by D05, etc.
In addition to that, the following concepts are used:
[Key definitions, group and level definitions as in ISO/IEC 9995.]
The principles behind the proposed solution are the following:
* Multilingual users should have access to several keyboard layout groups (in the ISO/IEC 9995 sense), one for each language. These are the language dependent standards, with a keyboard layout for the glyphs needed to type the language in question.
* In addition to that, there should be a common set of language independent groups. The proposal outlined below covers the extended Latin subset of the UCS, but similar solutions may be made for the other alphabetic scripts as well. The language independent standard thus contains a layout for glyphs other than the A-Z ones.
In the construction of the full keyboard, the two standards (the language dependent keyboard(s) and the language-independent set of groups) should be unified, giving priority to the language-specific standard.
The language-specific standard should be arrived at by the
following procedure:
* There should be a (language-independent) evaluation of key
positions, from an ergonomic point of view.
* There should be a frequency study of the glyphs of the language in
question. Ideally, the study should contain different genres, such as
fiction, newspaper texts, technical texts, etc.
The language-independent standard will be treated in section 4 below.
In ISO/IEC 9995, the characters produced by the keyboard are organized in groups and levels. Even though 9995 is quite open, it requires that the maximal number of levels be 3 for each group.
This an unfortunate desicion, for two reasons:
* Glyphs come in capital/small pairs, thus uneven number of levels force small/capital glyphs onto different positions.
* In order to achieve three levels, one needs tow modification keys, here called MK1 (often: SHIFT) and MK2 (often: OPTION, CTRL, etc.). By allowing a combination of MK1 and MK2 one gets a fourth level for free. In this way, the memory burden is drastically smaller: If, say, the user must remember that a t-stroke is placed on E05-3 (level 3) (press MK2 + E05), then, naturally T-stroke is placed on E05-4 (press MK1 + MK2 + E05). (This 4-level system has a solid testing period as a qualifying merit, it is and has always been implemented in Psion and Macintosh keyboard systems.)
What is needed is ordering the glyphs in small/capital pairs, thus the idea of an uneven number of level is bound to represent a waste of space and on memory. According to 9995, the Level 3 position of SMALL LETTER T WITH STROKE and CAPITAL LETTER T WITH STROKE are different: C05 and XYY, respectively.
On levels 1/2, the case distinction is always kept within the same position. This should be the case on levels 3/4 as well.
We then move to the principles behind the layout of the language-independent set of groups.
The concise way of producing glyphs is of course by way of hexadecimal (or decimal) code, as described in ISO DIS 14xxx. This is dependent upon a reference list of gyphs, though, and it in addition very time-consuming. What is needed is a userfriendly way of producing a large number of glyphs. We are talking bout a large set that is used rarely, and th main priority should thus be making it easy-to-remember. This is a far from academic dicipline, and the need for a standard of this kind is urgent. Within the next few years, the 10646 character set will be introduced in libraries, and the typical users will deal with a large range of glyphs in order to type names and titles. What is neaded is a procedure to type the glyph without being dependent upon large reference tables. The procedure may require typing, say, 5 strokes (approximately the same number as with the hexadecimal method), but once introduced it will be easily remembered.
The new ISO standard for keyboards, ISO 9995, makes this possible. The architecture of the keyboard allows for an infinite number of groups , or modes. My proposal is to give each diacritic a group of its own, and make it accessble via a ONE-SHIFT FUNCTION, or, equivalently, to have one dead key for each diacriic
As an example, let group n be the CARON group. The caron group is made accessible by pressing GROUP SELECTION KEY + GROUP IDENTIFYER KEY.In the example, the group in question is the caron group. The next key pressed gives the desired character; e.g. s gives s-caron, S gives S-caron etc.
The group/dead key selector may vary form platform to platform (or it can be standardized), but it is highly desirable to let the diacritic specifiers be standardized.
Each group has two levels, 1 and 2 (small and capital).
Below follows a matrix sketching the latin repertoire of 10646.
Note: This matrix is intended as an illustrative sketch only. The
principle should be clear, though. The filled-in letters of the
matrix refers to characters in 10646, and the open spaces indicate
that the combination in question does not exist.
Matrix: # abcdefghijklmnopqrstuvwxyz 1=acute 13 a.c.e...i..lmno..rs.u.w.y. 2=grave 7 a...e...i.....o.....u.w.Y. 3=circumfle 12 a.c.e.ghij...,o...s.u.w.y. 4=diaresis 7 a...e...i.....o.....u.w.y. 5=tilde 3 a............no........... 6=caron 9 ..cde.g...kl.n...rst.....z 7=breve 3 a.....g.............u..... 8=double acute 2 ..............o.....u..... 9=ring above 2 a...................u..... 10=dot above 5 ..c.e.g.i................z 11=macron 5 a...e...i.....o........... ... Ligatures Primitives
Use of multiple diacritic marks could be dealt with in separate groups (one group for macron and diaeresis, etc.), or one could operate with two GROUP SELECTION KEYs, one SINGLE GROUP SELECTION KEY and one DOUBLE GROUP SELECTION KEY (accessing themacron and diaeresis groups successively before pressing the relevant letter key).
The ligature group is accessed by the GROUP SELECTION KEY + LIGATURE GROUP IDENTIFYER KEY, and then the keys for the two glyph components are typed one after the other, thus the GROUP SELECTION KEY + LIGATURE GROUP IDENTIFYER KEY - a - e gives æ. By this I by no means imply that the æ itself is a ligature, the character that results is the U+00E6 and no composite, the point is that the typing procedure should be easy to remember. The Norwegians will have their national keyboard with their æ (C11) anyway. As for the primitives (i.e. the latin characters other than a-z that cannot by any stretch of imagination be seen as extensions of or combinations of any of the a-z characters), they should be assigned to keys resembling their shape or sound value, and the residue arbitrariy assigned.
Given that 10646 opens for combined glyphs, concisting of multiple characters, there is really no reason to limit the matrix to the precomposed characters in 10646. Linguists and other users composing their own characters (p-acute, a-caron, etc.) could utilize the combining properties of 10646 (Table 7 of 10646-1) to get the glyphs missing from the matrix above. There could also be an option enableing or disabeling composed characters, thereby using the space of 10646 as a checking mechanism (the glyphs of 10646 are in use in written languages, the ones that are not there probably are not).
Within computer keyboards, there has in the pre-10646 era developed two schools of thought. One (the PC school) limits the typable glyphs to a minimum, and offers nothing but [ctrl+decimal code] for access to letters not used in the language of the keyboard, and the other (the Macintosh school) aims at making typing of every glyph of the 8-bit code table in question possible. In the 10646 era we should opt for both philosophies, and have a minimal but tailored solution for the primary language (L1), access to tailored solutions for any other language (e.g., offering me the possibility to change from Norwegian to Finnish keyboard whenever that is needed), and two standards accessing all the Latin glyphs of 10646, one being accurate and cumbersome but hard to remember (the hexadecimal or decimal value option), an the other being cumbersome but easy to remember (the multiple group option sketched in Section 4 above).