The Glyph Grid and the NORC Encoding

The Hershey glyphs are constructed of straight lines which connect coordinate positions on a grid. (These are implemented as zero or more "polylines" which connect a series of two or more coordinate positions.) Coordinates are decimal integers within a limited range. The grid is centered on the coordinate position (0,0). (Glyphs are also centered on this position.) In the X axis the grid runs from -49 to 49, increasing left to right. In the Y axis it runs from -49 to 49, increasing top to bottom. Two special coordinate pairs, (-49,0) and (-49,-49), may not be used in any glyph (see the explanation of the NORC encoding below for the reason). Each glyph has encoded with it, additionally, a leftmost extent and a rightmost extent on the X axis (which positions are not necessarily the same as the leftmost and rightmost X coordinates of the glyph).

2 - The NORC Encoding and Nines Complement

I've never seen a tape, or the transcription of a tape, of the Hershey glyph repertories in their original "NORC" encoding (named after the computer on which they were first produced; see the earlier chapter on Dr. Hershey's paper "Calligraphy for Computers"). The presentation here is simply a reconstruction of certain aspects of that format from descriptions in Dr. Hershey's papers. This reconstruction may well contain serious errors. The value of this reconstruction, if indeed it has any, is that it allows an understanding of the interplay between the size of the Hershey glyph grid and its original encoding. Bear in mind that this encoding dates to at least 1967 (possibly 1960 or earlier), a time when computer storage was very expensive. This in turn created a generation of talented programmers who valued the efficient representation and processing of data.

2.1 - The Original NORC Encoding on Tape

In "Calligraphy for Computers," Dr. Hershey describes the "NORC encoding" (not there given this name) in this way:

It is important for a modern reader who may have been schooled in {GNU®/}Linux®-like operating systems and their derivatives to realize that this NORC encoding came from a record oriented model of data processing, not from a stream or sequential file oriented model.

In understanding this encoding, it may be best to start at the "outside." The encoding assumes the presence of a magnetic tape which can be divided physically (electronically/magnetically) into "blocks" separated by interblock gaps. Hardware and operating system support would have existed to read each block separately. This reliance upon the physical characteristics of magnetic tape differs from the later use of the "NORC format" in "Cartography and Typography with True BASIC" [Hershey 1995], where the blocks are, it would seem, gathered into one or more sequential files.

When Dr. Hershey says that "each block consists of 16 decimal digit words," he means that each block consists of some number of "words," each made up of 16 decimal digits (not that each block consists of 16 words made up of decimal digits - the number of words per block is unlimited). "Each word is divided into four fields of four digits each." Thus, a word would be:

where "a", "b", "c", and "d" represent decimal digits. Each four-digit series is a "field" within the word.

The details of the first and last words are not clear from this description. Dr. Hershey says that "the first word is a beginning-of-block word," but does not specify the value of this word. He also specifies that the "last word is an end-of-block word," but says only that the field (not word) with a value of 5050 "signifies the end of the character." In the absence of an actual NORC encoded tape, this probably doesn't matter that much.

Before going on to examine the format of the actual data, it might be of interest (well, it was of interest to me) to look at the way the NORC encoding was used and described 30 years later.

2.2 - The NORC Format in IBM Mainframe "Files"

In "Cartography and Typography with True BASIC" [Hershey 1995], Dr. Hershey describes the "NORC format" (as used on an IBM® mainframe) in a slightly different manner. The core of the encoding of the data has not changed, but the surrounding packaging has.

The "NORC format" here seems to describe the earlier NORC encoding (a record-based encoding employing physical blocks on tape) encapsulated in a sequential file (or multiple sequential files?) The format contains metadata which encode the length of a block, in a style comfortable to a mainframe programmer, but these metadata are redundant (and ignored by Dr. Hershey's software) because the blocks are scanned for an end of glyph identifier ("5050"), in a style comfortable to programmers trained in {GNU/}Linux-like and microcomputer environments.

2.3 - The NORC Data Encoding

The core of both descriptions is the same, however: data are encoded as two decimal digit subfields in nines complement. The first field contains two two-digit-pairs which specify the left and right edges of the glyph. Subsequent fields contain two two-digit pairs which encode a coordinate position. Polylines are drawn between coordinate positions, stopping when a particular value (5000) is found.

2.4 - A Puzzle

Dr. Hershey's comments in "Cartography and Typography with True BASIC" present one puzzle, however. It is a puzzle which would be solved immediately upon the examination of either an original NORC tape (if it could still be read; or a bit-for-bit transcription) or a mainframe tape for the "True BASIC" distribution, or preferably both - but I don't have such tapes.

In neither the original NORC nor the "True BASIC mainframe" descriptions does Dr. Hershey specify whether a "digit" is a 4-bit binary coded decimal digit or an 8 bit printable digit in some character code (whatever code was native to NORC in that case, and EBCDIC for the IBM® System/370 mainframe world of the "True BASIC" distribution).

In the absence of better information, it would make sense to assume BCD digits for the original NORC encoding. It might also make sense to assume this for the later encoding, particularly since the System/370 machine architecture was 32-bit, which would allow 4 digits (one complete nines complement coordinate position) per machine register.

However, when describing the microcomputer version of this distribution (p. 8), Dr. Hershey notes that its bias-64 encoding requires 2 bytes (16 bits), and thus constitutes "a two-fold compression of the data." This would suggest that each "digit" of the mainframe encoding was in fact being represented by an 8 bit character rather than a 4 bit BCD digit.

As noted, this puzzle could be solved trivially by examining the distribution tapes. Moreover, in the absence of such a tape (and the need to process it), the question of whether a "digit" in the mainframe distribution of Dr. Hershey's "Cartography and Typography with True BASIC" is BCD or printable EBCDIC is not relevant to an understanding of the nines complement encoding of the data using these digits.

2.5 - Nines Complement

"Nines¹ complement" is a concept which may be unfamiliar to typographers. It refers to a computational practice in decimal arithmetic which allows both the efficient subtraction of numbers using only the (easier to implement) operations of complementing and addition (this advantage isn't necessarily relevant here) and (of direct relevance here) the space-efficient encoding of positive and negative decimal integers.

The "nines complement" of a single decimal digit is the number which must be added to the digit in order to equal nine. Thus, the nines complement of 2 is 7, because 2 + 7 = 9. The nines complement of a multidigit decimal integer is generated by taking the nines complement of each digit. Thus the nines complement of 123 is 876.

As noted above, nines complement notation has two uses in computer programming. The first, easy-to-implement subtraction, isn't relevant here (Dr. Hershey indicates in "Calligraphy for Computers" that the decimal data on tape were converted to binary for use in the computer itself.) The second use is to permit the expression of negative numbers without the need to devote an entire digit position to the minus sign (or its absence).

The way this is done is rather clever. Ordinary integers starting with 0 are simply encoded starting with 0. Thus, for the two decimal digits of the NORC encoding, numbers 0 through 49 are encoded as "00", "01", "02", ... "49". Numbers in the range -49 through -0 are encoded by taking their nines complement. Thus, (negative) 00 is represented by 99, -01 by "98", -02 by "97", and so forth, counting down, to -49, which is represented by "50". Note that nines complement notation has two representations for the number 0, ordinary "positive" 0 ("00") and a somewhat strange "negative" 0 ("99"). In summary:

2.6 - The Significance of the NORC Encoding

The Hershey glyph grid size is thus exactly that permitted by two-decimal-digit nines complement encoding of coordinates. In order to increase the size of his grid by even one unit, Dr. Hershey would have had to have gone to three decimal digits per coordinate (six per coordinate pair). This would have given him a 999x999 grid, which would have exceeded, significantly, the resolution of his output devices. More importantly, such a fifty percent increase in encoding size might have been a serious consideration for the encoding of thousands of characters given the costs of storage in the early 1960s.

The special coordinates which represent the end of each polyline (5000) and the end of the character data in the NORC encoding (5050) are thus seen to be composed of the leftmost X coordinate (-49, "50" in nines complement) either duplicated or in conjunction with 0. This means that two coordinate positions, (-49,0) and (-49,-49), may not be used in glyphs.²

The relationship of the glyph grid size to the original encoding of the glyphs is not apparent in, for example, the NBS/NTIS encoding (which permits a range from -99 to 99) or the Hurt encoding (which permits a range from -49 to 44).

¹ The word is a plural, not a possessive, and thus is written without an apostrophe.

² I find it curious that Dr. Hershey did not use the redundant "negative 0" ("99" in nines complement) in constructing these special values.

Dr. Hershey's reports "Calligraphy for Computers" and "Cartography and Typography with True BASIC" were produced in the service of the US federal government for the United States Navy. They are therefore in the public domain.
Important disclaimers of warranty and liability in the presentation of public domain material.

Permission is granted to copy, distribute and/or modify copyrighted portions of this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts and no Back-Cover Texts. A copy of the license is included in the file entitled "fdl.txt" (GNU Free Documentation License).

The computer programs here are also present as files (in the original distribution, at least) and are released, as indicated in those files, under the GNU General Public License Version 2 or (at your option) any later version. A copy of this license is included in the file entitled "gpl.txt" (GNU General Public License).

Note: Those portions of this document which are in the public domain, if any, may be copied freely. The distribution of these public domain portions is subject to all of the disclaimers of warranty and liability noted herein.

This work is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Free Documentation License for more details.

You should have received copies of the GNU Free Documentation License and the GNU General Public License along with this work; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

NOTICE OF DISCLAIMER OF WARRANTY AND LIABILITY:

This work is distributed "as-is," without any warranty of any kind, expressed or implied; without even the implied warranty of merchantability or fitness for a particular purpose.

In no event will the author(s), editor(s), or publisher(s) of this work be liable to you or to any other party for damages, including but not limited to any general, special, incidental or consequential damages arising out of your use of or inability to use this work or the information contained in it, even if you have been advised of the possibility of such damages.

In no event will the author(s), editor(s), or publisher(s) of this work be liable to you or to any other party for any injury, death, disfigurement, or other personal damage arising out of your use of or inability to use this work or the information contained in it, even if you have been advised of the possibility of such injury, death, disfigurement, or other personal damage.

CircuitousRoot & circuitousroot.com are service marks of David M. MacMillan & Rollande Krandall. Other trademark recognition.

Presented originally by CircuitousRoot.^SM

Digits	Interpretation
1 - 2	X-coordinate
3 - 4	Y-coordinate