The Holzmann USENET Distribution & Hurt Encoding (1984)

Contents

1 - The Holzmann USENET Distribution

In December of 1986, Peter Holzmann distributed a version of the Hershey glyph data derived from a US NTIS (National Technical Information Service) distribution of the US NBS (National Bureau of Standards, now the NIST (National Institute of Standards and Technology; no relation to the NTIS)) version of the glyph data to the (now former) USENET newsgroup "mod.sources". This distribution is archived in the comp.sources.unix archive, Vol. 4. It may be downloaded from various locations online. I have also duplicated these files at: http://www.lemur.com/dmm/culch/hershey/index.html (Note: the licensing terms of the Holzmann distribution may not be compatible with the Free Software Foundation's GNU Free Documentation License, under which the present Notebook is licensed. My redistribution of the Holzmann files is not a part of this present Notebook, and itself complies with the terms of the Holzmann distribution.)

The Holzmann distribution is was a major step forward in the widespread use of Dr. Hershey's glyphs, and is memorable for that. It is also significant as it marked the introduction of James Hurt's "bias-82" encoding method. Hurt's method has continued in more modern distributions of the data, and a related (bias-64) form was used by Dr. Hershey in his "TrueBASIC" distribution. In my opinion, however, the Holzmann data is no longer the best basis for practical work with the Hershey glyphs. The GNU® Plotutils package is probably a better source.

The James Hurt Bias-82 (Bias-R) Encoding

For the Holzmann USENET distribution, James Hurt developed an extremely ingenious encoding scheme. It differs completely from the earlier nines complement NORC encoding and the printable character NBS/NTIS encoding. It allows each coordinate position represented to be represented by a single ASCII character, and uses only printable ASCII characters (this not only allows the encoded data to be printed, obviously, but also means that it is unlikely to be corrupted by systems trying to interpret nonprinting ASCII characters as control codes). On the negative side, however, it cannot represent the entire range of the original Hershey glyph grid (it can handle -49 to 44 only), and it uses for its special coordinate position a new value (-50) which differs from both the NORC encoding's value (-49) and the NBS/NTIS value (-64). Fortunately, these limitations do not affect any existing glyphs (at least through the GNU Plotutils distribution).

Another way of thinking about it is to see the Hurt encoding as a binary encoding which employs only (ASCII) printable values.

The Hurt encoding (or indeed any "bias-X" encoding) may be a bit confusing at first to nonprogrammers who are not used to thinking of characters as numbers. In a computer, though, each "character" is really just a number. ASCII characters are 7-bit numbers (numbers in the range 0 to 127). Of the ASCII characters, the first 32 (0 through 31) and the last (127) are special control codes of various sorts. The "printable" characters are those from 32 through 126.

The smallest value in the Hershey glyph grid is -49. Hurt's encoding actually starts one less than that, with a value of -50 which will be used to construct the special coordinate position used to signal pen-up. If one starts counting at -50 and assigns to -50 the ASCII character for a blank space (32, the first printable character), the result is the bias-82 coordinate encoding of the Hurt encoding:

 ASCII   ASCII  coordinate
decimal   char    value
   33   [space]    -50
   33      !       -49
   34      "       -48
   35      #       -47
   36      $       -46
   37      %       -45
   38      &       -44
   39      '       -43
   40      (       -42
   41      )       -41
   42      *       -40
   43      +       -39
   44      ,       -38
   45      -       -37
   46      .       -36
   47      /       -35
   48      0       -34
   49      1       -33
   50      2       -32
   51      3       -31
   52      4       -30
   53      5       -29
   54      6       -28
   55      7       -27
   56      8       -26
   57      9       -25
   58      :       -24
   59      ;       -23
   60      <       -22
   61      =       -21
   62      >       -20
   63      >       -19
   64      @       -18
   65      A       -17
   66      B       -16
   67      C       -15
   68      D       -14
   69      E       -13
   70      F       -12
   71      G       -11
   72      H       -10
   73      I        -9
   74      J        -8
   75      K        -7
   76      L        -6
   77      M        -5
   78      N        -4
   79      O        -3
   80      P        -2
   81      Q        -1
   82      R         0
   83      S         1
   84      T         2
   85      U         3
   86      V         4
   87      W         5
   88      X         6
   89      Y         7
   90      Z         8
   91      [         9
   92      \        10
   93      ]        11
   94      ^        12
   95      _        13
   96      `        14
   97      a        15
   98      b        16
   99      c        17
  100      d        18
  101      e        19
  102      f        20
  103      g        21
  104      h        22
  105      i        23
  106      j        24
  107      k        25
  108      l        26
  109      m        27
  110      n        28
  111      o        29 
  112      p        30
  113      q        31
  114      r        32
  115      s        33
  116      t        34
  117      u        35
  118      v        36
  119      w        37
  120      x        38
  121      y        39
  122      z        40
  123      {        41
  124      |        42
  125      }        43
  126      ~        44

As it happens, one runs out of ASCII characters at 126, the encoding of Hershey coordinate position 44. Fortunately, no glyph exceeds this coordinate value.

Hurt's encoding uses the coordinate value -50 to construct the special coordinate position which encodes "pen up." This is represented by the grid coordinate (-50,0), and is therefore encoded, numerically, as (32,82), which, interpreted as ASCII characters, is ([space],R). The special coordinate position for "end of glyph" is omitted, as the Hurt encoding specifies the length of each glyph explicitly.

The Hurt encoding is termed a "bias-82" encoding because each coordinate value in the original grid may be encoded by adding 82 to it. 82 is also the ASCII capital R character, so this may be termed a "bias-R" encoding. Conversely, one can go from the Hurt encoding to the grid coordinate by subtracting 82. Each of the two numerical domains is biassed, or shifted, by 82 from the other.

In overall form, the Hurt encoding employs for each character a variable number of printable ASCII characters (usually represented by single bytes in electronic format, though one can write the encoding with pencil and paper if desired). The space character is significant. When multiple glyphs are represented in the same file, no newline or other termination characters separate them. Since each glyph's encoding encodes its own length, such characters would be redundant.

The first five characters encode the glyph number, right justified and padded with spaces. The next three characters encode the number of pairs of coordinates to follow, also right justified and padded with spaces. The first pair of coordinates is a special one. Its first character encodes the leftmost boundary of the glyph, and its second character encodes the rightmost boundary. (The smallest glyphs are the various blanks. These contain only left and right boundaries, with no point data.)

The remaining pairs encode coordinate data. The glyph is drawn by a sequence of zero or more polylines connecting successive coordinate points. The special coordinate (-50,0) (or ([space],R) as Hurt encoded) indicates that the current polyline should terminate. A new polyline begins with the next coordinate, if present.

As an example, the first Hershey glyph (a cartographic-sized capital A) is encoded as:

    1  9MWRMNV RRMVV RPSTS

Just to make sure that the spaces are apparent, this is:

[sp][sp][sp][sp]1[sp][sp]9MWRMNV[sp]RRMVV[sp]RPSTS

The first five character field identifies this as glyph number 1. The next three character field indicates that this glyph has 9 pairs of data characters (and indeed there are 18 ASCII characters following.)

The left edge of this glyph is at "M" (ASCII 77) (which by the table above, or by the subtraction of 82 or "R", can be seen to be -5 (that is, 77 - 82 = 5)). The right edge is "W" (+5).

The first polyline goes from "RM" (0,-5) to "NV" (-4, 4). Then an "[sp]R" (-50, 0) indicates that the pen should be raised. (This "polyline" has in fact only one stroke.) The line so drawn is the left side of the "A" (remember, negative X coordinates are at the left, and negative Y coordinates are to the top).

The next polyline (also of one stroke; this is a simple glyph) goes from "RM" (0, -5) to "VV" (4, 4). This is the right side of the "A". Then a "[sp]R" indicates that the pen should be raised again.

Finally, a third line goes from "PS" (-2, 1) to "TS" (2, 1). This is the "crossbar" of the "A". With this the data line is done and the glyph is complete.

Exploring Dr. Hershey's Typography
CircuitousRoot