Back to Translating the Hershey Data
The Hershey occidental glyphs are numbered is a sequence that runs, with gaps, from 1 to 3926 (or 3999 or 4000 to round it off). While their arrangement has considerable internal order, it is not related to the order of the ASCII character set.
Several terms are used by Wolcott & Hilsenrath, the Holzmann USENET distribution, and the GNU plotutils to describe the glyphs. (I have not yet had the opportunity to consult Hershey's original papers, particularly Calligraphy for Computers.) I've adopted some of these here, but have omitted some, changed some, and have introduced new terms. I've done so to try to increase clarity, but I'm sure that I've just introduced confusion.
The glyphs as displayed here are shown within the context of VARKON font cells (not the original Hershey cells). Their margins are therefore ignored and they are shown flush left on the VARKON font baseline. They are not, however, tweaked as they are in the VARKON fonts I've prepared from them. (For example, the uniplex normal lowercase "m" is not scaled down in the x direction, as it is in its use in the fonts.)
The basic division of the Hershey glyphs is stroke count: 1, 2, or 3. This division is even more basic than size, and can form the basis for an arrangement of the glyphs into a "periodic table" of sorts (see below).
The stroke count ranges are continuous: uniplex from 1 to 1000, duplex from 1001 to 3000, and triplex from 3001 to 4000.
As a second descriptive term, the glyphs may be divided into three sets based on their capital height (or in the case of some symbols the capital height of the alphabetic characters best associated with them). Wolcott & Hilsenrath identify three heights: "cartographic," "indexical," and "normal."
The "cartographic" size, encompassing glyphs in the range 1 to 500 (all uniplex, therefore), is the smallest. It is also unusual as it contains only capital letters (in Roman and Greek), numerals, and a restricted set of symbols.
Although this range is termed "cartographic," it does not include a number of cartographic symbols included in other ranges.
The "indexical" size covers the range 1001 to 2000. All indexical glyphs are duplex, but the duplex range extends beyond into the "normal" size range.
The "normal" size appears in all three stroke ranges: uniplex (501-1000), duplex (2001-3000), and triplex ( 3001-4000). Only the triplex stroke range is exclusively normal size.
This division is more difficult than the earlier ones. Wolcott & Hilsenrath define "simplex," "complex," and "gothic" on the basis of the quantity of tapering done in the glyphs. They partially identify the ranges of glyphs corresponding to these types in their index (C-1).
I've changed these terms completely, for several reasons.
First, the "Fraktur" and "Blackletter" typefaces (of which there are examples in the Hershey glyphs in the 3301-3800 range) are often termed "Gothic." To confuse things further, in a completely unrelated usage, certain sans serif typefaces are (at least in traditional US usage) termed "gothic." I felt that the word had too many disparate meanings to be used for this particularly difficult division of these glyphs.
Second, "simplex" and "complex" are easy to conflate with "uniplex," "duplex," and "triplex" ("simplex" with "uniplex," especially).
It might be best to eliminate this division altogether, as it doesn't contribute as much as the others. However, this division is useful in distinguishing the duplex parallel-lined glyphs in the range 2501-2800 from duplex tapered glyphs.
I'll replace these terms with three newly coined ones: taper0, taper1, and taper2. These aren't very good terms, but they have the advantage of sticking out enough that it's unlikely they'll be conflated with other terms.
"Taper0" (W&H "simplex") glyphs don't taper at all. All uniplex glyphs are taper0. Some duplex glyphs are as well. It wouldn't be impossible to have a triplex taper0 glyph, but there are none in the Hershey repertory.
Most duplex glyphs are "taper1" (W&H "complex") except some parallel-line glyphs in the range 2501-2800 which are taper0. Some triplex glyphs are "taper1."
Many triplex glyphs are "taper2" (W&H "gothic").
The final division of the glyphs based on their visual forms (as opposed to further divisions based on the characters, numerals, and symbols they represent) is that of "typeface." Once again, there is considerable room for discussion.
The Uniplex faces are about as simple as writing can be: just single lines. They have no variation in width between letter components. Generally, they are "sans serif" in the sense that there are no fiddly bits sticking out to further define the glyph.
In the Holzmann USENET distribution these faces at the "cartographic" size are termed "plain," while at the normal size they're called "Roman" and "simplex." Wolcott & Hilsenrath do not further identify the cartographic range typeface, but call the normal range uniplex typeface "simplex." The GNU plotutils differ further (though theirs is the most extensive arrangement into modern fonts).
It doesn't seem right to call these faces "Roman" for two reasons. First, they're not really derived from the Roman letterforms. These letters originate in scratching the sand with a stick and have their modern echoes in the ballpoint pen. They share in the great utility of these two forms of writing, but have little to do with carved Roman capitals.
I'll sidestep the issue by introducing an invented term, "lineface," to designate this type of single-stroke typeface. For "regular" Latin letters and Greek letters I'll use simply "lineface" For the uniplex script in the ranges 550-576 and 651-676 I'll use "lineface script."
As used here, "Roman" designates a typeface, not a character set. (The character set employed by the Romans and much of the subsequent Western and Nonwestern world will be called "Latin.") In particular, some of the Greek and all of the Cyrillic letters are in a Roman typeface.
"Italic" is entirely different from the "Italian" typeface identified by Wolcott & Hilsenrath (and termed here "Lombardic," see below).
The typeface identified by Wolcott & Hilsenrath as "German" is a form of Gothic. The GNU plotutils call it "GothicGerman."
Hershey (Calligraphy for Computers, quoted in the GNU plotutils documentation) indicates that it was adapted from a Fraktur in Jan Tschichold's Treasury of Alphabets and Lettering. An examination of this work suggests that of the several versions of Fraktur illustrated the Fraktur intended is the "Walbaum Fraktur" on page 179. Tschichold identifies this as a "late" Fraktur (dating to the period 1803-1828) by J. G. Justus Erich Walbaum (1768-1839). His description of it is charming.
The term "Gothic" can rightly be applied to what I'm calling "Blackletter," but as indicated elsewhere "gothic" is an overloaded term that I wish to avoid. In particular, the "blackletter" typefaces are completely unrelated to the term "gothic" as employed by Wolcott & Hilsenrath to indicate what I'm here calling a "taper2" quantity of tapering.
The noted type designer Frederic W. Goudy (69) felt that "blackletter" faces were those gothic faces where the amount of black ink exceeded the amount of white paper. By that criterion, these probably don't qualify (especially the symbols).
Hershey (Calligraphy for Computers, quoted in the GNU plotutils documentation) say that the source of this typeface was "a Le Roy lettering set for Old English." I probably should simply call it "Old English," then. But to me "Old English" is a language, not a typeface (and a language not written in "blackletter" for that matter).
The GNU plotutils term this "GothicEnglish."
Wolcott & Hilsenrath term this face "Italian." While this is not incorrect, it is easily confusable (in name, not in appearance!) with "italic." The GNU plotutils call it "GothicItalian."
Hershey (Calligraphy for Computers, quoted in the GNU plotutils documentation) says of it that it
represents a large family of alphabets for which there does not seem to be a consistent nomenclature. Some writers refer to it as Gothic uncial while others call it Lombardic Gothic. It seems to have been developed in Lombardy while the best examples come from Spain. The present version is an adaptation of a font [the 'Missal Initials' font] of the American Type Founders Company.
(The [bracketed material] above is by the GNU plotutil editor.)
Since it resembles examples of "Lombardic" capitals given in Goudy, I've adopted that term for it.
A character set is the underlying communications entity separate from the form of the glyphs used to represent it.
The ordinary Western alphabet is generally termed "Latin" (e.g., the various ISO® "Latin" character set standards).
The decimal numerals 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9
The Hershey glyphs include the modern Greek characters, without additional markings, including some variant characters such as the terminal sigma.
I know nothing at all about Cyrillic.
The Hershey glyphs encode a wide range of symbols, but these do not necessarily correspond to the symbols and punctuation used in ASCII or any other character set. Certain common symbols are omitted. It is perhaps a sign of the decline of our society as we grasp more and more for less and less that only three decades ago Dr. Hershey did not find it necessary to encode the copyright and registered trademark symbols.
My first step was to go through and make MBS modules to draw each glyph. A short shell script, allhersheymbs.sh was sufficient for this. The core of this shell script is:
rm -f hershnums-cartographic hershnums-indexical hershnums-normal awk -f hershnums-cartographic.awk < hersh.occ > hershnums-cartographic awk -f hershnums-indexical.awk < hersh.occ > hershnums-indexical awk -f hershnums-normal.awk < hersh.occ > hershnums-normal # Make MBS for cartographic glyphs, baseline 4 for h in `cat hershnums-cartographic` do awk -f varkhersh.awk < hersh.occ hnum=$h mbs=1 baseline=4 > $1/hg$h.MBS done # Make MBS for indexical glyphs, baseline 6 for h in `cat hershnums-indexical` do awk -f varkhersh.awk < hersh.occ hnum=$h mbs=1 baseline=6 > $1/hg$h.MBS done # Make MBS for normal glyphs, baseline 9 for h in `cat hershnums-normal` do awk -f varkhersh.awk < hersh.occ hnum=$h mbs=1 baseline=9 > $1/hg$h.MBS done
In turn it employs four Awk scripts (previously introduced):
hershnums-cartographic.awk
hershnums-indexical.awk
hershnums-normal.awk
varkhersh.awk
In use:
./allhersheymbs.sh hg < hersh.occ
This puts an MBS module to draw each Hershey glyph in a previously existing subdirectory named "hg". There will be 1597 modules.
It is reasonable to copy these MBS modules into the "mbs" subdirectory of a VARKON job. It wouldn't be a good idea to compile all of them at the same time and put the *.MBO files in the job directory, though, as each of them would show up as a separate job, overwhelming the job selection window. It is perhaps better to compile smaller ranges as necessary. I modify a script, hershmbsc.sh, with varying minimum and maximum values as required to do this.
rm -f hg*.MBO rm -f ../job/hg*.MBO i=1 max=26 while [ $i -le $max ] do if [ -e hg$i.MBS ] then mbsc hg$i.MBS mv hg$i.MBO ../job fi let i=$i+1 done
I found it useful to have an MBS module which drew the VARKON font template box, so that I could see how each Hershey glyph fit within it. Since the glyphs will be drawin in the "shifted coordinate system," this box must also draw within that system.
global drawing module varkbox( float originx; float originy; int n; int color); float llx; float lly; float lrx; float lry; float urx; float ury; float ulx; float uly; float baseliney; float topliney; float rightlinex; float nx; float ny; beginmodule ! The hg*.MBS modules are written in the shifted coordinate system ! (not the scaled coordinate system). ! So the baseline is at Y=10, and the topline at Y=31 llx := originx + 0; lly := originy + 0; lrx := originx + 21; lry := originy + 0; urx := originx + 21; ury := originy + 36; ! 10 + 21 + 5 ulx := originx + 0; uly := originy + 36; ! 10 + 21 + 5 baseliney := originy + 10; topliney := originy + 31; ! 10 + 21 rightlinex := originx + 12.6; ! 21 * 0.6 nx := ulx + 1; ny := uly - 3; ! the outline of the box lin_free(#1,vec(llx,lly),vec(lrx,lry): pen=color); lin_free(#2,vec(lrx,lry),vec(urx,ury): pen=color); lin_free(#3,vec(urx,ury),vec(ulx,uly): pen=color); lin_free(#4,vec(ulx,uly),vec(llx,lly): pen=color); ! the baseline lin_free(#5,vec(llx,baseliney),vec(lrx,baseliney): pen=color); ! the topline lin_free(#6,vec(ulx,topliney),vec(urx,topliney): pen=color); ! the glyph right line lin_free(#7,vec(rightlinex,lry),vec(rightlinex,ury): pen=color); ! n text (#8, vec(nx,ny), 0, str(n,-4,0): pen=color, tsize=2); endmodule
Remember, VARKON uses "!" to introduce comments, bash uses "#". Using "#" in VARKON results in a core dump from mbsc. Details.
With MBS modules for each glyph in hand, all I had to do was write MBS modules to display any range of glyphs I wished. Given the diverse nature of the glyphs, each was bound to be different. In the glyph presentations below, I link in each case to the MBS module which produced the display. That module may be invoked as a part from the Active Module.
A typical VARKON Active Module to call a glyph display module might be:
BASIC DRAWING MODULE t2(); BEGINMODULE clear_pm(); clear_gm(); part(#1,h0027_0050(0, 0, 2, 3, 0)); ENDMODULE
Produced by MBS Module: h0001_0026.MBS
Produced by MBS Module: h0027_0050.MBS
At several points within the Hershey glyphs, three successive glyphs appear which have no coordinate points. These are, of course, invisible or unprintable; Wolcott & Hilsenrath identify them as "blank."
These glyphs do, however, have left and right margins. In the case of glyphs 197, 198, and 199 these are, respectively, "RR" (0 and 0), "PT" (-1 and 2), and "NV" (-4 and 4). I would guess that these are single (or null), double, and triple width spaces.
Produced by MBS Module: h0200_0209.MBS
Produced by MBS Module: h0210_0235.MBS
Glyph 214 is of course the exclamation point; "bang" is, I believe, printer's slang for this. This glyph is the only duplex glyph in the "cartographic" range (probably because it must match the width of a minimal one-unit (two points) wide dot). W & H give "interrogation" for 215 ("question" here, because "interrogation" wouldn't fit) and "prime" for 216 (identified here as "minute" so as to fit with 217, second). The programmer's "slash" (220) is known more formally as the "solidus" (W & H) or "virgule." W & H term 224 ("minus") "difference," 225 ("plus") "sum," and 226 ("equals") "equality." 227 which they and I list as "cross" is probably "multiplication," given its context. The printer's jargon for "asterisk" is "splat," at least if old movies are to be believed :-) 234 ("and") is of course the "ampersand" (which wouldn't fit on the line without obscuring the "l" of "lozenge).
Glyph 233, "number," has many names. See the entry for "ASCII" in Eric S. Raymond's "The Jargon File" (http://www.catb.org/~esr/jargon, or (I haven't seen it) the print version, The New Hacker's Dictionary).
Wolcott & Hilsenrath (24-45) give a table of glyphs digitized at the NBS for their use. The data for these glyphs do not appear either in their book or in the Holzmann USENET distribution. The location of most (but not all) of these glyphs in the under 500 range is interesting, as they do not seem to be of the "cartographic" size.
These glyphs are not identified, and contain several symbols with which I am unfamiliar. The four symbols that would seem to be of the most general use are the British pound (currency) sign (272), the registered trademark symbol (273), the copyright symbol (274), and the cent (currency) sign (outside the present range at 910).
Produced by MBS Module: h0501_0526.MBS
Produced by MBS Module: h0527_0550.MBS
Produced by MBS Module: h0551_0576.MBS
These glyphs don't fit particularly well into the VARKON font limits. I (559), J (560), Y (575), and Z (576) are capitals with descenders which are of such proportion as to extend below the VARKON descender space. The real solution here would be to rework the glyphs, but here I've simply shifted the entire set upward by two Hershey units.
Other glyphs, particularly M (564) and W (573) extend beyond the right of the VARKON font cell. These could be scaled back, but are shown here in their original proportions.
Produced by MBS Module: h0583.MBS
This symbol is identified by Wolcott & Hilsenrath as the "nabla." A quick online search reveals that the nabla is the vector calculus differential operator, introduced in this orientation by William Roan Hamilton in 1837 "Earliest Uses of Symbols of Calculus," by Jeff Miller. Miller indicates that this symbol is also referenced in the Oxford English Dictionary, 2nd edition (I verified that it is not referenced in the 1st edition) and Florian Cajori's A History of Mathematical Notations. Miller gives "del" and "atled" as alternate names.
Produced by MBS Module: h0601_0626.MBS
Produced by MBS Module: h0627_0650.MBS
Produced by MBS Module: h0651_0677.MBS
This range of glyphs, like the corresponding capitals in 551-576, contains descenders that exceed the space available in VARKON's font cell. I've shifted the entire glyph set up two Hershey units.
That glyph 677 is an alternate "l" is my presumption only.
I'm not sure what this symbol is. It appears in the same relative position as glyph 583 (nabla), so perhaps it is intended as a differentiation operator? I've displayed it with the Greek alternate lowercase letters, below.
Produced by MBS Module: h0684_0687.MBS
For 683, see above.
I'm guessing that 684 is epsilon, 685 an alternative form of uppercase theta, 686 an alternative form of lowercase phi, and 687 is the final sigma, but my knowledge of Greek characters is quite limited and I may be wrong.
Produced by MBS Module: h0700_0709.MBS
Produced by MBS Module: h0710_0746.MBS
See the Cartographic Symbol section for a discussion of some of these symbols.
Produced by MBS Module: h0750_0768.MBS
Some of the symbol names didn't fit on the display above. These are, in full:
(From Wolcott & Hilsenrath.)
Produced by MBS Module: h0796_0827.MBS
Glyph 798 (the long vertical line) extends below the VARKON Y=0 line; it has been shifted up by 1 Hershey unit.
The Wolcott & Hilsenrath names have been changed or abbreviated. Here's the correspondence:
Produced by MBS Module: h0828_0834.MBS
Produced by MBS Module: h0840_0847.MBS
Produced by MBS Module: h0850_0857.MBS
Produced by MBS Module: h0860_0874.MBS
"anch." = anchorage, "aero." = aerodrome, and "ltship" = lightship.
Produced by MBS Module: h0899_0907.MBS
In the untransformed glyphs these circles are, of course, centered on the origin (the center) of the Hershey glyph cell.
Produced by MBS Module: h0908_0909.MBS
Note for international readers: In the United States, the "Interstate" Highway System is a separate system of limited access highways, begun in the 1950s.
Produced by MBS Module: h3301_3326.MBS
Produced by MBS Module: h3401_3429.MBS
Note the three letters after z.
Produced by MBS Module: h3501_3526.MBS
Produced by MBS Module: h3601_3626.MBS
Produced by MBS Module: h3700_3709.MBS
Produced by MBS Module: h3710_3729.MBS
See the notes for the Cartographic symbols for variations in names.
Produced by MBS Module: h3801_3826.MBS
Produced by MBS Module: h3901_3926.MBS
The Hershey glyphs lend themselves to an arrangement which is both numerically sequential and divisible into a table not unlike the periodic table of the elements. It's a pretty big table, of course, so the image below is just an icon - it's not intended to be legible at this size. Click on it for a larger image, or see below for the full table in various formats.
A "Periodic Table" of the Hershey Glyphs
Here's the (full-size) table in various formats:
Here's the MBS module which drew it:
global drawing module pertab(); constant float space_x = 25; ! width of one glyph and its varkbox float ix; float iy; constant int titlecolor = 2; constant int boxcolor = 3; constant int boxblank = 1; beginmodule ix := 0; iy := 0; clear_pm(); clear_gm(); part(#1, h0001_0026(ix, iy, titlecolor, boxcolor, boxblank)); ix := ix + (space_x * 11); part(#27, h0027_0050(ix, iy, titlecolor, boxcolor, boxblank)); endmodule
Goudy, Frederic W. The Alphpabet and Elements of Lettering. (1918, 1942) NY: Dover Publications, 1963.
Hershey, A. V. Calligraphy for Computers. Technical Report No. 2101 (1 August 1967). U.S. Naval Weapons Laboratory, Dahlgren, Virginia.
Raymond, Eric S. The New Hacker's Dictionary. [Printed version of "The Jargon File."]
Tschichold, Jan. Treasury of Alphabets and Lettering. (Meisterbuch der Schrift, 1952) NY: Reinhold Publishing Corporation, 1966 [edition Hershey used?]. NY: Design Press, 1992 [reprint I consulted].
Wolcott, Norman M. and Joseph Hilsenrath. A Contribution to Computer Typesetting Techniques: Tables of Coordinates for Hershey's Repertory of Occidental Type Fonts and Graphic Symbols. Washington, D. C.: Office of Standard Reference Data, National Bureau of Standards, U.S. Department of Commerce, April 1976. NBS Special Publication 424. National Technical Information Service (NTIS) Order Number PB251845.
The data, files, text, and programs of the Holzmann USENET Hershey Glyph Distribution may be redistributed and used freely under their original terms as specified in the Holzmann USENET Hershey Glyph Distribution Cover Statement. The distribution here complies with these terms. The data of the Hershey Glyphs as transformed for use with VARKON may be redistributed and used freely under these same terms. I assert no additional rights or conditions on the use of the transformed data. Some of the text and programs in the Holzmann USENET Hershey Font Distribution may be Copyright 1986 by Peter Holzmann and/or James Hurt. Their own terms either allow or require their redistribution with the Hershey data. The distribution of these texts, files, data, and programs here is subject to all of the disclaimers of warranty and liability noted herein.
The quotation from Hershey is from a paper prepared for the US Navy, which as a US government official publication is in the public domain. The four words within that quote added by the GNU plotutils editor are licensed under the GPL and so should be quotably compatible with this present GFDL document (and within "fair use.")
The text of this document itself and of any linked program files insofar as their text is separable from any Hershey Glyph data they may contain are copyright © 2003 by David M. MacMillan.
Permission is granted to copy, distribute and/or modify copyrighted portions of this document (other than the portions the copyright of which is owned by Peter Holzmann and/or James Hurt, which are freely redistributable under their own terms) under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License."
Note: Those portions of this document which are in the public domain, if any, may be copied freely. The distribution of these public domain portions is subject to all of the disclaimers of warranty and liability noted herein.
This work is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Free Documentation License for more details.
You should have received a copy of the GNU Free Documentation License along with this work; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
This work is distributed "as-is," without any warranty of any kind, expressed or implied; without even the implied warranty of merchantability or fitness for a particular purpose.
In no event will the author(s), editor(s), or publisher(s) of this work be liable to you or to any other party for damages, including but not limited to any general, special, incidental or consequential damages arising out of your use of or inability to use this work or the information contained in it, even if you have been advised of the possibility of such damages.
In no event will the author(s), editor(s), or publisher(s) of this work be liable to you or to any other party for any injury, death, disfigurement, or other personal damage arising out of your use of or inability to use this work or the information contained in it, even if you have been advised of the possibility of such injury, death, disfigurement, or other personal damage.
GNU is a registered trademark of the Free Software Foundation.
ISO is a registered trademark of the International Organization for
Standardization.
VARKON is or was a trademark of Microform AB (Sweden).