Up to the Contents

Back to Translating the Hershey Glyphs


Preparing Hershey Data

Topics

  1. Obtaining the Holzmann USENET Distribution
  2. Unpacking It
  3. Preparing It
  4. Bibliography
  5. *
  6. GNU® Free Documentation License (separate file)
  7. GNU General Public License (separate file)
  8. Legal

1 - Obtaining the Holzmann Distribution

The Peter Holzmann USENET Distribution of the Hershey Glyph data was posted to the USENET former newsgroup "mod.sources" in December 1986. It is archived in the "comp.sources.unix" archive, Volume 4. This is available from several sources online; at the time of writing these included the Internet Software Consortium (http://sources.isc.org/) and uu.net (ftp://ftp.uu.net/usenet/comp.sources.unix/).

Note that the Holzmann USENET Hershey Glyph Distribution Cover Statement in this distribution is slightly shorter than the version distributed with the Hershey Fonts in Ghostscript®. I do not know why.

2 - Unpacking It

The distribution is packaged in five files. Each of these uses the older "compress" program for data compression. Modern Linux® systems support the unpacking of this format.

To unpack them, I first uncompressed them:

uncompress part1.Z
uncompress part2.Z
uncompress part3.Z
uncompress part4.Z
uncompress part5.Z

The resulting five uncompressed files are "shell archives." This is a self-extracting format. As the files themselves indicate:

# This is a shell archive, meaning:
# 1. Remove everything above the #!/bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:

Doing this for each file resulted in many files. I received an error message for several of the files. I haven't yet determined why:

$ sh part2
shar: extracting 'hersh.oc1' (26458 characters)
part2: test:   26458 hersh.oc1: integer expression expected
shar: extracting 'hersh.oc2' (26561 characters)
part2: test:   26561 hersh.oc2: integer expression expected
$ sh part3
shar: extracting 'hersh.oc3' (28258 characters)
part3: test:   28258 hersh.oc3: integer expression expected
shar: extracting 'hersh.oc4' (28235 characters)
part3: test:   28235 hersh.oc4: integer expression expected
$ sh part4
shar: extracting 'hersh.or1' (26757 characters)
part4: test:   26757 hersh.or1: integer expression expected
shar: extracting 'hersh.or2' (27782 characters)
part4: test:   27782 hersh.or2: integer expression expected
$ sh part5
shar: extracting 'hersh.or3' (28972 characters)
part5: test:   28972 hersh.or3: integer expression expected
shar: extracting 'hersh.or4' (25031 characters)
part5: test:   25031 hersh.or4: integer expression expected
$ 

Things seemed to work otherwise, though, and I ended up with the files:

cyrilc.hmp
gothgbt.hmp
gothgrt.hmp
gothitt.hmp
greekc.hmp
greekcs.hmp
greekp.hmp
greeks.hmp
hershey.c
hershey.doc
hershey.f77
hersh.oc1
hersh.oc2
hersh.oc3
hersh.oc4
hersh.or1
hersh.or2
hersh.or3
hersh.or4
italicc.hmp
italiccs.hmp
italict.hmp
README
romanc.hmp
romancs.hmp
romand.hmp
romanp.hmp
romans.hmp
romant.hmp
scriptc.hmp
scripts.hmp

The "README" file contains the Peter Holzmann distribution cover letter. The various *.hmp files are suggested mappings of the Hershey glyph numbers to ASCII fonts. The "hershey.doc" file contains information on the encoding format. hershey.c and hershey.f77 are sample programs for manipulating the data. Finally, hersh.oc* and hersh.or* are the files of occidental and oriental glyph data.

3 - Preparing It

As unpacked, the occidental glyph data is split into four files. Within each file, glyph definitions which exceed 72 characters in length are split every 72 characters. I found it convenient to concatenate all of these data together and to rejoin these line splits so that I had a single file containing all of the Hershey glyph data with one glyph definition per line.

To do this, I first edited each data file (using vi, my favorite text editor) to remove any leading or trailing blank lines. (As I later rewrote my "joinhersh.awk" script to skip blank lines, this step would probably now be redundant.) Then I concatenated them together:

cat hersh.oc1 hersh.oc2 hersh.oc3 hersh.oc4 > hersh.cat

Then I wrote a small Awk programming language script to join the lines. This script assumes a knowledge of the James Hurt encoding of the Hershey data. This encoding will be discussed in the next chapter ("Translating the Hershey Data"). Here's the script:

{
   # assume the current line is start of a glyph's encoding

   # Several blank lines (newline only) appear in the USENET data
   # distribution.  Check for these and skip as necessary.

   if (length($0) == 0) {
      next
   }

   # write out the glyph number
   printf ("%s", substr($0,1,5))

   # write out the data count
   printf ("%s", substr($0,6,3))

   # get the number of data pairs, including left/right margin data
   datapairs = substr($0,6,3) + 0

   # write out the data pairs, getting new lines of input as necessary
   pointpos = 9
   for (i = 1; i <= datapairs; i++) {
      a = substr($0,pointpos,1)
      b = substr($0,pointpos + 1,1)
      printf ("%c%c", a, b)
      # if at the end of the line, then...
      if (pointpos == 71) {
         if (i == datapairs) {    # either also at end of glyph
             printf("\n")
             next
         } else {                 # or have more glyph data to read
            getline
            pointpos = 1
         }
      } else {
         pointpos = pointpos + 2
      }
   }

   printf ("\n")
}

The full text of this script is available as joinhersh.awk

To run this script, I did:

awk -f joinhersh.awk < hersh.cat > hersh.occ

The resulting file, hersh.occ, contains a version of the Hershey Glyph data suitable for my further processing. I found it interesting that there appear to be 1597 glyphs (perhaps less 27 blank glyphs) in the repertory, rather than the 1377 cited by Wolcott & Hilsenrath.

4 - Bibliography

Wolcott, Norman M. and Joseph Hilsenrath. A Contribution to Computer Typesetting Techniques: Tables of Coordinates for Hershey's Repertory of Occidental Type Fonts and Graphic Symbols. Washington, D. C.: Office of Standard Reference Data, National Bureau of Standards, U.S. Department of Commerce, April 1976. NBS Special Publication 424. National Technical Information Service (NTIS) Order Number PB251845.


Legal

Copyright

The data, files, text, and programs of the Holzmann USENET Hershey Glyph Distribution may be redistributed and used freely under their original terms as specified in the Holzmann USENET Hershey Glyph Distribution Cover Statement. The distribution here complies with these terms. The data of the Hershey Glyphs as transformed for use with VARKON® may be redistributed and used freely under these same terms. I assert no additional rights or conditions on the use of the transformed data. Some of the text and programs in the Holzmann USENET Hershey Font Distribution may be Copyright 1986 by Peter Holzmann and/or James Hurt. Their own terms either allow or require their redistribution with the Hershey data. The distribution of these texts, files, data, and programs here is subject to all of the disclaimers of warranty and liability noted herein.

The text of this document itself and of any linked program files insofar as their text is separable from any Hershey Glyph data they may contain are copyright © 2003 by David M. MacMillan.

License

Permission is granted to copy, distribute and/or modify copyrighted portions of this document (other than the portions the copyright of which is owned by Peter Holzmann and/or James Hurt, which are freely redistributable under their own terms) under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License."

Note: Those portions of this document which are in the public domain, if any, may be copied freely. The distribution of these public domain portions is subject to all of the disclaimers of warranty and liability noted herein.

This work is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Free Documentation License for more details.

You should have received a copy of the GNU Free Documentation License along with this work; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

NOTICE OF DISCLAIMER OF WARRANTY AND LIABILITY:

This work is distributed "as-is," without any warranty of any kind, expressed or implied; without even the implied warranty of merchantability or fitness for a particular purpose.

In no event will the author(s), editor(s), or publisher(s) of this work be liable to you or to any other party for damages, including but not limited to any general, special, incidental or consequential damages arising out of your use of or inability to use this work or the information contained in it, even if you have been advised of the possibility of such damages.

In no event will the author(s), editor(s), or publisher(s) of this work be liable to you or to any other party for any injury, death, disfigurement, or other personal damage arising out of your use of or inability to use this work or the information contained in it, even if you have been advised of the possibility of such injury, death, disfigurement, or other personal damage.

Trademarks

Ghostscript is a registered trademark of artofcode LLC.
GNU is a registered trademark of the Free Software Foundation.
Linux is a registered trademark of Linus Torvalds.
VARKON is or was a trademark of Microform AB (Sweden).


Forward to Translating the Hershey Data