JUSTIFY.EXE Version 1.5 -- By Tom Almy

Copyright 1994 by Tom Almy

May be freely used and distributed as long as no charge is made beyond
distribution costs and program is not modified.


DESCRIPTION

Justify will reformat already formatted text.  It will ignore titles
and other header information and reformat paragraphs to any desired
style.  JUSTIFY is run from the command line.  Source code is
provided.


PREPARATION

The input text must be stripped of all tab characters.  JUSTIFY must
be able to deturmine what constitutes a paragraph.  It is important
that the input text be consistantly formatted.  JUSTIFY cannot
reformat tables.  In some cases it may be desirable to break the
input file into pieces, and use JUSTIFY with different settings on
each piece or just some pieces.

Running JUSTIFY with no arguments gives the following usage
information: 

 justify columns [bflditohsrweq] [indent] [body] <source >dest
   b - input file paragraph is hanging indented
   f - input file paragraph is fully indented
   l - input file paragraphs are single lines
   d - delete blank line after paragraph read
   i - insert blank line after paragraph read
   t - indent first paragraph line by indent spaces
   o - indent other paragraph lines by body spaces
   h - remove hyphens across line boundaries
   s - double space after . ? ! ." ?" or !"
   m - process m-dash adjacent to words
   w - output for word processors
   r - ragged right margin (otherwise full justification)
   e - EMAIL input -- don't format quotes or headers
   q - EMAIL output -- add '>' to non-blank lines


The "b", "f", "l", "h", "m" and "e" options (and to a certain extent
the "d" and "i" options) deal with parsing the input text. 

When the paragraphs are have "hanging indents" -- the first line is
flush against the left margin and all remaining paragraph lines are
indented -- then you need to use the "b" option.

When the paragraphs are set off from the left margin, and title
information is flush against the left margin, then you need to use
the "f" option.

When paragraphs are single long lines, the "l" option needs to be
used. This is "Generic Word Processor" format.

No option needs to be specified for block paragraphs flush against
the left margin, or paragraphs with first line indents.  If headers
and paragraphs are both indented, then leading spaces must be removed
so that JUSTIFY can differentiate between the two.  The author's
program UNOFFSET can be used to do this.

A single line of text between two blank lines is assumed to be a
header line unless text starts at the left margin, the f option is
not used, and the line is longer than the output columns setting. In
this case the line is formatted like a paragraph. The "e" and "q"
options have addition affects on handling header lines.

If the "e" option is used, all lines starting with a Right angle
bracket, ">", or starting with a word which ends with a colon are
considered to be header or quoted text, and are not formatted.

If the input text uses hyphenation across line boundaries (common in
scanned text), this hyphenation must be removed.  Use the "h" option
to do this. The output text should be spell checked because sometimes
line breaks split normally hypenated words into two pieces, and
JUSTIFY will remove hyphens from those as well! If the "h" option is
not specified the words crossing line boundaries will assume to be
hypenated, and the hyphen won't be removed (example: brother-
in-law.) Hypenated output is never created since this causes problems
doing searches for text.

Some text files use two sequential hyphens to represent an m-dash,
and do not have any spacing between the m-dash and adjacient text.
For text--like this--use the "m" option. This will handle m-dashes at
line ends in input as well as in output.


SELECTING THE OUTPUT FORMAT

The "columns" argument must be specified, and is the number of
columns of text to output.  The indent options, "t" and "o" will
reduce that number, but the right hand margin will stay the same.
Title and header lines are not processed, so they can be longer than
the columns figure.

The values given for columns, indent, and body should be reasonable!

To format with indented paragraphs (first line indented, remainder of
the lines flush with the left margin), specify the "t" option, and
provide the indent argument for the number of columns to indent the
first line of the paragraph.  Headers and titles are not indented.

For a hanging indent, specify the "o" option, and provide the body
indent argument.

For block paragraphs, nothing need be done, but if it is desired to
indent entire paragraphs, specify both the "t" and "o" options, and
specify indent and body values that are identical.

Block paragraphs are identifiable by there being a blank line between
the paragraphs.  Indented paragraphs often do not have blank lines
between paragraphs.  When an text file with indented paragraphs is
converted to one with block paragraphs, often it is necessary to
insert a blank line between the paragraphs.  The "i" option will
insert a blank line after each paragraph.  Conversely, when
converting from block paragraphs to indented paragraphs, it is often
desirable to delete the blank line between paragraphs.  The "d"
option will delete any blank line after a paragraph.

JUSTIFY does full justification (even left and right margins) by
default, however the "r" option can be used to output ragged right
margins.  With full justification, spaces are inserted from the left
and from the right on alternate lines. 

JUSTIFY removes spaces from input text.  The "s" option will insure
two spaces after each sentence.  This is a common practice when
ragged right margins are used.

It is possible to produce output intended for word proccessor
programs. When the option "w" is specified, the columns value is
ignored, as well as the "t", "o", "q", and "r" options. Each
paragraph is output as a single long line. Most word processor
programs, when fed text this way (as an "ASCII Import"), will
reformat the text as a paragraph. 

If the "q" option is used, the text is output "quoted" for email.
This means that all non-blank lines will start with a right angle
bracket. The space taken by the bracket does not effect the right
margin. Indentation occurs after the bracket. Lines not in a
paragraph and not already quoted are also indented by the smaller of
the first line or body intentation.


RUNNING THE PROGRAM

This file, justify.doc, makes a suitable source file for
reformatting.  It has block paragraphs, ragged right formatted with
70 columns.

Example to reformat to 50 columns, with full justification:

  justify 50 <justify.doc >justify.txt

When you do this, you will note that the "usage" table does not get
reformatted.  This is luck, caused by the table not being recognized
by the program.  On the other hand, the line above starting "Example"
will be treated like a header line and won't be reformatted although
it should be.  Output text will typically need to be cleaned up, but
the time spent doing this should be very small for books.

Example to reformat to 75 columns, first line indented 5 columns, no
space between paragraphs, ragged right with two spaces after sentence
ends:

   justify 75 dtsr 5 <justify.doc >justify.txt

When you do this, you will see that deleting lines does not always
work.  Some additional cleanup is necessary.  Before doing so,
convert this new file to 70 columns, paragraphs indented 5 columns
(blocked), reinsert blank lines between paragraphs:

   justify 70 ito 5 5 <justify.txt >justify.tx2

Reformat to the first 50 column format.  We now need to specify that
the file has full indented paragraphs.

   justify 50 f <justify.tx2 >justify.tx3

Pretty good, but the usage table is messed up because JUSTIFY could
not tell it from a paragraph.

Mark the justify file as quoted text, double spacing after sentences:

   justify 70 qs <justify.doc >justify.quo

With a space after the quote bracket:

   justify 70 qsto 1 1 <justify.doc >justify.quo

Add some new text to the justify.quo file, then format for email:

   justify 70 es <justify.quo >justify.eml

Or format for email as newly formatted text:

   justify 70 esqto 1 1 <justify.quo >justify.eml


CONCLUSION

I hope you find this program useful. I have!

Tom Almy
Internet: tom_almy@ieee.org

12/17/93
 revised 1/5/94
 revised 9/1/94
 revised 2/96
 revised 7/97

