                 INTRAN: an INteger formula TRANslator

                              J.V. Noble
               Institute of Nuclear and Particle Physics
                        University of Virginia
                    Charlottesville, Virginia 22901

                               Abstract
    INTRAN is a simple recursive integer expression  parser  that
    permits formulas to be embedded in Forth  words.  The  parser
    directly represents the Backus-Naur statement of its  grammar
    as (recursive) Forth code. The resulting compiled  code  adds
    about 1Kb to the Forth kernel.


    In a letter to Forth Dimensions (Sept./Oct. '92) Peter Roeser pre-
sented a wish list for Forths of the future. Some wishes would require
consensus from the Forth community, while others (like  the  C  inter-
face) are already available. One wish--that Forth be broadened to  en-
compass arithmetic expressions (including parentheses) a  la  FORTRAN,
Pascal, BASIC or C--is within the  capacity  of  individual  Forthniks
(such as the author) to grant. This note presents an  INteger  expres-
sion TRANslator (INTRAN) in a little over a kilobyte of code.

    INTRAN came about because I needed to  translate  integer  expres-
sions appearing as indices in FORTRAN DO loops. That is, FORTRAN  per-
mits the construction

         DO 100 I=(J-K)/2, (J+K)/2+M, K/4
         ...stuff...
    100  CONTINUE


The FORmula TRANslator described in my book Scientific FORTH: a modern
language for scientific computing (available from  FIG)  parses  mixed
(single-  and  double-precision, real and complex) floating point  ex-
pressions. Unfortunately, for technical reasons (having to do with how
I implemented operator overloading) the  FORmula  TRANslator  was  un-
suited to simple integer expressions. This implied a  distinct  parser
for integer expressions. The one I wrote compiles  to  702  bytes  and
surely could be squeezed further. Terseness, however,  subverts  clar-
ity, so here I present a lengthier but more pedagogical version.

    Since the INTRAN is intended to  be  production-quality  code,  we
must address the following programming issues:

         . Simple user interface
         . Ease of porting, maintenance, and extension
         . Bullet-proof error handling

And since this note is  intended  to  illustrate  general principles,
I have striven to make INTRAN both clear and well-documented.


    We begin with the user interface: what we want INTRAN to do. Prim-
arily, we want INTRAN to embed infix  expressions  into  word  defini-
tions. Just as HS/FORTH, e.g., allows assembler to coexist  with  high
level Forth through locutions like

         : HI-LEVEL   words ... [%" assembler words " %] ...words...;

we would like to insert formulae into words, with  their  translations
compiled immediately as Forth. My scientific FORmula  TRANslator  uses
F"..." as in

         : FORMULA      F" a = (b+c)/(c+d)"   ;

to do precisely this. The notation has proven so convenient and trans-
parent that I chose a similar one for INTRAN. If we define a word

         : EXPR1     i" (I+J)/2 + K"  ;

(I+J)/2 + K should be translated and compiled. When we decompile EXPR1
--in HS/Forth, e.g., using the word SHOW-- we will see

         SHOW EXPR1     I J + 2 /  K +  ;  ok.


    Experience shows the value of dry runs: during  testing  we  might
want INTRAN to translate interactively, without  compiling.  That  is,
when we type

         i" (I+J)/2+K"

INTRAN should send the Forth translation

         I J + 2 /  K +  ok

to the screen. To make all these things  happen,  i"  must  be  state-
smart. These abilities alone would  make  INTRAN  the  answer  to  Mr.
Roeser's prayers.

    However, experimenting with early versions of  INTRAN  has  proven
the worth of several other features. Sometimes I need expression frag-
ments like (I+J)/2+K, that leave a result  on  the  stack,  but  other
times I wish to embed formulas in assignment form,

         B = (I+J) / 2 + K

that translate to something like

         I J  +  2  /  K  +  B  !

Fortunately it is easy to do both with little extra code.


    Here is another problem of interface design: what do  the  symbols
I, J, K and B represent? Many Forths define words  I,  J  and  K  that
fetch the current values of indices (from nested loops) to the  stack.
For these we must omit explicit fetches (@'s).  However,  our  expres-
sions might well incorporate Forth VARIABLEs. But a VARIABLE  must  be
followed by @ for the translation to work correctly.

    What about formulas including CONSTANTs or  VARs  (multiple  code-
field words--QUANs in MMSForth and VALs in other dialects): in  either
case @ would be incorrect. While Forth makes it possible to determine,
during compilation, the type of a data structure (e.g. we  could  look
up its CFA) the simplest course is to require the programmer  to  keep
track of VARIABLEs. That is, at all costs we  should  avoid  decisions
and keep the program simple, especially since good Forth  style  tends
to eschew variables (that is, formulas  including  variables  will  be
rare). The programmer will indicate a VARIABLE with a @ at the end  of
its name; this @ will be translated (with appropriate spaces) to  make
the resulting Forth correct:

         : EXPR1     i" (I0@+J0@)/2 + K1"  ; ok
         SHOW EXPR1     I0 @ J0 @ + 2 /  K1 +  ; ok


    In an assignment like B=(I+J)/2+K we need to know  what  B  refers
to. A VARIABLE is stored to by the phrase B ! while  a  VAR  uses  the
locution IS  B. Although good programming practice frowns  on  it,  we
can even modify CONSTANTs with the phrase ' B !  .  Since  the  latter
phrase works equally well for CONSTANTs, VARIABLEs and VARs, it solves
our problem and requires no decisions.

    Finally, we want the parser to report bad  input,  and  to  ignore
white space inserted for readability. With these provisos we have com-
pletely specified the user interface.

    To make porting, maintaining and modifying easier, I have tried to
keep definitions dialect-invariant, to name the key actions telegraph-
ically, to comment thoroughly, and to make the  program  structure  as
transparent as possible. In particular, the self-referential nature of
the grammatical rules embodied in a formula  translator  can  be  most
easily and clearly expressed through recursion.

    The context-free grammar for INTRAN (see Scientific FORTH and ref-
erences contained therein) can be expressed as rules:

    <assignment>   ->   <id> = <expression>
    <expression>   ->   <term> | <term> & <expression>
    <term>         ->   <factor> | <factor> % <term>
    <factor>       ->   ( <expression> ) | <id> | <literal>
    <literal>      ->   digit {digit}
    <id>           ->   letter {letter | digit}  { | @}

The notation for these rules contains some short hand: & stands for  +
or -, % for * or /, and | means "or". A superscript   means  "zero  or
more". The Greek letter  means "void",  hence   { | @}  stands   for
"nothing or the symbol @".

    The preceding grammar lends itself naturally to a  recursive  pro-
gramming style. In pseudo-Forth (also in reverse order of definition!)

    : <assignment>      find "="   found?
                        if   push { id   "!" }
                             push { null " ' " }
                             push { expression  BL }
                             <expression>  print  print
                        else push { expression  BL }
                             <expression>
                        then  ;

    : <expression>      find "&"   found?
                        if     rearrange   <expression>
                        else   arrange
                        then   <term>  ;

    : <term>            find "%"   found?
                        if     rearrange   <term>
                        else   arrange
                        then   <factor>  ;

    : <factor>     literal or id?  if  print  exit  then
                   (<expr>)?
                   if    remove()  <expression>
                   else  error.mss  then  ;


    To implement these ideas as working code we need to  define  words
that can recognize "+", "-", "*", "/" and "=", as  well  as  parenthe-
sized expression fragments and id's.  A  further  restriction  on  the
words that find numeric operators is that they must never be  enclosed
within parentheses. That is, we want "*" in (a+b)*(c-d)  to  be  found
before "+" or "-".

    Then we must choose how to represent the  input  string  (and  the
substrings obtained as we decompose it into terms, factors and  paren-
thesized expressions); as well as the operators corresponding to  each
step in the decomposition.

    To minimize crashes during development we begin  with  the  purely
interpretive function of i". When that works  properly  we  can  worry
about its compiling function. As usual, we begin  with  the  low-level
words. We need a word to find either of a pair of characters. Since we
need to do this with "+" and "-"; "*" and "/"; and also  "="  and  "="
(this last so we can use the same word to find "="); there are  enough
cases to justify the memory overhead of a defining word

    : a|b     CREATE  D,  DOES>  D@  ROT  UNDER  =  -ROT  =  OR  ;

used as

    ASCII +  ASCII -  a|b  +|-
    ASCII *  ASCII /  a|b  *|/
    ASCII =  ASCII =  a|b  =|=

At run time the words +|-, *|/ and =|= compare the TOS (top of  stack)
with two built-in numbers; if either matches, the  words  return  TRUE
(-1). The plan is to write a generic  character-finding  routine  that
will search a string byte by byte, and with each byte EXECUTE  one  of
the above words. That is, the search word will take as input the loca-
tion of the string and the CFA ("code-field address") of the word cor-
responding to the particular character(s) being sought.


    The generic character-finding routine mentioned  above  is  find2.
The first moderately complex word in INTRAN, it embodies a  not  alto-
gether necessary trick, a sort of "hack". The problem is this: we need
to go through the input string one byte at a time, looking for  either
of two characters representing operators. The CFA  of  the  word  that
tests the input is one argument, the others being  the  beginning  and
end of the string. Only "exposed" operators--those not  hidden  within
parentheses--can be "found". So find2 must also check that the  paren-
thesis-level is also 0 (recall the old trick for balancing parentheses
in long FORTRAN expressions: counting from the left, start  at  0  and
add 1 for each "(", subtract 1 for each ")"; if the last ")" coincides
with a 0 count, the parentheses are balanced).

    All four variables --parenthesis count, string pointers and  CFA--
are temporary, so we are tempted to keep them on the stack while find2
searches, and to drop them thereafter. By using a DO...LOOP  with  the
string pointers as limits we automatically puts two of  the  arguments
on the rstack. This leaves only the parenthesis count and CFA  on  the
stack; however as the DO loop executes, both must be  accessed.  While
not impossible to an accomplished stack gymnast, the necessary  manip-
ulations tend to hide what is going on. They are also prone to  error.
Worse, the search must generate a "semi-flag": the address of a  found
operator, or a zero to indicate failure. This is tricky  to  program--
one must LEAVE the loop if the operator is found, but it is then  hard
to determine whether or not the loop terminated early.

    Perhaps DO...LOOP is not the appropriate looping mechanism.  So  I
tried BEGIN...WHILE...REPEAT. Now one may comfortably put the  tempor-
ary variables on the rstack (difficult in a DO...LOOP where the limits
must be accessible on the rstack) but the termination condition(s) are
no longer automatic. Endless loops are easy to produce  inadvertantly.

    What about looping by recursion? This is even  more  horrible  be-
cause the CFA, needed at each iteration, would be buried if placed  on
the rstack, but would be  relatively  inaccessible  on  the  parameter
stack as well. It would have to be off-loaded to a VARIABLE, which  is
what we were trying to avoid. So in the end, compromised:  I  returned
to DO...LOOP, and I off-loaded the parenthesis count to its own  named
variable, ()level.

    But I began this section by advertising a  cute  trick  in  find2.
Where is it? In order to leave the result ( -- adr|0 ), I  first  copy
the CFA (of the comparison routine x|y) on the rstack with DUP>R, then
put it below the loop limits using -ROT. DO now moves  the  limits  to
the rstack and begins executing. The CFA is on the TOS. IF an  exposed
operator is found, we LEAVE the loop early, replacing the CFA  by  the
current index I. Finally we retrieve the saved CFA and compare  it  to
TOS with the phrase DUP  R>  <>. If the loop terminated normally,  TOS
is still the CFA and the comparison yields FALSE.  Whereas,  when  the
loop terminates early (because it has found what it was looking  for),
the TOS is the desired address, which cannot be the CFA, so  the  com-
parison yields TRUE. The final ANDing together  of  flag  and  address
then leaves the desired semi-flag.


    Another look at the INTRAN listing reveals that  after  each  word
was tested I followed it with a comment line "\ OK  time  date".  This
discipline helped me  to  be  both  thorough  and  systematic  in  the
endless quest for bug-free code. An  editor  with  a  time/date  stamp
lessens the tedium of this necessary chore.

    The next major issue is how to represent the input string, as well
as the substrings we are going to find as we decompose it.  The  first
time I ever wrote a FORmula TRANslator, I actually defined a stack  to
hold strings, and placed each string and substring in a separate stack
location, with the operators on a parallel stack. Thus a formula  such
as A=B+C would lead to

      STAGE 0      STAGE 1        STAGE 2
      -------      -------        -------
    A=B+C   nop    A      !       A      !
                   B+C  nop       C      +
                                  B    nop


    Some sort of stack is manifestly demanded by recursion. Eventually
I realized a string stack was excessive: since each  fragment  appears
once and only once, it is enough to store pointers marking the  begin-
ning and end of each fragment, rather than copying the fragments them-
selves. We still need a stack to pass the arguments  during  recursive
calls, however, so my next try was a stack 3 cells wide, to  hold  the
pointers and a token for the operator. For technical reasons  this  is
still the method I use in my scientific FORmula TRANslator. But sever-
al attempts to write INTRAN made clear that the parameter stack  would
suffice to store the pointers and operator. So the words <expression>,
<term>, <factor> will expect the input stack picture

         ( beg end .op )

where .op stands for an operator token, represented as the ASCII codes
32d, 42d, 43d, 45d and 47d (BL, "*", "+", "-" and "/").


    The words to determine whether a string  is  an  identifier  or  a
literal integer are straightforward. They could as well have been pro-
grammed as state machines, which would have made it simple to  enforce
length rules (i.e., how long valid numbers or id's can  be).  I  chose
not to include such elaborations because my Forth allows id's up to 32
characters long, and because I am unlikely  to  write  an  excessively
large literal integer.


    It is now time to implement the Backus-Naur grammatical  rules  in
real, rather than pseudo-, code. Consider <expression>, whose rule is

         <expression> -> <term> | <term> "&" <expression>

and whose code is

    : <expression>   ( beg end .op -- )  -bl-
                     >R DDUP  CFA' +|- find2      \ find&
                     R>  OVER                     \ found?
                     IF    rearrange    RECURSE
                     ELSE  PLUCK   THEN   <term>  ;
    \ OK 17:36:42  3/2/1993


With the exception of the word -bl- (discussed below)  I  have  trans-
lated the rule directly into Forth. One crucial phrase is

    CFA' +|- find2

which locates an exposed "+" or "-". The other is the IF...ELSE...THEN
that executes <expression> <term> when there is  an  exposed  conjunc-
tion, but only <term> when there is none. The words >R, R>, rearrange,
DDUP and PLUCK are mere "glue" that do not express the  algorithm  but
are nonetheless necessary to its actual performance.


    Of course recursion is not absolutely necessary: by  theorem,  re-
cursion can always be removed from a  recursive  program  (sadly,  the
theorem does not give specifics). However, recursion offers a decisive
advantage over non-recursive indefinite loops. The  latter  demand  an
explicit stopping condition such as "test the stack depth  to  see  if
there are no more arguments"; whereas recursion implicitly keeps track
of execution. To guarantee this with a  multiply  recursive  algorithm
such as INTRAN, we must ensure first, that each parsing word takes the
same number of arguments; and second, that each leaves nothing on  the
stack. Thus, to make a recursive call to <expression>  followed  by  a
call to <term>, we must put two sets of arguments on the stack,  which
the two calls will consume. This is precisely what rearrange does.  On
the other hand, the second branch (that only  calls  <term>)  requires
but one set of arguments, which PLUCK takes care of.


    Note that <factor> has to call <expression> before the latter  has
been defined. Some dialects permit forward vectoring using a word like
DEFER, and in fact HS/FORTH offers several forward-vectoring  methods.
But all Forths permit the simple vectoring method I  used,  namely  to
define a CONSTANT (or a VARIABLE) to hold the  code-field  address  of
the word to be executed. In this program I defined  'expression  as  a
CONSTANT, initially holding the CFA of NEXT (so if it is  accidentally
EXECUTEd no harm will be done).  The  phrase  'expression  EXECUTE  in
<factor> will execute whatever word has its CFA stored in 'expression.
Finally, executing the phrase

         CFA' <expression>   '  'expression !

immediately after defining <expression> fulfills  the  forward  refer-
ence. This reference achieves indirect recursion  (<expression>  calls
<term> which calls <factor> which calls <expression>) in  addition  to
the direct recursion found in <expression> and <term>.


    With all the advantages of brevity, directness and simplicity that
recursion brings to a program like INTRAN, one wonders  whether  there
are countervailing disadvantages. The chief one  is  the  ever-present
danger of mistreating the stack in such a way as to produce an endless
loop that overwrites vital things.  Debugging  tools  like  HS/FORTH's
TRACE and SSTRACE are vital to making sure recursive  definitions  be-
have themselves.


    The reader will find the word -bl-  sprinkled  through  the  code.
What does it do, and why is it there? I wanted to  allow  formulas  to
contain optional white space for clarity.  Unfortunately,  there  does
not seem to be any simple method, short of redefining  it  completely,
to set my Forth's version of WORD to ignore blanks. (WORD is  the  key
component of TEXT, that reads input terminated with a given character,
to the scratchpad (PAD).) Since using TEXT was easier than  redefining
it from scratch, I was left with a string (possibly) full  of  blanks.
How to rid the input of blanks? A priori it seems  most  efficient  to
deblank the entire input string before parsing.  The  following  (two)
words accomplish this:

: skip   ( end beg char cfa -- end beg' )  D>R
         1- BEGIN 1+  DDUP =   OVER C@   DR@ EXECUTE  OR  UNTIL  DRDROP  ;

: -BL  ( end beg -- end' beg)               \ strip blanks out of a string
       UNDER  - 1+  -TRAILING               \ strip trailing blanks
       OVER + 1-  SWAP                      ( end" beg)
       DUP>R  BEGIN   BL CFA' =   skip      \ -> 1st BL
                      DDUP                  ( end adr end adr)
                      BL CFA' <>  skip      \ -> non-blank
                      PLUCK                 ( end adr adr+n-1)
                      DDUP  <               \ not at the end yet
              WHILE   OVER  DDUP  -               ( end adr src dst #bl)
                      -ROT  D>R  -ROT             ( #bl $end adr)
                      DDUP - 1+  DR>  ROT  CMOVE  ( #bl $end adr)
                      -ROT  SWAP  -  SWAP
              REPEAT  DDROP  R>  ;

The algorithm is simple: having first used the system  word  -TRAILING
to eliminate trailing blanks, search (from the  left,  rightward)  for
the first blank. Save the address where  this  occurs.  Then  continue
rightward to the first non-blank character. Compute how many  charact-
ers are in the tail of the string. Slide  the  tail  leftward  to  the
first blank, and adjust the end-of-string pointer.  Repeat  until  all
blanks have been eliminated.

    Oddly, the above procedure is neither so compact nor efficient  as
my "afterthought" method (when I realized blanks should  be  allowed).
Whereas the preceding code adds 135 bytes, the "afterthought" adds 57.
How does it work? We decompose as though the blanks were not  present.
The substring pointers point to the ends of (sub)strings with (possib-
ly) leading and trailing blanks. It is much easier to advance the  be-
ginning pointer rightward past the leading blanks and the end  pointer
leftward past the trailing ones,  than  to  actually  compose  a  new,
blankless, string from one with blanks sprinkled  through  it.  So  we
compose a word that moves the pointers as noted:

    : -bl-      ( beg end .op -- beg' end' .op)
            >R     1+  BEGIN  1-  DUPC@  BL <>  UNTIL   \ skip leading
            SWAP   1-  BEGIN  1+  DUPC@  BL <>  UNTIL   \ skip trailing
            SWAP   R>  ;

and then apply it whenever a new substring has been dissected out  and
its pointers placed on the stack. The definition -bl- (the  name  sug-
gests that blanks are removed at both ends) requires 47 bytes  (in  an
indirect-threaded system), and the five subsequent  references  to  it
add another 10 bytes.


    We are nearing the end of the  exposition.  The  final  definition
(the initial one, were we so foolish as to  design  top-down)  is  i".
Before we add the compiling abilities described above, we only require
that it acquire the text, place pointers to the ends of the string, as
well as a do-nothing operator token, on  the  stack,  and  invoke  the
first parsing word, <assign>:

    : i"    ASCII " TEXT    ()_ok      \ check for balanced parens
            PAD  adr>ends   BL  CR  <assign>  ;
    OK 17:27:16  3/2/1993

We check for balanced parentheses before starting to parse, since this
obviates multiple tests of the parenthesis level during execution.


    Having tested the word i" on many different cases and having tried
(unsuccessfully, one hopes!) to make it fail, we are now ready to  try
out the most dangerous part of the development: granting i" the  power
to compile expressions. Many Forths (and the new ANS standard) include
a word like EVAL that will compile directly from  a  string.  If  your
Forth can do this, all that is needed is to redirect the  output  from
the screen to a buffer, convert that to a counted string, and EVALuate
the string. HS/FORTH has an equivalent form of redirection, namely the
ability to load from a buffer (the text in the buffer must be  termin-
ated by double-0). The (non-standard) word that  does  this  is  MLOAD
which expects a segment descriptor and offset on the stack. The defin-
itions needed to implement expression compilation are then

256 SEGMENT intran      \ make a named segment to hold output from i"  "

: i"->MEM     intran @  0  256 0 FILLL         \ initialize buffer
              intran  OPEN-MEM  >MEM   ;       \ output -> buffer

: MEM->    intran @  0  MLOAD                  \ load from intran
           CLOSE-MEM  ;

: i"    ASCII " TEXT    ()_ok                     \ input, check ()
        PAD  adr>ends   BL  CR                    \ set stack
        STATE @                                   \ compiling?
        IF  i"->MEM                               \ vector to intran
            <assign>                              \ output the FORTH code
            CRT                                   \ return output to CRT
            MEM->                                 \ load from intran
        ELSE  <assign>   THEN   ;   IMMEDIATE     \ otherwise -> display
\ OK 12:21:28  3/3/1993

The only major changes from the non-compiling version of  i"  are  the
state-dependent IF...ELSE...THEN and making i" IMMEDIATE, so it can do
its work within a word being defined.



                               Appendix
\ Mini expression parser
\ version of 11:50:06  3/6/1993
\ compiles to 1239 + 256 bytes
\ C J.V. Noble, 1993.
\ Placed in the public domain 3/5/93 for non-commercial use only.

TASK INTRAN

: a|b     CREATE  D,  DOES>  D@  ROT  UNDER  =  -ROT  =  OR  ;
\ OK 20:31:59  3/1/1993

VARIABLE ()level    0 ()level !     \ holds current parens level

: inc()    ( c --)   ASCII )  OVER  =   SWAP   ASCII ( =   -   ()level +!  ;
\ increment/decrement ()level
\ OK 20:44:29  3/1/1993

: adr>ends  ( $adr -- $beg $end)  COUNT  OVER +  1-  ;
\ OK 20:34:18  3/1/1993

: find2     ( beg end cfa -- adr|0 )
            ()level 0!   DUP>R                            \ initialize
            -ROT  DO  I C@   DUP    inc()                 \ adjust ()level
                      OVER   EXECUTE    ( -- cfa f)       \ test input
                      ()level @  0=                       \ exposed?
                      AND                                 \ found?
                      IF  DROP  I  LEAVE   THEN           ( -- adr)
            -1 +LOOP
            R>  OVER  <>  AND  ;                          \ ( adr | 0 )
\ OK 22:03:34  3/1/1993

ASCII +  ASCII -  a|b  +|-
ASCII *  ASCII /  a|b  *|/
\ OK 22:10:12  3/1/1993

: print   ( beg end .op -- )   -ROT              \ move op token
          DUPC@  >R                              \ save last char on rstack
          OVER - TYPE                            \ type all but last char
          R>    DUP  ASCII @ =                   \ last char = @ ?
          IF  SPACE  THEN                        \ emit space
          EMIT   SPACE                           \ type last char
          SPACE  EMIT  SPACE  ;                  \ type operator
\ OK 15:17:49  3/2/1993

: rearrange      ( beg end adr .op -- adr+1 end .op'  beg adr-1 .op )
                 >R   DUP>R                   ( -- beg end adr)
                 1+ SWAP  ROT  R@ C@          ( -- adr+1 end beg .op')
                 SWAP  R>  1-   R>   ;
\ OK 17:05:24  3/2/1993

CFA' NEXT  CONSTANT  'expression
\ for forward reference, initialized for safety

: (<expr>)?   ( beg end -- f)  C@ ASCII ) =   SWAP   C@ ASCII ( =   AND  ;
\ OK 19:53:31  3/2/1993

: WITHIN      ( k m n --f)   ROT  UNDER  MIN  -ROT  MAX  =  ;
\ OK 21:44:15  3/2/1993

: digit?      ( c -- f)          ASCII 0  ASCII 9  WITHIN ;
: letter?     ( c -- f)   32 OR  ASCII a  ASCII z  WITHIN ;

: <id>      ( beg end -- f)      \ <id> -> letter {letter|digit}* { |@}
            DUPC@  ASCII @ =  +  \ ignore trailing @
            SWAP  DUPC@  letter?  -ROT
            1+ SWAP
            DO  I C@  DUP
                letter?   SWAP   digit?   OR   AND
            -1 +LOOP  ;
\ OK 22:37:51  3/2/1993

: <#>       ( beg end -- f)          \ <#> -> {digit}+
            1+  SWAP                 ( end+1 beg)
            -1 -ROT                  ( -1 end+1 beg)
            DO  I C@  digit?   AND   LOOP  ;
\ OK 12:55:49  3/2/1993

: simple?    ( beg end -- f)  DDUP  <#>  -ROT  <id>  OR  ;

VARIABLE BLBL   8224 BLBL !
: NULL  ' BLBL   DUP 1+  ;

: 3SWAP   ( a b c d e f -- d e f a b c)
          6 ROLL 6 ROLL 6 ROLL  ;

: -bl-      ( beg end .op -- beg' end' .op)
            >R    1+  BEGIN  1-  DUPC@  BL <>  UNTIL  \ end' <- end
            SWAP  1-  BEGIN  1+  DUPC@  BL <>  UNTIL  \ beg  -> beg'
            SWAP   R>  ;
\ OK 20:07:32  3/2/1993

: <factor>  ( beg end .op -- )   -bl-     \ fctr -> <#> | <id> | (<expr>)
            >R
            DDUP  (<expr>)?               \ enclosed?
            IF  1-  SWAP 1+ SWAP          \ remove ()
                R>  'expression  EXECUTE  \ <expression>
                EXIT
            THEN
            DDUP  simple?                 \ <id> or <literal>
            IF    R>  print
            ELSE  RDROP  CRT ." INCORRECT EXPRESSION"  ABORT  THEN  ;
\ OK 11:46:25  3/6/1993

: <term>  ( beg end .op -- )     -bl-     \ trm -> fctr | fctr % trm
          DUP>R  BL <>
          IF  BL NULL R> 3SWAP  RECURSE   print   EXIT  THEN
          DDUP  CFA' *|/  find2               \ find%
          R>  OVER                            \ found?
          IF    rearrange    RECURSE
          ELSE  PLUCK   THEN   <factor>  ;
\ OK 11:46:31  3/6/1993

: <expression>   ( beg end .op -- )  -bl-     \ expr -> term | term & expr
                 >R DDUP  CFA' +|- find2      \ find&
                 R>  OVER                     \ found?
                 IF    rearrange    RECURSE
                 ELSE  PLUCK   THEN   <term>  ;
\ OK 17:36:42  3/2/1993

CFA' <expression>  '  'expression !        \ make forward reference

: put0    ( beg end -- beg end+1)
          DUP>R  OVER -  1+  >R  DUP DUP 1+ R>  <CMOVE
          ASCII 0  OVER  C!   R>  1+  ;

: fix-    ( beg end -- beg end f)      \ leading "-" -> leading "0-"
          OVER C@  ASCII - =           \ leading - ?
          IF  put0  THEN   ;
\ OK 11:43:23  3/3/1993

ASCII = DUP  a|b =|=                       \ to find =

: <assign>  ( beg end .op -- )   -bl-           \ eliminate spaces
      >R  DDUP   CFA' =|= find2                 \ find=
      ?DUP                                      \ found?
      IF    ( -- beg end adr )
            DUP>R  1-  SWAP   ASCII !  SWAP     ( -- beg adr-1 "!" end )
            >R   NULL  ASCII '     ( -- beg adr-1 "!" beg' end' "'")
            R> R> 1+  SWAP  ( -- adr+1 end)
            R> -bl- >R                          \ eliminate spaces
            fix-                                \ "-" -> "0-"
            R>   <expression>   print   print
      ELSE  fix-   R>  <expression>   THEN   ;
\ OK 11:43:34  3/3/1993

: ()_ok   ()level 0!    PAD  COUNT  OVER +  SWAP
           DO   I C@  inc()  LOOP   ()level @
           IF CRT ." Unbalanced parentheses!"  ABORT  THEN  ;
\ OK 17:52:30  3/2/1993

\ : i"    ASCII " TEXT    ()_ok
\         PAD  adr>ends   BL  CR  <assign>  ;
\ OK 17:27:16  3/2/1993

256 SEGMENT intran      \ make a segment to hold output from i"  "

: i"->MEM     intran @  0  256 0 FILLL         \ initialize buffer
              intran  OPEN-MEM  >MEM   ;       \ output -> buffer

: MEM->    intran @  0  MLOAD                  \ load from intran
           CLOSE-MEM  ;

: i"    ASCII " TEXT    ()_ok                     \ balanced () ?
        PAD  adr>ends   BL  CR                    \ set initial stack
        STATE @                                   \ compiling?
        IF  i"->MEM                               \ vector to intran
            <assign>                              \ output the FORTH code
            CRT                                   \ return output to CRT
            MEM->                                 \ load from intran
        ELSE  <assign>   THEN   ;   IMMEDIATE
\ OK 11:48:06  3/6/1993
\ ******************************************************************



