Forth

Forth is a programming language and programming environment. It was initially developed by Chuck Moore at the US National Radio Astronomy Observatory (NRAO) during the 1960s, formalized as a programming language in 1977, and standardized by ANSI in 1994. It features both interactive execution of commands (making it suitable as a shell for systems that lack a more formal operating system), as well as the ability to compile sequences of commands into threaded code for later execution.

The language is so named because Moore considered it appropriate for fourth-generation computers (i.e. microcomputers), but the system on which he developed it was limited to five-letter filenames. Since the name is not an acronym, it is typically not spelled in all capital letters.

Overview

Forth offers a standalone programming environment consisting of a stack oriented interactive incremental interpreter/compiler. Programming is done by extending the language with 'words' (the term used for Forth subroutines), which become part of the language once defined. Forth is usually implemented with an inner interpreter tracing indirectly threaded machine code, which yields extremely compact and fast high-level code that can be compiled rapidly. A character-oriented screen/block mechanism and standard editor written in Forth, provide a file mechanism for creating and storing Forth source code. A typical Forth package will consist of a pre-compiled kernel of the core words, which the programmer uses to define new words for the application. The application, once complete, can be saved as an image, with all new words already compiled. Generally, programmers will extend the initial core with words that are useful to the sorts of applications that they do, and save this as their working foundation.

Forth has been popular for developing embedded systems and instrument controls because it is easy to add small machine code definitions to the language and use those in an interactive high-level programming environment.

The structure of the inner interpreter is compatible with modern RISC processors, and processors that use Forth as machine language have been produced. The modular extensible nature of Forth permits many high-level applications, such as CAD systems to be written in Forth.

Forth is used in the OpenFirmware boot ROMs used by Apple and Sun. It is also used by FreeBSD as the first stage boot controller.

Forth is both blessed and cursed by its ease of implementation. It is without doubt the standardized programming system that is quickest to implement, and that can be ported at the lowest cost. A skilled programmer with good tools can port Forth to a new computer architecture in as little as two weeks. Most of that time goes to writing the assembler, and coding the basic words. Porting a familiar computer architecture to a new computer is often much faster, because only a few drivers are needed. Even programmers unfamiliar with Forth can use low-quality tools and write a Forth system from scratch in a few months of part-time effort.

However, the result has been a proliferation of nonstandard Forth systems by hobbyists, often of indifferent quality, and with poor documentation. Often newcomers to the language download a free version from the net, have problems and dismiss the language as garbage, mistaking a poor implementation for a fundamentally flawed language.

In contrast, professional implementations are rigorously tested (usually with proprietary test suites), conform to published standards, have manuals and telephone support, and often have more than 90% of the core words written in assembler for the highest possible speed. High-end professional Forth systems are usually delivered with all source code required to recompile themselves. There are no mysteries in such a system. Code written by a professional Forth programmer with years of experience is often just better than that written by an inexperienced amateur, no matter how great the love.

Forth from a programmer's perspective

Forth relies heavily on explicit use of the stack data structure and reverse Polish notation (or RPN, also used on advanced calculators from Hewlett-Packard). This notation is also called postfix notation because the operator comes after its operands, as opposed to the more common infix notation where the operator comes between its operands. The rationale for postfix notation is that it is closer to the machine language the computer will eventually use, and should therefore be faster to execute. For example, one could get the result of a mathematical expression this way:

25 10 * 50 + .
300

This command line first puts the numbers 25 and 10 on the implied stack; the "*" command multiplies the two numbers on the top of the stack and replaces them with their product; then the number 50 is placed on the stack, and the "+" command adds it to the previous product; finally, the "." command prints the result to the user's terminal. Even the language's structural features are stack-based. For example:

: FLOOR5 DUP 5 < IF DROP 5 ELSE 1 - THEN ;

This code defines a new word (again 'word' is the term used for a subroutine) called "FLOOR5" using the following commands: "DUP" simply duplicates the number on the stack; "<" compares the two numbers on the stack and replaces them with a true-or-false value; "IF" takes a true-or-false value and chooses to execute commands immediately after it or to skip to the "ELSE"; "DROP" discards the value on the stack; and "THEN" ends the conditional. The net result is a function that performs similarly to this function (written in the C programming language):

int floor5(int v) { if (v < 5) return 5; else return v - 1; }

A terser Forth definition of FLOOR5 that gives the same result:

: FLOOR5 1 - 5 MAX ;

Forth became very popular in the 1980s because it was well suited to the small microcomputers of that time: very efficient in its use of memory and easily implemented on a new machine. At least one home computer, the British Jupiter ACE, had Forth in its ROM-resident OS. The language is still used in many small computerized devices (called embedded systems) today for three main reasons: efficient memory use, shortened development time, and fast execution speed.

Forth is also infamous as being one of the first and simplest extensible languages. That is, programmers can easily extend the language with new commands appropriate to the primary programming problem in the particular application area. Unfortunately, extensibility also helps poor programmers to write incomprehensible code. The language never achieved wide commercial use, perhaps because it acquired a reputation as a "write-only" language after several companies had product failures caused when a crucial programmer left. In addition, the ease of implementing Forth on a given processor meant that the barrier to self-development of a Forth system was quite low, so that commercial suppliers were, in effect, competing head-to-head with hobbyists, many of whom supported the idea that software should be free.

Facilities of a Forth system

Assembler

Most Forth systems include a specialized assembler that produces executable words. Assembly language words usually end in a macro called "NEXT" which indexes the address interpreter to the next word, and executes it. Forth assemblers usually use a reverse-polish syntax in which the parameters of an instruction precede the instruction. The usual design of a Forth assembler is to construct the instruction on the stack, then copy it into memory as the last step. Registers are often either numbered (0..n, as used in the actual operation code) or named for their purpose in the Forth system: e.g. "s" for the register used as a stack pointer.

Operating System

Classic Forth systems use no operating system. Instead of storing code in files, they store it as source-code in disk blocks written to physical disk addresses. This is more convenient than it sounds, because the numbers come to be familiar. Also, Forth programmers come to be intimately familiar with their disks' data structures, just by editing the disk. Forth systems use a single word "BLOCK" to translate the number of a 1K block of disk space into the address of a buffer containing the data. The Forth system automatically manages the buffers.

File System

Some classic commercial Forth systems have implemented contiguous disk files, using the Forth operating system's disk access, and placing the files at fixed disk block ranges. Usually the system implements records as fixed-length binary data, with an integer number of records per disk block. Quick searching is achieved by hashed access on key data.

Multitasking

Classic Forth systems multitask. They use a special word, "PAUSE" to save all the important registers to the current stack, locate the next task, and restore all the registers.

Tasks are organized as a scratchpad, an area for variables used by the task, and the stacks for that task. The customary way to search for an executable task is to jump to the schedule, which is a linked list consisting of jump instructions. When a software interrupt instruction replaces the jump, the task begins to run. This system is remarkably efficient. In a Forth programming class, ten users have been supported on an 8MHz PDP-11, with each user operating out of less than 4K of RAM and sharing a single floppy disk. In a telephone system, a thousand tasks (one per phone) were supported on a small NOVA minicomputer.

Self and cross compilation

A full-featured Forth system with all source code will compile itself. The usual method is to redefine the handful of words that place compiled bits into memory. The compiler's words therefore use specially-named versions of fetch and store that can be redirected to fetch and store to a buffer area in memory. The buffer area simulates or accesses a memory area beginning at a different address than the actual buffer address. Such compilers define words to access both the target computer's memory, and the host (compiling) computer's memory.

After the fetch and store versions are redefined, the compiler, assembler, etc. are recompiled using the new definitions of fetch and store. This effectively reuses all the code of the compiler and interpreter. Then, the Forth system's code is compiled, but this version is stored in the buffer. The buffer in memory is written to disk, and ways are provided to load it temporarily into memory for testing. When the new version appears to work, it is written over the previous version.

There are numerous variations of such compilers for different environments. For embedded systems, the code may instead be written to another computer over a serial port or even a single TTL bit, while keeping the word names and other non-executing parts of the dictionary in the original compiling computer. The minimum definitions to "remote" a forth compiler are the words that fetch and store a byte, and the word that commands a forth word to be executed. Often the most time-consuming part of a remote port is to construct the initial program to implement fetch, store and execute. Many modern microprocessors have integrated debugging features (such as the Motorola CPU32) that eliminate even this task.

Structure of the language

The basic data structure of Forth is the "dictionary" which maps "words" to executable code or other named data structures. The dictionary is laid out in memory as a single-linked-list with the links proceeding from the latest (most recently) defined word to oldest, until a sentinel, usually a NULL pointer, is found.

A defined word generally consists of two parts: the head and the body. The head consist of two fields, the name field (NF) and the link field (LF); and, the body consists of the code field (CF) and the parameter field. (PF)

Described as a C language structure, the word structure would look this way:

 struct forthword {
       byte _flag; 
       char name[];  
       struct forthword *previous; 
       struct forthword *codeword; 
       byte parameterfield[]; 
     };

The NF starts with a byte-prefix: 5-bit length count of the word's name for a maximum length of 32-bytes, and 3-bits for flags. The character representation of the word's name then follows the prefix. Depending on the particular implementation of the Forth, there may or may not be a final NULL ('\0') byte for alignment.

The LF contains a pointer to the previously defined word. The pointer maybe a relative displacement or an absolute address that points to the next oldest sibling.

The CF will be either the address of the word which will execute the code or data in the PF or the beginning of machine code that the processor will execute directly. For colon defined words, the CF points to the word that will save the current Forth instruction pointer (IP) on the return stack, and load the IP with the new address from which to continue execution of words. This is the same as what a processor's call/return instructions does.

The flag byte in the NF distinguishes words with "compile time" behavior. Most simple words execute the same code whether they are typed on a command line, or embedded in code. Compile-time words have special meanings inside Forth code. The classic examples of compile time words are the control-structures. All of Forth's control structures, and almost all of its compiler are implemented as compile-time words.

When a word is a variable, or other data structure, the CF points to the runtime of the defining word. In this way, a runtime action can be defined specifically for the class of data defined, and thus a Type can be appointed to such words because they all share the same defining word. The runtime action of a defining word is programmer-implemented and completely up to him.

The compiler itself consists of various commands within the language itself. This gives the programmer considerable control of the compiler: not only is its action clear and simple, the compiler commands can be redefined for special purposes.

Computer programs in Forth

Words written in Forth usually compile into lists of addresses of other words, which saves very large amounts of space. The code executed by these words is an "address interpreter." The address interpreter does just enough work to be able to execute the lowest level of words, which are written in assembly language. Generally speaking, a Forth program is saved as the memory image of the compiled program with a single command (e.g., RUN) that is executed when the compiled version is loaded.

Implementation of a Forth System

Forth uses two stacks for each executing task. The stacks are the same width as the index register of the computer, so that they can be used to fetch and store addresses. One stack is the parameter or data stack, used to pass data to words. The other stack is the linkage stack, known as the Return Stack, used to nest words, and store local variables. There are standard words to move data between the stacks, and access variables.

The Forth interpreter looks up words one at a time in the dictionary, and executes their code. The basic algorithm is to search a line of characters for a non-blank, non-control-character string. If this string is in the dictionary, and it is not a compile-time word (marked in the flag byte), the code is executed. If it is not in the dictionary, it may be a number. If it converts to a number, the number is pushed onto the parameter stack. If it does not convert, then the interpreter prints the string, followed by a question mark, and throws away the rest of the line of text.

A Forth compiler produces dictionary entries. Other than that, it tries to simulate the same effect that would be produced by typing the text into the interpreter.

The great secret to implementing Forth is natively compiling it, so that it compiles itself. The basic scheme is to have the compiler defined in terms of a few words that access a code area. Then, one definition of the words compiles to the normal area of memory. Another definition compiles to disk, or to some special memory area. How does one adapt the compiler? One recompiles it with the new definitions. Some systems have even defined the low-level words to communicate with a debugger on a different computer, building up the Forth system in a different computer.

One mainstream use of the Forth programming language is in Open Firmware, a bootloader framework developed by Sun Microsystems.

Hello World

 : HELLO ." Hello World!" ;

Try it out yourself

There's a Forth implementation in JavaScript online, which you can use to try out Forth. The interactive terminal emulation works best with browsers like Firefox, Mozilla, Netscape. Tutorial material is provided there as well:

http://forthfreak.net/jsforth.html

As this interpreter has been written in JavaScript, you browser needs to support JavaScript to run it, of course. Text mode browser elinks does not work, as support for forms (which are used for terminal emulation) is not yet sufficient.

External links

Freely available Forth implementations

MVP Forth with source code available from Mountain View Press

Commercial Forth implementations

Forth communities

History of Forth

This article is licensed under the GNU Free Documentation License.
It uses material from the Wikipedia article "Forth programming language".