Diff: Forth - Waikato Linux Users Group

Differences between version 28 and predecessor to the previous major change of Forth.

Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 28	Last edited on Tuesday, March 20, 2007 5:21:38 pm	by SamuelFalvo	Revert
Older page:	version 26	Last edited on Wednesday, February 14, 2007 8:29:16 pm	by LawrenceDoliveiro	Revert

@@ -36,21 +36,23 @@

A [Forth] machine has two stacks. When subroutines (called ''words'' in [Forth] lingo) are called they pop arguments off the ''parameter stack'' and push back return values. The ''return stack'' holds the return addresses of word calls.

Finally, the input buffer for the [Forth] parser forms an integral part of a [Forth] system.

-[Forth] programs are written in ReversePolish notation. [Forth] words access the ''parameter stack'' directly to fetch arguments and leave return values. The stack is also used for temporary storage, as traditionally there are no local variables in [Forth]. There are words for flipping around values on the top of the stacks to help you with this. As you program in [Forth] you must keep careful track of what is supposed to be on the stack at every point. This is all part of ChuckMoore's philosophy of brutal simplicity in software engineering~~, and the most indigestible aspect of [Forth]~~ .

+[Forth] programs are written in ReversePolish notation. [Forth] words access the ''parameter stack'' directly to fetch arguments and leave return values. The stack is also used for temporary storage, as traditionally there are no local variables in [Forth]. There are words for flipping around values on the top of the stacks to help you with this. As you program in [Forth] you must keep careful track of what is supposed to be on the stack at every point, though this is much easier than it first seems (beginners tend to make this more complicated than necessary, thus giving Forth a bad reputation as a "write-only" language) . This is all part of ChuckMoore's philosophy of brutal simplicity in software engineering.

-The ''~~user~~ dictionary'' ~~looks a bit~~ like a stack ~~and~~ is ~~organised~~ in ~~sets called~~ ''~~vocabularies~~ ''. The first vocabulary is a tiny kernel of word definitions written in MachineCode, then there is a standard library vocabulary defined in [Forth] ~~(compiled to ByteCode)~~ , and then optional vocabularies that define a simple text editor, assembler and disk OperatingSystem. (A modern [Forth] system might have a [TCP/IP] stack on there somewhere too.)

+The ''dictionary'' is structured also like a stack; there is a ''here'' pointer, where everything behind it is considered ''allocated'' and part of the runtime system, and everything in front is considered temporary storage at best. As words are defined, ''here '' is advanced. When you wish to recycle memory, you ''forget'' a word (usually by executing a word defined by '''MARKER'''; see the ANSI Forth specification for details), which resets ''here'' to that word's definition point.

+

+Some Forth's permit multiple vocabularies to exist concurrently, analogous to C++ namespaces . The first vocabulary is a tiny kernel of word definitions written in MachineCode, then there is a standard library vocabulary defined in [Forth], and then optional vocabularies that define a simple text editor, assembler and disk OperatingSystem. (A modern [Forth] system might have a [TCP/IP] stack on there somewhere too.)

There is no syntax in [Forth], just words for manipulating other words, the input buffer and the user dictionary.

The word <tt>:</tt> tells [Forth] to begin compiling the subsequent words found on the input buffer.

-Some words have flags that tell [Forth] to execute them immediately while compiling: these create control structures in the ~~ByteCode of a~~ word. Ordinary words ~~are~~ compiled into <tt>call</tt> statements. Words that look like integers ~~are~~ compiled into <tt>push</tt> statements. ~~The word~~ <tt>;</tt> pushes the new defintion onto the dictionary and ~~ends compilation~~ . ~~(This is~~ the ~~gist.)~~

+Some words have flags that tell [Forth] to execute them immediately while compiling: these create control structures in the word. Ordinary words may be compiled into <tt>call</tt> statements, or they may be expanded to native machine language instructions . Words that look like integers may be compiled into the equivalent of <tt>push</tt> statements, or may be embedded into immediate operands of machine instructions . Note that the definition is ''usually'' built piece-meal in the dictionary, so that when <tt>;</tt> is executed, the definition is already finished. Highly optimizing compilers, however, will wait to put anything at all in the dictionary until <tt>;</tt> is encountered, where <tt>;</tt> pushes the new defintion onto the dictionary after proper lexical and parsed analysis . To help generate efficient code, some highly optimizing compilers will even ''infer types'' based on words you call in the definition!

-You use <tt>:</tt> to add new words to the [Forth] system until you've created a high-level [Forth] vocabulary that you can easily express your problem in~~… then you~~ just keep going until you've defined a single word that, when called, executes your entire program. [Forth] is very much a BottomUp language. :-)

+You use <tt>:</tt> to add new words to the [Forth] system until you've created a high-level [Forth] vocabulary that you can easily express your problem in. You just keep going until you've defined a single word that, when called, executes your entire program. [Forth] is very much a BottomUp language. :-) However, this is absolutely no different from programming in any other language; in C, all programs ''ultimately'' function from a single <tt>main()</tt> function. Likewise, in Haskell as well.

-MachineCode statements can be used as [Forth] ByteCode~s so the machine can execute them directly. ~~This~~ is called ''subroutine threading''. It is one of a number of approaches to reduce the overhead of interpretation (which spends a lot of time executing call statements).

+MachineCode statements can be used as [Forth] ByteCode~s so the machine can execute them directly. One form of this is called ''subroutine threading''. It is one of a number of approaches to reduce the overhead of interpretation (which spends a lot of time executing call statements).

Real [Forth] geeks write their own [Forth] kernels from the metal up. (I wrote my own, on paper, when I was 14. (I don't think I can do that anymore.) I was working from a [Forth] book and a Z80 opcode table; I didn't have a computer. I still have pieces of it. God knows if it would have worked.) ChuckMoore creates his own [Forth] machines from silicon. It's not ''very'' hard once you've absorbed the [Forth] philosophy. There is shockingly little holding [Forth] up.

!!! Examples

@@ -81,25 +83,48 @@

drop \ discard buffer address

;

</verbatim>

-Yet another example, a simple text editor (from [Retro Native Forth | http://retro.tunes.org]):

+Notice the style of code is predominantly vertical, leading to a program structure that is intrinsically very hard to read. The while loop's condition check, for example, consists of the overwhelming bulk of the code. This, generally, is considered very poor programming style in Forth.

+

+ Yet another example, a simple text editor (from [Retro Native Forth | http://retro.tunes.org]), is much easier to read, and demonstrates the preferred programming method that is more horizontal. Just as words in a college dictionary are written in only a few lines, and are predominantly horizontal, so too is the code below. Another way of thinking about this is in terms of functional programming, where most definitions are, despite being defined at the global scope, really private to the main word :

0 variable lastBlock

asmbase $1024 + constant blockBase

-: block 512 * blockBase + ;

-: select dup lastBlock ! ;

-: vl dup 40 type cr 40 + ;

-: view lastBlock @ block vl vl vl vl vl vl vl vl vl vl vl vl ;

-: edit select block s drop drop view ;

-: delete select block 512 0 fill view ;

-: edit-line 40 * lastBlock @ block + s drop drop view ;

-: delete-line 40 * lastBlock @ block + 37 0 fill view ;

+

+ : block 512 * blockBase + ;

+: select dup lastBlock ! ;

+: vl dup 40 type cr 40 + ;

+: view lastBlock @ block vl vl vl vl vl vl vl vl vl vl vl vl ;

+: edit select block s drop drop view ;

+: delete select block 512 0 fill view ;

+: edit-line 40 * lastBlock @ block + s drop drop view ;

+: delete-line 40 * lastBlock @ block + 37 0 fill view ;

</verbatim>

-As you can see, [ Forth~~] appeals most~~ to AssemblyLanguage ~~types~~ .

+For example, the definition of "block" can be likened to the more traditional functional style:

+

+<verbatim>

+let block b = blockBase + (512*b)

+</verbatim>

+

+In fact, there are many similarities and dualities between Forth and functional programming languages.

+

+ * Haskell compiles software to a tree of ''closures,'' basically packets of code and data that, taken together, form a ''function'' in the mathematical sense. In Forth, because parameters are placed on the stack, there is no need to store free-variables or bound-variables in the program structure. Hence, a poiner to a closure consists ''only'' of its code pointer; therefore, typical Forth "threaded" code maps 1:1 to Haskell's closure-based execution environment. (The fact that [bigFORTH | http://bigforth.sf.net/] and [GHC Haskell Compiler | http://www.haskell.org] produce binaries that are about equally performant is no coincidence.)

+

+ * Forth has issues using local variables, but can maintain global state with ease. Haskell has issues using global state, but can maintain local variables with ease. To address these issues, ANSI Forth introduces a portable locals wordset, while Haskell introduced monads to deal with global state. Both constructs are fully expressable in the core language, and both address (essentially) different sides of the same problem.

+

+ * Forth allows the compiler to be extended by fundamentally altering the compiler in terms of words written in the base language itself. Haskell utilizes monads for this same purpose.

+

+ * Haskell uses the <tt>>>=</tt> operator to compose the ''right-hand's'' monadic function onto the results returned by the ''left-hand'' computation. This is, save for the threading of state, ''normal function composition.'' In Forth, which is described mathematically as a purely combinator-based language, composition is performed by concatenation -- e.g., simply listing one word after another. Hence, both Forth and Haskell code can be expressed via function composition, and therefore, reasoned about algebraically.

+

+ * Forth allows one to freely alter the return stack. Return stack items are (usually) return addresses, and therefore, (partial) continuations. Freely altering the return stack is dangerous and can produce unstable software. However, there are nonetheless distinct *patterns* of return stack manipulations you can use to produce _extremely_ compact representations of fairly high-level control flow structures. This introduces the concept of "return conventions," as distinct from "calling conventions." It turns out that Haskell uses return conventions as well, under the hood, to optimize pattern-matching in case and function evaluation.

+

+ As you can see, Forth shares many traits with functional languages, and therefore doesn't always apply to the AssemblyLanguage programmer. In fact, besides embedded devices, ''music'' is one of Forth's application domains, as it is used in some of UCSD's music courses. Forth has successfully been used in artificial intelligence research, and is the language of choice for NASA on many satellites and planetary probes. In these environments, the functional correctness and compactness of Forth weighs more heavily than the use of assembly language .

+

+-----

[Forth humour at UserFriendly|http://ars.userfriendly.org/cartoons/?id=20070213&mode=classic]

-----

CategoryProgrammingLanguages, CategoryMachineOrientedProgrammingLanguages, CategorySystemsProgrammingLanguages