BASIC interpreters are of historical importance. Microsoft's first product for sale was a BASIC interpreter (Altair BASIC), which paved the way for the company's success. Before Altair BASIC, microcomputers were sold as kits that needed to be programmed in machine code (for instance, the Apple I). During the Altair period, BASIC interpreters were sold separately, becoming the first software sold to individuals rather than to organizations; Apple BASIC was Apple's first software product. After the MITS Altair 8800, microcomputers were expected to ship bundled with BASIC interpreters of their own (e.g., the Apple II, which had multiple implementations of BASIC). A backlash against the price of Microsoft's Altair BASIC also led to early collaborative software development, for Tiny BASIC implementations in general and Palo Alto Tiny BASIC specifically.
BASIC interpreters fell from use as computers grew in power and their associated programs grew too long for typing them in to be a reasonable distribution format. Software increasingly came pre-compiled and transmitted on floppy disk or via bulletin board systems, making the need for source listings less important. Additionally, increasingly sophisticated command shells like MS-DOS and the Mac GUI became the primary user interface, and the need for BASIC to act as the shell disappeared. The use of BASIC interpreters as the primary language and interface to systems had largely disappeared by the mid-1980s.
History
BASIC helped jumpstart the time-sharing era, became mainstream in the microcomputer era, then faded to become just another application in the DOS and GUI era, and today survives in a few niches related to game development, retrocomputing, and teaching.
Time-sharing era
First implemented as a compile-and-go system rather than an interpreter, BASIC emerged as part of a wider movement towards time-sharing systems. General Electric, having worked on the Dartmouth Time-Sharing System and its associated Dartmouth BASIC, wrote their own underlying operating system and launched an online time-sharing system known as Mark I featuring a BASIC compiler (not an interpreter) as one of its primary selling points. Other companies in the emerging field quickly followed suit. By the early 1970s, BASIC was largely universal on general-purpose mainframe computers.[1]
BASIC, as a streamlined language designed with integrated line editing in mind, was naturally suited to porting to the minicomputer market, which was emerging at the same time as the time-sharing services. These machines had very small main memory, perhaps as little as 4 KB in modern terminology, and lacked the high-performance storage like hard drives that make compilers practical. In contrast, an interpreter would take fewer computing resources, at the expense of performance. In 1968, Hewlett Packard introduced the HP 2000, a system that was based around its HP Time-Shared BASIC interpreter.[2] In 1969, Dan Paymar and Ira Baxter wrote another early BASIC interpreter for the Data General Nova.[3]
One holdout was Digital Equipment Corporation (DEC), the leading minicomputer vendor. They had released a new language known as FOCAL, based on the earlier JOSS developed on a DEC machine at the Stanford Research Institute in the early 1960s. JOSS was similar to BASIC in many respects, and FOCAL was a version designed to run in very small memory systems, notably the PDP-8, which often shipped with 4 KB of main memory. By the late 1960s, DEC salesmen, especially in the educational sales department, found that their potential customers were not interested in FOCAL and were looking elsewhere for their systems. This prompted David H. Ahl to hire a programmer to produce a BASIC for the PDP-8 and other DEC machines. Within the year, all interest in alternatives like JOSS and FOCAL had disappeared.[4]
Microcomputer era
The introduction of the first microcomputers in the mid-1970s continued the explosive growth of BASIC, which had the advantage that it was fairly well known to the young designers and computer hobbyists who took an interest in microcomputers, many of whom had seen BASIC on minis or mainframes. BASIC was one of the few languages that was both high-level enough to be usable by those without training and small enough to fit into the microcomputers of the day. In 1972, HP introduced the HP 9830A programmable desktop calculator with a BASIC Plus interpreter in read-only memory (ROM).[5]
In June 1974, Alfred Weaver, Michael Tindall, and Ronald Danielson of the University of Illinois at Urbana-Champaign proved it was possible to produce "A BASIC Language Interpreter for the Intel 8008 Microprocessor," in their paper of the same name, though their application was deployed to an 8008 simulator for the IBM 360/75 and required 16 KB.[6]
In January 1975, the Altair 8800 was announced and sparked the microcomputer revolution. One of the first microcomputer versions of BASIC was co-written by Gates, Allen, and Monte Davidoff for their newly formed company, Micro-Soft. This was released by MITS in punch tape format for the Altair 8800 shortly after the machine itself,[7] showcasing BASIC as the primary language for early microcomputers.
In March 1975, Steve Wozniak attended the first meeting of the Homebrew Computer Club and began formulating the design of his own computer. Club members were excited by Altair BASIC.[8] Wozniak concluded that his machine would have to have a BASIC of its own. At the time he was working at Hewlett Packard and used their minicomputer dialect, HP Time-Shared BASIC, as the basis for his own version. Integer BASIC was released on cassette for the Apple I, and was supplied in ROM when the Apple II shipped in the summer of 1977.[9]
Other members of the Homebrew Computer Club began circulating copies of Altair BASIC on paper tape, causing Gates to write his Open Letter to Hobbyists, complaining about this early example of software piracy. Partially in response to Gate's letter, and partially to make an even smaller BASIC that would run usefully on 4 KB machines,[a]Bob Albrecht urged Dennis Allison to write their own variation of the language. How to design and implement a stripped-down version of an interpreter for the BASIC language was covered in articles by Allison in the first three quarterly issues of the People's Computer Company newsletter published in 1975 and implementations with source code published in Dr. Dobb's Journal of Tiny BASIC Calisthenics & Orthodontia: Running Light Without Overbyte. This led to a wide variety of Tiny BASICs with added features or other improvements, with well-known versions from Tom Pittman and Li-Chen Wang, both members of the Homebrew Computer Club.[10] Tiny BASIC was published openly and Wang coined the term "copyleft" to encourage others to copy his source code. Hobbyists and professionals created their own implementations, making Tiny BASIC an example of a free software project that existed before the free software movement.
Many firms developed BASIC interpreters. In 1976, SCELBI introduced SCELBAL for the 8008[11] and the University of Idaho and Lawrence Livermore Laboratory announced that they would be publishing to the public domain LLL BASIC, which included floating-point support.[12] In 1977, the Apple II and TRS-80 Model I each had two versions of BASIC, a smaller version introduced with the initial releases of the machines and a licensed Microsoft version introduced later as interest in the platforms increased.
Microsoft ported its interpreter to the MOS 6502, which quickly became one of the most popular microprocessors of the 8-bit era. When new microcomputers began to appear, such as the Commodore PET, their manufacturers licensed a Microsoft BASIC, customized to the hardware capabilities. By 1978, MS BASIC was a de facto standard and practically every home computer of the 1980s included it in ROM. In 1980, as part of a larger licensing deal that included other languages and PC DOS, IBM rejected an overture from Atari and instead licensed MS-BASIC over its own implementation, eventually releasing four versions of IBM BASIC, each much larger than prior interpreters (for instance, Cartridge BASIC took 40 KB).[13]Don Estridge, leader of the IBM PC team, said, "IBM has an excellent BASIC--it's well received, runs fast on mainframe computers, and it's a lot more functional than micro-computer BASICs... But [its] number of users were infinitesimal compared to the number of Microsoft BASIC users. Microsoft BASIC had hundreds of thousands of users around the world. How are you going to argue with that?"[14](See Microsoft BASIC for the subsequent history of these different implementations.)
In 1978, David Lien published the first edition of The BASIC Handbook: An Encyclopedia of the BASIC Computer Language, documenting keywords across over 78 different computers. By 1981, the second edition documented keywords from over 250 different computers, showcasing the explosive growth of the microcomputer era.[23]
Interpreters as applications
With the rise of disk operating systems and later graphical user interfaces, BASIC interpreters became just one application among many, rather than providing the first prompt a user might see when turning on a computer.
In 1983, the TRS-80 Model 100portable computer debuted, with its Microsoft BASIC implementation noteworthy for two reasons. First, programs were edited using the simple text editor, TEXT, rather than typed in line by line (but line numbers were still required).[24] Second, this was the last Microsoft product that Bill Gates developed personally.[25][26]
Also in 1983, Microsoft began bundling GW-BASIC with DOS. Functionally identical to IBM BASICA, its BASIC interpreter was a fully self-contained executable and did not need the Cassette BASIC ROM found in the original IBM PC. According to Mark Jones Lorenzo, given the scope of the language, "GW-BASIC is arguably the ne plus ultra of Microsoft's family of line-numbered BASICs stretching back to the Altair--and perhaps even of line-numbered BASIC in general."[27] With the release of MS-DOS 5.0, GW-BASIC's place was taken by QBasic.
MacBASIC featured a fully interactive development environment for the original Macintosh computer and was developed by Donn Denman,[28] Marianne Hsiung, Larry Kenyon, and Bryan Stearns.[29] MacBASIC was released as beta software in 1985 and was adopted for use in places such as the Dartmouth College computer science department, for use in an introductory programming course. It was doomed to be the second Apple-developed BASIC killed in favor of a Microsoft BASIC. In November 1985, Apple abruptly ended the project as part of a deal with Microsoft to extend the license for BASIC on the Apple II.[30][31]
\
BASIC interpreters were not just an American/British development. In 1984, Hudson Soft released Family BASIC in the Japanese market for Nintendo's Family Computer video game console, an integer-only implementation designed for game programming, based on Hudson Soft BASIC for the Sharp MZ80 (with English keywords).[32]Turbo-Basic XL is a compatible superset of Atari BASIC, developed by Frank Ostrowski and published in the December 1985 issue of Germancomputer magazineHappy Computer, making it one of the last interpreters published as a type-in program. The language included a compiler in addition to the interpreter and featured structured programming commands. Several modified versions working with different DOS systems were released by other authors. In France, François Lionet and Constantin Sotiropoulos developed two BASIC interpreters with a focus on multimedia: STOS BASIC for the Atari ST, in 1988,[33] and AMOS BASIC for the Amiga, in 1990.
In 1993, Microsoft released Visual Basic for Applications, a scripting language for Microsoft Office applications, which supersedes and expands on the abilities of earlier application-specific macro programming languages such as Word'sWordBASIC (which had been introduced in 1989).
In 1999, Benoît Minisini released Gambas as an alternative for Visual Basic developers who had decided to migrate to Linux.[36]
In 2000, Lee Bamber and Richard Vanner released DarkBASIC, a game creation system for Microsoft Windows, with accompanying IDE and development tools.[37]
In 2002, Emmanuel Chailloux, Pascal Manoury and Bruno Pagano published a Tiny BASIC as an example of developing applications with Objective Caml.[39]
In 2011, Microsoft released Small Basic (distinct from SmallBASIC), together with a teaching curriculum[40] and an introductory guide.,[41] designed to help students who have learnt visual programming languages such as Scratch learn text-based programming.[42] The associated IDE provides a simplified programming environment with functionality such as syntax highlighting, intelligent code completion, and in-editor documentation access.[43] The language has only 14 keywords.[44] In 2019, Microsoft announced Small Basic Online (SBO), allowing students to run programs from a web browser.[45][46]
In 2014, Robin H. Edwards released Arduino BASIC for the Arduino, and now a widely forked implementation.[47] Another implementation using the same name was adapted from Palo Alto Tiny BASIC in 1984 by Gordon Brandly for his 68000 Tiny BASIC, later ported to C by Mike Field.[48]
Many BASIC interpreters are now available for smartphones and tablets via the Apple App Store, or Google Play store for Android.
Today, coding BASIC interpreters has become part of the retrocomputing hobby. Higher level programming languages on systems with extensive RAM have simplified implementing BASIC interpreters. For instance, line management is simple if your implementation language supports sparse matrixes, variable management is simple with associative arrays, and program execution is easy with eval functions. As examples, see the open-source project Vintage BASIC, written in Haskell[49] or the OCaml Tiny BASIC.
Sales and distribution
Initially, interpreters were either bundled with computer hardware or developed as a custom service, before an industry producing independently packaged software for organizations came about in the late 1960s.[50] BASIC interpreters were first sold separately from microcomputers, then built-in, before becoming sold as applications again in the DOS era.
As the market shifted to ROMs, ROM size came to dominate decisions about how large a BASIC interpreter could be. Because RAM were sold as 4 KB chips, Altair BASIC was initially packaged in separate editions for 4K, 8K, and 12K; this carried over to ROM chips, as manufacturers would decide how many ROM chips they could fit in their design, given price goals and other constraints.
Compilers vs. interpreters
Compilers vs. Interpreters
Aspect
Compiler
Interpreter
Optimized for
Performance
Memory usage
Execution speed
Faster
Slower
Memory usage
Higher
Lower
Secondary storage
Required
Optional
Error checking
Before execution
During execution
Source code
Not embedded in executable
Required to execute
The first implementation of BASIC, Dartmouth BASIC, was a compiler. Generally, compilers examine the entire program in a multi-step process and produce a second file that is directly executable in the host computer's underlying machine language without reference to the source code. This code is often made up of calls to pre-written routines in the language's runtime system. The executable will normally be smaller than the source code that created it.
The main disadvantage of compilers, at least in the historical context, is that they require large amounts of temporary memory. As the compiler works, it is producing an ever-growing output file that is being held in memory along with the original source code. Additional memory for temporary lookups, notably line numbers in the case of BASIC, adds to the memory requirement. Computers of the era had very small amounts of memory; in modern terms a typical mainframe might have on the order of 64 KB. On a timesharing system, the case for most 1960s BASICs, that memory was shared among many users.
In order to make a compiler work, the systems had to have some form of high-performance secondary storage, typically a hard drive. Program editing took place in a dedicated environment that wrote the user's source code to a temporary file. When the user ran the program, the editor exited and ran the compiler, which read that file and produced the executable code, and then finally the compiler would exit and run the resulting program. Splitting the task up in this fashion reduced the amount of memory needed by any one of the parts of the overall BASIC system; at any given time, only the editor, compiler, or runtime had to be loaded, the rest was on storage.
While mainframes had small amounts of memory, minicomputers had even smaller amounts: 4 and 8 KB systems were typical in the 1960s. But far more importantly, minicomputers tended to lack any form of high-performance storage; most early designs used punch tape as a primary storage system, and magnetic tape systems were for the high end of the market. In this environment, a system that wrote out the source, compiled it, and then ran the result would have taken minutes. Because of these constraints, interpreters proliferated.
Interpreters ultimately perform the same basic tasks as compilers, reading the source code and converting that into executable instructions calling runtime functions. The primary difference is when they perform the various tasks. In the case of a compiler, the entire source code is converted during what appears to the user as a single operation, whereas an interpreter converts and runs the source one statement at a time. The resulting machine code is executed, rather than output, and then that code is then discarded and the process repeats with the next statement. This dispenses with the need for some form of secondary storage while an executable is being built. The primary disadvantage is that you can no longer split the different parts of the overall process apart - the code needed to convert the source into machine operations has to be loaded into memory along with the runtime needed to perform it, and in most cases, the source code editor as well.
Producing a language with all of these components that can fit into a small amount of memory and still has room for user's source code is a major challenge, but it eliminates the need for secondary storage and was the only practical solution for early minicomputers and most of the history of the home computer revolution.
Development
Language design
Language design for the first interpreters often simply involved referencing other implementations. For instance, Wozniak's references for BASIC were an HP BASIC manual and a copy of 101 BASIC Computer Games. Based on these sources, Wozniak began sketching out a syntax chart for the language.[51] He did not know that HP's BASIC was very different from the DEC BASIC variety used in 101 Games. The two languages differed principally in terms of string handling and control structures.[52]Data General Business Basic, an integer-only implementation, was the inspiration for Atari BASIC.[53]
In contrast, Dennis Allison, a member of the Computer Science faculty at Stanford University, wrote a specification for a simple version of the language.[54] Allison was urged to create the standard by Bob Albrecht of the Homebrew Computer Club, who had seen BASIC on minicomputers and felt it would be the perfect match for new machines like the Altair. Allison's proposed design only used integer arithmetic and did not support arrays or string manipulation. The goal was for the program to fit in 2 to 3 kilobytes of memory. The overall design for Tiny BASIC was published in the September 1975 issue of the People's Computer Company (PCC) newsletter.
The grammar is listed below in Backus–Naur form.[55] In the listing, an asterisk ("*") denotes zero or more of the object to its left — except for the first asterisk in the definition of "term", which is the multiplication operator; parentheses group objects; and an epsilon ("ε") signifies the empty set. As is common in computer language grammar notation, the vertical bar ("|") distinguishes alternatives, as does being listed on separate lines. The symbol "CR" denotes a carriage return.
This syntax, as simple as it was, added one innovation: GOTO and GOSUB could take an expression rather than a line number, providing an assigned GOTO[56] rather than the switch statement of the ON-GOTO/GOSUB structure more typical of BASIC.
Sinclair BASIC used as its language definition the 1978 American National Standards Institute (ANSI) Minimal BASIC standard, but was itself an incomplete implementation with integer arithmetic only.[57] The ANSI standard was published after the design of the first generation of interpreters for microcomputers.
Early microcomputers lacked development tools, and programmers either developed their code on minicomputers or by hand. For instance, Dick Whipple and John Arnold wrote Tiny BASIC Extended directly in machine code, using octal.[59] Robert Uiterwyk handwrote MICRO BASIC for the SWTPC (a 6800 system) on a legal pad.[60] Steve Wozniak wrote the code to Integer BASIC by hand, translating the assembler code instructions into their machine code equivalents and then uploading the result to his computer.[61] (Because of this, the program was very hard to change, and Wozniak was not able to modify it quickly enough for Steve Jobs, who subsequently licensed BASIC from Microsoft.[62])
Gates and Allen did not have an Altair system on which to develop and test their interpreter. However, Allen had written an Intel 8008emulator for their previous venture, Traf-O-Data, that ran on a PDP-10time-sharing computer. Allen adapted this emulator based on the Altair programmer guide, and they developed and tested the interpreter on Harvard's PDP-10.[63] When Harvard stopped their use of this system, Gates and Allen bought computer time from a timesharing service in Boston to complete their BASIC program debugging. Gates claimed, in his Open Letter to Hobbyists in 1976, the value of the computer time for the first year of software development was $40,000.[64]
Not that Allen couldn't handcode in machine language. While on final approach into the Albuquerque airport on a trip to demonstrate the interpreter, Allen realized he had forgotten to write a bootstrap program to read the tape into memory. Writing in 8080 machine language, Allen finished the program before the plane landed. Only when he loaded the program onto an Altair and saw a prompt asking for the system's memory size did he know that the interpreter worked on the Altair hardware.[65][66]
One of the most popular of the many versions of Tiny BASIC was Palo Alto Tiny BASIC, or PATB for short. PATB first appeared in the May 1976 edition of Dr. Dobbs, written in a custom assembler language with non-standard mnemonics. Li-Chen Wang had coded his interpreter on a time-share system with a generic assembler.
One exception to the use of assembly was the use of ALGOL 60 for the Paisley XBASIC interpreter for Burroughs large systems.[67] Another exception, and type-in program, was Classic BASIC, written by Lennart Benschop in Forth and published in the Dutch Forth magazine Vijgeblad (issue #42, 1993).[68]
The source code of interpreters was often open source (as with Tiny BASIC) or published later by the authors. The complete annotated source code and design specifications of Atari BASIC were published as The Atari BASIC Source Book in 1983.[69]
While virtual machines had been used in compile and go systems such as BASIC-PLUS, these were only for executing BASIC code, not parsing it.[70] Tiny BASIC, in contrast, was designed to be implemented as a virtual machine that parsed and executed (interpreted) BASIC statements; in such an implementation, the Tiny BASIC interpreter is itself run on a virtual machine interpreter.[71] The length of the whole interpreter program was only 120 virtual machine operations, consisting of 32 commands.[72] Thus the choice of a virtual machine approach economized on memory space and implementation effort, although the BASIC programs run thereon were executed somewhat slowly. (See Tiny BASIC: Implementation in a virtual machine for an excerpt and sample commands.) While the design intent was for Tiny BASIC to use a virtual machine, not every implementation did so; those that did included Tiny BASIC Extended, 6800 Tiny BASIC,[73] and NIBL.
For its TI-99/4 and TI-99/4A computers, Texas Instruments designed a virtual machine with a language called GPL, for "Graphic Programming Language".[74] (Although widely blamed for the slow performance of TI-BASIC, part of the problem was that the virtual machine was stored in graphics ROM, which had a slow 8-bit interface.)[75]
A misunderstanding of the Apple II ROMs led some to believe that Integer BASIC used a virtual machine, a custom assembler language contained in the Apple ROMs and known as SWEET16. SWEET16 is based on bytecodes that run within a simple 16-bit virtual machine, so memory could be addressed via indirect 16-bit pointers and 16-bit math functions calculated without the need to translate those to the underlying multi-instruction 8-bit 6502 code.[76] However, SWEET16 was not used by the core BASIC code, although it was later used to implement several utilities, such as a line renumbering routine.[77]
Program editing and storage
Program editing
Most BASIC implementations of the era acted as both the language interpreter as well as the line editor. When BASIC was running, a >command prompt was displayed where the user could enter statements.[78] This was known as "direct mode". Upon boot, a BASIC interpreter defaulted to direct mode.
Statements that were entered with leading numbers are entered into the program storage for "deferred execution",[79] either as new lines or replacing any that might have had the same number previously.[80] Statements that were entered without a line number were referred to as commands, and ran immediately. Line numbers without statements (i.e., followed by a carriage return) deleted a previously stored line.
When a program was present in memory and the user types in the RUN command, the system enters "indirect mode". In this mode, a pointer is set to point to the first line of the program, for instance, line 10. The original text for that line is then retrieved from the store and run as if the user had just typed it in direct mode. The pointer then advances to the next line and the process continues.
Different implementations offered other program-editing capabilities. Altair BASIC 8K had an EDIT command to shift into an editing mode for one line. Integer BASIC, also included the AUTO command to automatically enter line numbers at a given starting number like AUTO 100, adding 10 to the last number with every new line. AUTO 300,5 would begin numbering at line 300 by fives; 300, 305, etc. Automatic numbering was turned off by entering MAN.[81] Some interpreters offered line-renumbering commands or utilities.
Tokenizing and encoding lines
To save RAM, and speed execution, all BASIC interpreters would encode some ASCII characters of lines into other representations. For instance, line numbers were converted into integers stored as bytes or words, and keywords might be assigned single-byte tokens (for instance, storing PRINT as the byte value 145, in MS-BASIC). These representations would then be converted back to readable text when LISTing the program.
Encoding and Tokenization in Prominent BASIC Interpreters
As an alternative to tokenization, to save RAM, early Tiny BASIC implementations like Extended Tiny BASIC,[82] Denver Tiny BASIC[83] and MINOL[84] truncated keywords: PR for PRINT, IN for INPUT, RET for RETURN. The full, traditional keywords were not accepted.
In contrast, Palo Alto Tiny BASIC accepted traditional keywords but allowed any keyword to be abbreviated to its minimal unique string, with a trailing period. For instance, PRINT could be typed P., although PR. and other variations also worked. This system was retained in Level I BASIC for the TRS-80, which used PATB, and was also found in Atari BASIC and the BASIC of various Sharp Pocket Computers.[85]
To expand an abbreviation, the Atari BASIC tokenizer searches through its list of reserved words to find the first that matches the portion supplied. More commonly used commands occur first in the list of reserved words, with REM at the beginning (it can be typed as .). When the program is later LISTed it will typically write out the full words. MS BASICs also allowed ? as a short-form for PRINT, but did expand it when listing, treating it as an abbreviation, not a synonym.
Tokenization
Most BASIC interpreters perform at least some conversion from the original text form into various platform-specific formats. Tiny BASIC was on the simple end: it only converted the line number from its decimal format into binary. For instance, the line number "100" became a single byte value, $64, making it smaller to store in memory as well as easier to look up in machine code (a few designs of Tiny BASIC permitted line numbers from only 1 to 254 or 255, although most used double byte values and line numbers of at least 1 to 999). The rest of the line was left in its original text format.[86] In fact, Dennis Allison argued that, given memory constraints, tokenization would take more code to implement than it would save.[87]
MS-BASICs went slightly further, converting the line number into a two-byte value and also converting keywords, like FOR or PRINT, into a single-byte value, the "token".[88] The token value had the high bit set to allow them to be easily distinguished at runtime. Everything else on a line was left in its original format, so for instance, the line:
10 FOR I=1 TO 10
would be tokenized as:
$64$81 I$B211$A410
Note that the space between FOR and I remains in the tokenized line, and the variable names and constants are not tokenized. The code that performed this tokenization, known as "the chunker", simply copied anything it did not recognize as a token back into the output, preserving spaces as-is. This meant that PRINTA was stored in two bytes, while PRINT A was stored in three bytes, and removing spaces was a common way to improve memory use.[89]Sinclair BASIC modified this slightly, removing spaces from the stored code and inserting them in code during a LIST, such that PRINTA would appear as PRINT A yet not take up the extra byte in memory.
In contrast, Integer BASIC would convert the line 10 GOTO 100 entirely into tokens that could be immediately read and performed. In MS-BASIC, the line would produce $64 $89 100, and at runtime the "100" would have to be converted to 16-bit format every time it was encountered. In contrast, Integer BASIC also tokenized numeric variables, avoiding this conversion and speeding up execution. The resulting two byte value was inserted into the tokenized code along with a prefix byte indicating a number followed. The prefix was a value between $B0 and $B9, the last nibble of the value being the first decimal digit in the original value. String literals, like "HELLO WORLD" were instead encoded by setting the high bit of each character so that A was stored as $C1. Variable names were converted in the same fashion, with the letters encoded to have their high-bit turned on, and any digits in the name represented by the corresponding $B0 through $B9, so that the variable A5 would be encoded as $C1B5 (not reduced to a token).[90] There were numerous other optimizations; where Microsoft BASIC had one token for the keyword PRINT, Integer BASIC had three tokens: one if the keyword was followed by no arguments, one if followed by an arithmetic expression, and one if followed by a string literal.[91]
Carrying this even further, Atari BASIC's tokenizer parses the entire line when it is entered or modified. Numeric constants are parsed into their 48-bit internal form and then placed in the line in that format, while strings are left in their original format, but prefixed with a byte describing their length. Variables have storage set aside as they are encountered, instead of at runtime, and their name is replaced with a pointer to their storage location in memory. Shepardson referred to this early-tokenizing concept as a "pre-compiling interpreter"; statements with syntax errors could not actually be stored, and the user was immediately prompted to correct them.[92]
Some interpreters, such as the Sinclair systems, basically had the user do the tokenization by providing special keystrokes to enter reserved words. The most common commands need one keystroke only; for example, pressing only P at the start of a line on a Spectrum produces the full command PRINT. Less frequent commands require more complex key sequences.[93] As every line starts with a keyword, LET is not optional, after a keyword is typed the system drops back to accepting text character-by-character. One upside to this approach is that the tokenizer cannot confuse strings with keywords. For instance, it allows a variable to be named PRINT and output its value with PRINT PRINT.
Many "pocket computers" similarly use one keystroke (sometimes preceded by various kinds of shift keys) to produce one byte (the keyword token) that represented an entire BASIC keyword, such as EXP, SQR, IF, or PEEK, such as Sharp pocket computer character sets and TI-BASIC. The BASIC expansion for the Bally Astrocade use this as well.
Valid line numbers varied from implementation to implementation, but were typically from 1 to 32767.
Most of the memory used by BASIC interpreters was to store the program listing itself. Numbered statements were stored in sequential order in a sparse array implemented as a linear collection (technically not a list as no line number could occur more than once).
Many Tiny BASIC implementations stored lines as follows:
Binary equivalent of line number (one or two bytes, depending on range of valid line numbers supported)
ASCII source statement (variable length)
Carriage return (one byte, set to 13)
Microsoft BASIC, starting with Altair BASIC, stored lines as follows:[94]
Pointer to the next line (two bytes)
Binary equivalent of line number (two bytes, unsigned)
Forward pointer to next sequential line (two bytes)
Length of ASCII source statement (one byte)
ASCII source statement (variable length)
The maximum length of a line varied: 64 characters in Palo Alto Tiny BASIC, including the decimal representation of the line number; 120 characters in Atari BASIC; 128 characters in Integer BASIC;[96] and 255 characters in MS-BASIC (not including the line number).
Interpreters would search the program a line at a time, looking at each line number. If it were lower than the new line number, the later lines would be moved in memory to make room for the space required for the new line. If it were the same line number, and not the exact same length, subsequent lines would need to be moved forward or backward.[97] (Because sequential order was always maintained in memory, these were not linked lists.)
In Tiny BASIC, these searches required checking every byte in a line: the pointer would be incremented again and again until a carriage return was encountered, to find the byte before the next line. In Altair BASIC and LLL BASIC, on the other hand, the pointer would instead be set to the start of the next sequential line; this was much faster, but required two bytes per line. Given that Tiny BASIC programs were presumed to be 4 KB or less in size, this was in keeping with Tiny BASIC's general design philosophy of trading off performance in favor of minimizing memory usage.
When the user typed LIST into the command line, the system would loop over the array of lines, using one of these methods, convert the line number back to decimal format, and then print out the rest of the text in the line, decoding any tokens or other encoded representations.
Dartmouth BASIC and HP-BASIC limited variable names to at most two characters (either a single letter or a letter followed by one digit; e.g., A to Z9). MS-BASIC allowed variable names of a letter followed by an optional letter or digit (e.g., A to ZZ) but ignored subsequent characters: thus it was possible to inadvertently write a program with variables "LOSS" and "LOAN", which would be treated as being the same; assigning a value to "LOAN" would silently overwrite the value intended as "LOSS".
Integer BASIC was unusual in supporting any length variable name (e.g., SUM, GAMEPOINTS, PLAYER2), provided it did not contain a reserved word.[98] Keywords could not be used in variables in many early BASICs; "SCORE" would be interpreted as "SC" OR "E", where OR was a keyword.
String variables are usually distinguished in many microcomputer dialects of BASIC by having $ suffixed to their name, and values are often identified as strings by being delimited by "double quotation marks". Later implementations would use other punctuation to specify the type of a variable: A% for integer, A! for single precision, and A# for double precision.
With the exception of arrays and (in some implementations) strings, and unlike Pascal and other more structured programming languages, BASIC does not require a variable to be declared before it is referenced. Values will typically default to 0 (of the appropriate precision) or the null string.
Symbol table
Because Tiny BASIC only used 26 single-letter variables, variables could be stored as an array without storing their corresponding names, using a formula based on the ASCII value of the letter as the index. Palo Alto Tiny BASIC took this a step further: variables 'two-byte values were located in RAM within the program, from bytes 130 (ASCII 65, 'A', times two) to 181 (ASCII 90, 'Z', times two, plus one for the second byte).[85]
Most BASICs provided for the ability to have far more than 26 variables and so needed symbol tables, which would set aside storage capacity for only those variables used.
In LLL BASIC, each entry in the symbol table was stored as follows:[99]
Variable name (byte 1: ASCII letter; byte 2: 0-9 ASCII or binary 0)
Forward pointer (2 bytes)
Value (4 bytes per element, 1 element if a scalar variable, otherwise as many elements as DIMensioned for an array)
Unlike most BASIC interpreters, UIUC BASIC had a hash function, hashing by the letter of the variable/function/array name, then conducting a linear search from there. In UIUC BASIC, a symbol table entry was:[58]
Flag (bit 0: entry in use; bit 6: user-defined function; bit 7: array}
Variable name (byte 1: ASCII letter; byte: 0-9 ASCII, " ", or "(") or function name (byte 1: ASCII letter or token 154 for FN; ASCII letter)
Value (5 bytes):
Floating-point value for a scalar
Array definition (last 3 bytes: upper dimension of first, second, third dimension, all assumed to start at 0)
User function (first 2 bytes with the address of the function; byte 3 is symbol table displacement to the dummy variable parameter in function definition).
In Atari BASIC, a set of pointers (addresses) indicated various data: variable names were stored in the variable name table (VNTP – 82, 8316) and their values were stored in the variable value table (pointed to at VVTP – 86, 8716). By indirecting the variable names in this way, a reference to a variable needed only one byte to address its entry into the appropriate table. String variables had their own area.
One BBC BASIC performance optimization included using multiple linked lists for variable lookup rather than a single long list, as in Microsoft BASIC.
Memory management
Because of the small RAM capacity of most systems originally used to run BASIC interpreters, clever memory management techniques had to be employed. Altair BASIC let users reclaim the space for trigonometry functions if those weren't being used during a session. PATB placed the start of the most common subroutines at the front of the program for use by the 1-byte RST 8080 opcode instead of the 3-byte CALL opcode. In LLL BASIC, some variables occupied the same memory locations, in cases where the different variables were used only in command mode or only at runtime.[100]
Video was often memory addressable, and certain esoteric functions were available by manipulating values at specific memory values. For instance, addresses 32 to 35 contained the dimensions of the text window (as opposed to the graphics window) in Applesoft BASIC. The POKE command and the PEEK function (adapted from machine code monitors such as the DECsystem-10 monitor[101]) provided direct memory access, for a variety of purposes,[102] especially for modifying special memory-mappedhardware registers to control particular functions of the computer such as the input/output peripherals. "Memory maps" (in the archaic sense of lists of memory addresses and their functions) were popular for use with PEEK and POKE, with one of the best known memory maps being the book Mapping the Atari, written by Ian Chadwick.
Some implementations of the Microsoft interpreter, for example those running on the TRS-80 Models I/III, required the user to specify the amount of memory to be used by the interpreter. This was to permit a region of memory to be reserved for the installation of machine language subroutines that could be called by the interpreted program, for greater speed of execution. When the Models I/III are powered up, the user is greeted with the prompt "Memory size?" for this purpose.
Mathematics
Integer BASIC, as its name implies, uses integers as the basis for its math package. These were stored internally as a 16-bit number, little-endian (as is the 6502). This allowed a maximum value for any calculation between −32767 and 32767. Calculations that resulted in values outside that range produced an error.[103]
Most Tiny BASIC interpreters (as well as Sinclair BASIC 4K) supported mathematics using integers only, lacking floating-point support. Using integers allowed numbers to be stored in a much more compact 16-bit format that could be more rapidly read and processed than the 32- or 40-bit floating-point formats found in most BASICs of the era. However, this limited its applicability as a general-purpose language.
Business BASIC implementations, such as Data General Business Basic, were also integer-only, but typically at a higher precision: "double precision", i.e. 32-bit (plus or minus 2,147,483,648) and "triple precision" (plus or minus 1.4x10^14).
One story encapsulates why floating point was considered so important. The original prototype of the TRS-80 Model I ran Li-Chen Wang's public domain version of Tiny BASIC. This required only 2 KB of memory for the interpreter, leaving an average of another 2 KB free for user programs in common 4 KB memory layouts of early machines. During a demonstration to executives, Tandy Corporation's then-President Charles Tandy tried to enter his salary but was unable to do so. This was because Tiny BASIC used 2-byte signed integers with a maximum value of 32,767. The result was a request for floating-point math for the production version.[105] This led to the replacement of the existing 16-bit integer code with a version using 32-bit single-precision floating-point numbers by Tandy-employee Steve Leininger.[106]
SCELBAL used floating point routines published by Wadsworth in 1975 in Machine Language Programming for the 8008 based on a 32-bit (four byte) format for numeric calculations, with a 23-bit mantissa, 1-bit sign for the mantissa, a 7-bit exponent, and 1-bit sign for the exponent. These were organized in reverse order, with the least significant byte of the mantissa in the first byte, followed by the middle and then most significant byte with the sign in the high bit. The exponent came last, again with the sign in the high bit.[107] The manual provides well-documented assembly code for the entire math package, including entry points and usage notes.[108]
Consultants were typically brought into handle floating-point arithmetic, a specialist domain well studied and developed for the scientific and commercial applications that had characterized mainframes. When Allen and Gates were developing Altair BASIC, fellow Harvard student Monte Davidoff convinced them to switch from integer arithmetic. They hired Davidoff to write a floating-point package that could still fit within the 4KB memory limits. Steve Wozniak turned to Roy Rankin of Stanford University for implementing the transcendental functions LOG, LOG10, and EXP;[109] however, Wozniak never finished adding floating-point support to Integer BASIC. LLL BASIC, developed at the University of Idaho by John Dickenson, Jerry Barber, and John Teeter, turned to David Mead, Hal Brand, and Frank Olken for their floating-point support.[110] For UIUC BASIC, a Datapoint 2200 floating-point package was licensed.[111]
In contrast, time-shared systems had often relied on hardware. For instance, the GE-235 was chosen for implementing the first version of Dartmouth BASIC specifically because it featured an "Auxiliary Arithmetic Unit" for floating point and double-precision calculations.[112][113]
Here is the value 0.15625 as stored in this format:
While 32-bit formats were common in this era, later versions of BASIC, starting with Microsoft BASIC for the MOS 6502, generally adopted a 40-bit (five byte) format for added precision.[114]
Operators and functions
Infix operators typically included + (addition), - (subtraction), * (multiplication), / (division), and exponent using the ^ character. Relative operations included the standard set of =, >, <, >=, <=, and for "not equal" either <> or the HP-TSB-inspired #.[115] Binary operators, such as AND, OR and NOT, weren't in every implementation, and some did Boolean algebra and some did not.
Dartmouth BASIC's initial edition included the following functions: ABS (absolute value), ATN (arctangent), COS (cosine), EXP (e raised to the power), INT (truncate any fractional value, returning an integer), LOG (logarithm), RND (pseudorandom number generator), SIN (sine), SQR (square root), and TAN (tangent). It also included the DEF FN statement to declare one-line functions, which would then be referred to as FNA(), FNB(), etc.
The RND function was the most widespread function to be supported in early BASICs, though implementations varied:
Dartmouth's RND ignored the parameter and always returned a new pseudorandom number between 0 and 1.
Altair BASIC and later Microsoft BASICs used the sign of the parameter: For RND(X), "X<0 starts a new sequence of random numbers using X. Calling RND with the same X starts the same random number sequence. X=0 gives the last random number generated."[116]
Being unable to return a decimal, integer-only BASICs instead used the value of the parameter, typically to specify an upper bound for the randomization; for example, in Integer BASIC itself, RND(6)+1 would simulate a die roll, returning values from 1 to 6.
In contrast, in some TRS-80 BASICs, the parameter was the upper bound that could be returned; for instance, RND(6) would return a value from 1 to 6, and RND(1) would always return 1.[117]
Arrays
The second version of Dartmouth BASIC supported matrices and matrix operations, useful for the solution of sets of simultaneous linear algebraic equations; MAT matrix operations such as assignment, addition, multiplication (of compatible matrix types) and evaluation of a determinant were supported.
In contrast, Tiny BASIC as initially designed didn't even have any arrays, due to the limited main memory available on early microcomputers, often 4 KB, which had to include both the interpreter and the BASIC program. Palo Alto Tiny BASIC added a single variable-length array of integers, the size of which did not have to be dimensioned but used RAM not used by the interpreter or the program listing, A().
SCELBAL supported multiple arrays, but taken together these arrays could have no more than 64 items. Integer BASIC supported arrays of a single dimension, limited in size only by the available memory.[118] Tiny BASIC Extended supported two-dimensional arrays of up to 255 by 255. Altair BASIC 4K supported only arrays (one dimension) while the 8K version supported matrices of up to 34 dimensions.[119]
Many implementations supported the Dartmouth BASIC practice of not requiring an array to be dimensioned, in which case it was assumed to have 11 elements (0 to 10); e.g., {{{1}}} would create the 11-element array as a side effect.
The dope vector of arrays varied from implementation to implementation. For instance, the dope vector of an Altair BASIC 4K array:[94]
Variable name (2 bytes)
Size of the array elements in bytes (2 bytes, so 4 times the number of elements, which was the upper bound plus one)
Then the array values themselves:
Element 0 value (4 bytes)
Element 1 value (4 bytes)
...
Element N value (4 bytes)
Implementations that supported matrices had to record the number of dimensions and the upper bound of each dimension. Further, as some interpreters had only one data type (either floating point or integer), the dope vector merely needed to record the number of dimensions and the upper bound of each dimension. Interpreters with multiple data types had to record the data type of the array.
Even though Microsoft and other BASICs did support matrices, matrix operations were not built in but had to be programmed explicitly on array elements.
The original Dartmouth BASIC, some of its immediate descendants, and Tiny BASIC implementations lacked string handling. Two competing schools of string-handling evolved, pioneered by HP and DEC, although other approaches came later. These required different strategies for implementation.
The simplest string handling copied HP Time-Shared BASIC and defined string variables as arrays of characters that had to be DIMensioned prior to use. Strings in HP TSB are treated as an array of characters, up to 72 in total, rather than a single multi-character object. By default, they are allocated one character in memory, and if a string of longer length is needed, they have to be declared. For instance, DIMA$[10] will set up a string that can hold a maximum of 10 characters.[120]
Substrings within strings are accessed using a "slicing" notation: A$(L,R) or A$[L,R], where the substring begins with the leftmost character specified by the index L and continues to the rightmost character specified by the index R, or the A$[L] form where the substring starts at the leftmost character specified by the index L and continues to the end of the string. TSB accepts () or [] interchangeably. Array and substring indices start with 1.
This is in sharp contrast to BASICs following the DEC pattern that use functions such as LEFT$(), MID$(), and RIGHT$() to access substrings. Later adopted by ANSI BASIC, HP's notation can also be used on the destination side of a LET or INPUT statement to modify part of an existing string value, for example 100A$[3,5]="XYZ" or 120B$[3]="CHANGE ALL BUT FIRST TWO CHARS", which cannot be done with early implementations of LEFT$/MID$/RIGHT$.
Later versions of Dartmouth BASIC did include string variables. However, they did not use the LEFT$/MID$/RIGHT$ functions for manipulating strings, but instead used the CHANGE command which converted the string to and from equivalent ASCII values. (Later adopted as is by DEC and adapted by HP, which changed the keyword to CONVERT.[120]) Additionally, one could use the single-quote to convert a numeric constant to an ASCII character, allowing one to build up a string in parts; A$='23 '64 '49 "DEF" produced the string "ABCDEF", without the need for the CHR$() function.[120] Dartmouth BASIC Sixth Edition supported SEG$ (for MID$) and POS (for INSTR).
Some of the Tiny BASIC implementations supported one or more predefined integer arrays, which could be used to store character codes, provided the language had functionality to input and output character codes (e.g., Astro BASIC had KP and TV for this purpose).
Having strings use a fixed amount of memory regardless of the number of characters used within them, up to a maximum of 255 characters, may have wasted memory[124] but had the advantage of avoiding the need for implementing garbage collection of the heap, a form of automatic memory management used to reclaim memory occupied by strings that are no longer in use. Short strings that were released might be stored in the middle of other strings, preventing that memory from being used when a longer string was needed.
On early microcomputers, with their limited memory and slow processors, BASIC garbage collection could often cause apparently random, inexplicable pauses in the midst of program operation. Some BASIC interpreters, such as Applesoft BASIC on the Apple II family, repeatedly scanned the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in O(n2) performance, which could introduce minutes-long pauses in the execution of string-intensive programs. Garbage collection was notoriously slow or even broken in other versions of Microsoft BASIC.[125] Some operating systems that supported interrupt-driven background tasks, such as TRSDOS/LS-DOS 6.x on the TRS-80 Model 4, exploited periods of user inactivity (such as the milliseconds-long periods between keystrokes and periods following video screen refresh) to process garbage collection during BASIC program runs.
Other functionality
Graphics and sound
Most BASIC interpreters differed widely in graphics and sound, which varied dramatically from microcomputer to microcomputer. Altair BASIC lacked any graphics or sound commands, as did the Tiny BASIC implementations, while Integer BASIC provided a rich set.
Level I BASIC for the TRS-80 had as minimal a set as possible: CLS, for CLear Screen; SET(X,Y), which lit a location on the display; RESET(X,Y), which turned it off; and POINT(X,Y), which returned 1 if a location was lit, 0 if it was not. The coordinates could be any expression and ranged from 0 to 127 for the X-axis and 0 to 47 for the Y-axis. Only black-and-white display was supported.[126]
In contrast, Integer BASIC supported color graphics, simple sound, and game controllers. Graphics mode was turned on with the GR statement and off with TEXT.[127] Drawing was modal and normally started by issuing a command to change the color, which was accomplished by setting a pseudo-variable; COLOR=12 would set the drawing color to 12, light green. One could then PLOT 10,10 to produce a single spot of that color,[128]HLIN 0,39 AT 20 to draw a horizontal line at row 20 that spanned the screen, or VLIN 5,15 AT 7 to draw a shorter vertical line down column 7.[129]A=SCRN X,Y returned the color of the screen at X,Y.[130][b]
Hardware manufacturers often included proprietary support for semigraphics, simple shapes and icons treated as special characters. Examples included the block graphics of the ZX-81, and the card symbols of ♠, ♣, ♥ and ♦ in the Commodore InternationalPETSCII character set. BASIC could generate these symbols using PRINT CHR$();.
Microsoft added many graphics commands to IBM BASIC: LINE, PSET (Pixel SET), PRESET (Pixel RESET), GET (stores a rectangle of the screen to an array), PUT (displays a stored rectangular segment), LOCATE (to move the text cursor), and DRAW, which sketches shapes using a LOGO-like syntax. Bill Gates and Neil Konzen wrote DONKEY.BAS, a bundled game, to demonstrate the interpreter's color graphics and sound.[131]
Input/output
Another area where implementations diverged was in keywords for dealing with media (cassettes and floppy disks), keyboard input, and game controllers (if any).
Since ROM-based BASIC interpreters often functioned as shells for loading in other applications, implementations added commands related to cassette tapes (e.g., CLOAD and CSAVE), binary disk files (e.g., BLOAD, BSAVE, and BRUN), and BASIC programs on disk (e.g., LOAD, SAVE, and CATALOG). Business BASIC implementations added commands for random-access files. (Even ROM-based BASIC interpreters were not designed or intended to be used as operating systems, and smaller microcomputers simply lacked any OS at all.[132])
Dartmouth BASIC lacked a command for getting input from the keyboard without pausing the program. To support videogames, BASICs added proprietary commands for doing so: INKEY$ was a function in Microsoft BASIC that would return an empty string if no key was pressed or otherwise a single character; KP (for KeyPress) returned the ASCII value of the input in Astro BASIC.
Palo Alto Tiny BASIC lacked strings but would allow users to enter mathematical expressions as the answer to INPUT statements; by setting variables, such as Y=1; N=0, the user could answer “Y” or “1” or even "3*2-5" at a yes/no prompt.
Some systems supported game controllers. Astro BASIC supported JX() (specified joystick's horizontal position), JY() (joystick vertical position), KN() (knob status), and TR() (trigger status). Integer BASIC supported a game controller, a paddle controller, which had two controllers on a single connector. The position of the controller could be read using the PDL function, passing in the controller number, 0 or 1, like A=PDL(0):PRINT A, returning a value between 0 and 255.[133][c]
Integer BASIC lacked any custom input/output commands, and also lacked the DATA statement and the associated READ. To get data into and out of a program, the input/output functionality was redirected to a selected card slot with the PR#x and IN#x, which redirected output or input (respectively) to the numbered slot. From then on, data could be sent to the card using conventional PRINT commands and read from it using INPUT.[130] Producing sounds was accomplished by PEEKing the memory-mapped location of a simple "beeper", −16336.[d]
Structured programming
While structured programming, through the examples of ALGOL 58 and ALGOL 60, were known to Kemeny and Kurtz when they designed BASIC, they adapted only the for-loop, ignoring the else-statement, while-loop, repeat loop, named procedures, parameter passing, and local variables. As a result, subsequent dialects often differed dramatically in the wording used for structured techniques. For instance, WHILE...WEND (in Microsoft BASIC), WHILE...ENDWHILE (in Turbo-Basic XL), DO...LOOP WHILE and even WHILE clauses (both in BASIC-PLUS).
Of the Tiny BASIC implementations, only National Industrial Basic Language (NIBL) offered a loop command of any sort, DO/UNTIL.[135] This was despite the inventor of Tiny BASIC, Dennis Allison, publicly lamenting the state of BASIC.[136]
BBC BASIC was one of the first microcomputer interpreters to offer structured BASIC programming, with named DEF PROC/DEF FN procedures and functions, REPEAT UNTIL loops, and IF THEN ELSE structures inspired by COMAL. Second-generation BASICs—for example, SBASIC (1976), BBC BASIC (1981), True BASIC (1983), Beta BASIC (1983), QuickBASIC (1985), and AmigaBASIC (1986) -- introduced a number of features into the language, primarily related to structured and procedure-oriented programming. Usually, line numbering is omitted from the language and replaced with labels (for GOTO) and procedures to encourage easier and more flexible design.[137] In addition keywords and structures to support repetition, selection and procedures with local variables were introduced.
The following example is in Microsoft QBASIC, Microsoft's third implementation of a structured BASIC (following Macintosh BASIC in 1984 and Amiga BASIC in 1985).[138]
REM QBASIC exampleREM Forward declaration - allows the main code to call aREM subroutine that is defined later in the source codeDECLARESUBPrintSomeStars(StarCount!)REM Main program followsDOINPUT"How many stars do you want? (0 to quit) ",NumStarsCALLPrintSomeStars(NumStars)LOOPWHILENumStars>0ENDREM subroutine definitionSUBPrintSomeStars(StarCount)REMThisprocedureusesalocalvariablecalledStars$Stars$=STRING$(StarCount,"*")PRINTStars$ENDSUB
One of the unique features of BBC BASIC was the inline assembler, allowing users to write assembly language programs for the 6502 and, later, the Zilog Z80, NS32016 and ARM. The assembler was fully integrated into the BASIC interpreter and shared variables with it, which could be included between the [ and ] characters, saved via *SAVE and *LOAD, and called via the CALL or USR commands. This allowed developers to write not just assembly language code, but also BASIC code to emit assembly language, making it possible to use code-generation techniques and even write simple compilers in BASIC.
Execution
Debugging
As in most BASICs, programs were started with the RUN command, and as was common, could be directed at a particular line number like RUN 300.[141] Execution could be stopped at any time using Ctrl+C[142] (or BREAK such as on the TRS-80) and then restarted with CONTinue (CON in Integer BASIC).[143] Taking advantage of the unique capabilities of interpreted programs (code is processed in realtime one statement at a time, in contrast to compilers), the user at the console could examine variable data using the PRINT statement, and change such data on-the-fly, then resume program execution.
For step-by-step execution, the TRON or TRACE instruction could be used at the command prompt or placed within the program itself. When it was turned on, line numbers were printed out for each line the program visited. The feature could be turned off again with TROFF or NOTRACE.[144]
Some implementations such as the Microsoft interpreters for the various TRS-80 models included the command ON ERROR GOSUB. This would redirect program execution to a specified line number for special error handling.
Unlike most BASICs, Atari BASIC scanned the just-entered program line and reported syntax errors immediately. If an error was found, the editor re-displayed the line, highlighting the text near the error in inverse video.
In many interpreters, including Atari BASIC, errors are displayed as numeric codes, with the descriptions printed in the manual.[145] Many MS-BASIC used two-character abbreviations (e.g., SN for SYNTAX ERROR). Palo Alto Tiny BASIC and Level I BASIC used three words for error messages: "WHAT?" for syntax errors, "HOW?" for run-time errors like GOTOs to a line that didn't exist or numeric overflows, and "SORRY" for out-of-memory problems.
Parsing
While the BASIC language has a simple syntax, mathematical expressions do not, supporting different precedence rules for parentheses and different mathematical operators. To support such expressions requires implementing a recursive descent parser.[146]
This parser can be implemented in a number of ways:
As a virtual machine, as discussed above for many Tiny BASIC implementations. The value of the Tiny BASIC initiative was in specifying an implementation of a parser.
Directly in code, as in Palo Alto Tiny BASIC and Integer BASIC. In Integer BASIC, the runtime interpreter used two stacks for execution: one for statement keywords and the other for evaluating the parameters. Each statement was given two priorities: one that stated where it should occur in a multi-step operation, like a string of mathematical operations to provide order of operations, and another that suggested when evaluation should occur, for instance, calculating internal values of a parenthesis formula. When variables were encountered, their name was parsed and then looked up in the symbol table. If it was not found, it was added to the end of the list. The address of the variable's storage, perhaps freshly created, was then placed on the evaluation stack.[90]
Performance
The range of design decisions that went into programming a BASIC interpreter were often revealed through performance differences.
Line-management implementations often affected performance and typically used linear search. Delimiting each line with a CR would make a GOTO or GOSUB to a later line would take longer, as the program would need to iterate over all the lines to find the target line number. In some implementations, such as Atari BASIC, the length of each line was recorded and stored after the line number, so that the program did not have to scan each character of the line to find the next carriage return. Many implementations would always search for a line number to branch to from the start of the program; MS-BASIC would search from the current line, if the destination line number was greater. Pittman added a patch to his 6800 Tiny BASIC to use a binary search.[148]
Working solely with integer math provides another major boost in speed. As many computer benchmarks of the era were small and often performed simple math that did not require floating-point, Integer BASIC trounced most other BASICs.[e] On one of the earliest known microcomputer benchmarks, the Rugg/Feldman benchmarks, Integer BASIC was well over twice as fast as Applesoft BASIC on the same machine.[150] In the Byte Sieve, where math was less important but array access and looping performance dominated, Integer BASIC took 166 seconds while Applesoft took 200.[151] It did not appear in the Creative Computing Benchmark, which was first published in 1983, by which time Integer BASIC was no longer supplied by default.[152] The following test series, taken from both of the original Rugg/Feldman articles,[150][149] show Integer's performance relative the MS-derived BASIC on the same platform.
In theory, Atari BASIC should have run faster than contemporary BASICs based on the Microsoft pattern. Because the source code is fully tokenized when it is entered, the entire tokenization and parsing steps are already complete. Even complex mathematical operations are ready-to-run, with any numerical constants already converted to its internal 48-bit format, and variables values are looked up by address rather than having to be searched for. In spite of these theoretical advantages, in practice, Atari BASIC is slower than other home computer BASICs, often by a large amount.[153] In practice, this was not borne out. On two widely used benchmarks of the era, Byte magazine's Sieve of Eratosthenes and the Creative Computing benchmark test written by David H. Ahl, the Atari finished near the end of the list in terms of performance, and was much slower than the contemporary Apple II or Commodore PET,[154] in spite of having the same CPU but running it at roughly twice the speed of either. It finished behind relatively slow machines like the Sinclair ZX81 and even some programmable calculators.[155]
Most of the language's slowness stemmed from three problems.[153] The first is that the floating-point math routines were poorly optimized. In the Ahl benchmark, a single exponent operation, which internally loops over the slow multiplication function, was responsible for much of the machine's poor showing.[153] Second, the conversion between the internal floating-point format and the 16-bit integers used in certain parts of the language was relatively slow. Internally, these integers were used for line numbers and array indexing, along with a few other tasks, but numbers in the tokenized program were always stored in binary-coded decimal (BCD) format.[156] Whenever one of these is encountered, for instance, in the line number in GOTO 100, the tokenized BCD value has to be converted to an integer, an operation that can take as long as 3500 microseconds.[157] Other BASICs avoided this delay by special-casing the conversion of numbers that could only possibly be integers, like the line number following a GOTO, switching to special ASCII-to-integer code to improve performance. Third was how Atari BASIC implemented branches and FOR loops. To perform a branch in a GOTO or GOSUB, the interpreter searches through the entire program for the matching line number it needs.[158] One minor improvement found in most Microsoft-derived BASICs is to compare the target line number to the current line number, and search forward from that point if it is greater, or start from the top if less. This improvement was missing in Atari BASIC.[153] Unlike almost all other BASICs, which would push a pointer to the location of the FOR on a stack, so when it reached the NEXT it could easily return to the FOR again in a single branch operation, Atari BASIC pushed the line number instead. This meant every time a NEXT was encountered, the system had to search through the entire program to find the corresponding FOR line. As a result, any loops in an Atari BASIC program cause a large loss of performance relative to other BASICs.[153]
^Microsoft BASIC left 780 bytes free for user program code and variable values on a 4 KB machine, and that was running a cut-down version lacking string variables and other functionality.
^Note the odd syntax of the SCRN, which is technically a function because it returns a value, but does not use function-like syntax which would be A=SCRN(X,Y).
^The manual suggests, but does not outright state, that the actual range of values is less than 0 to 255.[133]
^The negative number is a side-effect of the integers being stored in signed format, so any memory location over 32767 appeared as a negative value in BASIC.[134]
^Bill Gates complained about this, stating that it was unfair to compare Integer BASIC to a "real" BASIC like MS.[149]
^A BASIC Language Interpreter for the Intel 8008 Microprocessor. Department of Computer Science, University of Illinois at Urbana-Champaign (published 1974). June 1974.
^Gates, Bill. "Bill Gates Interview". National Museum of American History, Smithsonian Institution (Interview). Interviewed by David Allison. Retrieved April 10, 2013.
^Manes, Stephen; Andrews, Paul (21 January 1994). Gates: How Microsoft's Mogul Reinvented an Industry--and Made Himself the Richest Man in America: Stephen Manes, Paul Andrews: 9780671880743: Amazon.com: Books. Touchstone. ISBN0671880748.
^Allison, Dennis (July 1976). "Design notes for TINY BASIC". SIGPLAN Notices. 11 (7). ACM: 25–33. doi:10.1145/987491.987494. S2CID18819472. The ACM Special Interest Group on Programming Languages (SIGPLAN) reprinted the Tiny Basic design notes from the January 1976 Tiny BASIC Journal.
^ abA BASIC Language Interpreter for the Intel 8008 Microprocessor. Department of Computer Science, University of Illinois at Urbana-Champaign (published 1974). June 1974. pp. 16–19.
^Weyhrich 2001, The [Integer] BASIC, which we shipped with the first Apple II's, was never assembled — ever. There was one handwritten copy, all handwritten, all hand-assembled..
^Altair 8800 BASIC Reference_Manual 1975, Page 68 of PDF, "Using the PEEK function and OUT statement of 8K BASIC, the user can write a binary dump program in BASIC. Using INP and POKE it is possible to write a binary loader. PEEK and POKE can be used to store byte oriented information. When you initialize BASIC, answer the MEMORY SIZE? question with the amount of
memory in your ALTAIR minus the amount of memory you wish to use as storage for byte formatted data."
^A BASIC Language Interpreter for the Intel 8008 Microprocessor. Department of Computer Science, University of Illinois at Urbana-Champaign (published 1974). June 1974. p. 20.
^A BASIC Language Interpreter for the Intel 8008 Microprocessor. Department of Computer Science, University of Illinois at Urbana-Champaign (published 1974). June 1974. pp. 24–36.
^Pittman, Tom (1981). "The First Book of Tiny BASIC Programs". Retrotechnology.com. Itty Bitty Computers. Retrieved August 5, 2020. Because TA is so large (19,703 bytes), I found that execution became excruciatingly slow, simply due to the memory scan for GOTOs, GOSUBs, and RETURNs. A simple patch to the interpreter converts it to a binary search algorithm, for about an order of magnitude speedup in execution time. The necessary changes are listed in the Appendix.