Compiler Design and Construction
Introduction
Programming languages are notations for describing computations to people and to machines. The world as we know it depends on programming languages, because all the software running on all the computers was written in some programming language. But, before a program can be run, it first must be translated into a form in which it can be executed by a computer. The software systems that do this translation are called compilers. This book is about how to design and implement compilers. We shall discover that a few basic ideas can be used to construct translators for a wide variety of languages and machines. Besides compilers, the principles and techniques for compiler design are applicable to so many other domains that they are likely to be reused many times in the career of a computer scientist. The study of compiler writing touches upon programming languages, machine architecture, language theory, algorithms, and software engineering.
In this preliminary chapter, we introduce the different forms of language translators, give a high level overview of the structure of a typical compiler, and discuss the trends in programming languages and machine architecture that are shaping compilers. We include some observations on the relationship between compiler design and computer-science theory and an outline of the applications of compiler technology that go beyond compilation. We end with a brief outline of key programming-language concepts that will be needed for our study of compilers.
Names, Identifiers, and Variables
Although the terms "name" and "variable," often refer to the same thing, we use them carefully to distinguish between compile-time names and the run-time locations denoted by names.
An identifier is a string of characters, typically letters or digits, that refers to (identifies) an entity, such as a data object, a procedure, a class, or a type. All identifiers are names, but not all names are identifiers. Names can also be expressions. For example, the name x.y might denote the field y of a structure denoted by x. Here, x and y are identifiers, while x.y is a name, but not an identifier. Composite names like x.y are called qualified names.
A variable refers to a particular location of the store. It is common for the same identifier to be declared more than once; each such declaration introduces a new variable. Even if each identifier is declared just once, an identifier local to a recursive procedure will refer to different locations of the store at different times.
binds the name ARRAYSIZE to the value 1000 statically. We can determine this binding by looking at the statement, and we know that it is impossible for this binding to change when the program executes.
1.6.3 Static Scope and Block Structure
Most languages, including C and its family, use static scope. The scope rules for C are based on program structure; the scope of a declaration is determined implicitly by where the declaration appears in the program. Later languages, such as C++, Java, and C#, also provide explicit control over scopes through the use of keywords like public, private, and protected.
In this section we consider static-scope rules for a language with blocks, where a block is a grouping of declarations and statements. C uses braces I and ) to delimit a block; the alternative use of begin and end for the same purpose dates back to Algol.
Example 1.5 : To a first approximation, the C static-scope policy is as follows:
1. A C program consists of a sequence of top-level declarations of variables and functions.
2. Functions may have variable declarations within them, where variables include local variables and parameters. The scope of each such declaration is restricted to the function in which it appears.
1.6. PROGRAMMING LANGUAGE BASICS 29
Procedures, Functions, and Methods
To avoid saying "procedures, functions, or methods," each time we want to talk about a subprogram that may be called, we shall usually refer to all of them as "procedures." The exception is that when talking explicitly of programs in languages like C that have only functions, we shall refer to them as "functions." Or, if we are discussing a language like Java that has only methods, we shall use that term instead.
A function generally returns a value of some type (the "return type"), while a procedure does not return any value. C and similar languages, which have only functions, treat procedures as functions that have a special return type "void," to signify no return value. Object-oriented languages like Java and C++ use the term "methods." These can behave like either functions or procedures, but are associated with a particular class. 3. The scope of a top-level declaration of a name x consists of the entire program that follows, with the exception of those statements that lie within a function that also has a declaration of x.
The additional detail regarding the C static-scope policy deals with variable declarations within statements. We examine such declarations next and in
Example 1.6.
In C, the syntax of blocks is given by
1. One type of statement is a block. Blocks can appear anywhere that other types of statements, such as assignment statements, can appear.
2. A block is a sequence of declarations followed by a sequence of statements, all surrounded by braces.
Note that this syntax allows blocks to be nested inside each other. This nesting property is referred to as block structure. The C family of languages has block structure, except that a function may not be defined inside another function.
We say that a declaration D "belongs" to a block B if B is the most closely nested block containing D; that is, D is located within B, but not within any block that is nested within B.
The static-scope rule for variable declarations in a block-structured languages is as follows. If declaration D of name x belongs to block B, then the scope of D is all of B, except for any blocks B' nested to any depth within B, in which x is redeclared. Here, x is redeclared in B' if some other declaration D' of the same name x belongs to B'.
An equivalent way to express this rule is to focus on a use of a name x. Let B1, B2, . . . , Bk be all the blocks that surround this use of x, with Bk the smallest, nested within Bk-1, which is nested within Bk-2, and so on. Search for the largest i such that there is a declaration of x belonging to Bi. This use of x refers to the declaration in Bi. Alternatively, this use of x is within the scope of the declaration in Bi.