Compiler Design and Construction
Programming languages are notations for describing computations to people
and to machines. The world as we know it depends on programming languages,
because all the software running on all the computers was written in some
programming language. But, before a program can be run, it first must be
translated into a form in which it can be executed by a computer.
The software systems that do this translation are called compilers.
This book is about how to design and implement compilers. We shall discover that a few basic ideas can be used to construct translators for a wide
variety of languages and machines. Besides compilers, the principles and techniques for compiler design are applicable to so many other domains that they
are likely to be reused many times in the career of a computer scientist. The
study of compiler writing touches upon programming languages, machine architecture, language theory, algorithms, and software engineering.
In this preliminary chapter, we introduce the different forms of language
translators, give a high level overview of the structure of a typical compiler,
and discuss the trends in programming languages and machine architecture
that are shaping compilers. We include some observations on the relationship
between compiler design and computer-science theory and an outline of the
applications of compiler technology that go beyond compilation. We end with
a brief outline of key programming-language concepts that will be needed for
our study of compilers.
Names, Identifiers, and Variables
Although the terms "name" and "variable," often refer to the same thing,
we use them carefully to distinguish between compile-time names and the
run-time locations denoted by names.
An identifier is a string of characters, typically letters or digits, that
refers to (identifies) an entity, such as a data object, a procedure, a class,
or a type. All identifiers are names, but not all names are identifiers.
Names can also be expressions. For example, the name x.y might denote
the field y of a structure denoted by x. Here, x and y are identifiers, while
x.y is a name, but not an identifier. Composite names like x.y are called
A variable refers to a particular location of the store. It is common for
the same identifier to be declared more than once; each such declaration
introduces a new variable. Even if each identifier is declared just once, an
identifier local to a recursive procedure will refer to different locations of
the store at different times.
binds the name ARRAYSIZE to the value 1000 statically. We can determine
this binding by looking at the statement, and we know that it is impossible
for this binding to change when the program executes.
1.6.3 Static Scope and Block Structure
Most languages, including C and its family, use static scope. The scope rules
for C are based on program structure; the scope of a declaration is determined
implicitly by where the declaration appears in the program. Later languages,
such as C++, Java, and C#, also provide explicit control over scopes through
the use of keywords like public, private, and protected.
In this section we consider static-scope rules for a language with blocks,
where a block is a grouping of declarations and statements. C uses braces I and
) to delimit a block; the alternative use of begin and end for the same purpose
dates back to Algol.
Example 1.5 : To a first approximation, the C static-scope policy is as follows:
1. A C program consists of a sequence of top-level declarations of variables
2. Functions may have variable declarations within them, where variables
include local variables and parameters. The scope of each such declaration
is restricted to the function in which it appears.
1.6. PROGRAMMING LANGUAGE BASICS 29
Procedures, Functions, and Methods
To avoid saying "procedures, functions, or methods," each time we want
to talk about a subprogram that may be called, we shall usually refer to
all of them as "procedures." The exception is that when talking explicitly
of programs in languages like C that have only functions, we shall refer
to them as "functions." Or, if we are discussing a language like Java that
has only methods, we shall use that term instead.
A function generally returns a value of some type (the "return type"),
while a procedure does not return any value. C and similar languages,
which have only functions, treat procedures as functions that have a special
return type "void," to signify no return value. Object-oriented languages
like Java and C++ use the term "methods." These can behave like either
functions or procedures, but are associated with a particular class.
3. The scope of a top-level declaration of a name x consists of the entire
program that follows, with the exception of those statements that lie
within a function that also has a declaration of x.
The additional detail regarding the C static-scope policy deals with variable
declarations within statements. We examine such declarations next and in
In C, the syntax of blocks is given by
1. One type of statement is a block. Blocks can appear anywhere that other
types of statements, such as assignment statements, can appear.
2. A block is a sequence of declarations followed by a sequence of statements,
all surrounded by braces.
Note that this syntax allows blocks to be nested inside each other. This
nesting property is referred to as block structure. The C family of languages
has block structure, except that a function may not be defined inside another
We say that a declaration D "belongs" to a block B if B is the most closely
nested block containing D; that is, D is located within B, but not within any
block that is nested within B.
The static-scope rule for variable declarations in a block-structured languages is as follows. If declaration D of name x belongs to block B, then the
scope of D is all of B, except for any blocks B' nested to any depth within B,
in which x is redeclared. Here, x is redeclared in B' if some other declaration
D' of the same name x belongs to B'.
An equivalent way to express this rule is to focus on a use of a name x.
Let B1, B2, . . . , Bk be all the blocks that surround this use of x, with Bk the
smallest, nested within Bk-1, which is nested within Bk-2, and so on. Search
for the largest i such that there is a declaration of x belonging to Bi. This use
of x refers to the declaration in Bi. Alternatively, this use of x is within the
scope of the declaration in Bi.
Download the pdf file to get all the unit wise notes.
Try to save your valueable time!