Lesson 2.1 Programming
What is Programming?
Before we dive into the Java programming language, let's take a look at where Java came from. In this lesson, you'll learn a little about the history of computers, and computer programming.
Modern computer systems [in general] consist of three pieces:
- A processing unit [CPU] that performs simple arithmetic and comparisons.
- Memory that can store information used by the processing unit.
- Input/Output devices that allow you to store information in memory and see the results of your calculations.
The instructions that tell the processing unit which calculations to perform, and which information to use in those calculations, is called a program. Programming is the art [or science, or even black-magic, if you prefer] of creating programs.
The First Generations
Modern digital computers were developed in the late 1940s. These first computers had a processing unit, memory, and input/output devices, just like the computers we use today, but, surprisingly, they didn't have programs. Instead, these machines were "wired" to perform a specific function , just like the inexpensive 4-function calculator you can purchase at K-Mart. These computers' memory didn't hold programming instructions; it was used only to hold data.
The first breakthrough in programming, [as we know it], came when John von Neumann realized that memory could be used to store computer instructions along with data, the information processed by the computer.
Machine Language
The key to understanding these early stored-program computers--as well as the computers we still use today--is to realize that every CPU understands only one language. This language is called machine language .
Here is an Intel Pentium machine language program. The rows of numbers on the left are the locations (addresses) in memory where each instruction is stored. The columns of numbers and letters on the right are the actual machine language instructions. The instructions stored in memory are actually binary numbers, but here they are displayed in hexadecimal [base 16] notation to make them easier to read.

As you can see, machine language is not very much like human language.
- First, all machine language is a numeric language, because the memory inside your computer can only store numeric data. Even when you work with text [such as viewing this web page], the computer is working with binary numbers. Because of this, writing machine language programs is very slow, tedious, and error-prone.
- Second, every different CPU family uses a different machine language. The machine language for the Intel Pentium is entirely different from the machine language used on the Macintosh or the Sun SPARC.
The Second Generation
Despite the difficulties of using machine language, programming expanded exponentially in the late 1940s and early 1950s. Machine-language programming might be difficult, but it is easier, and certainly less expensive, than building new "hard-wired" computing machines.
These early "lean-and-mean" machine language programs quickly became too large to easily understand. If something went wrong, machine language programmers had to create a printout that showed values in each memory cell when the error occurred--[this is called a "core dump"]. Core dump in hand, the programmer then had to translate these values stored in memory into the basic instructions that the computer could perform: adding two numbers, perhaps, or storing a value at a particular location. Only after the raw machine values were translated into their corresponding computer instructions, was the programmer ready to unravel the problem.
Assembly Language
Assembly language was the first big step up from machine language, but it wasn't really all that big of a step. Assembly language is simple a mnemonic replacement for machine language. Instead of entering the numbers 54 24 66 9C FE C2 84 92 into memory, the assembly language programmer could write LDX 24, [669C].
Here's a portion of the same machine language program shown earlier, this time in assembly language, as well as machine language. As you can see, the location where each instruction is stored is still displayed on the left. The machine language instructions are now displayed one instruction per line. [Note that some instructions are very short--a single byte--while others can take several bytes].
The third column from the left displays the assembly language mnemonic instructions that the assembly language programmer uses in place of the machine language instructions. To subtract 1 from the memory location named CX, for instance, the programmer can write DEC CX, instead of the actual numeric code [49] understood by the CPU.
Here are some things you should remember about assembly language.
- Even though programmers were more productive, they still had to write one line of assembly language code for every machine instruction .
- The computer didn't understand assembly language at all, only machine language. After writing an assembly language program, the programmer had to convert it into machine code before the program would run it. This was done using a program called an assembler.
Interpreters
Computers completely lack intuition and common sense. They follow their programmed instructions in a literal--in fact, mechanical--way. You can't just tell your computer to "print the budget report"; you have to explain every single step.
In machine and assembly language, things are even worse. Something as simple as printing a sentence on the screen can take half a page of code; and often, it's the same half page of code that you've written a dozen times already.
To lessen the burden of repetition, and to increase productivity, programmers started to create libraries of code that performed common tasks. Along with these libraries, they also started inventing "higher-level" versions of assembly language. In these higher level languages you could:
- Combine many lines of assembly code into a single instruction, called a macro. Instead of writing 50 lines of code to print a single line of output, you'd use only one.
- Avoid having to translate your program into assembly or machine language.
Even with these "high-level" assembly languages, the computer still only understood machine language. Instead of using an assembler, however, these systems used a second program running on the computer to read each "psuedo instruction" and produce machine code. This second program, called an interpreter, only generated machine code when the program actually ran.
Virtual Machine Languages
These high-level interpreters not only made programmers more productive, but they also addressed another problem that afflicted machine and assembly-language programs: portablility.
Early computers were very expensive and individually built. Because of this, when a new computer was developed, companies found that the programs they'd written for their previous machines would not run on new models.
The interpreter provided a clever solution to this problem: create an "ideal" machine language and then simply write an interpreter for the ideal language when a new machine was released. The most popular of these virtual machine languages were Speedcode and Shortcode.
Today, Java uses the same concept. Its virtual machine language is called byte code, and the interpreter you use to run it is called a Java Virtual Machine or JVM.
Compilers
By the end of the 1950s, both computers and interpreters had become widely entrenched in the business community. Interpreters and virtual assembly languages such as Speedcode, and Shortcode, allowed programmers to become much more productive. These much more productive programmers did what all productive people do--they produced more stuff; in the programmers' case they produced bigger and better programs.
Well, bigger anyway.
As programs got bigger, the weaknesses of the interpreter approach became obvious. Because so much of the interpreter's time was spent translating from "pseudo instructions" into machine language, interpreted programs ran much slower than hand written machine language programs. And, in those days, people were cheap, but computers were expensive.
A programmer named Grace Hopper is credited with an insight that seems obvious in retrospect. Instead of translating Speedcode into machine code every time you run the program, just do the translation once. Save the translated code on disk or tape and reuse it every time you need to run your program. This invention was called the compiler. Today, most programming languages use some form of compiler.
Hopper's language, called Flowmatic, was the last and greatest, of the 2nd generation languages.
The Third Generation
Once people realized that computers could translate virtual assembly languages like Speedcode and Flowmatic into machine code, they began to wonder if, perhaps, computers could do the same for more "natural" languages. [Natural for human beings, that is.] This marked the beginning of the third generation of computer languages - High Level Languages.
High-level Languages
The basic idea behind a high-level language is straightforward. Instead of writing a computer program in terms that the computer uses, write it in terms of the problem to be solved. Since different people want to solve different kinds of problems, different languages were developed.
FORTRAN
Developed by John Backus at IBM in the mid-to-late 1950s, the FORmula TRANslator language let engineers write programs using familiar notation. Beginning in 1954, FORTRAN set the standard for estimating the length of a programming project, when Backus predicted it would be finished in six months. The first version was delivered in 1958.
COBOL
In the same way that FORTRAN turned engineers into programmers, COBOL, the Common Business Oriented Language, attempted to recruit accountants and other business professionals. And, it was wildly successful. More programs have been written in COBOL [in the last 40 years] than in any other language.
COBOL was created by a committee called the Conference on Data System Languages (CODASYL). It was let by led by Joe Wegstein of NBS (now NIST), who was an early computer pioneer.
Algol
In the 1950s, if you wanted to program in FORTRAN, you had to purchase an IBM mainframe computer, and, out of the goodness of their hearts, IBM threw in a FORTRAN compiler for free. [Well, maybe not for free.] If you wanted to run FORTRAN on another system, however, you were out of luck.
Like COBOL, Algol (the Algorithmic Language) was the product of a committee intent on producing a common numeric and scientific programming language that would not be tied to a particular vendor like FORTRAN. While not as successful as COBOL, it was, none-the-less, a primary influence on the structured programming languages which would follow it in the 1960s and 1970s.
LISP
The last of the "Big-4" high-level languages begun during the 1950s was LISP. Begun by John McCarthy at MIT in 1958, LISP (the LISt Processing language) is quite a bit different than the other three languages and requires a bit of mathematical sophistication to learn.
The Process of Programming
When programming with a high-level language, there are certain operations that must be followed. This mechanical process is called the edit-compile-run cycle of programming. Learning this process is not the same as learning how to program; you must master the process--that is necessary--but mastering the process doesn't mean you'll write good, or even acceptable programs.
Here are the steps in the process:
Edit
When you write a program in a high-level language, you write your instructions--in the form of programming language statements--using a text editor. The document you produce at this stage is called source code.
Compile
Once you've written your program, you compile it by using a software program called a compiler. The compiler turns your source code into machine language and produces another document called object code. If your program fails to compile, (if you have "grammatical" or sytnax errors in your source code), then you must re-edit the code until it compiles correctly.
Once a program compiles without errors, it is syntactically correct. It still may contain runtime errors, or logic errors, however.
Run
Once your program compiles, you run it. All high-level languages require a significant amount of additional machine-language code--in addition to the code you've written--before your program will actually run. How this additional code is added to your program varies from language to language and from system to system. Generically, this is called linking. Today, this often happens behind the scenes, and you don't really have to worry about it.
When you run your program, the first thing you'll want to do is to verify that it runs as you expect it. This process is called testing. It is not uncommon to find that your program has some logic errors that can only be discovered when the program runs. These errors may even make your program "crash". These kinds of errors are called runtime errors.
To fix a runtime error, you go back to the very first step--your source code--and make changes there, repeating the edit-compile-run cycle until the program runs correctly.
Structured Programming
By the late 1960s, high-level languages were in full-flower, and there were a lot of them. Programmers had become monumentally more productive; at least, they were churning out a whole bunch of code. A decade earlier, a 20,000 line program was huge. Now programmers were thinking in terms of hundreds-of-thousands, and even millions of lines of code.
What is Structured Programming?
Despite the increased productivity, however, there was a small problem: the portions were huge but the quality left a bit to be desired. The Cold-War and Vietnam were both then in full force, and the government couldn't help but notice that when they ordered a new software system, it:
- Cost a whole lot more than promised.
- Failed to arrive when promised
- Failed to work if it did arrive
The crisis in software quality, however, made everyone sit up and take note, and in the late 1960s, the NATO countries sponsered a conference which gave rise to the term Software Engineering.
The questions they asked are simple:
- Why does software cost more than is planned?
- Why doesn't software work the way is was designed?
- Why aren't software projects completed as planned?
In other words--to paraphrase Professor Henry Higgins' famous line from My Fair Lady:
Why isn't a word processor more like a truck?
In many ways, Software Engineering was kind of a bust: software is still late, bloated and buggy. Despite that, some real good came out out the conference. A diverse group of people, around the world, now began to look at how to build good software. Before this, everyone was just kind of amazed that it could be done at all.
These efforts and their results are often called structured programming.
Principles of Structured Programming
The first principle in structured programming is that little programs are easier to write, understand, and maintain, than are big programs. This principle is called "divide and conquer." Programmers were encouraged to organize their programs as a hierarchy of small sub-programs [often called procedures or functions] that each performed a specific task like this:
Controlled Branches
The second principle is that actions should be as local as possible. Even after programmers began to use procedures, they would often "leave" the procedure to rescue a piece of code in a different procedure.
Researchers found that these uncontrolled links [branches] between different sections of a program lead to a style of programming known as spaghetti code --so named because those maintaining the code would draw lines connecting the related portions; over time, these lines began to resemble a bowl of spaghetti like this:
These researchers found that you could write any program using a simple combination of three control structures :
- Sequence: actions that directly follow each other should appear next to each other in the source code.
- Selection: actions that are conditionally executed should be arranged in a group [block] and that whole block should be placed together as a selection statement.
- Iteration: actions that are to be repeated should also be placed in a block and placed together as an iteration or looping statement.
Information Hiding
A second cause of programming mistakes is almost as pernacious as spaghetti code--the use of global variables by unrelated pieces of code. If procedure A uses a variable that procedure B also uses, then it's quite possible for an error to occur.
Because of this, a new kind of variable was created, the local variable . Using local variables, procedure A and procedure B no longer share the same variable, but each procedure has its own. If the two procedures need to share information, procedure A will pass the information to procedure B. In addition, procedure B can return information to other procedures which call it.
This style of programming required a new type of language as well. These languages, which use procedures and hidden or local variables, are called block structured languages. The two most widely used block-structured languages are Pascal and C.
Object Oriented Programming
While structured programming--and the languages that grew out of structured programming, such as C and Pascal--improved the quality of programs, it wasn't a panacea.
Structured programs are organized around a hierarchy of procedures. This is a very logical and helpful way to organize programs that work like assembly lines; you put data in one end and the finished product comes out the other.
On the other hand, more and more programs failed to fit that mold; instead, interactive programs and simulations became more and more common at the end of the 1960s. In these programs, there was no sequential path through the program and so the procedural, hierarchical organization was much less appropriate.
A group of programmers working on a naval simulations in Norway discovered that a better organizaion was to combine some data, along with the operations on that data, into relatively self-contained units that obeyed a set of commands. The language that they developed to accomplish this, Simula, formed the basis for many of the object-oriented languages that would follow in the late 1970s until now.
In the remaining sections of this lesson, you'll learn more about object-oriented programming and get started writing your own Java programs.
Please continue to the next section of this lesson.
|