Lesson 1.2 Basic HTML
As you learned in Lesson 1.1, "The Web", HyperText Markup Language documents are plain-text files that are stored on a server. When a client program [Web browser] requests a particular document it is delivered to the client by an HTTP or Web server. When it reaches the client, it is interpreted and displayed.
In this lesson, you'll take a look at text editors and then learn the basic syntax of HTML, the language used to encode Web pages so that they can be interpreted by your browser.
Using Your Text Editor
When you write Java programs or HTML pages, you'll use a text editor. A text editor is a little bit like a word processor--say Microsoft Word or WordPad--but without all the formatting options. While this might not seem like an important distinction, it really is.
Text Editor or Word Processor?
When you create a document with a word processor, the word processor saves special formatting characters in your file, along with the content of your document. Often, you can tell your word processor to save only the text and no formatting information, but it's easy to forget to do that. Let's take a look at some of those formatting characters.

Here are the steps to follow (Windows Platform):
- Open WordPad from your Start button. It's under the accessories folder.
- Type in a little text; "Here's a line of text from WordPad", for instance. Then save your file using the file name TEXTTEST.DOC.
-
Now, open an MS-DOS window and display your file using the DOS command TYPE TEXTTEST.DOC. [In Windows 2000 and Windows ME, an MS-DOS window is called a "Command Window". You can open a new one from the Accessories menu. You'll learn more about using the command-line in section 1.3]
You should see something that looks very much like this. Not very recognizable, is it? Your Java compiler will have the same problem if you attempt to feed it source code written using a word processor instead of a text editor.
-
If you are still determined to use Word or WordPad, then make sure you select Save As Text each time you save your document, like this:

Which Editor?
Your textbook assumes that you will be using the MS-DOS Edit program, which comes free with Windows, as your text editor. You can also use the Notepad program, but you'll find that it has a tendency to rename its files, appending a superfulous ".txt" to whatever filename you specify. To avoid this, put quotes around your file name whenever you save a file using Notepad.
|
Got Windows NT or 2K?
|
| If you are using Windows NT, you should NOT use the EDIT program. The NT version of EDIT is an older 16-bit application that cannot use long filenames, which means that it cannot be used for Java programs, which all have a 4-character extension. Furthermore, even if you are running a 32-bit command shell, running the EDIT program drops you back into a 16-bit shell, and you can no longer even see your long file names. [This bug still exists in Windows 2000 as well, even though W2K comes with the improved version of Edit.]
If you are using Windows NT or Windows 2000, either use the Notepad program, or download another programmer's editor. |
You can spend a lot of money on a text editor. If you don't believe me, then look through the ads in programmer's magazines like Dr. Dobbs' Journal. You can also get several extremely powerful editors for free. In this course, you are asked to use the free SciTE text editor [see next section], but you can really use any editor you like.
I'm also somewhat fond of the Vim [vi improved] editor, for instance, because vi is the editor I use in Unix and I can get versions of Vim for Windows, Unix, and the Mac. On the other hand, vi is an acquired taste; many an undergraduate CS student has decided to leave the field entirely, rather than spend another minute working with vi.
For those of you who are interested, vi was written by Bill Joy, one of the co-founders of Sun Microsystems, while he was an undergraduate CS major at UC Berkeley. Of course, in his spare time he also rewrote the UNIX operating system, [BSD], so I guess a little editor is no big deal.
If you're programming on the Mac, you can use the SimpleText application that came with your operating system.
The SciTE Editor
My favorite editor today is the Scintilla Text Editor [SciTE ]. This is an open-source, free text editor that includes syntax highlighting (for Java, HTML, and a variety of other languages) and the ability to run your compiler from within the text editor.
Let's go ahead and set up a working environment, using SciTE, so that you can experiment with HTML as you work though this lesson. We'll use the single-file executable version of SciTE, called Sc1.exe which is quite small, and does not require any additional DLLs.
Here are the steps to follow:
Step 1 : Create a Home Directory
Use Windows Explorer to create a directory (folder) to hold your homework and experiments. I'll call this your home directory .
Name your home directory something like JavaProjects or MyWebPages. Make it something easy to remember.
- You can create this inside your "My Documents" folder, but you'll have an easier time if you create it so that it is accessible directly from the C:, D:, or, in the Computing Center, U:. [You'll find that having spaces in the path makes things more difficult.]
Here's my home directory [named MyWebPages] on my C: drive:
Step 2 : Download Sc1.exe
Use your Web browser to download the file Sc1.exe [about 200K] from the Scintilla Web site. Place the file in your C:\Windows directory, or the C:\WINNT directory with Windows NT/2K when you download it.
|
Watch Out!
You do have a current virus scanner installed, don't you? That's important whenever you download and run executable programs from the Internet; whether you get them from an open-source site like Scintilla, or from a commercial site like Microsoft or Sun. SciTE may be even a little safer than software from MS and Sun, because, if you like, you can download the source code and build the executable yourself. |
Using Windows Explorer, locate the file in your Windows directory, and right-drag it to the Desktop to create a shortcut, as shown here: [Hold down the right-mouse button while dragging, instead of the left. When you drop the icon on the desktop, you be given the opportunity to create a shortcut, instead of moving or copying the file.]
Once the shortcut is on your desktop, you can right-click and rename it if you like. When you first create it, the title says "Shortcut to Sc1.exe " which is kind of verbose. I renamed my copy so that it simply displays SciTE as shown here:
When you start the editor, you want to make sure that it opens its files in your new MyWebPages directory. Right-click the SciTE shortcut like this:
When the Properties dialog appears, change the Start in directory to reflect the new directory you created in Step 1, like this:
Step 3 : Configure SciTE
The single-file version of SciTE comes "preconfigured". To customize it to meet your own needs, you make entries in a properties file, which is a simple text configuration file. To make your own custom configuration, choose "Open Global Options File" from the Options menu like this:
Since you don't yet have a configuration file, SciTE will beep at you like this:
Don't be discouraged, though, SciTE has gone ahead and created the file; all you have to do is fill in the entries. Here are the entries I suggest to start with. We'll make more later when we start creating and running Java programs.
Once you've saved this file [use File | Save on the menu], close SciTE and re-open it by double-clicking the icon. If all went well, your SciTE editor should now look like this:
For a complete list of options that can go into a SciTE properties file, check the online documentation at http://www.scintilla.org/SciTEDoc.html
Editor Basics
Whichever editor you choose, you need to know how to do some basic tasks. I'm going to assume that you already know how to do each of the following tasks.
Go through this checklist which assumes you are using the SciTE program as your editor. If you don't know how to do something, then be sure and use the discussion area to see if anyone can help you out.
Necessary Skills
- Start your editor
- Add new text
- Replace some existing text with new text
- Delete text
- Cut, Copy, and Paste text
- Save your file
- Save your file using a different name
Useful skills
- Opening multiple files at once. [SciTE, for instance, allows you to open up to ten files at once, using the Buffer menu. Notepad does not].
- Viewing multiple files. [In SciTE, you switch between different files using the Buffers menu. If you want to look at two files "side-by-side", open up another copy of SciTE, and you can look at two files simultaneously.]
Basic HTML
Many programmers get very offended when you call HTML a programming language; it has no variables, no loops, no selection statements; it has none of the programming language features that mark FORTRAN, COBOL, C, and Java. How can it be a language?
The answer is quite simple: it says so! The name of the language is HyperText Markup Language. And, let's face it, if Political Science and Computer Science can get away with calling themselves sciences--no matter what those stuffy folks in Physics and Biology say--then we're duty bound to honor HTML's modest claims.
As the name says, HTML is a markup language, the most prolific of a family of languages that include SGML [Standard Generalized Markup Language] and XML [ eXtensible Markup Language] as well as XHTML. (XML is becoming the "next big thing." I'd start acquainting myself with XML now! )
Tags
An HTML document is a plain-text file that has been "marked-up" by the inclusion of tags. Tags are special commands that tell your Web browser how to display your HTML document. Tags are the basic elements of HTML.
As Sgt. Friday would say, "Here's the facts, Ma'am":
- HTML tags are enclosed in angle brackets, like this <tag>.
- The actual tag names are case-insensitive. For instance, you can write <html>, <HTML> , or even <hTmL>. All of these are treated identically. (Note, however, that in XHTML [the W3C--annointed successor to HTML 4], all tags must be in lowercase.)
Delimited Tags
Many tags come in pairs: an opening tag and a closing tag.
- When tags are paired, they act as delimiters , or containers, applying a property to all of the text they enclose.
- The closing tag is the same as the ending tag, but preceded by a forward slash [/].
- For some tags, the closing tag is optional. [Not in XHTML, however.]
Example: The tags <h4> and </h4> are the "level-1 head" tags. All of the text that appears between these two tags will be interpreted by your browser as a level-1 [largest] headline, and displayed appropriately. Your browser will not display the tags themselves.
|
Works in IE?
|
| There is often quite a difference between how different browsers interpret a given piece of HTML. One of the most pernacious [evil] differences is that Internet Explorer often ignores required closing tags, while Netscape does not.
If you use IE as your primary browser, make sure you double-check each of your closing tags. |
Tag Attributes Many tags require additional pieces of information. These additional pieces of information are called attributes. Attributes can be either mandatory or optional.
- Every attribute is represented by a keyword. Like tag names, the attribute name is not case-sensitive.
- Attribute names are followed by an equals sign and a value.
- The value may be placed in quotes [even if it is a numeric value].
- The value may be case sensitive.
When you supply a value to the code attribute in a Java <applet> tag, it is always case sensitive. On the other hand, when you supply a URL as a value to an href attribute, inside an <a> tag, it is sometimes case sensitive. [It will be case sensitive when you place it on a UNIX server, but it would not be case sensitive if you examine the link on your local Windows or Mac machine.].
The best advice is to assume that all values are case sensitive, and to double check your pages and links once you've published them.
Now, let's take a look at some specific tags.
Structural Tags
The basic HTML tags form the "sections" of your document. These tags are:
| <html></html> |
Placed at the beginning and end of your HTML document. Inside these tags, your document is divided into two parts, a header and a body. |
| <head></head> |
Text inside this section does not appear on your page, with the exception of text appearing between <title> tags, [which you'll meet in the next section]. |
| <body></body> |
This section follows the head section. Text that appears between these tags will appear on your Web page. |
Make sure that you don't "overlap" beginning and ending tags. The following structure would be incorrect, because the </head> and <body> tags overlap:
<html>
<head>
<body>
</head>
</body>
</html>
The correct structure would look like this:
<html>
<head>
</head>
<body>
</body>
</html>
Headlines, Paragraphs, and Rules
Once you've written the basic structure, you're ready to add some elements. Text that does not appear between any of the tags here will be displayed in the default browser "Normal" text style.
Here are the basic tags you'll use:
|
<title></title>
|
These tags can only appear inside the header section of your document. Text delimited by the <title> tag is used as the document's title and may appear in the title bar of your browser. |
|
<h4></h4>
|
Text delimited by these tags is considered a "level-4" headline. The actual size of the font used to display the text will depend upon the user's browser settings. In addition to the <h4> tag [which is the largest] there are also h4-h6 tags, each successively smaller. |
|
<p></p>
|
This tag is inserted into text to create a new paragraph. HTML ignores the line breaks and spaces you place in your HTML source code. All line breaks and spaces are replaced with a single space on output. Thus, to create paragraphs, you need to insert the <p> tag. The closing tag </p> is optional and was seldom used in the past. In XHTML, however, all closing tags are required, and it doesn't hurt to put them in. |
|
<br>
|
Like the paragraph tag, this tag causes your browser to start a new line. The main difference between <p> and <br> is that <p> will generally insert extra space to signify a paragraph break. There is no closing </br> tag. |
|
<hr>
|
This tag will insert a horizontal rule. The rule will generally appear on a line by itself. |
|
<pre></pre>
|
This tag is used to delimit "preformatted" text. What this means is that text placed between these two tags will appear exactly as it does in the source document, including all spacing and line breaks. You'll use <pre> when you want to include programming code in your Web pages. |
Appearance Tags
Often, you'll want individual pieces of text to take on a different appearance. To do that, you can use several appearance tags.
HTML programming purists often disdain the use of appearance tags, pointing out, correctly, that HTML is supposed to describe the structure of your document, not its appearance, since it is up to the client to actually render the page.
Even if you use the newer appearance tags, like <font> for instance, it's never a good idea to specify a specific font, because it may not be available on your client's browser, and what actually gets rendered may appear illegible.
Here are the basic tags which are usually safe:
<b></b>
<strong></strong>
|
Text between these tags will appear strong or bold. |
<i></i>
<em></em>
|
Text between these tags appears emphasized or italic. |
|
<u></u>
|
Text between these tags may appear as underlined, providing the browser supports it. |
This last tag brings up a good point; what happens if you use one of the new appearance tags like this:
<font face="Candy" size=+10>Abazaba</font>
and the user's browser doesn't support the tag?
The answer is, thankfully, nothing. HTML is a very forgiving language. If you insert a tag that it doesn't understand, it doesn't do anything. For instance, if you use your browser to view the source code for this page, you'll find that this line is enclosed in a set of <flub></flub> tags. However, since no browser currently supports the <flub> tag, the text appears just like normal.
Checking Your Code
One of the frustrating things about HTML is that not all Web browsers display your pages in exactly the same way; many browsers have bugs that cause them to display perfectly correct HTML in an incorrect manner.
It's even more frustrating, however, when you write incorrect HTML code and your browser doesn't let you know about it. That's because your pages may look fine when viewed under one browser, but not be visible at all under some other browser. Microsoft's Internet Explorer is probably the worst offender in this regard; it is extremely easy to create "IE only" pages that are invisible to everyone else on the Internet. That kind of defeats the purpose of using HTML, eh?
The W3C Validator
One way around this problem is to test your program against every single version of every single browser, which is a difficult task. A better solution is to use a program that checks your HTML to see if it contains errors. HTML validator programs don't display your Web pages, like a Web browser does; instead, they compare your HTML code against the rules for correct HTML and let you know if there is a problem, so that you can correct it.
To use the validator, point your browser at : http://validator.w3.org as shown here:
Enter the URL for your page, the Character Encoding and Document Type as shown above, and then press the Validate button to check your page. If you haven't posted your page yet, you can submit a file instead, using the File Upload link [not shown].
Your assignments should display correctly with any browser, IE, Netscape Communicator, Mozilla, or Netscape 6X, on a Windows, Solaris, or Mac computer. It's up to you to see that your code works on the browser. The best way to do that is to use the W3C HTML validator.
URLs, Images, & Links
Now that you know how to create a basic Web page, it's time to put the HyperText in HTML, by learning how to create links. The tag you'll use is generally called the anchor tag, but its actually just an a. Here's how it works.
The Anchor Tag
The anchor tag uses the delimited pair <a> </a>. All text appearing between the tags will be highlighted and when the user clicks on any portion of the text, her or she will be transported to...
Well, where?
The anchor tag is the first tag you've used which requires an attribute. The attribute is called href, and its value is the location where you want to go--the URL.
Thus, a basic link would look like this:
<a href="Go Here">Text to Display</a>
Note that the attribute appears inside the opening tag. The words "Text to Display" will appear highlighted, and when the user clicks on any part, the current page will be replaced with the location represented by the words "Go Here".
The words "Go Here" must represent a valid URL [Uniform Resource Locator]. Let's take a quick look.
URLs
URLs consist of three parts:
- How we want to communicate, called the protocol.
- Where the machine is, called the host name.
- What we want, called the file specifier or path.
The Protocol
The protocol can be any of the common Internet services: http, ftp, gopher, smtp, telnet, etc. When you specify a protocol, you are specifying which server program should respond to your request. If the host you are trying to contact is not running that service, your request will fail.
The protocol [how] is separated from the host name by a colon and two forward slashes [://]. This is not part of the protocol name; it's just a separator, like the comma or the semicolon in this sentence.
The Host
The host name is composed of two parts. One part is the domain name. Here at Orange Coast College (OCC), for instance, the domain name is occ.cccd.edu. This name represents all of the servers at OCC.
To retrieve a document, however, you need to know more than the neighborhood. You need to know what server the document is stored on. Each server has a unique address, and many of them have a unique name as well. Here are some examples of different servers at OCC:
- The host csjava.occ.cccd.edu is the Computer Science/Computer Information Systems Sun machine where the course author's home page is stored.
- The server jumbo.occ.cccd.edu is another Unix machine used in the Computer Science department.
- The server sgilbert.occ.cccd.edu is the Windows 2000 machine sitting on the author's desk.
|
What are Ports?
|
| The port specifier allows a single machine to run two servers of the same type. If you use the URL http://www.occ.cccd.edu , you are requesting information from the Netscape Enterprise Web Server hosting the main OCC Web site. By adding the :8900 ( called a port specifier) on to the end of the host, you are asking to use a different "channel". |
The File Specifier
Once you've made contact with the Web server [by using http as the protocol] that this course uses, you still have to tell the server what file you want.
File specifiers begin with a forward slash like this: /. This is called the root directory. However, this root directory is not the physical root directory used by host server file system. Instead, it is the Web server's root directory. The actual files you want will be located in a subdirectory below that root directory.
Here's how you'd make a link to the page fun.html located in the default directory of the user 1831-100 on the csweb server at OCC:
<A HREF="http://csweb.occ.cccd.edu/1831/100/fun.html">
Fun Stuff Here
</A>
Relative URLs
As you might guess, fully specifying a URL for every link can quickly become tedious. More to the point, it can make your Web pages much less portable. If you use the fully specified URL [called an absolute URL] to refer to links between pages stored on your site, then all of the links need to be rewritten when you move your pages to a new location.
To avoid this problem, you should always use relative URLs in any links to other pages on your own site. A relative URL does not have the protocol or host portion that an absolute URL has.
When your Web server encounters a link that lacks a protocol and a host, it automatically adds the http protocol and the current host name in front of the location.
Here are some interesting relative URLs:
| fun.html |
Refers to a file in the same directory as the file which contains the link. |
| stuff/fun.html |
Refers to a file in a subdirectory named stuff which is below the directory that contains the link. |
| ../fun.html |
Refers to a file in the directory above the directory [parent] that contains the file that contains the link. |
| /fun.html |
Refers to a file in the Web server's root directory. |
Images
Web pages can, as I'm sure you've noticed, contain images. Those images must be in GIF or JPG format. Like the anchor tag, the img tag, used for images, also requires an additional attribute, named src. The src attribute contains the URL of the image to display. The img tag does not have a closing partner.
In addition to required src attribute, the img tag has some optional attributes that you might find useful:
- height and width allow you to display an image at a size other than its natural size. [Don't use this for "thumbnails" however. The Web server still downloads the entire image to the browser.]
- align allows you to align the image to the left or right so that text will flow around it.
There are actually 21 different attributes that you can use to fine-tune the appearance of your images, but this should keep you for now.
Something to Talk About
Here are some example img tags:
<img src="mypic.gif">
<img src="../images/mypic.gif" ALIGN=RIGHT>
<img src="images/mypic.gif" ALIGN=LEFT>
If all three of these links are located in a page named Unit1b.html, and if Unit1b.html is located in a subdirectory [folder] named unit1, which is located inside your personal Web page directory [the one you are using for this course], then what is the absolute URL where the file mypic.gif will be found? [In other words, what would you type into your browser's location bar to display just the picture?]
Please continue to the next section of this lesson.
|