Académique Documents
Professionnel Documents
Culture Documents
Chapter 1
Introduction to HTML and XHTML
Presented by Thomas Powell
Slides adopted from
HTML & XHTML: The Complete Reference, 4th Edition
2003 Thomas A. Powell
Markup Quickstart
HTML document is a structured text document composed of
elements, entities and text fragments
<b>This is important text! © 2002</b>
Markup elements are made up of a start tag (e.g. <strong>) and
might include an end tag that contains a closing slash character
(e.g. </strong>).
The browser applies the meaning of the element to the enclosed
content.
Under traditional HTML some elements are emptythey enclose no
content and thus they have no close tag (e.g. <hr>). In XHTML all
tags close so we use <hr></hr> or more appropriately <hr />.
<html>
<head>
<title>First HTML Example</title>
</head>
<body>
<h1>Welcome to the World of HTML</h1>
<hr>
<p>HTML <b>really</b> isn't so hard!</p>
<p>You can put in lots of text if you want to. In
fact, you could keep on typing and make up more
sentences and continue on and on.</p>
</body>
</html>
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en">
<head>
<title>First XHTML Example</title>
</head>
<body>
<h1>Welcome to the World of XHTML</h1>
<hr />
<p>XHTML <b>really</b> isn't so hard!</p>
<p>You can put in lots of text if you want to. In
fact, you could keep on typing and make up more
sentences and continue on and on.</p>
</body>
</html>
Example Overview
The preceding example uses some of the most common elements found in (X)HTML
documents:
The <!DOCTYPE> statement indicates the particular version of HTML or XHTML
being used in the document. In the first example, the transitional 4.01
specification was used, while in the second the transitional XHTML 1.0
specification was employed.
The <html>, <head>, and <body> tag pairs are used to specify the general
structure of the document. Notice that under XHTML you need to have a little
more information about the language you are using.
The <title> and </title> tag pair specifies the title of the document that generally
appears in the title bar of the Web browser
.
The <h1> and </h1> header tag pair creates a headline indicating some
important information.
The <hr /> tag, which has no end tag making its syntax different in XHTML,
inserts a horizontal rule, or bar, across the screen.
The <p> and </p> paragraph tag pair indicates a paragraph of text.
Example Wrap-up
From the previous example you might surmise that learning
(X)HTML is merely a matter of learning the multitude of markup
tags, such as <b>, <i>, <p>, and so on, that specify the format
and/or structure of documents to browsers.
This is partially true but like knowing how Microsoft Word
commands works does not make one a writer.
It should be obvious from the proceeding example that creating
(X)HTML in such a manual fashion is not appropriate.
Well study tools to produce markup in a bit, but regardless of the
tool being used to create a page we should know how markup
works.
HTML has a very well-defined syntax and all HTML documents should
follow a formal structure.
The World Wide Web Consortium (www.w3.org) defines the HTML and
XHTML standards.
In 1999 the definition of HTML was rewritten using XML (Extensible Markup
Language) and renamed XHTML.
In XML you also may use a DTD but an emerging grammar form called
a schema can also be used.
In this fragment we see the definition of the root element html which
encloses a head element followed by a body element and the html
element has an xmlns attribute as well as something called %i18n which
is just a macro that expands to some more attributes such as lang and
dir which specify aspects of the language in use.
HTML Structure
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Document Title Goes Here</title>
...Head information describing the document and providing
supplementary information goes here....
</head>
<body>
...Document content and markup go here....
</body>
</html>
XHTML Structure
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Document Types
All documents begin with a <!DOCTYPE> declaration.
In the basic sense it identifies the HTML dialect used in a document by
referencing an external DTD.
A DTD defines the actual elements, attributes, and element
relationships that are valid in documents.
or
Notice that the later examples are more appropriate and provide the
actual URL to the DTD in question
!DOCTYPE Declaration
2.0
3.2
4.0 Transitional
4.0 Frameset
4.0 Strict
4.01 Transitional
4.01 Frameset
4.01 Strict
Doctype
XHTML 1.1
Description
2.0
3.0
The proposed replacement for HTML 2.0 that was never widely adopted, most
likely due to the heavy use of browser-specific markup.
3.2
A version of HTML finalized by the W3C in early 1997 that standardized most
of the HTML features introduced in browsers such as Netscape 3. This
version of HTML supports many presentation elements, such as fonts, as well
as early support for some scripting features.
4.0 Transitional
The 4.0 transitional form finalized by the W3C in December of 1997 preserves
most of the presentation elements of HTML 3.2. It provides a basis for
transition to CSS as well as a base set of elements and attributes for multiple
language support, accessibility, and scripting.
4.0 Strict
The strict version of HTML 4.0 removes most of the presentation elements
from the HTML specification, such as fonts, in favor of using Cascading Style
Sheets (CSS) for page formatting.
4.0 Frameset
4.01 Tran/Strict/Frame
A minor update to the 4.0 standard that corrects some of the errors in the
original specification.
Description
1.0 Transitional
1.0 Strict
1.1
2.0
<html> tag
Looking deeper at the document we see the <html> tag
delimits the beginning and the end of an HTML document.
Given that <html> is the common ancestor of an HTML
document it is often called the root element, as it is the root of
an inverted tree structure containing the tags and content of a
document.
The <html> tag, however, directly contains only the <head>
tag, the <body> tag, and potentially the <frameset> tag
instead of the <body> tag.
Interestingly <html> is not required under standard HTML
The <body> of a document contains the actual content and appropriate markup
to render the page
There should be only one head section (<head>) and one body section
(<body>) in a document.
Under old HTML, both <head> and <body> are actually optional
Further structures like lists (<ul>), images (<img>), scripts (<script>) and
multimedia objects (<object>) are also found in the <body> but may fall
outside the hierarchy you might expect.
The concept of tags enclosing only certain types of other tags is dubbed the
content model.
test</b>
Under some elements like <pre> or <textarea> whitespace rules may be different
Lack of whitespace understanding can create visual problems and result in
wasted bandwidth.
Despite all these rules you find browsers allow just about anything to
render.
Beware: Tag soup HTML common or not does not lend itself to
maintenance and is not futureproof!
With the rise of XHTML we do actually need to know what is going on!
Major Themes
Logical and Physical markup
Logical markup says what something means, physical markup
describes how something looks.
<b> is physical markup and <strong> is logical markup.
What is <p>, <h1>, <head>, <body>?
Standards vs. Practice
Question: How do most people think about HTML?
Answer: Physically
Consider a WYSIWG editor, does it encourage logical markup?
Consider the value of logical markup but be pragmatic!