"...Decent surfing value..."
|
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy. |
|
Shakespeare. |
this is overlapping bold italic text
Much less is known what to count as HTML.
There's rather vague relationship between HTML (Hypertext Markup Language) as (almost) standartized by Internet Engineering Task Force and whatever is called HTML as it implemented by WEB browsers. To add confusion, there's some levels and revisions(?) of HTML. For example there's HTML-2.0 Level-1, and HTML-3.0. HTML-2.0 supposed to be some sort of the standard most browsers are trying to support.
To make things really obfuscated, should be noted that HTML is a markup language. Markup means that it will mark different elements of your document, but how this document will be seen by WEB wandering individuals is on total behalf of miscellaneous WEB browsers running on a multitude of variuos operating systems.
What's interesting, some of the browsers are rushing to support yet-to-be-defined HTML-3.0, by the way ignoring some basic features of the (almost) standard HTML-2.0. Looks like HTML profanation is getting really profound.
Sudden paroxysm of critical paranoia stroke me and I wrote this page.
Tags are the base of HTML. Tags is what differentiates HTML from simple dull dumb boring plain vanilla ASCII text. Sometimes it may seems HTML is just loose collection of various tags. Unfortuantely, HTML is also a language -- in it's own right. Language -- it what is last letter in the HTML stands for. But story so far will be about tags.
Very little attention is paid to what exactly tag is. Common sense says that tag is some word (called tag identifier) surrounded by the angle brackets. For example, <H1> declares beginning of a heading level ONE and law abiding browser should display aforementioned heading in rather big font. Right angle bracket, a.k.a less-than sign is called start-tag open symbol, and left angle bracket, a.k.a greater-than sign is called tag close symbol.
Most of tags should be balanced -- when they are belong to the element with certain context, like
<H1>This is heading number one</H1>-- where opening tag is followed by the closing tag. Note, that closing-tag open symbol is less-than sign followed by slash, </. For some HTML elements open or close or even both tags could be omitted. For example paragraph close tag </P> could be omitted.
Being a lousy typist, I'm having severe troubles in typing markup. For example to get angle quotes you should keep pressing and releasing SHIFT key while tapping on less-or-greater-than keys. Also typing proper closing tags is quite boring, especially when they are nested.
So I was quite excited when I've found that HTML-2.0 language definition allows Tag minimization : buried deep inside dark mess of HTML-2.0 SGML declaration (don't miss it with DTD - Document Type Definition) was the magick word SHORTTAG in FEATURES section and it was set to YES.
I've rushed to my keyboard and typed this :
<H1/First minimized HTML tag ever typed by humanity/Nothing happened.
Netscape (which I'm, as millions of other people, evaluating for 90 days on the fact whether to purchase an ongoing license to the Software or rather not) just ignored this thing as if it weren't there. Mosaic for Windows won't go much further either.
Slightly puzzled whether my knowledge is wrong or browers are screwed, I went to the HTML Validation Service (went - in cyber sense, you know; on the matter I've just made a search on Yahoo) and found that minimization tags are perfectly legal even in the strong arm of the Strict HTML law.
Now I'm entertaining myself by hounding various web browsers with several test pages shown below.
Here they are -- perfectly HTML-compliant and utterly useless, tragically invisible and infernally hostile to any existing HTML rendering device ...
Minimization TAGS
<UL> <LI> this is the first item of the list <> this is second one -- implied identifier is LI </>which is rendered as:
Some <B>bold text with empty end tag </> -- right here.Now check out how your browser will chew up page with such tags. Doesn't it looks like this one?
This text is <b<i> bold and italic at once </b</i>.Take a look at the page filled with such tags. Obviously it should looks like this, eh?
<H1/Header with null-end tag/Null-end tag consists of start-tag open symbol followed by the tag identifier and textual data enclosed within two null-end tag symbols (slash).
So far I've stressed following browsers :
Needless to say, none of them was capable to handle minimized tags.
I have been told that Harmony Hyper-G Text Viewer can cope with MINIMIZED tags, but since it cannot work from behind the firewall, I was unable to verify its capabilities.
Experience with Arena browser was the most inspiring. This browser is supposed to be testbed for upcoming HTML-3.0 standard and have little indicator telling you whether HTML ducument is bad (i.e. incorrect) or not. For all the sample pages shown above and using minimization tags, it undoubtely flashed "Bad HTML" sign. But these pages was Strict HTML-3.0 checked ! What HTML we're talking about after all?
Oh, yes - if you think this all is a joke - go to the HTML Validation Service and check it for yourself.
During HTML validation, I've found some things that contradicted to something I've heard just before. Nothing serious, just another little critical paranoia splash:
"Bad"
<h1>What not to do</h1> <p>This is like bad or something...
"Good"
<h1>What to do</h1> This is like good <p>or something...<p>
Without much hesitation, I've feed both examples to
HTML Validation Service
and slammed them against strict HTML-2.0.
As I expected results were exacly in reverse to the name of examples : "Bad"
example passed test and "Good" example caused wrath of compiler.
After consulting with HTML-2.0 spec. I've found that Validation Service was 100% right (it would be surprising if it wouldn't). If someone doesn't know - in HTML-2.0 paragraph is non-empty element with mandatory start tag and optional end tag (<P> and </P> respectively). Therefore paragraph element in HTML-2.0 can contain arbitrary number of subelements - lists, text data, etc. In HTML-1.0 paragraph was EMPTY element, which actually represented not a paragraph, but rather paragraph break and had only start tag, <P> -- similar to the line break <BR>. I failed to find HTML-1 DTD, but I think things are pretty close to what I've described.
Note, that wedging paragraph tags within <h1>...</h1> is an error, so don't try to catch me on this.
Another note: non-strict HTML-2.0 is more relaxed, so both examples would be ok.
After testing there was one question left - where's such interesting HTML specification page came from? My guess (and I think I'm right with probability about 0.99) : this was remnants of the HTML-1.0 spec. safely decomposing in some of the dark corners of the W3 consortium. I've failed to find head or TOC of this document. Interestingly enough, link to HTML-1.0 spec. on W3 page was hoplessly broken too.
Looking at different HTML tutorials I've found suprising multitude of opinions what should be considered as minimal HTML document. Id est what is the minimal amount of tags you should put to make your ASCI text look like valid HTML document? In other words, what is this thin metaphysical boundary beyond which plain ASCII text became HYPER? To cut off the fuzziness of the word "valid", I decided valid would be fully conforming to HTML-2.0 (strict) specification (or DTD -- for those who behold). Since I wasn't sure by myself about what is the minimal valid document is, I digged up HTML-2.0 spec and stared at it for a moment.
I've found following amazing (or maybe not) facts:
Summing all of the above minimal document would look like :
<HTML> <HEAD> <TITLE>Minimal HTML Document</TITLE> </HEAD> <BODY> </BODY> </HTML>
But...all the elements except TITLE happened to have optional start-tag and end-tag symbols ! So until you not a typing maniac, minimal HTML document would be:
<TITLE>Minimal HTML Document</TITLE>
If an HTML document is to convey any sort of information, minimal HTML-2.0 strict -conforming document would be (note <P> symbol!) :
<TITLE>Minimal HTML Document</TITLE> <P>Some text without any spark of sense.
Oh, yes, if we'd want to treat our minimal HTML documents as SGML one, document identifier should precede everything, like:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <TITLE>Minimal HTML Document</TITLE>
-- but now we starting to play by the SGML rules and no browser can stand where SGML reigns...
This element can be used whenever you want to protect your precious spaces from all-spaces-in-one jamming browser.
Non-breaking space value is 160, symbol is   and code is . Code   is a part of HTML-2.0, by the way.
Some browsers, like any version of X-Mosaic, ignore both   and or, like Arena -- only  , while Mosaic for Windows and Netscape (for everything) can cope with both.
Should be noted, proportional font (default in many browsers) usually have pretty narrow space character, so it is advisable to switch to the fixed font before using non-breaking space.
Here's some example:
<dl><dd> <tt> </tt>Look, here's some paragraph with indent,<br> whoa -- check it out. </dl>
will looks like
Useful side effect of the non-breaking space is that latter is not considered by the the browsers as space at all, so it could be used whenever you want to protect words from breaking apart.
Now that I've have enough of the subject and it would be just right time to outline what I've tried to tell and what would be the best approach to cope with HTML:
HAIL to folks at WebTechs (formerly HAL)
for the pretty useful HTML Validation Service referred
throughout this manuscript. It saved me quite a time on running SP manually.
HAIL to James Clark @ jclark.com, creator of the most profound SGML parser so far. One of the previous version of this parser is used in the HTML Validation Service.
Beyound the dark side: loads of incredibly odd information about tables in TABLEMAQUIA.