Thursday, September 23, 2004

WYSIWYG Considered Harmful

The Productivity Promise


Growing up in the 80's I was able to not just witness the rise of the personal computer, but to completely immerse myself in the mind boggling growth of the technology. In those days the immense curiosity surrounding PCs was often quickly followed by the question "what will it do for me?" The answer typically included a promise that a personal computer would make you more productive.

One of the most highly touted productivity tools was the word processor. With it you could create finished documents in a fraction of the time that it would take using the old method of typing, editing, and retyping. It worked ... for a while. Early PC word processors were little more than text editors. You typed in paragraphs of text, edited the content until it said what you wanted to say without spelling or grammatical errors, and then you saved or printed your document.

For casual correspondence and personal or company documents that was usually enough. For material destined for publication, worrying about making that material aesthetically pleasing was the job of a typesetter. The author (and their word processing software) simply generated and organized the content. When the author wanted to start a new section they would annotate the text accordingly. A new page starting with the line "Section - How to be a Typesetter" is all that was required. The author didn't worry about the fact that the section title needs to be formatted with a bold typeface in a larger font size with the correct section number, margins, etc. And since only the printing presses at a publishing company could produce text with those visual effects and since publishing companies keep professional typesetters on staff, that approach worked reasonably well.

How to be a Typesetter in 10 Easy Lessons


In the mid-80's everything changed. The Graphical User Interface and the Laser printer (two technologies mainstreamed by Apple Computer) suddenly brought publishing technology to the desktop. With desktop publishing came a new type of word processing software -- "What You See Is What You Get" (WYSIWYG). Now, your computer display could show your document as you edit it with the exact same appearance it would have when printed. Users could use proportional typefaces, switch fonts to emphasize chapter and section headings, adjust margins, use bold and italic faces for emphasis, and more. It seemed like a good idea at the time.

The problem is that most users didn't (and still don't) understand the "rules" of typesetting. Early WYSIWYG documents were often an eye straining jumble of random fonts and bad page layout. User's were forced to begin practicing the typesetter's art without any education or training. Years of bad documents have lead many writers to reinvent some of the basic rules of typesetting (eg. no more than two typefaces in a document), but it's been a long hard road.

The more insidious problem brought on by the proliferation of WYSIWYG is that authors have to be typesetters. And the interactive nature of WYSIWYG encourages most to try to wear both hats at the same time. The process of getting thoughts and ideas captured in the word processor is constantly interrupted by the distraction of managing the content's appearance. The net result is a sharp decline in productivity as every writer now takes on an additional part-time job as a typesetter.

Imagine, if you will, Twain or Poe spending hours playing with font settings trying to get the chapter headings to look just right. Most would see that as a colossal waste of talent. So why is it that authors today are so willing to waste their considerable talent worrying about how their document will be typeset? The simple fact of the matter is that most authors would be far better off abandoning their WYSIWYG word processors and creating their documentation in nothing more than a text editor. Without the distraction of typesetting, ideas can be captured efficiently without interruption or distraction. The writer remains totally focused on content because there is nothing else. When the time comes to produce a typeset version of the document, we go back to the old way of letting a skilled typesetter deal with that task. Of course nothing says that your "skilled typesetter" needs to be human.

Will Typeset for Electrons


There are a variety of computer programs that are designed to take plain text and typeset it for the screen or for the printed page. The most common example is your web browser. Now, in order for a software typesetter to work, it has to be able to identify the various parts of your document. You need to be able to tell it which lines are chapter and section headings and which are body text. You need to let it know which sections of text should be handled as bulleted or enumerated lists. You need to be able to identify words or phrases that should be typeset with emphasis such as italics or bold face.

The typesetter in your web browser relies on finding special symbols (also referred to as "tags" or "mark-up") embedded in the text. Those tags communicate the semantic nature of the text that follows. A first level section heading, for example, gets bracketed with the "tags" <H1> and </H1>. The HyperText Mark-up Language defines a rich set of such tags and authors who create documents specifically for the web often work by creating plain text documents marked up with HTML tags.

Unfortunately, HyperText Mark-up Language (HTML) was designed with an emphasis on the hypertext aspect of web pages with little thought given to the actual typesetting (especially in print output). New versions of HTML provide better support for typesetting, but the resulting tags are complicated, messy, and often distract from the task of writing just as much as WYSIWYG editing does.

Another popular software typesetter is LaTeX. LaTeX's claim to fame is it's ability to produce virtually pixel perfect print output. Like HTML, LaTeX defines a "language" for annotating your text such that the software will know what to do with it. Some may argue that LaTeX mark-up is less intrusive than HTML, but the fact remains that both are designed to be "computer friendly" and that places a substantial burden on the writer. The writer no longer has to deal with the intricacies of typesetting, but he or she is still required to interrupt the flow of ideas and bounce between their natural language and an artificial computer mark-up language.

Computers are Supposed to be Smart


What if I just open up my text editor of choice and create something like this:


Document Requirements
=====================

For engineering documentation to be *really* useful, it needs
to be produced, maintained, and stored in electronic formats.
Further, a few requirements must be imposed upon it:

* All developers need to be able to create, read, and modify
the documents.
* There must be a way to track document changes and recover
older versions.
* The files need to be usable for many years.


That's perfectly readable and I never once had to think about typesetting or switch between English and some computer language. It's fairly obvious that "Document Requirements" is a section title and the "*" characters represent list bullets. I didn't have to know that to get a proper heading I should select the "Document Requirements" line and then click on the "Styles" pull-down in the toolbar and select "Heading 1". Likewise, I didn't need to worry about changing the font size and weight of the heading text to make it stand out (in a text editor I couldn't even if I wanted to). I didn't have to remember that bracketing the text with <H1> and </H1> produces a first level heading, nor does my brain have to filter those cryptic tags when I go back and read the text. Now that's efficiency! The fact is, people have been writing like that in email for years (at least until recently when WYSIWYG started getting its ugly little fingers into email clients.

If the structure of the text in the example above can be intuitively obvious to you and I, it seems that with all the processing power in today's personal computers someone should be able to create software that can recognize that text structure as well.

reStructuredText


It just so happens that someone did! There is a program (actually a collection of programs) going under the nondescript name Docutils. With Docutils you can create plain text documents that follow a few simple natural (human friendly) formatting rules and then Docutils can "magically" figure out what's what and add in all the computer friendly typesetting mark-up after the fact. Pass Docutils output through LaTeX or a web browser or some other typesetting tool and you'll have a perfectly typeset document without ever thinking about typesetting details.

Those simple natural formatting rules are defined in a standard called reStructuredText. The example above follows the reStructuredText standard and gives you a glimpse of reStructuredText's most basic formatting rules. In addition to the headings and bullets the example shows, reStructuredText lets you create tables, numbered lists, footnotes, citations, bold, italics, hyperlinks, and even inline graphics ... all without any complex mark-up language.

reStructuredText won't let you arbitrarily change the font for a chunk of text to 28pt bold italic. Nor does it contain extensive features for embedding graphics other than ordinary bitmap (JPEG and GIF) images. If the task at hand is mostly visual (for example creating a marketing brochure), reStructuredText is not the way to do it. But if your focus is content and your goal is to put information and ideas into written words, then reStructuredText is perfect.

This entire document was written as reStructuredText. To get a better feel for what it's like to write and edit reStructuredText, check out the raw version and compare that to the PDF version. I challenge anyone to provide convincing evidence that creating the same article properly structured and typeset using a WYSIWYG word processor would be faster or easier. I'm not even going to venture into the discussion of storage efficiency, portability, longevity, and economics associated with using reStructuredText instead of native WYSIWYG word processor formats but it suffices to say that efficient writing is not the only potential benefit of reStructuredText.

My next article will dig a little deeper into the rules of reStructuredText and the tools in the Docutils package. However, the impatient need not wait. The Docutils website is loaded with information on both. And if you liked this article, [Cottrell] has another article on the evils of WYSIWYG that you may find interesting.





[Cottrell]Allin Cottrell Word Processors: Stupid and Inefficient.
http://www.ecn.wfu.edu/~cottrell/wp.html