On Word and HTML

I won’t heap any more scorn on MS Word as an HTML generator; the flaws of Word in that regard are fairly well documented.

If I have a Word document I want to render in HTML, I save the file to a plain-text file, stripping out every little formatting quirk, and then build the HTML coding around the plain-text in an HTML editor. (I use two different editors, depending on what I need to do; Arachnophilia, for quickie jobs, Hot Dog Professional, for heavy duty jobs that require tables, forms, that sort of thing.) If it’s a Trek story, re-italicizing a ship name isn’t difficult; just do a search-and-replace to put the italics tags around the name of the ship. I don’t do a lot of italicizing outside of that, nor do I do much in the way of boldface or underlining.

Style sheets. (I said I wasn’t going to heap more scorn on Word; well, I’ve changed my mind.) The one thing Word doesn’t do is style sheets; the layout of a Word HTML document is embodied in the document itself. One of the truly sucky jobs I did a few years ago was to take a very large HTML document that had been produced in Word and redo it in a style sheet, because the maintainer of the document wanted to be able to update the document in the future easily. The problem was that the signal-to-noise ratio in the document was extremely low–Word puts way too many formatting tags into the document, sometimes in the middle of words–and I spent a week stripping out the document of all the tags, going line by line. Generating a style sheet to handle the formatting took all of about fifteen minutes, and reduced the size of the original document by 70 percent.

Don’t get me wrong, there’s a reason I like writing in Word, because it does a lot. But there’s also a reason why I convert my documents over to another format–plain-text or WordPerfect 5.1–if I need to do something else with them. Word produces really bloated documents, when it doesn’t need to.

Leave a Reply

Your email address will not be published. Required fields are marked *