Penguin

A "word processor" that is part of the MicrosoftOffice suite.


Some people use it for writing emails or webpages, and using Word's "save as html" feature. This is a bad idea, because it makes horrible HTML. This is (most likely) a deliberate act designed to make code that renders better when using the InternetExplorer browser compared to other browsers. Some example generated HTML:

<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 10 (filtered)">
...
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=1 color=navy face=Verdana><span
  style='font-size:9.0pt;font-family:Verdana;color:navy'>Hello,</span></font></p>
<p class=MsoNormal><font size=1 color=navy face=Verdana><span
  style='font-size:9.0pt;font-family:Verdana;color:navy'>&nbsp;</span></font></p>
...
<p class=MsoNormal><font size=1 color=navy face=Verdana><span
  style='font-size:9.0pt;font-family:Verdana;color:navy'>&nbsp;</span></font></p>

<table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0
 style='border-collapse:collapse;border:none'>
 <tr height=14 style='height:10.7pt'>
  <td width=476 colspan=2 height=14 valign=top style='width:357.1pt;
   border: solid windowtext 1.0pt;padding:0in 5.4pt 0in 5.4pt;height:10.7pt'>
  <p class=MsoNormal><b><font size=2 face=Verdana><span
   style='font-size:10.0pt;font-family:Verdana;font-weight:bold'>August </span></font></b></p>
  </td>
 </tr>
 <tr height=57 style='height:42.9pt'>
  <td width=69 height=57 valign=top style='width:51.4pt;border:solid windowtext 1.0pt;
    border-top:none;padding:0in 5.4pt 0in 5.4pt;height:42.9pt'>
  <p class=MsoNormal><font size=2 face=Verdana><span style='font-size:10.0pt;
    font-family:Verdana'>8-12</span></font></p>
  </td>
...

There is a tool that cleans up this mess; it is rightfully called demoroniser. Another (more generic) one is html tidy.


Apparently the latest versions of MS Word have a feature to produce cleaner HTML