Better HTML from Word 2007 Wednesday, November 14 2007
One thing I find myself doing often is copying text from word into a web page (a really common example is when a client sends over privacy policy page content). Most of you will know that Word sucks when it comes to outputting HTML and provides you with much more markup than you need (there is a good reason for this. Word outputs markup specific to Word so when you open a file saved by Word there is no loss of data/formatting but this isn't needed for the web).
For example, if I was to write 'Hello World', make it bold and italic in, Word would create this markup:
<p class="MsoNormal"> <b style='mso-bidi-font-weight: normal'><i style='mso-bidi-font-style: normal'> Hello world<o:p></o:p> </i></b> </p>
There is a slightly better way of getting HTML out of Word however, and although it's still not as clean as you'd do by hand, it's a big improvement. Simply click the 'Office Button' in the top left corner of Word, and select 'Save As'. You should see in the 'Save as Type' drop down an option for 'Web Page, Filtered'. This will produce some much friendlier markup. This is how what the filtered HTML looks like:
<p class=MsoNormal><b><i>Hello world</i></b></p>







