XHTML to HTML Wordpress plugin
This tiny plugin filters Wordpress’s output to produce HTML instead of XHTML. It is simple to use and will ensure that your Wordpress Blog stands at least a fighting chance of being properly standards compliant.
Background
The Wordpress platform is based with the best intentions, from boots to brow, on XHTML and has been ever since I’ve been using it. Now this is a shame really because that single fact may be preventing your Blog from being properly standards compliant.
Put your hands up if you are running a blog that serves documents using a MIME type of “application/XHTML+XML”? What’s that deathly silence I hear – what, nobody is doing that? Then in that case nobody (regardless of what doctype you are using) is serving proper XHTML and worse, no one is running a standards compliant website. Perhaps even worse still, you probably shouldn’t even try to use an XHTML MIME type on your website either. Now I bet you didn’t know that.
The vast, vast majority of people whose blogs are served as MIME type “text/html” should be using the HTML 4.01 doctype rather than XHTML. The issues surrounding this problem are considered unfortunately rather technical so I will endeavour to write a lay person’s guide to the subject soon, covering the issues a bit more simply than they’re are in most places I know of.
In the meantime there is a very good article by WebDevout about this which is worth reading.
What The Plugin Does
XHTML to HTML is a simple output filter that translates XHTML documents into valid HTML 4.01.
Do I need to write in HTML now?
No, you don’t have to change a thing. All your carefully coded XHTML will translate into pristine HTML 4.01 seamlessly. Remember, Wordpress is an XHTML platform – all headers, plugins, themes and filters assume you will be using XHTML as the output Doctype. Whilst this is, technically, wrong (unless you use the right MIME type) you can continue to write in XHTML if you want to. However, you could alternatively switch to writing in HTML 4.01 instead. You do not have to write sloppy hard to read code, keep your tags lowercase if you like, close your tags a’ la XHTML, use a DOCTYPE etc. because that’s all valid HTML too!
Installation
- Download the plugin using the link above and extract the ZIP archive onto your computer somewhere.
- Copy the folder “XHTML-to-HTML” to your /wp-content/plugins/ folder
- Activate the plugin in Wordpress
- That’s it!
Development
The plugin really does very little, which is not a bad thing for an output filter. It has no bugs I’m aware of, but suggestions for improvements are always welcome.
- 1.
- Microsoft browsers do not support XML (XHTML is a type of XML). IE7 has limited support but lower versions have none whatsoever. So, unless you’d like to banish that audience from your blog you can’t even consider trying to use XHTML properly – that is, by using the correct MIME type.

About the author

October 16th, 2008 at 2:41 pm
[...] El HTML también es válido. El código de la plantilla valida en HTML, pero Wordpress a veces añade códigos en XHTML, lo que hace que no valide. Para solucionarlo puedes instalar este sencillo plugin [...]
October 16th, 2008 at 3:05 pm
This plugin seemed to be a quick solution of “XHTML vs HTML” issue, but I had a week of nightmares getting to work Wordpress built-in WYSIWYG plugin (as well as TinyMCE Advanced) instead. None of the solutions from Wordpress support blog helped. The only cause of the problem was “XHTML to HTML” plugin. I was able to get WYSIWYG to works as soon as this plugin was disabled.
You can find an issue an several solutions to WYSIWYG issue here.
http://wordpress.org/support/topic/164990
BTW, this plugin also changes the look of “Ozh’ Admin Drop Down Menu” plugin. It is not as serious as previous issue, but it can be a problem to somebody with small screen resolution. The issue is that white spaces appear between menu entires, and this extremely enlarge menus.
If you ask me about how to fix those (and many other issues that I didn’t experienced, but somebody probably did) issues, I would suggest to make this plugin work only on the frond end of Wordpress and disable it on the backend. So that admin area will be left untouched, and this should minimize plugin incompatibilities.
Thank you in advance. Waiting for your answer.
October 16th, 2008 at 3:44 pm
Hi Edward,
I’m sorry you’ve been having trouble with the plugin. This is the first problem that’s been reported, so I’d like to get to the bottom of it.
Please let me know what version of Wordpress you’re using. Also, please can you email me the source code of your dashboard page (or whichever page the plugin is messing up) so I can see what’s going on.
I’ll get on to this as soon as possible and hopefully get it fixed for you.
By the way, you make an interesting point about enabling the plugin solely for ‘frontend’ pages. I’ll look at that too.
John.
October 16th, 2008 at 3:49 pm
Dear Milbits,
I’m sorry but my language skills don’t extend to Spanish (I assume you’re spanish) - I’ll try to get your message translated. Anyway, thanks for the pingback.
October 16th, 2008 at 5:23 pm
Unfortunately, I don’t know your email to send you the source code of my dashboard, but you know mine. Could you send me an email so I’ll reply with attachments?
Thank you for your swift reply.
December 5th, 2008 at 5:49 pm
John,
Nice plugin. I’ve converted it to work with Textpattern, and I’ve added a couple lines to get rid of the xml:lang=”foo” bits and some erronious spaces at the end of tags in the header. Here’s what I ended up with on lines 13 and on:
December 5th, 2008 at 5:51 pm
Haha! The plugin works for comments, too, I guess.
December 5th, 2008 at 8:33 pm
Good work Jonathan.
I’ve yet to test your changes but, out of interest, I’m not sure what your $xhtml[7] pattern changes that isn’t already changed by $xhtml[3]. $xhtml[7] just strips out spaces before the closing angle bracket, which isn’t erroneous in either XHTML or HTML. Unless I’ve missed something there.
Glad you found the plugin useful.
John.
December 5th, 2008 at 8:47 pm
Just thought I’d write a quick update regarding Edward’s query.
Turns out the Ozh plugin wasn’t well-formed and was thus being mangled by XHTML-to-HTML (XTH). However, there is a bug in XTH and it’s interactions with certain TinyMCE functions. This affects the Admin interface only, not the frontend.
In short, XTH needs to ignore [CDATA], in particular, Javascript regex, which obviously use the forward slash extensively. Thus unfortunately at the moment XTH clobbers JS regex.
I’ve simply (and with apologies) not had time to sit down and address this but plan to do so soon as I can. In the meantime it’s still safe to use XTH with your blog’s frontend, though it might cause bugs with the dashboard if you use the Visual editor (I don’t use it which is why I never noticed the backend bug).
If anyone gets time to address this issue before I do, please post your solution here.
December 5th, 2008 at 10:20 pm
I noticed that there were some tags that still had a space before the bracket. Adding that line removed them. It seems that [3] was cleaning the slash, but not the space.
Then again, [4] should do that for these particular tags. I added [4] after [7]. Some tags are formatted as , others as , and still others as . So [2], [3], and [4] may take care of it without the need of [7].
December 5th, 2008 at 10:21 pm
blah. That should have been
,, and.December 5th, 2008 at 10:22 pm
Let’s try it with HTML entities:
<foo/>, <foo;/ >, and <foo />
December 6th, 2008 at 5:28 pm
Hi Jonathan,
I’m not really a PHP programmer, my main language is Perl. But as far as I know PHP regex are modelled quite closely on Perl regex, and in Perl, $xhtml[3] would clear any forward slash followed by one or more spaces followed by an angle bracket.
If a tag is closed a la HTML, without a slash, it won’t remove trailing space. This is by design, as trailing space is a matter of coding style/preference and is perfectly valid in HTML.
The objective of XTH is to create valid HTML, not stylistically and subjectively pleasing HTML. Seeing as the effects of [4] and [7] are simply stylistic (they don’t affect validity) I’m ambivalent about including them though of course people are free to hack the plugin around to suit their preferred coding styles.
If you’re sure [4] and [7] address validity rather than style, then I would be interested to see an example of an XHTML closing tag that is fixed by $xhtml[7] but not by $xhtml[3], as this might show an important difference between Perl and PHP regex engines.
Cheers again,
John.
December 15th, 2008 at 5:47 am
Hey John,
As you pointed out, [7] is simply for removing space at the end. I didn’t know it wasn’t errouious. I thought HTML 4.01 Strict was a bit less lenient in this regard. Such as using lower-case tags. But again, that’s a matter of which DOCTYPE is chosen. So you’re right in saying something like that should be added by the end user.
[4] isn’t stylistic. I’ve seen tags closed three ways: <foo/>, <foo />, and <foo/ >. [4] just takes care of the last one. But three lines could possibly be reduced to one with something like:
If that’s even possible in Perl/PHP regex style. That’s more like .htaccess style! I don’t know how it would be formed in this case.
December 15th, 2008 at 10:13 am
Hi Jonathan,
I’m trying politely to tell you that you are being seduced by a red herring. As I said, the bit of code you added [4] and [7], whilst you are free to add it, actually creates no improvement in validity over the original.
Please read the HTML and XHTML specifications and try running code throught the W3C validators and you’ll see that this is true. I understand you want your code to look a certain way - that’s fine, you can do that, however please understand that it IS a stylistic and not a functional improvement.
For reference you should know that the ONLY valid XHTML self-closing tag is ‘/>’. Space around those two characters is utterly irrelevant and space between them is utterly invalid. XTH already fixes ALL valid XHTML self-closing tags.
BTW, all versions of html are case insensitive. Html Strict and Transitional are mainly different in their support for deprecated tags. If you want to learn about the differences there is some good info on the W3C website.
happy reading,
John.
February 17th, 2009 at 8:09 am
[...] CSS is valid. HTML is also valid but WordPress sometimes add code in XHTML, so it doesn’t validate. To solve this trouble, you could install this plugin [...]
February 20th, 2009 at 10:06 pm
this is brilliant and very informative plug in, I am facing an rss issue though, which occurs when I have your plug in activated, the feeds are not working for me no matter what.
Would you have any suggestion about this?
February 21st, 2009 at 4:20 pm
Hi Calliope,
Unfortunately the plugin is still young and unruly and it interferes with things like Javascript CDATA which it shouldn’t. Also bearing in mind that your RSS feeds are XML I’m not utterly surprised you’re having a problem
Can you give me more information about what the set up is (other plugins, how are you doing the rss) and any error messages you get to help me target the problem better.
It’s all talk at the moment, but we’re about to start work on a new version of XTH which will only filter wordpress frontend pages and posts (so no backend or rss filtering). We’ll clean up the parser too. Hope to have this out before Easter.
John.
March 11th, 2009 at 6:32 pm
Thank you for this! Saved me a lot of frustration when a quick Google search led me here.
May 20th, 2009 at 7:38 am
Hi,
sorry i’am not that skilled in this topic- so maybe you can enlighten me..
after despairing on my rss2 feed i found out your plugin causes some problems with rss2 validation.
2 examples:
- a proper enclosure tag should look like:
-> with your plugin it look it looks like:
- including the atom link should look like this:
-> this one doesn’t work:
I guess.. because rss is xml.. it insists on being validate for xml and not for html? - either we need to remove whitespace (i didn’t test it) or to stop the plugin for feed filtering
so.. again - i have not really an idea what i talking about - any help would be appreciated
reagards
Chris
May 20th, 2009 at 7:39 am
ok.. no code - anyway - this one: text /> ends up this way: text >
May 20th, 2009 at 12:05 pm
Hi Chris,
RE your 2nd message.
That behaviour is what XTH is supposed to do: remove xhtml end tags, leaving you with a valid html end tag.
There is a problem with XTH and certain sorts of RSS feed and also with some javascript. I haven’t had the time to make it fully compatible with either yet - I was supposed to do this at easter but am so busy I haven’t had time.
I’m afraid as it stands you might have to disable XTH if you decide that your RSS feed is more important. I will release a new version as soon as I can.
best wishes,
John.
May 25th, 2009 at 12:16 pm
Hi John - yes i know that’s what it is supposed to do.. but it seems this is not so good for the rss sites - i extended your plugin (with my limited knowledge.. so probably there is an more elegant way..) in checking for the current url - excluding all rss url’s..
check url: http://www.bradino.com/php/get-current-url/
June 11th, 2009 at 9:42 am
Hi John, nice plugin. The reason I like your idea so much is that, while I’m seriously bending toward moving “back” to HTML, I’m addicted to the W3C validator’s precision in checking my markup which I’ll lose if I code in native HTML4, even /Strict. I love having my cake and eating it too.
The criticism that your logic tramples CDATA is significant. You could lessen the likelihood that your plugin would interfere with non-HTML elements by being more specific about those empty element closures: Instead of replacing ‘/\/>/’ with ” you could replace ‘@(@’ with ‘$1>’.
Or specify all of the empty XHTML elements:
‘@(@’
You could also improve on .* by excluding ; and other characters that might occur in JavaScript and more to the point aren’t permitted in valid XHTML.
To be rigorous, though, you should really smarten your code enough to exclude inline script, CDATA, and HTML comments. In PHP there’s a handy fourth argument ‘flags’ to preg_split() that lets you capture the delimiters that split a string into an array. Using this you can splinter a page of HTML with several patterns at once such as:
‘((<!–)|(<\/)|(<))’ to split on <!–, </, and <
Then as you walk the output array it’s easy to see which delimiting strings precede each chunk of content, and perform operations on only the ones you want. Applying this technique to <script, <style, and XHTML’s funky CDATA escapes would be easy.
Finally, a minor coding note: using slashes as your regexp delimiters means having to escape the slashes that occur within your patterns. Since those slashes are integral to the task at hand, why not simply use another delimiter (I chose @ above) to make your code easier for us hyoomins to read?
Regards,
Paul
June 11th, 2009 at 10:28 am
Hey Paul,
>> I’m seriously bending toward moving “back” to HTML
I like to see it more as a preparatory move forwards to the next markup standard, HTML 5
Thanks for your comments on the XTH plugin. We’ve been aware for many months that the it dorks on cdata and some RSS feeds. It was never meant as more than a ready-roll 5-minute monkey wrench to get the job done for now. Trouble is ‘for now’ has lasted eight months and there’s no obvious sign when there’ll be enough time to come back to it. Though it’ll happen at some point.
If we’d intended to be ‘rigorous’ there’s a number of things we’d have done, including choosing a wholely different way of accomplishing this task
But as far as your difficulty reading the delimiters is concerned: I’m sorry if you find it hard to read, my background is in Perl rather than PHP and I’ve noticed generally that Perl programmers seem to have fewer probs reading escapes than do people who use PHP.
Basically, feel free to change the code as you like! If you get your ideas working please send us a copy.
best wishes,
John.
June 28th, 2009 at 4:09 am
P.S. Watch out for those pesky quotes/double quotes. I neglected to encode them before posting.
P.P.S. I love this plugin — thank you for promoting forward movement in html standards. Do you accept donations?