View on GitHub

ExSponge

A real-time HTML filter and Rich Text / WYSIWYG / Microsoft Word cleanup plugin for ExpressionEngine

Download this project as a .zip file Download this project as a tar.gz file

This plugin cleans up the mess your clients leave behind!

Whether your markup was entered via WYSIWYG (Rich Text) editors (such as TinyMCE, CKEditor, FCKEditor, Expresso, Wyvern, Wygwam, Blogger's online editor, and ExpressionEngine's own built-in Rich Text Editor), pasted in from Microsoft Word or Adobe InDesign, or bulk-imported from XML or another CMS, ExSponge leaves it properly formatted and free of layout-breaking cruft.

It will also optionally remove all tags, or keep only the tags you want. And you can even trim the fully filtered, cruft-free content down to a specified number of paragraphs.

This plugin is for developers who want neatly formatted paragraphs with minimal, semantic styling, and who do not want the proprietary tags and unnecessary parameters inserted by word processors (or the "tag soup" unwittingly generated by clients) compromising their layout.

Although undoubtedly less comprehensive than HTML TIDY or HTML Purifier, it is also more efficient, easier to set up, and focused on the specific problems you will likely encounter if you give your clients a WYSIWG field with which to edit their channel entries. In my worst-case scenario (a Microsoft Word document exported to HTML and pasted into an EE Rich Text field), ExSponge reduced the data size by 97% without any loss in content.

Some of what is removed by default:

In addition, special HTML entities are converted to their ASCII equivalents, special Word characters are converted to UTF-8, unterminated tags are closed, and non-breaking spaces (@ @) are converted to normal spaces. If tables tags are to be removed, table text is reformatted first. Paragraph formatting is given special attention, and missing paragraph start and end tags are inserted where needed.

The final output will be compact, tidy, and ready to use in your layout.

Demo

A live demonstration of ExSponge is available here:

http://fcgrx.com/sponge

Installation

Place the "ex_sponge" folder in your "system/expressionengine/third_party" folder.

Parameters

All parameters are optional:

allow_tags - Remove all HTML tags from the markup and leave only raw, unformatted text ("no"), or remove most tags but keep the most useful and safe ( "safe", which is the equivalent of "<a><i><em><strong><cite><code><ul><ol><li><dl><dt><dd><img><h1><h2><h3><h4><h5><h6><br><p><b><blockquote>"), strip most tags but the minimum ("minimal", which is the equivalent of "<p><br><b><a><i><em><strong><ul><ol><li><img><h1><h2><h3><h4><h5><h6><blockquote>"), or strip all tags except the ones you list. Tip: if you set this parameter to "<p>", text will be reduced to paragraphs only. Note that out-of-scope tags (html, head, link, header, footer etc) will be removed regardless. (Default = "safe")

allow_breaks - Allow <br> tags to remain as-is ("yes"), or convert double-breaks (<br><br>) to paragraphs while leaving single breaks alone ("single"), or consolidate all breaks into paragraphs ("no"). (Default = "no")

allow_parameters - Allow tag parameters to remain ("yes"), strip all but the most necessary ("no", which is the equivalent of "alt|href|src|title"), or strip all parameters except the ones you list. (Default = "no")

convert_tags - Convert presentational tags <i> and <b> and <s> and <strike> to the semantic <em> and <strong> and <del> and <ins> ("yes"), or leave them as-is ("no"). (Default = "yes")

paragraphs - Clip the text after a specified number of paragraphs. Any positive number ("1", "4", "9999") will cause the text to be trimmed. "-1" will not clip the text at all. (Default = "-1")_

Usage

To use this plugin, simply wrap the text you want processed between these tag pairs:

{exp:ex_sponge}
    ( your mess goes here )
{/exp:ex_sponge}

In my templates, I typically wrap the above tag (with no parameters) around the output of any Rich Text or WYSIWYG field the client is allowed to edit.

A more complex example, which reduces the markup down to the basics, keeps only the first four paragraphs, and takes advantage of EE's built-in tag caching:

{exp:ex_sponge allow_tags="<p><strong><em><ul><li>" 
 allow_attributes="href|src|alt|title"
 paragraphs="4" cache="yes" refresh="1440"}
    ( your mess goes here )
{/exp:ex_sponge}

License

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License http://creativecommons.org/licenses/by-sa/3.0/us/

Contact

Support / Feature Requests

This project is an active part of all my ExpressionEngine installations, and I'd like to keep it as fast, full-featured and bulletproof as possible.

Have a bug? Feature request? Please create an issue on GitHub at https://github.com/fcgrx/ex_sponge/issues