My admin allows editing of some html fields using TinyMCE, so I end up with horrible code that contains lots of nested <p>
, <div>
, <span>
tags, and style properties which destroy my layout and consistence.
This tag based on lxml tries to kill as much unneeded tags as possible, and style properties. These properties can be customized by adapting the regex to your needs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | from lxml import html, etree
import re
register = Library()
css_cleanup_regex = re.compile('((font|padding|margin)(-[^:]+)?|line-height):\s*[^;]+;')
def _cleanup_elements(elem):
"""
Removes empty elements from HTML (i.e. those without text inside).
If the tag has a 'style' attribute, we remove the css attributes we don't want.
"""
if elem.text_content().strip() == '':
elem.drop_tree()
else:
if elem.attrib.has_key('style'):
elem.attrib['style'] = css_cleanup_regex.sub('', elem.attrib['style'])
for sub in elem:
_cleanup_elements(sub)
@register.simple_tag
def cleanup_html(string):
"""
Makes generated HTML (i.e. ouput from the WYSISYG) look almost decent.
"""
try:
elem = html.fromstring(string)
_cleanup_elements(elem)
html_string = html.tostring(elem)
lines = []
for line in html_string.splitlines():
line = line.rstrip()
if line != '': lines.append(line)
return '\n'.join(lines)
except etree.XMLSyntaxError:
return string
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 8 months ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 8 months, 1 week ago
- Serializer factory with Django Rest Framework by julio 1 year, 3 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 3 months ago
- Help text hyperlinks by sa2812 1 year, 4 months ago
Comments
Why not just use TinyMCE's 'valid_elements' option to control which tags it allows?
#
Please login first before commenting.