keeptags: strip all HTML tags from output except a specified list of elements

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
def keeptags(value, tags):
    """
    Strips all [X]HTML tags except the space seperated list of tags 
    from the output.
    
    Usage: keeptags:"strong em ul li"
    """
    import re
    from django.utils.html import strip_tags, escape
    tags = [re.escape(tag) for tag in tags.split()]
    tags_re = '(%s)' % '|'.join(tags)
    singletag_re = re.compile(r'<(%s\s*/?)>' % tags_re)
    starttag_re = re.compile(r'<(%s)(\s+[^>]+)>' % tags_re)
    endtag_re = re.compile(r'<(/%s)>' % tags_re)
    value = singletag_re.sub('##~~~\g<1>~~~##', value)
    value = starttag_re.sub('##~~~\g<1>\g<3>~~~##', value)
    value = endtag_re.sub('##~~~\g<1>~~~##', value)
    value = strip_tags(value)
    value = escape(value)
    recreate_re = re.compile('##~~~([^~]+)~~~##')
    value = recreate_re.sub('<\g<1>>', value)
    return value

Comments

chrominance (on July 30, 2007):

Yeah, so upon further thought I think this approach is ultimately too fragile; the escape(value) breaks things, for example, and the regexes aren't exactly the nicest things in the world. http://www.djangosnippets.org/snippets/205/ is probably a better way of doing this; it's trivial to modify it to allow template authors to specify the allowed tags.

#

(Forgotten your password?)

You may use Markdown syntax here, but raw HTML will be removed.