Login

Sanitize HTML filter with tag/attribute whitelist and XSS protection

Author:
harrym
Posted:
July 27, 2009
Language:
Python
Version:
1.0
Score:
0 (after 2 ratings)

Reworked version of this snippet that now accepts an argument so the user can specify which tags to allow, and which attributes should be allowed for each tag. Argument should be in form tag2:attr1:attr2 tag2:attr1 tag3, where tags are allowed HTML tags, and attrs are the allowed attributes for that tag.

It also uses code from this post on stack overflow to add XSS protection.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from django import template
from BeautifulSoup import BeautifulSoup, Comment
import re

register = template.Library()

def sanitize(value, allowed_tags):
    """Argument should be in form 'tag2:attr1:attr2 tag2:attr1 tag3', where tags
    are allowed HTML tags, and attrs are the allowed attributes for that tag.
    """
    js_regex = re.compile(r'[\s]*(&#x.{1,7})?'.join(list('javascript')))
    allowed_tags = [tag.split(':') for tag in allowed_tags.split()]
    allowed_tags = dict((tag[0], tag[1:]) for tag in allowed_tags)

    soup = BeautifulSoup(value)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()

    for tag in soup.findAll(True):
        if tag.name not in allowed_tags:
            tag.hidden = True
        else:
            tag.attrs = [(attr, js_regex.sub('', val)) for attr, val in tag.attrs
                         if attr in allowed_tags[tag.name]]

    return soup.renderContents().decode('utf8')

register.filter(sanitize)

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 10 months, 3 weeks ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 11 months ago
  3. Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
  5. Help text hyperlinks by sa2812 1 year, 7 months ago

Comments

ronnie (on May 20, 2011):

This script does not protect to XXS attacks

Try the following string: <script><script type="text/javascript">alert("ok");<</script>/script>

It results in: <script type="text/javascript">alert("ok");</script>

#

Please login first before commenting.