Login

Sanitize HTML filter with tag/attribute whitelist and XSS protection

Author:
harrym
Posted:
July 27, 2009
Language:
Python
Version:
1.0
Score:
0 (after 2 ratings)

Reworked version of this snippet that now accepts an argument so the user can specify which tags to allow, and which attributes should be allowed for each tag. Argument should be in form tag2:attr1:attr2 tag2:attr1 tag3, where tags are allowed HTML tags, and attrs are the allowed attributes for that tag.

It also uses code from this post on stack overflow to add XSS protection.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from django import template
from BeautifulSoup import BeautifulSoup, Comment
import re

register = template.Library()

def sanitize(value, allowed_tags):
    """Argument should be in form 'tag2:attr1:attr2 tag2:attr1 tag3', where tags
    are allowed HTML tags, and attrs are the allowed attributes for that tag.
    """
    js_regex = re.compile(r'[\s]*(&#x.{1,7})?'.join(list('javascript')))
    allowed_tags = [tag.split(':') for tag in allowed_tags.split()]
    allowed_tags = dict((tag[0], tag[1:]) for tag in allowed_tags)

    soup = BeautifulSoup(value)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()

    for tag in soup.findAll(True):
        if tag.name not in allowed_tags:
            tag.hidden = True
        else:
            tag.attrs = [(attr, js_regex.sub('', val)) for attr, val in tag.attrs
                         if attr in allowed_tags[tag.name]]

    return soup.renderContents().decode('utf8')

register.filter(sanitize)

More like this

  1. codigo alto nivel by MrRocklion 1 month ago
  2. Load template from specific app by Krzysiek555 1 month, 3 weeks ago
  3. PostgreSQL JSON subqueries by dolamroth 1 month, 3 weeks ago
  4. "Magic Link" Management Command by webology 7 months ago
  5. Closest ORM models to a latitude/longitude point by simonw 7 months ago

Comments

ronnie (on May 20, 2011):

This script does not protect to XXS attacks

Try the following string: <script><script type="text/javascript">alert("ok");<</script>/script>

It results in: <script type="text/javascript">alert("ok");</script>

#

Please login first before commenting.