Login

Sanitize text field HTML (here from the Dojo Toolkit Editor2 widget)

Author:
akaihola
Posted:
April 10, 2007
Language:
Python
Version:
.96
Score:
2 (after 2 ratings)

When using a JavaScript WYSIWYG editor widget for text area content, the resulting HTML should be sanitized so no unallowed HTML tags (esp. script tags) are present.

The BeautifulSoup library handles HTML processing in the solution presented above, so you should place it in the Python path.

The snippet also assumes that you have the Dojo Toolkit and its Editor2 widget loaded on your page.

Note: this snippet was originally written for use with Dojo Toolkit 0.4, and it hasn't been updated for 0.9 or 1.0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from django import newforms as forms
from BeautifulSoup import BeautifulSoup, Comment

class Editor2Field(forms.CharField):

    widget=forms.widgets.Textarea(attrs={'dojoType': 'Editor2'})

    valid_tags = 'p i strong b u a h1 h2 h3 pre br img'.split()
    valid_attrs = 'href src'.split()

    def clean(self, value):
        """
        Cleans non-allowed HTML from the input.
        """
        value = super(Editor2Field, self).clean(value)
        soup = BeautifulSoup(value)
        for comment in soup.findAll(
            text=lambda text: isinstance(text, Comment)):
            comment.extract()
        for tag in soup.findAll(True):
            if tag.name not in self.valid_tags:
                tag.hidden = True
            tag.attrs = [(attr, val) for attr, val in tag.attrs
                         if attr in self.valid_attrs]
        return soup.renderContents().decode('utf8')


class TestForm(forms.Form):
    title = forms.CharField()
    content = Editor2Field()

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 2 months ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 2 months, 1 week ago
  3. Serializer factory with Django Rest Framework by julio 9 months, 1 week ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 9 months, 4 weeks ago
  5. Help text hyperlinks by sa2812 10 months, 3 weeks ago

Comments

guettli (on November 16, 2007):

Nice snippet!

#

marcink (on February 10, 2008):

This is nice, but you should also look into href attributes to make sure they don't contain javascript code.

#

akaihola (on April 21, 2008):

marcink: Thanks for the heads up. It's obviously a fatal mistake to have left out that check.

#

Please login first before commenting.