Login

strip_tags like php one

Author:
homm
Posted:
April 29, 2010
Language:
Python
Version:
1.1
Score:
0 (after 0 ratings)

Usage:

clean_html = strip_tags(html, {'a': ['href'], 'p': ['class']})

Based on another snippet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def strip_tags(text, valid_tags={}):
    from BeautifulSoup import BeautifulSoup, Comment
    
    soup = BeautifulSoup(text)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()
    for tag in soup.findAll(True):
        if tag.name in valid_tags:
            valid_attrs = valid_tags[tag.name]
            tag.attrs = [(attr, val.replace('javascript:', '')) 
                for attr, val in tag.attrs if attr in valid_attrs]
        else:
            tag.hidden = True
    return soup.renderContents().decode('utf8')

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 2 months, 2 weeks ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 2 months, 3 weeks ago
  3. Serializer factory with Django Rest Framework by julio 9 months, 2 weeks ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 10 months, 1 week ago
  5. Help text hyperlinks by sa2812 11 months ago

Comments

simon (on May 4, 2010):

This snippet is not enough to protect against malicious input from users - for example, a URL with an href of javascript : alert('evil') would bypass the filter here and would probably still work in most browsers. Sanitising HTML is a very, very hard problem with an awful lot of edge cases - I'm sure there are plenty of other holes in the above code.

#

homm (on May 4, 2010):

No one my browser run «javascript : alert('evil')» as javascript. All of they try open file with same name.

#

Please login first before commenting.