Mathematical Captcha

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Copyright (c) 2007, Dima Dogadaylo (www.mysoftparade.com)

import re
import sha
import pickle
import base64
import time
from random import randint
from django import newforms as forms
from django.conf import settings

class MathCaptchaForm(forms.Form):
    """Lightweight mathematical captcha where human is asked to solve
    a simple mathematical calculation like 3+5=?. It don't use database
    and don't require external libraries.
        
    From concatenation of time, question, answer, settings.SITE_URL and
    settings.SECRET_KEY is built hash that is validated on each form
    submission. It makes impossible to "record" valid captcha form
    submission and "replay" it later - form will not be validated
    because captcha will be expired.
    
    For more info see:
    http://www.mysoftparade.com/blog/improved-mathematical-captcha/
    """
    A_RE = re.compile("^(\d+)$")
    
    captcha_answer = forms.CharField(max_length = 2, required=True,
        widget = forms.TextInput(attrs={'size':'2'}))
    captcha_token = forms.CharField(max_length=200, required=True,
        widget=forms.HiddenInput())
    
    def __init__(self, *args, **kwargs):
        """Initalise captcha_question and captcha_token for the form."""
        super(MathCaptchaForm, self).__init__(*args, **kwargs)
        # reset captcha for unbound forms
        if not self.data:
            self.reset_captcha()

    def reset_captcha(self):
        """Generate new question and valid token
        for it, reset previous answer if any."""
        q, a = self._generate_captcha()
        expires = time.time() +\
        getattr(settings, 'CAPTCHA_EXPIRES_SECONDS', 60*60)
        token = self._make_token(q, a, expires)
        self.initial['captcha_token'] = token
        self._plain_question = q
        # reset captcha fields for bound form
        if self.data:                   
            def _reset():
                self.data['captcha_token'] = token
                self.data['captcha_answer'] = ''
            if hasattr(self.data, '_mutable') and not self.data._mutable:
                self.data._mutable = True
                _reset()
                self.data._mutable = False
            else:
                _reset()

    def _generate_captcha(self):
        """Generate question and return it along with correct answer."""
        a, b = randint(1,9), randint(1,9)
        return ("%s+%s" % (a,b), a+b)

    def _make_token(self, q, a, expires):
        data = base64.urlsafe_b64encode(\
            pickle.dumps({'q': q, 'expires': expires}))
        return self._sign(q, a, expires) + data
    
    def _sign(self, q, a, expires):
        plain = [getattr(settings, 'SITE_URL', ''), settings.SECRET_KEY,\
                 q, a, expires]
        plain = "".join([str(p) for p in plain])
        return sha.new(plain).hexdigest()
    
    @property
    def plain_question(self):
        return self._plain_question
    
    @property
    def knotty_question(self):
        """Wrap plain_question in some invisibe for humans markup with random
        nonexisted classes, that makes life of spambots a bit harder because
        form of question is vary from request to request."""
        digits = self._plain_question.split('+')
        return "+".join(['<span class="captcha-random-%s">%s</span>' %\
                         (randint(1,9), d) for d in digits])

    def clean_captcha_token(self):
        t = self._parse_token(self.cleaned_data['captcha_token'])
        if time.time() > t['expires']:
            raise forms.ValidationError("Captcha is expired.")
        self._plain_question = t['q']
        return t
        
    def _parse_token(self, t):
        try:
            sign, data = t[:40], t[40:]
            data = pickle.loads(base64.urlsafe_b64decode(str(data)))
            return {'q': data['q'],
                    'expires': float(data['expires']),
                    'sign': sign} 
        except Exception, e:
            import sys
            sys.stderr.write("Captcha error: %r\n" % e)
            raise forms.ValidationError("Invalid captcha!")
        
    def clean_captcha_answer(self):
        a = self.A_RE.match(self.cleaned_data.get('captcha_answer'))
        if not a:
            raise forms.ValidationError("Number is expected!")
        return int(a.group(0))
        
    def clean(self):
        """Check captcha answer."""
        cd = self.cleaned_data
        # don't check captcha if no answer
        if 'captcha_answer' not in cd:
            return cd

        t = cd.get('captcha_token')
        if t:
            form_sign = self._sign(t['q'], cd['captcha_answer'],
                                   t['expires'])
            if form_sign != t['sign']:
                self._errors['captcha_answer'] = ["Are you human?"]
        else:
            self.reset_captcha()
        return super(MathCaptchaForm, self).clean()

Comments

mointrigue (on December 11, 2007):

I mentioned the same thing on your blog, but you should probably do some checking to make sure that the captcha question has not been changed during the post.

You could solve the problem by making adding hash of the values concatenated with your settings.SECRET_KEY, maybe even additional data as well, which is similar to how the comments app works to begin with. Just toss the hash in a hidden field and compare the hash to the forms to confirm that the initial question remained the same between display and post.

Should be relatively easy to implement.

#

mointrigue (on December 11, 2007):

As we discussed on your blog, what I'm suggesting doesn't solve the problem of replay attacks, but at least means that each attack has to be customized for a particular site.

#

dogada (on December 20, 2007):

The issue with replay attacks was addressed in the Improved Mathematical Captcha

#

tehmaze (on December 21, 2007):

Using unpickle is extremely insecure, quoting http://www.python.org/doc/2.2.3/lib/pickle-sec.html:

"However, for unpickling, it is never a good idea to unpickle an untrusted string whose origins are dubious, for example, strings read from a socket. This is because unpickling can create unexpected objects and even potentially run methods of those objects, such as their class constructor or destructor."

#

trbs (on December 24, 2007):
  1. Don't think we even need the pickle to store these numbers...
  2. Would like it to be 2 + ? = 3, because 2+3=? could be evaluated with answer = eval(document.getElementById('captcha')); in javascript. (Or at least with some more javascript trickery) When the questionmark is in the middle it will be much harder to script it together or at least the scriptkiddie has to write a math reserval where 3 - 2 = answer instead of just plainly running the question through eval....

#

(Forgotten your password?)

You may use Markdown syntax here, but raw HTML will be removed.