Login

Fake File Uploads

Author:
rfk
Posted:
January 27, 2009
Language:
Python
Version:
1.0
Score:
1 (after 1 ratings)

In-browser testing frameworks (I'm using Windmill) have trouble testing file uploads because javascript's security policy prevents them from setting the value of file input fields. Instead the tests must issue some sort of "fake" file upload request, but implementing this on an ad-hoc basis quickly gets ugly.

This middleware is designed to support fake file uploads as transparently and as thoroughly as possible. For example, it is careful to properly trigger any file upload handlers so that things like upload progress reporting will work correctly. It can also simulate a slow file upload by sleeping between reads from the file.

From the client-side point of view, each input field of type "file" has a similarly-named hidden field automatically prepended. Test scripts can simply set the value of this hidden field to trigger a fake upload, rather than having to set the value of the file input field itself.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
#
#  Copyright 2009, Ryan Kelly.
#  Redistributable under the BSD license, just like Django itself.
#  

from StringIO import StringIO
import time
import re

from django.conf import settings
from django.utils.datastructures import MergeDict
from django.http.multipartparser import MultiPartParser


class FakeFileUploadMiddleware:
    """Middlware to fake the upload of some files by the client.

    This middleware can be used to simulate a file upload by the client.
    You might use this, for example, to work around JavaScript security
    restrictions on file uploads when performing in-browser testing.

    The details of the fake files available for upload must be specified
    in the Django setting FAKEUPLOAD_FILE_SPEC, which is a mapping from
    string ids to dictionaries of file properties.  The supported properties
    are as follows:

        * filename:     the name of the file, for display purposes
        * contents:     the contents of the file as a string
        * file:         the server-side file to use for contents
        * chunk_size:   maximum chunk size at which to read from the file
        * sleep_time:   time to sleep between successive reads of the file

    The properties allow you to simulate a variety of upload conditions,
    such as large files and slow uploads.  For example, the following setting
    would make "test1" a fake file containing with contents "I am a testing
    file" that will take 8 seconds to read:

        FAKEUPLOAD_FILE_SPEC = {
          "test1": { "filename"     : "test1.txt",
                     "contents"     : "I am a testing file",
                     "chunk_size"   : 5,    # four chunks in total
                     "sleep_time"   : 2  }  # eight seconds slept in total
        }


    All incoming POST requests are searched for fields with names like
    fakefile_<name>, which correspond to a fake upload of a file in the
    field <name>.  The field value must contain a fake file id as described
    above.  Any such fields are interpreted and sent throuh the standard
    file upload machinery of Django, including triggering of upload handlers.
    The modified request is (in theory) indistinguishable from one in which
    a genuine file upload was performed.

    All outgoing responses are searched for file upload fields.  Each file
    field <name> will be augmented with a hidden input field fakefile_<name>
    which can be used to fake an upload of that file.  For example, this
    simple upload form:

        <form method='POST' enctype='multipart/form-data'>
          <input type='file' name='myfile' />
          <input type='submit' name='upload' value='upload' />
        </form>

    would come out of this middleware looking like this:

        <form method='POST' enctype='multipart/form-data'>
          <input type='hidden' name='fakefile_myfile' />
          <input type='file' name='myfile' />
          <input type='submit' name='upload' value='upload' />
        </form>

    Ordinarily users would fill in the 'myfile' field, but test scripts can
    instead fill in the 'fakefile_myfile' field.  After passing through
    FakeFileUploadMiddleware, the two requests should be indistinguishable.

    The following additional settings can be specified:

        * FAKEUPLOAD_FIELD_NAME:        field name prefix; default "fakefile"
        * FAKEUPLOAD_REWRITE_RESPONSE:  whether to rewrite response bodies
        * FAKEUPLOAD_MIME_BOUNDARY:     boundary to use when encoding the files

    """

    def __init__(self):
        self.file_spec = settings.FAKEUPLOAD_FILE_SPEC
        try:
            self.field_name = settings.FAKEUPLOAD_FIELD_NAME + "_"
        except AttributeError:
            self.field_name = "fakefile_"
        try:
            self.rewrite_response = settings.FAKEUPLOAD_REWRITE_RESPOSNE
        except AttributeError:
            self.rewrite_response = True
        if self.rewrite_response:
            # Yeah yeah, "now I have two problems" etc...
            # It's not worth firing up a HTML parser for this.
            self.file_field_re = re.compile(r'<input\W[^>]*\btype=(\'|"|)file(\'|"|)\b[^>]*>',re.IGNORECASE)
            self.field_name_re = re.compile(r'\bname=(\'|"|)(?P<name>.+?)(\'|"|)\b',re.IGNORECASE)

    def process_request(self,req):
        """Interpret POST variables that indicate fake file uploads."""
        #  Bail out if any real files were uploaded
        if len(req.FILES) > 0:
            return None
        #  Find any post variables named like "fakefile_*".
        #  These contain the fake files that are to be uploaded.
        fakefiles = []
        for (k,v) in req.POST.iteritems():
            if k.startswith(self.field_name):
                if v == "": continue
                fakefiles.append((k[len(self.field_name):],self.file_spec[v]))
        if not fakefiles:
            return None
        #  Remove the fakefile keys from POST
        for f in fakefiles:
            del req.POST[self.field_name + f[0]]
        #  Construct a fake request body and META object
        fake_data = FakeFilesData(fakefiles)
        fake_meta = MergeDict(fake_data.META,req.META)
        #  Re-parse the fake data, triggering upload handlers etc.
        parser = MultiPartParser(fake_meta,fake_data,req.upload_handlers,req.encoding)
        (_, req._files) = parser.parse()

    def process_response(self,req,resp):
        """Augment file upload fields with a fakefile hidden field."""
        if not self.rewrite_response:
            return resp
        if resp.status_code != 200:
            return resp
        ct = resp["Content-Type"].lower()
        if not ct.startswith("text") and not "html" in ct:
            return resp
        resp.content = self.file_field_re.sub(self._add_fakefile,resp.content)
        return resp

    def _add_fakefile(self,match):
        """Insert hidden fakefile field in front of matched file field."""
        field = match.group()
        m = self.field_name_re.search(field)
        if not m:
            return field
        name = self.field_name + m.group("name")
        return "<input type='hidden' name='%s' />%s" % (name,field)


class FakeFilesData:
    """Class representing fake file upload data.

    This class provides a readable file-like represenation of the fake
    upload files, encoded in multipart/form-data format.  It also provides
    the attribute 'META' which provides the necessary HTTP headers for the
    fake request.
    """

    def __init__(self,files):
        """FakeFilesData constructor.

        This constructor expects a single argument, a sequence of (name,spec)
        pairs specifying the fake upload files to be encoded.
        """
        #  Determine the MIME encoding boundary
        try:
            boundary = settings.FAKEUPLOAD_MIME_BOUNDARY
        except AttributeError:
            boundary = "----------thisisthemimeboundary"
        #  Construct each encoded file
        self._files = [FakeFileData(f[0],f[1],boundary) for f in files]
        #  Add the end-of-request footer
        footer = StringIO("--%s--\r\n" % (boundary,))
        footer.size = len(footer.getvalue())
        self._files.append(footer)
        #  Construct the request headers
        size = sum([f.size for f in self._files])
        type = "multipart/form-data; boundary=%s" % (boundary,)
        self.META = {"HTTP_CONTENT_LENGTH":size,"HTTP_CONTENT_TYPE":type}
        #  Internal read-ahead buffer
        self._buffer = ""

    def read(self,size=-1):
        """Read 'size' bytes from the encoded file data."""
        # This method does internal read-ahead buffering, so that the
        # individual encoded files are free to return more or less data than
        # was actually requested.  This makes the implementation of streaming,
        # sleeps etc much easier.
        data = [self._buffer]
        if size < 0:
            #  We want all the remaining data
            for f in self._files:
                ln = f.read()
                while ln != "":
                    data.append(ln)
                    ln = f.read()
            self._files = []
            return "".join(data)
        else:
            #  We want a specific amount
            count = len(data[0])
            while count < size and self._files:
                ln = self._files[0].read(size-count)
                if ln == "":
                    self._files.pop(0)
                else:
                    data.append(ln)
                    count += len(ln)
            data = "".join(data)
            self._buffer = data[size:]
            return data[:size]


class FakeFileData:
    """Class representing a single fake file upload.

    This class provides a readable file-like interface to a single fake]
    upload file, encoded in multipart/form-data format.  However, the 'read'
    method of this object is not guaranteed to return the requested number
    of bytes; either more or less bytes could potentially be returned.
    """

    def __init__(self,name,spec,boundary):
        """FakeFileData constructor.

        This constructor expects the name of the file field, the spec dict
        specifying the fake file info, and the MIME boundary string.
        """
        self._spec = self._normalize_spec(spec)
        disp = 'Content-Disposition: form-data; name="%s"' % (name,)
        if spec.has_key("filename"):
            disp = disp + '; filename="%s"' % (spec["filename"],)
        self._header = "\r\n".join(["--"+boundary,disp,"\r\n"])
        self.size = self._spec['size'] + len(self._header)

    def read(self,size=-1):
        """Read approximately 'size' bytes of encoded data.

        To make it easier to implement streaming, sleeping, etc, this method
        may not return precisely the specified number of bytes; either more
        or less bytes can potentially be returned.
        """
        if self._header:
            header = self._header
            self._header = ""
            return header
        if self._spec.has_key("sleep_time"):
            time.sleep(self._spec["sleep_time"])
        if self._spec.has_key("chunk_size"):
            return self._spec["fileobj"].read(self._spec["chunk_size"])
        return self._spec["fileobj"].read(size)

    def _normalize_spec(self,spec):
        """Create a normalised copy of the given fake file spec.

        This copies the given spec so that it can be freely modified, then
        ensures that it has the following keys:

            * fileobj:  the file object to read data from
            * size:     the size of the file, in bytes

        """
        spec = spec.copy()
        #  Contents provided as a string
        if spec.has_key("contents"):
            spec["fileobj"] = StringIO(spec["contents"])
            spec["size"] = len(spec["contents"])
        #  Contents provided as a file
        elif spec.has_key("file"):
            f = open(spec["file"],"rb")
            f.seek(0,2)
            spec["size"] = f.tell()
            f.seek(0,0)
            spec["fileobj"] = f
        else:
            raise ValueError("Invalid file spec: " + repr(spec))
        return spec

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 2 months, 2 weeks ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 2 months, 3 weeks ago
  3. Serializer factory with Django Rest Framework by julio 9 months, 2 weeks ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 10 months, 1 week ago
  5. Help text hyperlinks by sa2812 11 months ago

Comments

toutanc (on August 3, 2009):

Hi,

I ran into trouble when I use non-ASCII characters:

If I use

FAKEUPLOAD_FILE_SPEC = {
    "NewPackage": { "filename": "test1.txt",  "contents": "I am a small text file."},
}

it works fine, however, if I use non ascii character like

FAKEUPLOAD_FILE_SPEC = {
    "NewPackage": { "filename": "test1.txt",  "contents": "€I am a small text file."},
}

I get that message:

File "~/Netvoyager/dev/volcano/dep/fakeupload/FakeFileUploadMiddleware.py", line 206, in read
    data = "".join(data)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

This also happens using a binary file if one byte is not ascii.

Obviously, this is a charset error, anybody got an idea on how to solve this?

#

Please login first before commenting.