Validating multipart/form-data request with requests-mock

2020/05/03 8:11pm

When writing tests with Python requests-mock, I often want to test requests which contain data in multipart/form-data format.

multipart/form-data

To upload multipart-encoded files, you may write the code like this.

url = 'http://httpbin.org/post'
files = {'file': ('settings.json', open('settings.json', 'rb'), 'application/json')}

r = requests.post(url, files=files)

To test that the expected request has been sent, let’s examine the body of the above request by using Request History of the requests-mock.

history = requests_mock.request_history

print(history[0].body)

Multipart-encoded request sends MIME data stream (with line breaks where appropriate for readability).

--c93780ea776047cd945c189bad1d081c
Content-Disposition: form-data; name="settings"; filename="settings.json"
Content-Type: application/json

{"foo": 1, "bar": 2}
--c93780ea776047cd945c189bad1d081c
Content-Disposition: form-data; name="archive"; filename="archive.zip"
Content-Type: application/zip

...
--c93780ea776047cd945c189bad1d081c--

You’ll notice a few things.

Of course, it’s difficult to test raw data stream, so I’d like to convert it to a convenient data structure.

cgi.parse_multipart()

For parsing multipart/form-data request, the function parse_multipart() is provided in the cgi module. Using this function is as follows.

import cgi
import json
from io import BytesIO
...

history = requests_mock.request_history

# Extract content type and its parameters.
c_type, c_data = cgi.parse_header(history[0].headers['Content-Type'])
assert c_type == 'multipart/form-data'

# Prior to Python 3.7, cgi.parse_multipart() works on bytes.
form_data = cgi.parse_multipart(BytesIO(history[0].body), {'boundary': c_data['boundary'].encode()})
settings = json.loads(form_data['settings'][0].decode('utf-8'))

assert form_data['archive'][0] == zip_content
assert settings['foo'] == 1
assert settings['bar'] == 2

Note: Since Python 3.7, it accepts str, so you don’t need to use encode()/decode() to convert data.

However, cgi.parse_multipart() has the following disadvantages.

cgi.FieldStorage

Because cgi.parse_multipart() has the above drawbacks, I recommend you to use cgi.FieldStorage for more detailed control. FieldStorage is used inside the parse_multipart() function since Python 3.7.

import cgi
import json
from io import BytesIO
...
history = requests_mock.request_history

fs = cgi.FieldStorage(
        fp=BytesIO(history[0].body),
        headers=history[0].headers,
        # FieldStorage works only for POST request!
        environ={'REQUEST_METHOD': 'POST'})

item = fs['settings']
settings = json.loads(item.value.decode('utf-8'))

assert item.filename == 'settings.json'
assert item.headers['Content-Type'] == 'application/json'
assert settings['foo'] == 1
assert settings['bar'] == 2

item = fs['archive']
assert item.filename == 'archive.zip'
assert item.headers['Content-Type'] == 'application/zip'
assert item.value == zip_content

It’s a little tricky that you have to pass environ={'REQUEST_METHOD': 'POST'}, but you can understand how to use it.