Compressing request body with python-requests?

(This question is not about transparent decompression of gzip-encoded responses from a web server; I know that requests handles that automatically.)

Problem

I'm trying to POST a file to a RESTful web service. Obviously, requests makes this pretty easy to do:

files = dict(data=(fn, file))
response = session.post(endpoint_url, files=files)

In this case, my file is in a really highly-compressible format (yep, XML) so I'd like to make sure that the request body is compressed.

The server claims to accept gzip encoding (Accept-Encoding: gzip in response headers), so I should be able to gzip the whole body request body, right?

Attempted solution

Here's my attempt to make this work: I first construct the request and prepare it, then I go into the PreparedRequest object, yank out the body, run it through gzip, and put it back. (Oh, and don't forget to update the Content-Length and Content-Encoding headers.)

files = dict(data=(fn, file))
request = request.Request('POST',endpoint_url, files=files)

prepped = session.prepare_request(request)
with NamedTemporaryFile(delete=True) as gzfile:
    gzip.GzipFile(fileobj=gzfile, mode="wb").write(prepped.body)
    prepped.headers['Content-Length'] = gzfile.tell()
    prepped.headers['Content-Encoding'] = 'gzip'
    gzfile.seek(0,0)
    prepped.body = gzfile.read()
    response = session.send(prepped)

Unfortunately, the server is not cooperating and returns 500 Internal Server Error. Perhaps it doesn't really accept gzip-encoded requests?

Or perhaps there is a mistake in my approach? It seems rather convoluted. Is there an easier way to do request body compression with python-requests?

EDIT: Fixed (3) and (5) from @sigmavirus24's answer (these were basically just artifacts I'd overlooked in simplifying the code to post it here).


ANSWERS:


Or perhaps there is a mistake in my approach?

I'm unsure how you arrived at your approach, frankly, but there's certainly a simpler way of doing this.

First, a few things:

  1. The files parameter constructs a multipart/form-data body. So you're compressing something that the server potentially has no clue about.
  2. Content-Encoding and Transfer-Encoding are two very different things. You want Transfer-Encoding here.
  3. You don't need to set a suffix on your NamedTemporaryFile.
  4. Since you didn't explicitly mention that you're trying to compress a multipart/form-data request, I'm going to assume that you don't actually want to do that.
  5. Your call to session.Request (which I assume should be, requests.Request) is missing a method, i.e., it should be: requests.Request('POST', endpoint_url, ...)

With those out of the way, here's how I would do this:

# Assuming `file` is a file-like obj
with NamedTemporaryFile(delete=True) as gzfile:
    gzip.GzipFile(fileobj=gzfile, mode="wb").write(file.read())
    headers = {'Content-Length': gzfile.tell(),
               'Transfer-Encoding': 'gzip'}
    gzfile.seek(0, 0)
    response = session.post(endpoint_url, data=gzfile, 
                            headers=headers)

Assuming that file has the xml content in it and all you meant was to compress it, this should work for you. You probably want to set a Content-Type header though, for example, you'd just do

 headers = {'Content-Length': gzfile.tell(),
            'Content-Type': 'application/xml',  # or 'text/xml'
            'Transfer-Encoding': 'gzip'}

The Transfer-Encoding tells the server that the request is being compressed only in transit and it should uncompress it. The Content-Type tells the server how to handle the content once the Transfer-Encoding has been handled.


I had a question that was marked as an exact duplicate. I was concernd with both ends of the transaction.

The code from sigmavirus24 wasn't a direct cut and paste fix, but it was the inspiration for this version.

Here's how my solution ended up looking:

sending from the python end

import json
import requests
import StringIO
import gzip

url = "http://localhost:3000"
headers = {"Content-Type":"application/octet-stream"}
data = [{"key": 1,"otherKey": "2"},
        {"key": 3,"otherKey": "4"}]

payload = json.dumps(data)

out = StringIO.StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
  f.write(json.dumps(data))
out.getvalue()

r = requests.post(url+"/zipped", data=out.getvalue(), headers=headers)

receiving at the express end

var zlib = require("zlib");
var rawParser = bodyParser.raw({type: '*/*'});

app.post('/zipped', rawParser, function(req, res) {

    zlib.gunzip(req.body, function(err, buf) {
        if(err){
            console.log("err:", err );
        } else{
            console.log("in the inflate callback:",
                        buf,
                        "to string:", buf.toString("utf8") );
        }
    });

    res.status(200).send("I'm in ur zipped route");
});

There's a gist here with more verbose logging included. This version doesn't have any safety or checking built in either.



 MORE:


 ? Confusion about the text file encoding and how to transform between different encoding method?
 ? 2 Java Strings in different encoding?
 ? How do browser determine the encoding used ?
 ? Mail() function in Laragon v.2.2.2 - Wrong encoding after sent
 ? html attributes coming as encoded values from Database instead of same values Ex; " as "
 ? Open .txt file in Python after skipping lines - Encoding issue
 ? what does PHP substr do on utf-8 string?
 ? I'm trying to parse some table and they have some weird alphabet-like non-ascii character
 ? encoding issues with rvest, "Â " appearing instead of a space in character strings
 ? JSP String encoding issue