Android default charset when sending http post/put - Problems with special characters

I have configured the apache httpClient like so:

HttpProtocolParams.setContentCharset(httpParameters, "UTF-8");
HttpProtocolParams.setHttpElementCharset(httpParameters, "UTF-8");

I also include the http header "Content-Type: application/json; charset=UTF-8" for all http post and put requests.

I am trying to send http post/put requests with a json body that contains special characters (ie. chinese characters via the Google Pinyin keyboard, symbols, etc.) The characters appear as gibberish in the logs but I think this is because DDMS does not support UTF-8, as descibed in this issue.

The problem is when the server receives the request, it sometimes doesn't see the characters at all (especially the Chinese characters), or it becomes meaningless garbage when we retrieve it through a GET request.

I also tried putting 250 non-ascii characters in a single field because that particular field should be able to take up to 250 characters. However, it fails to validate at the server side which claims that the 250 character limit has been exceeded. 250 ASCII characters work just fine.

The server dudes claim that they support UTF-8. They even tried simulating a post request that contains Chinese characters, and the data was received by the server just fine. However, the guy (a Chinese guy) is using a Windows computer with the Chinese language pack installed (I think, because he can type Chinese characters on his keyboard).

I'm guessing that the charsets being used by the Android client and the server (made by Chinese guys btw) are not aligned. But I do not know which one is at fault since the server dudes claim that they support UTF-8, and our rest client is configured to support UTF-8.

This got me wondering on what charset Android uses by default on all text input, and if it can be changed to a different one programatically. I tried to find resources on how to do this on input widgets but I did not find anything useful.

Is there a way to set the charset for all input widgets in Android? Or maybe I missed something in the rest client configuration? Or maybe, just maybe, the server dudes are not using UTF-8 at their servers and used Windows charsets instead?


ANSWERS:


Apparently, I forgot to set the StringEntity's charset to UTF-8. These lines did the trick:

    httpPut.setEntity(new StringEntity(body, HTTP.UTF_8));
    httpPost.setEntity(new StringEntity(body, HTTP.UTF_8));

So, there are at least two levels to set the charset in the Android client when sending an http post with non-ascii characters.

  1. The rest client itself itself
  2. The StringEntity

UPDATE: As Samuel pointed out in the comments, the modern way to do it is to use a ContentType, like so:

    final StringEntity se = new StringEntity(body, ContentType.APPLICATION_JSON);
    httpPut.setEntity(se);

I know this post is a bit old but nevertheless here is a solution:

Here is my code for posting UTF-8 strings (it doesn't matter if they are xml soap or json) to a server. I tried it with cyrillic, hash values and some other special characters and it works like a charm. It is a compilation of many solutions I found through the forums.

HttpParams httpParameters = new BasicHttpParams();
HttpProtocolParams.setContentCharset(httpParameters, HTTP.UTF_8);
HttpProtocolParams.setHttpElementCharset(httpParameters, HTTP.UTF_8);

HttpClient client = new DefaultHttpClient(httpParameters);
client.getParams().setParameter("http.protocol.version", HttpVersion.HTTP_1_1);
client.getParams().setParameter("http.socket.timeout", new Integer(2000));
client.getParams().setParameter("http.protocol.content-charset", HTTP.UTF_8);
httpParameters.setBooleanParameter("http.protocol.expect-continue", false);
HttpPost request = new HttpPost("http://www.server.com/some_script.php?sid=" + String.valueOf(Math.random()));
request.getParams().setParameter("http.socket.timeout", new Integer(5000));

List<NameValuePair> postParameters = new ArrayList<NameValuePair>();
// you get this later in php with $_POST['value_name']
postParameters.add(new BasicNameValuePair("value_name", "value_val"));

UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(postParameters, HTTP.UTF_8);
request.setEntity(formEntity);
HttpResponse response = client.execute(request);

in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
StringBuffer sb = new StringBuffer("");
String line = "";
String lineSeparator = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
    sb.append(line);
    sb.append(lineSeparator);
}
in.close();
String result = sb.toString();

I hope that someone will find this code helpful. :)


You should set charset of your string entity to UTF-8:

StringEntity stringEntity = new StringEntity(urlParameters, HTTP.UTF_8);

You can eliminate the server as the problem by using curl to send the same data. If it works with curl use --trace to check the output.

Ensure you are sending the content body as bytes. Compare the HTTP request from Android with the output from the successful curl request.



 MORE:


 ? How can I remove wrongly encoded characters from a string when converting to a StringEntity in Java?
 ? pdfwriter doesn't translate special characters
 ? Java Special character encoding issue
 ? Java Special character encoding issue
 ? Java Special character encoding issue
 ? Oracle character encoding
 ? Java Special Characters in Base64 encode
 ? Is there a tool for finding the Char code of a character?
 ? XML parsing of some special characters using SAX parser in Android
 ? php header location - french characters