Character encoding of GET request parameter

Hello fellow Stackoverflowers.

I have an issue that i need some help with:

We're making an http GET web service call from a smartphone app to a Java/Spring MVC application. We're on a Tomcat application server that is fronted by an Apache server with a mod_proxy proxy setup.

One of the parameters imbedded in the URL is the word "Männen", which is the organization name that's one of the parameters. The app makes a Jquery Ajax GET request and the parameter leaves the app as "M%E4nnen", which to my understanding means the "ä" has been properly URL-encoded. When it arrives to the Spring controller, it has been distorted to "Männen".

I have googled and found quite a few threads on this and they all recommend modifying the Tomcat server.xml file by adding URIEncoding="UTF-8" to all connectors. Of course, i tried this. It made a change but did not solve the issue. The string now comes through as "M�nnen". There was also a thread suggesting you add "nocanon" to the ProxyPass parameter in the Apache proxy configuration. This was tried but made no difference.

Using the logs, i can follow the request:

  1. In the Apache access log, the parameter is logged as "M%E4nnen"
  2. In the Apache proxy log, the parameter is logged as "M%E4nnen"
  3. In the Tomcat localhost_access log, the parameter is logged as "M%E4nnen"
  4. In the Spring controller that receives the request, the parameter is logged as "M�nnen"

My Spring application also has a character encoding filter, but as far as i understand, it only works on the request body. It is configured as shown below:

<filter>
  <filter-name>CharacterEncodingFilter</filter-name>
  <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
  <init-param>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
  </init-param>
        <init-param>
        <param-name>forceEncoding</param-name>
        <param-value>true</param-value>
        </init-param>
</filter>
<filter-mapping>
  <filter-name>CharacterEncodingFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>

I really don't know what else to try or where else to look. If anyone could guide me in the right direction, it would be highly appreciated.


ANSWERS:


If the HTML is in Windows-1252 (or the "subset" ISO-8859-1), then %E4 is okay. If however the HTML is in Unicode, UTF-8, then not.

String auml = "\u00e4";
String aumlPerc = URLEncoder(auml, "UTF-8");
URLDecoder.decode(aumlPerc, "UTF-8");

Besides the HTML page having charset UTF-8, you can have <form accept-charset="UTF-8" ...>.

It seems the page erroneously sends %E4, is accepted as ISO-8859-1 (the default), converted to a multi-byte UTF-8 sequence, but that then is wrongly considered to be ISO-8859-1.

There are some screws to set the encoding, like request.setEncoding, but with the limited information I cannot say where to look. Maybe this information suffices.



 MORE:


 ? Tomcat character encoding
 ? What's the difference between "URIEncoding" of Tomcat, Encoding Filter and request.setCharacterEncoding
 ? Tomcat Character Encoding working differently between server and local development
 ? Receving data form using HttpURLConnection.getOutputStream,chinese encoding wrong(The same code,why the result is not the same?)
 ? Receving data form using HttpURLConnection.getOutputStream,chinese encoding wrong(The same code,why the result is not the same?)
 ? Receving data form using HttpURLConnection.getOutputStream,chinese encoding wrong(The same code,why the result is not the same?)
 ? Octal Escape in Java result in wrong byte value, Encoding problem?
 ? Why can Java display chinese characters although it is using a wrong encoding?
 ? URL-encoded form data is not valid
 ? Encoded form data