How can I post data with overlong encoding to test for vulnerabilities?

I recently learned that overlong encodings cause a security risk when not properly validated. From the answer in the previously mentioned post:

For example the character < is usually represented as byte 0x3C, but could also be represented using the overlong UTF-8 sequence 0xC0 0xBC (or even more redundant 3- or 4-byte sequences).

And:

If you take this input and handle it in a Unicode-oblivious byte-based tool, then any character processing step being used in that tool may be evaded.

Meaning that if I use htmlspecialchars on a string that uses overlong encoding, then the output could still contain tags. I also assume that you could post similar characters (like " or ;) which could also be used for SQL injections.

Perhaps it is me, but I believe that this is a security risk relatively few people take into account and even know about. I've been coding for years and am only now finding this out.

Anyway, my question is: what tools can I use to send data with overlong encodings? People who are familiar with this risk: how do you perform tests on websites? I want to POST a bunch of overlong characters to my sites, but I have no idea how to do this.

In my situation I mostly use PHP and MySQL, but what I really want to know are testing tools, so I guess the back-end situation does not matter much.


ANSWERS:


To test if your site is vulnerable use curl to fets your page using post and the encoding to the utf8 long and post utf8 long encoded information(you could use your text editor for this by setting the text editor encoding to utf8 long so the text you post using curl and the php file is in long)


I want to POST a bunch of overlong characters to my sites, but I have no idea how to do this.

Apart from testing it with manual request tools like curl, a simple workaround for in-browser testing is to override the encoding of the form submission. Using eg Firebug/Chrome Debugger, alter the form you're testing to add the attribute:

accept-charset="iso-8859-1"

You can now type characters that, when encoded as Windows code page 1252(*), become the UTF-8 overlong byte sequence you want.

For example, enter café into the form and you will get the byte sequence c a f 0xC3 0xA9 so the application will think you typed café. Enter À¼foo and the sequence 0xC0 0xBC f o o will be submitted, which could be interpreted as <foo. Note that you won't see <foo in any output page source because modern browsers don't parse overlong UTF-8 sequences in web pages, but you might get a �foo or other indication something isn't right.

For more in-depth access to doctor the input and check the output of a webapp, see dedicated sec tools like Burp.



 MORE:


 ? How to set text file encoding in PHP?
 ? Transferring extended ascii characters with unknown encoding to a Twisted XMLRPC from C#
 ? Transferring extended ascii characters with unknown encoding to a Twisted XMLRPC from C#
 ? Transferring extended ascii characters with unknown encoding to a Twisted XMLRPC from C#
 ? how to: Twisted privmsg to accept non-ascii strings
 ? How to send XML-RPC request from XML-RPC callback in twisted
 ? Allowing any ASCII character transmitted in string with xmlrpc
 ? Rails 4: incompatible character encodings: UTF-8 and ASCII-8BIT
 ? twisted xmlrpc and numpy float 64 exception
 ? GetPrivateProfileString Oddity