The "default" you refer to is probably the "platform default", which is used when no other encoding information is available, but only for reading character streams into or out of the JVM. Once inside the JVM, all characters are represented in UTF-16. The encoding you mentioned is probably Cp1252. It would be impossible to represent Chinese characters in this encoding, so that's not what's happening. You'd have to be more specific about what's happening, but the XML parser you're using is probably detecting the correct encoding to use and thus not garbling it.
Assuming everything is working, this is how it'd work:
Your XML parser decodes the XML and converts it to Java's internal representation (effectively UTF-16 -- a Java
char is actually a UTF-16 code unit, not a "character").
When you render a JSP it's encoding the page based on your Servlet container configuration. The HTTP headers probably include the encoding being used, so your browser can decode it correctly.
Here's where it becomes unclear whether things really are working. What ends up in
System.out depends on how you're writing to it. You say "printed", so I'm guessing you're using the
print methods, which means the platform's default character encoding is being used. If this encoding really is CP-1252 (the only one I can think of that sounds like Cp1522) and the result looks "right", then actually something is wrong.
CP-1252 is essentially Latin-1, which is sometimes abused into being treated as "bytes == chars". That would suggest that your multi-byte Chinese characters are actually being converted into multiple Java
chars. This would only be correct behavior in the case of non-BMP/plane-0 characters, and in that case these character should become a surrogate pair.
To test what's going on, try putting the two characters 你好 into your XML and testing the length of the parsed
String. The length should be 2 (those are both BMP characters). If the length is something bigger (probably 6) then you're decoding incorrectly and things only seem to work because you're re-encoding the same (wrong) way.
I will recommend you check your default IDE workspace encoding setting to "UTF-8". Otherwise it will change the encoding when you modify the xml files.
Anyway you seems to be more interested in how DOMParser works. But DOMParser can decide its encoding. It probably uses its own default encoding. You can debug into it and see what encoding it is using.