UnicodeEncodeError when using the compile function

Using python 3.2 in Windows 7 I am getting the following in IDLE:

>>compile('pass', r'c:\temp\工具\module1.py', 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character

Can anybody explain why the compile statement tries to convert the unicode filename using mbcs? I know that sys.getfilesystemencoding returns 'mbcs' in Windows, but I thought that this is not used when unicode file names are provided.

for example:

f = open(r'c:\temp\工具\module1.py') 

works.

For a more complete test save the following in a utf8 encoded file and run it using the standard python.exe version 3.2

# -*- coding: utf8 -*-
fname = r'c:\temp\工具\module1.py'
# I do have the a file named fname but you can comment out the following two lines
f = open(fname)
print('ok')
cmp = compile('pass', fname, 'exec')
print(cmp)

Output:

ok
Traceback (most recent call last):
  File "module8.py", line 6, in <module>
    cmp = compile('pass', fname, 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval
id character


ANSWERS:


From Python issue 10114, it seems that the logic is that all filenames used by Python should be valid for the platform where they are used. It is encoded using the filesystem encoding to be used in the C internals of Python.

I agree that it probably shouldn't throw an error on Windows, because any Unicode filename is valid. You may wish to file a bug report with Python for this. But be aware that the necessary changes might not be trivial, because any C code using the filename has to have something to do if it can't be encoded.


Here a solution that worked for me: Issue 427: UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-6: ordinal not in range (128):

If you look the PyScripter help file in the topic "Encoded Python Source Files" (last paragraph) it tells you how to configure Python to support other encodings by modifying the site.py file. This file is in the lib subdirectory of the Python installation directory. Find the function setencoding and make sure that the support locale aware default string encodings is on. (see below)

def setencoding():
  """Set the string encoding used by the Unicode implementation.  The
  default is 'ascii', but if you're willing to experiment, you can
  change this."""
  encoding = "ascii" # Default value set by _PyUnicode_Init()
  if 0:  <<<--- set this to 1 ---------------------------------
      # Enable to support locale aware default string encodings.
      import locale
      loc = locale.getdefaultlocale ()
      if loc[1]:
          encoding = loc[1]
  if 0:
      # Enable to switch off string to Unicode coercion and implicit
      # Unicode to string conversion.
      encoding = "undefined"
  if encoding != "ascii":
      # On Non-Unicode builds this will raise an AttributeError...
      sys.setdefaultencoding (encoding) # Needs Python Unicode
build !

I think you could try to change the "\" in the path of file into "/",just like

compile('pass', r'c:\temp\工具\module1.py', 'exec')

compile('pass', r'c:/temp/工具/module1.py', 'exec')

I have met a problem just like you, I used this method to solve the problem. I hope it can work with yours.



 MORE:


 ? Special characters in Python strings when writing code
 ? Special characters in Python strings when writing code
 ? Special characters in Python strings when writing code
 ? write special characters into excel table by python package pyExcelerator/xlwt
 ? How to handle special characters in comments and hard coded strings in python file?
 ? PHP - writing special characters with fwrite- looks good in notepad but wrong in browser
 ? Erasing all unreadable characters in tab-delimited txt
 ? Erasing all unreadable characters in tab-delimited txt
 ? Erasing all unreadable characters in tab-delimited txt
 ? In Ruby, how to convert special characters like ë,à,é,ä all to e,a,e,a?