Python source "encoding not declared"

This is the top of my source file:

# -*- coding: UTF-16 -*-
import re
import unittest

Despite including the declaration at the top of the file, I get an error when I run this: "Non-ASCII character '\xff' in file ScrapeEdgar/parsers/parser8kex101_tests.py on line 1, but no encoding declared; see for details".

Here's the full output:

usau810195:ScrapeEdgar reedn$ python -m unittest ScrapeEdgar.parsers.parser8kex101_tests
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/__main__.py", line 12, in <module>
    main(module=None)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/main.py", line 149, in parseArgs
    self.createTests()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/main.py", line 158, in createTests
    self.module)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 128, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 91, in loadTestsFromName
    module = __import__('.'.join(parts_copy))
  File "ScrapeEdgar/parsers/parser8kex101_tests.py", line 1
SyntaxError: Non-ASCII character '\xff' in file ScrapeEdgar/parsers/parser8kex101_tests.py on line 1, but no encoding declared; see  for details


ANSWERS:


Python source files don't support the UTF-16 encoding.

Any encoding which allows processing the first two lines in the way indicated above is allowed as source code encoding, this includes ASCII compatible encodings as well as certain multi-byte encodings such as Shift_JIS. It does not include encodings which use two or more bytes for all characters like e.g. UTF-16. The reason for this is to keep the encoding detection algorithm in the tokenizer simple.

See PEP 0263.

Hope this helps.



 MORE:


 ? How to retrive the data if it is special charecter (Õ)..?
 ? Pandas read excel with Chinese filename
 ? TextView not displaying Characters like "à è ì ò ù" in android
 ? Docker Jenkins slave wrong character encoding
 ? Same visible character but different bytes
 ? lexical error encountered: "<" (60) after ""
 ? Teradata adding  in front of special characters
 ? Adding element to dictionary breaks encoding
 ? What does this mojibake/krakozyabry on The Simpsons say?
 ? How to make Zend Form element attributes encode HTML elements using correct encoding?