I want to test if a string of bytes that I'm extracting from a file results in valid ISO-8859-15 encoded text. The first thing I came across is this similar case about UTF-8 validation:
So based on that, I thought I was being clever by doing something similar for ISO-8859-15. See the following demo code:
#! /usr/bin/env python # def isValidISO885915(bytes): # Test if bytes result in valid ISO-8859-15 try: bytes.decode('iso-8859-15', 'strict') return(True) except UnicodeDecodeError: return(False) def main(): # Test bytes (byte x95 is not defined in ISO-8859-15!) bytes = b'\x4A\x70\x79\x6C\x79\x7A\x65\x72\x20\x64\x95\x6D\x6F\xFF' isValidLatin = isValidISO885915(bytes) print(isValidLatin) main()
However, running this returns True, even though x95 is not a valid code point in ISO-8859-15! Am I overlooking something really obvious here? (BTW I tried this with Python 2.7.4 and 3.3, results are identical in both cases).