I'm having some issues with character encoding, and in this special case with Polish characters.
I need to replace all none windows-1252 characters with a windows-1252 equivalent. I had this working until I needed to work with Polish characters. How can I replace these characters?
é for example is a windows-1252 character and must stay this way. But the
ł is not a windows-1252 character and must be replaced with its equivalent (or stripped if it hasn't a equivalent).
I tried this:
import unicodedata text = "Racławicka Rógé" tmp = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore') print(tmp.decode("utf-8"))
But now the
é are both encoded to
How can I get this right?