I have Unicode text as follows
(S (NP (N \u0db6\u0dbd\u0dbd\u0dcf)) (VP (V \u0db6\u0dbb\u0dc0\u0dcf)))
How do I change this to a readable format by converting the codes '\u0___' in to the relevant readable characters. I'm using python version 2.7
I obtained that output by following code segment in NLTK (3.0) where tree is a nltk.tree.Tree
for tree in treelist1: print unicode(str(tree))
I need something like print(TreePrettyPrinter(tree).text()) where it gives unicode compatible output as I wanted, but with a tree layout that I don't want. Is there a method in NLTK to get such a readable text like output too?
Same issue have with the output from
for rule in grammar1.productions(): print(rule.unicode_repr())
where grammar1 is nltk.grammar.CFG
Output is as follows.
VP -> V VP -> NP V N -> '\u0db6\u0dbd\u0dca\u0dbd\u0dcf' N -> '\u0db8\u0dd2\u0db1\u0dd2\u0dc3\u0dcf' N -> '\u0db8\u0dda\u0dc3\u0dba'
Final results are perfectly fine. I only have issues with the representation of the output