I want to know how many characters are there in a Unicode string (Tamil) and then check the character1 and character2 for particular occurrences.
I am able to split the word into characters, but I do not know how to traverse through them character by character using the word length.
Example : word : "எஃகு".
It should return no of characters as 3, and I should be able to print word as 'எ', word as 'ஃ' and word as 'கு'.
I want to check like:
if word is a vowel: if word is "ஃ": then print word+word+word (as எஃகு) else: print word
I want to traverse using no of characters, if no.of.char is 3, then i=0 should help me process 'எ'.
I saw many questions related to Unicode character processing and length processing. But they all either return byte length or give varying results. So am confused.
Code that I use for splitting them character-wise:
for line in f.readlines(): letters = utf8.get_letters(line) for letter in letters: ff.write(unicode(letter)) ff.write(' ')
Sample Input File:
Sample Output File:
அ ன் று
அ தா வ து
அ ஃ தா ன் று