Meaning of "data_coding" field in SMPP

What is the meaning of "data_coding" field in the SMPP protocol?

I searched for this but couldn't find any helpful resource.


In short, datacoding contains the information on how the text in an smpp SubmitSM (i.e. a typical SMS message) message is encoded. The SubmitSM packet contains a binary encoded body, and the dataCoding is how the text is stored in this body.

The most important values are:

  • 00000000 (0) - usually GSM7 (the default 7 bit encoding for messages, with a few characters that are encoded as two bytes), but technically could be something else
  • 00000011 (3) for standard ISO-8859-1
  • 00001000 (8) for the universal character set -- de facto UTF-16

Other possible values (rarely used):

  • 00000001 - IA5_CCITT_T_50_ASCII_ANSI_X3_4
  • 00000101 - JIS_X_02081990
  • 00000110 - CYRLLIC_ISO88595
  • 00000111 - LATIN_HEBREW_ISO88598
  • 00001010 - ISO2022JP_MUSIC_CODES
  • 00001101 - EXTENDED_KANJI_JISX_02121990
  • 00001110 - KS_C_5601

And two reserved for special uses:

  • 00001011 - RESERVED #1
  • 00001100 - RESERVED #2

In short, if your binary body is unicode (UTF-16) you will set dataCoding to 8. If your message is stored as GSM7 then it will (usually) be 0.

It means how text is converted into bytes, since SMPP is a binary protocol but applications typically deal with text strings. The first hit on google for 'smpp data coding' explains it well in section 2.2.2.

This should definitely help: ETSI GSM 03.38 Specification


 ? Unable to covert .txt file in ANSI to UTF-8 in windows
 ? Encoding correctly CSV files for PHPexcel
 ? Easy way to remove UTF-8 accents from a string?
 ? Java, XML DocumentBuilder - setting the encoding when parsing
 ? How to encode + into %2B with NSURLComponents
 ? Cannot read from TargetDataLine
 ? Java AudioSystem and TargetDataLine
 ? Java AudioSystem and TargetDataLine
 ? Java AudioSystem and TargetDataLine
 ? Java AudioSystem .wav file discrepancy in behaviour