Friday, February 26, 2021

A schema for digitising alphanumeric data

Since the objective of this method is to digitise alphanumeric data creating the shortest possible output string, existing codes such as ASCII are not suitable.  For the sake of data compression, the method described below limits itself to an input set containing the digits 0-9 and the English alphabet in both upper and lower case.  While the absence of punctuation or spaces creates a challenge for the recipient of a message, the method is suitable for general communication and will also handle such technical information as cyber-currency hashes.

The following table is used to encode data:

  0 1 2 3 4 5 6 7 8 9
      E S T # O N I A
0 B C D F G ^ H J K L
1 M P Q R U V W X Y Z

The table shows all letters as upper case for the purpose of clarity only.  Before encoding can take place, a piece of plaintext must be modified as follows:

Please send 5000 pounds to Mr X.

becomes

pleasesend#5#0#0#0poundsto^mr^x

and is encoded as

11,09,2,9,3,2,3,2,7,02,5,5,5,0,5,0,5,0,11,6,14,7,02,3,4,6,05,10,13,05,17

Note that the characters on the first row of the table only require one digit to encode them.  Note also the convention to indicate digits and upper case characters.

Notes:

Commas are only shown for the sake of clarity and will be omitted from the encoded message.  Spaces are not encoded.  The first character of the message is never capitalised.  ESTONIA is an easily remembered anagram of the seven most commonly used letters in written English.

No comments:

Post a Comment