This is the description of the default semester project. All graduate students enrolled in CSI660T are required to do the project. Undergraduates in CSI445T may also do the project for extra credit.
The project assignment is to write a program to aid in the cracking of polyalphabetic ciphers. For this project, the cipher used will be a straight Vigenere cipher, as described in class.
Since a polyalphabetic cipher uses multiple key substitutions, it cannot be cracked by looking at the letter distribution like a mono-alphabetic (simple) substitution cipher. There are two ways to attack a polyalphabetic cipher.
The Vigenere Cipher is a polyalphabetic cipher. It uses the Vigenere square:
a b c d e f g h I j k l m n o p q r s t u v w x y z
B C D E F G H I J K L M N O P Q R S T
U V W X Y Z A
C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
R S T U V W X Y Z A B C D E F G H I J K L M N O P Q
S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
To encrypt with the Vigenere cipher, first repeat the keyword above the plaintext. Then the encryption is done as follows for each plaintext letter:
| Keyword: | unscrambleunscrambleunscramble |
| Plaintext: | therearetwentythreeguardstoday |
| Ciphertext: | nuwtvadfeayalakhdfpkonjfjtaelc |
Polyalphabetic ciphers do not show the same distribution of letter frequencies that makes a monoalphabetic substitution cipher so easy to crack. They are however, subject to attack. The weakness is the key. For example here's the example Vigenere encryption:
| Keyword: | unscramble | unscramble | unscramble |
| Plaintext: | therearetw | entythreeg | uardstoday |
| Ciphertext: | nuwtvadfea | yalakhdfpk | onjfjtaelc |
The key is used repeatedly. All positions that use the same key letter can be treated as a monoalphabetic substitution cipher. In this example, there are only three characters for each component simple substitution cipher. This is probably too little to solve them. But, if the message was longer, or if there multiple messages using the same key, there would be enough data to cryptanalyze the component ciphers.
However, a common technique to defend polyalphabetic ciphers against this attack is to keep individual messages short and to shift the key between uses. That is, if the key is UNSCRAMBLE, use NSCRAMBLEU one time, CRAMBLEUNS the next, and so on. This can be done randomly or according to a schedule. Shifting the starting point of the key prevents the cryptanalyst from aligning several messages to combine them to get enough data to attack the cipher. You don't know where the alignments go.
But, before WW II, Friedman and Kullback devised a method to allow you to find the alignments, even if the encrypter shifts the key for each message. This technique is known as putting the messages in depth. It is a statistical technique. Suppose you have two messages encoded using a polyalphabetic cipher using the same key, but the key was shifted differently between the two. Assuming the cipher was good, the probability of finding a given letter, e.g. 'w' is 1/26. If you randomly aligned the two messages, since the odds of finding a single 'w' are 1/26, the odds of finding one in each message in the aligned position are 1/26 x 1/26. If an alignment between two messages is incorrect (that is, they do reflect the alignment that uses the same key letter in the aligned positions of each message), then the probabililty of a pair of a specific letter is indeed (1/26)2. That is, about 0.0000568. The probability of any pair of letters occurring is 26 x (1/26)2. That is, about 0.038
But, if the two messages are properly aligned the probabilities of pairs occurring jump way up. Each plaintext letter will be enciphered with the same ciphertext letter. This allows the letter distribution statistics to show through. So, now instead of 26 x (1/26)2, the probability of any pair occurring is 0.1272 + 0.09062 + ... + 0.0072 for 'e', 't', ... , 'z'. This sums to a probability of seeing any pair of 0.067. This is about twice the random probability of 0.038.
Using this method, an automatic technique can be created to align two messages.
The Graduate Project is to write a program to automate the decryption of messages encrypted using polyalphabetic ciphers. You may assume that a Vigenere cipher is used.
Your program should guess
Your program may allow manual intervention to allow you to replace an incorrect automatic guess of a Vigenere row substitution. For example, it might read in the task file(s), use in-depth to guess the alignment(s) and key length, and then analyze the component simple substitution ciphers. When it shows its results, the user can alter its guesses if s/he wants.
This is a looser, more open-ended assignment than the others in this course. Part of this has to do with the nature of cryptanalysis. It is looser than the implementation of established cryptographic algorithms. Given this, different implementations of this project may well look very different from one-another. Don't get spooked just because one of your fellow students is doing something different.
I will put sample files in the directory
~berg/cs445t/project/
Each separate cryptanalysis task will be in a separate subdirectory. Most files will contain only one or more ciphertext files. In a given subdirectory, all the ciphertext files will have been encrypted using a Vigenere cipher using the same key. The key may be shifted by different amounts for each encryption.
As an aid to development and debugging, some of the subdirectories will contain plaintext and key files for the ciphertext files.
All files will contain only alphabetic characters (i.e. 'a' to 'z').
Some of the sample files will be easy. Others have varying degrees of difficulty. Others will be close to impossible.
The project described here is the default project. If you have an idea for an alternate project, please let me know. If you wish to pursue an alternate project, please contact Prof. Berg with your idea by April 15th. All custom projects must be approved by the professor.
The following needs to be submitted
The programs must be adequately documented so that it is clear what is occurring at each point in the program, as well as necessary documentation for overall algorithms, assumptions, etc.
Programs should be tested so that they can be compiled (if necessary) and run on the academic computing center Unix cluster.
To submit the programs, you can either send them as attachments to berg@cs.albany.edu, or they can be encoded and mailed directly:
shar file1 file2 ... | Mail -s "crypto projecct" berg@cs.albany.edu
The bulk of the grade will be on successful implementation of the in-depth alignment algorithm, and the associated routines to do the counts, etc. and the other mechanical aspects of the program. Also, is the overall program a credible approach to in-depth alignment based cryptanalysis. This will be 80% of the grad.
The remainder of the grade (20%) will be awarded on how many of the task you solve.
Q:
Version 04030301 (04/03/03)
Copyright 2003 by George Berg