Cryptology

Spring 2003

 

Default Graduate Project

CSI660T

Due Thursday, May 15th, 2003

Assigned 04/02/03

Value: 40 points.


This is the description of the default semester project. All graduate students enrolled in CSI660T are required to do the project. Undergraduates in CSI445T may also do the project for extra credit.

The project assignment is to write a program to aid in the cracking of polyalphabetic ciphers. For this project, the cipher used will be a straight Vigenere cipher, as described in class.

Since a polyalphabetic cipher uses multiple key substitutions, it cannot be cracked by looking at the letter distribution like a mono-alphabetic (simple) substitution cipher. There are two ways to attack a polyalphabetic cipher.

  1. If you have a long enough message, guess different key lengths. For a guess of length n, assume every n-th letter is encrypted with the same substitution. For each of these n groups, use the simple substitution cracking technique. That is, assume 'e' is the plaintext for the most common ciphertext letter, etc. Work out a trial subsitution table for each of the n groups.
    1. Your program should allow you to modify the substitutions for each group. This will allow you to tinker with the substitutions, in case the letters are not in the stereotypical order in the actual distribution of the message.
    2. If individual messages are too short for this method (e.g. Budansky, in his book "Battle of Wits: The Complete Story of Codebreaking in World War II" says it takes fifty letter or more of text) multiple messages encoded with the same key can be used. If all messages use the key identically, this method will work just fine.
  2. However, code clerks often make an adjustment to defeat this approach. Assume the key is COYOTE. One message may be encrypted using COYOTE, one with YOTECO, the next with TECOYO, OYOTEC, etc., effectively doing a shift on the key. If this is the case, a more involved approach will be necessary, which is known as alignment, or placing messages in depth.

The Vigenere Cipher is a polyalphabetic cipher. It uses the Vigenere square:

a b c d e f g h I j k l m n o p q r s t u v w x y z

B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
R S T U V W X Y Z A B C D E F G H I J K L M N O P Q
S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

To encrypt with the Vigenere cipher, first repeat the keyword above the plaintext. Then the encryption is done as follows for each plaintext letter:

 Keyword:  unscrambleunscrambleunscramble
 Plaintext:  therearetwentythreeguardstoday
 Ciphertext:  nuwtvadfeayalakhdfpkonjfjtaelc

Polyalphabetic ciphers do not show the same distribution of letter frequencies that makes a monoalphabetic substitution cipher so easy to crack. They are however, subject to attack. The weakness is the key. For example here's the example Vigenere encryption:

 Keyword:  unscramble  unscramble  unscramble
 Plaintext:  therearetw  entythreeg  uardstoday
 Ciphertext:  nuwtvadfea  yalakhdfpk  onjfjtaelc

The key is used repeatedly. All positions that use the same key letter can be treated as a monoalphabetic substitution cipher. In this example, there are only three characters for each component simple substitution cipher. This is probably too little to solve them. But, if the message was longer, or if there multiple messages using the same key, there would be enough data to cryptanalyze the component ciphers.

However, a common technique to defend polyalphabetic ciphers against this attack is to keep individual messages short and to shift the key between uses. That is, if the key is UNSCRAMBLE, use NSCRAMBLEU one time, CRAMBLEUNS the next, and so on. This can be done randomly or according to a schedule. Shifting the starting point of the key prevents the cryptanalyst from aligning several messages to combine them to get enough data to attack the cipher. You don't know where the alignments go.

But, before WW II, Friedman and Kullback devised a method to allow you to find the alignments, even if the encrypter shifts the key for each message. This technique is known as putting the messages in depth. It is a statistical technique. Suppose you have two messages encoded using a polyalphabetic cipher using the same key, but the key was shifted differently between the two. Assuming the cipher was good, the probability of finding a given letter, e.g. 'w' is 1/26. If you randomly aligned the two messages, since the odds of finding a single 'w' are 1/26, the odds of finding one in each message in the aligned position are 1/26 x 1/26. If an alignment between two messages is incorrect (that is, they do reflect the alignment that uses the same key letter in the aligned positions of each message), then the probabililty of a pair of a specific letter is indeed (1/26)2. That is, about 0.0000568. The probability of any pair of letters occurring is 26 x (1/26)2. That is, about 0.038

But, if the two messages are properly aligned the probabilities of pairs occurring jump way up. Each plaintext letter will be enciphered with the same ciphertext letter. This allows the letter distribution statistics to show through. So, now instead of 26 x (1/26)2, the probability of any pair occurring is 0.1272 + 0.09062 + ... + 0.0072 for 'e', 't', ... , 'z'. This sums to a probability of seeing any pair of 0.067. This is about twice the random probability of 0.038.

Using this method, an automatic technique can be created to align two messages.

Details

The Graduate Project is to write a program to automate the decryption of messages encrypted using polyalphabetic ciphers. You may assume that a Vigenere cipher is used.

Your program should guess

Your program may allow manual intervention to allow you to replace an incorrect automatic guess of a Vigenere row substitution. For example, it might read in the task file(s), use in-depth to guess the alignment(s) and key length, and then analyze the component simple substitution ciphers. When it shows its results, the user can alter its guesses if s/he wants.

This is a looser, more open-ended assignment than the others in this course. Part of this has to do with the nature of cryptanalysis. It is looser than the implementation of established cryptographic algorithms. Given this, different implementations of this project may well look very different from one-another. Don't get spooked just because one of your fellow students is doing something different.

Files

I will put sample files in the directory

~berg/cs445t/project/

Each separate cryptanalysis task will be in a separate subdirectory. Most files will contain only one or more ciphertext files. In a given subdirectory, all the ciphertext files will have been encrypted using a Vigenere cipher using the same key. The key may be shifted by different amounts for each encryption.

As an aid to development and debugging, some of the subdirectories will contain plaintext and key files for the ciphertext files.

All files will contain only alphabetic characters (i.e. 'a' to 'z').

Some of the sample files will be easy. Others have varying degrees of difficulty. Others will be close to impossible.

Custom Projects

The project described here is the default project. If you have an idea for an alternate project, please let me know. If you wish to pursue an alternate project, please contact Prof. Berg with your idea by April 15th. All custom projects must be approved by the professor.

Submission

The following needs to be submitted

  1. Source file(s)
  2. A README file containing instructions necessary to compile (if necessary) and run the program. This may be extensive, and look more like a users manual. This may be text, WORD, pdf, etc.
  3. Output dumps or traces of your solution of each task. Basically, I'm asking you to "show your work" to verify the task solution.

The programs must be adequately documented so that it is clear what is occurring at each point in the program, as well as necessary documentation for overall algorithms, assumptions, etc.

Programs should be tested so that they can be compiled (if necessary) and run on the academic computing center Unix cluster.

To submit the programs, you can either send them as attachments to berg@cs.albany.edu, or they can be encoded and mailed directly:

shar file1 file2 ... | Mail -s "crypto projecct" berg@cs.albany.edu

Grading

The bulk of the grade will be on successful implementation of the in-depth alignment algorithm, and the associated routines to do the counts, etc. and the other mechanical aspects of the program. Also, is the overall program a credible approach to in-depth alignment based cryptanalysis. This will be 80% of the grad.

The remainder of the grade (20%) will be awarded on how many of the task you solve.


Frequently Asked Questions

Q:


Version 04030301 (04/03/03)

Copyright 2003 by George Berg