3.33

From The Algorithm Design Manual Solution Wiki
Jump to navigation Jump to search

Idea: Because a Caesar cipher just rotates the alphabet, we can try every one of the 26 shifts, score each tentative plaintext with a simple English–frequency χ² test, and pick the shift whose score is smallest (i.e. closest to normal English statistics). For texts longer than a few lines this almost always singles out the correct key.

Algorithm (χ²-based search)
1. Strip the text to A–Z and convert to uppercase.
2. For k from 0 to 25 do
  a. Rotate every letter left by k (A→A, B→A is k=1, etc.).
  b. Count the 26 letter frequencies fi in the result.
  c. Compute
    chi2 = Σ((fi – pi·N)² / (pi·N))
    where pi is the expected English probability for that letter and N is text length.
3. Return the shift whose χ² is minimum and its plaintext.

Python 3 reference implementation (≈30 lines) — no external files needed:

import string, sys ENGLISH_FREQ = [ # ACEFHILN... ordered A-Z 8.167, 1.492, 2.782, 4.253, 12.702, 2.228, 2.015, 6.094, 6.966, 0.153, 0.772, 4.025, 2.406, 6.749, 7.507, 1.929, 0.095, 5.987, 6.327, 9.056, 2.758, 0.978, 2.360, 0.150, 1.974, 0.074] def rotate(text, k): res = [] for c in text: if c.isalpha(): base = ord('A') res.append(chr(base + (ord(c.upper()) - base - k) % 26)) else: res.append(c) return ''.join(res) def chi2_score(freq, N): return sum((freq[i] - ENGLISH_FREQ[i]*N/100)**2 / (ENGLISH_FREQ[i]*N/100) for i in range(26)) def decrypt_caesar(cipher): best_chi, best_shift, best_plain = 10**9, None, None cleaned = ''.join(c for c in cipher if c.isalpha()).upper() for k in range(26): plain = rotate(cipher, k) freq = [0]*26 for c in cleaned: idx = (ord(c) - ord('A') - k) % 26 freq[idx] += 1 chi = chi2_score(freq, len(cleaned)) if chi < best_chi: best_chi, best_shift, best_plain = chi, k, plain return best_shift, best_plain if __name__ == '__main__': ciphertext = sys.stdin.read() shift, plaintext = decrypt_caesar(ciphertext) print(f"Shift used = {shift}") print(plaintext)
Complexity: 26 passes over the text ⇒ O(26·N) ≈ O(N) time and O(1) additional memory.

Run the script, paste or pipe the ciphertext, and it prints the most likely key and the decrypted message.


Back to Chapter 3