Shannon Entropy

Entropy is a concept borrowed from thermodynamics and statistical mechanics, but it’s widely used in various fields, including information theory and cryptography. In simple terms, entropy refers to the measure of randomness or uncertainty in a system.  

In the context of cryptography, Shannon entropy is important, because it helps in understanding the randomness of data. A very practical use of this is to discern between encrypted and plaintext data. The higher the entropy, the more unpredictable the data is, and consequently it’s harder to predict the next element in the sequence.

$$H(X) = -\sum_{x \in X} P(x) \log_2(P(x))$$

fn calculate_shannon_entropy(symbol_frequencies: &HashMap<char, f64>) -> f64 {  
   let mut entropy = 0.0;  
  
   for (&symbol, &frequency) in symbol_frequencies {  
       entropy -= frequency * frequency.log2();  
   }  
  
   entropy  
}  

Entropy of Language #

Languages have characteristic entropy levels based on the predictability of letters and words. Entropy analysis can help distinguish between encrypted text and random noise or other forms of data. If the entropy of a text is similar to that of natural language, it suggests that the text might be encrypted rather than random noise.

Calculating the entropy of the English language #

For this, we first need to count the occurrences of each letter in the input text, and then calculate the rate or frequency at which each letter appears. Luckily, I already have a function for that from the cryptopals challenges series:

fn count_characters(input: &str) -> HashMap<char, u32> {  
   let mut char_count: HashMap<char, u32> = HashMap::new();  
  
   for c in input.chars() {  
       if c.is_alphabetic() {  
           *char_count.entry(c.to_ascii_lowercase()).or_insert(0) += 1;  
       }  
   }  
  
   char_count  
}  

Let’s now try the code with a small piece of text. Notice how char_frequencies_percentages is calculated for further use in the formula.   

fn main() {  
   let input = "  
       Cryptography is the ultimate form of non-violent direct action.    
       While nuclear weapons states can exert unlimited violence over even millions of individuals,    
       strong cryptography means that a state, even by exercising unlimited violence,    
       cannot violate the intent of individuals to keep secrets from them.    
   ";  
   let char_frequencies = count_characters(&input);  
  
   let total_count: u32 = char_frequencies.values().sum();  
  
   let char_frequencies_percentages: HashMap<char, f64> = char_frequencies  
       .iter()  
       .map(|(&c, &count)| (c, count as f64 / total_count as f64))  
       .collect();  
  
   for (ch, freq) in &char_frequencies_percentages {  
       println!("Character '{}': {:.4}%", ch, freq);  
   }  
   println!("{} bits",calculate_shannon_entropy(&char_frequencies_percentages));  
}  

Results #

Of course, this was calculated excluding punctuation and case sensitivity. Though, a rough estimate, based on much larger texts, should be that the entropy of the English language is around 4.15 bits. More on this topic - arxiv.org

> ./shannonentropy  
> Character 'i': 0.0945%  
 Character 'u': 0.0236%  
 Character 'p': 0.0236%  
 Character 'g': 0.0157%  
 Character 's': 0.0512%  
 Character 'b': 0.0039%  
 Character 'e': 0.1299%  
 Character 'l': 0.0512%  
 Character 'k': 0.0039%  
 Character 'a': 0.0630%  
 Character 'm': 0.0315%  
 Character 'w': 0.0079%  
 Character 'x': 0.0079%  
 Character 'n': 0.0906%  
 Character 'd': 0.0276%  
 Character 'y': 0.0197%  
 Character 'h': 0.0276%  
 Character 'v': 0.0354%  
 Character 't': 0.1024%  
 Character 'c': 0.0433%  
 Character 'f': 0.0197%  
 Character 'r': 0.0512%  
 Character 'o': 0.0748%  
 4.099274812502984 bits  

Cryptanalysis #

Entropy helps cryptanalysts in determining the effectiveness of cryptographic techniques.

For instance, if a cryptographic algorithm produces ciphertext with low entropy, it suggests potential vulnerabilities that an attacker could exploit. Cryptanalysts can use entropy measurements to identify weaknesses in encryption systems and develop strategies to break them. I will try to further use this in one of the next challenges.


Get Involved #

I think knowledge should be shared and discussions encouraged. So, don’t hesitate to ask questions, or suggest topics you’d like me to cover in future posts.

Stay Connected #

You can contact me on ion.miron@tutanota.com