Showing posts with label Pseudorandom. Show all posts
Showing posts with label Pseudorandom. Show all posts

Wednesday, September 21, 2016

RC4 stream cipher variants and visualization of table permutation state



Here, I present some work I have been doing on two RC4 stream cipher variants. The first variant, as seen below, I wrote to help me visualize and understand what the RC4 tables was doing, and help me understand its properties.

Identity Permutation

The class that contains it is called SimpleTable and is exactly that; The simplest R4C implementation possible. It is notable in the fact that it does not use key scheduling at all, and its starting state is that of the Identity Permutation. The identity permutation is where the value at index zero equals zero, the value at index one is one, and so on. An easy way to remember what the Identity Permutation is, just recall the notion of a Multiplicative Identity (which is 1), where by multiplying a number N by the Multiplicative Identity gives you back the value of N, also known as the identity. Similarly, the Identity Permutation of an array A just gives you A. This is the trivial permutation. That is, there is no permuting of the array at all!

Anyways, this is done to see the perfectly ordered state, and how each round effects that state. In this way, we can visually check for the avalanche effect. In order to visualize the table, i just assign each value 0 to 255 a different shade of grey (I also have a rainbow-colored option that might be easier to tell apart similar values). At each step I create a Bitmap by looping through the table. Below, you can find an animated GIF of the first 100 steps of this cipher being applied to the identity permutation:


Notice how it takes a while to get going, and the first several values don't move much at all. After 256 steps, or one round, the cursor arrives back at index zero. Because the location of the first several values have not moved much or at all, we can clearly see that a mere 256 steps is insufficient at permuting the state enough to avoid leaking the first part of your key. Therefore it it is vital to permute the table for several rounds (256 steps per round) before you start using the stream.

Wired Equivalent Protocol

As some of you may know, WEP used RC4 with a weak key schedule. The key is spread out over 256 bytes using the following approach:


j = (j + table[i] + key[i mod keylength]) % 256;
SwapValues(table[i], table[j]);


and then it began streaming bytes from the table. Typically a nonce is concatenated to the key. Every time the table is set up and/or the nonce changes, some information about the key is leaked. Obviously a more secure procedure would use a hash of the key and the nonce, instead of the plain-text key, and to toss away the first 1024 bytes or so.

Cycle Length

Because each step in an RC4 cipher is a permutation, there is a limit to the number of unique bytes that can be produced before it begins repeating. This is called the permutation cycle. The length of the permutation cycle depends on the exact starting state, but we can get an upper bound.

Since there are 256 elements in the array, and two indices into the array (i and j), there is a maximum of
256! * 256^2 = 5.62 * 10^512 = 2^1700
possible states. That's 4.6 * 10^488 yottabytes!

This is the maximum possible states, however, and other starting states could have less. If the RC4 algorithm performed as a random permutation (which it does not, it performs worse), the cycle length would be half of the theoretical maximum above. Luckily the number above is so vast, that even some faction of it is still so many bytes that all of humanity has never and likely will never have that much total storage.

One thing to watch out for, however is something called Finney States. If an RC4 is started in one of these Finney States, the length of the cycle is much, much reduced. The chance of randomly generating one of these starting states, however, is VERY, very low.

Strengthening RC4

As stated, and visualized, above, it is vital to permute the table for several rounds (at 256 steps per round) after the key schedule, discarding the bytes, before you start using the stream. Also, it would be foolish to use the actual bytes of the key for permuting the starting state. It would be instead better to use a hash of the key + nonce or a key derivation function from the key instead the actual value of the key itself.

Another idea is, after shuffling the table enough rounds to hide the key, scramble the table an additional number of rounds, that value being some function of the key. This increases the possible starting states by whatever your range is.

In the classic RC4, each step would return one byte. The number of steps taken before returning each byte is configurable in my implementation.

Memory hardening

Check out the experimental branch for a memory hardened version. It stores the key class in memory, with the key XORed with a one-time pad, and then is protected in memory from access with the System.Security.Cryptography.ProtectedMemory class.

Other uses

The pseudo-random byte stream from the RC4 table is deterministic. Therefore if two remote computers with a shared secret, both computers can independently set up an RC4 table with exactly the same starting state and will get the same sequence of bytes which would be difficult to guess, given just the stream of bytes. If the plain text is XORed by the pseudorandom byte stream, then it can be decrypted by XORing it by the same byte steam.

The project includes 2 variants: 1) A simple table with a method to visualize the permutation state of the table and the avalanche effect as a bitmap 2) A more serious attempt at a secure implementation.


NOTE: THIS HAS NOT BEEN CRYPTO-ANALYZED AND PROBABLY NOT ACTUALLY SECURE, SO DO NOT TRUST IT!

Screenshots




Source code

Here is the GitHub page to the project (master branch).
Or just directly download the Zip file (experimental branch).




Tuesday, October 6, 2015

Mixed Radix Numeral System class and Counter


Mixed Radix Calculator

   My 'Mixed Radix Calculator' creates a counting system of radices (plural of radix), such as base 12 or mixed radices such as Minutes/Hours/Days/Years: 365:24:60:60. I choose the left side to be the most significant side. This is merely a personal preference, and my MixedRadixSystem class supports displaying both alignments.

   Of course you dont have to choose a mixed radix numeral system, you can count in an N-base numeral system, such as base 7 or a more familiar base 16. Another feature lies in my RadixNumeral class. Each numeral, or place value, supports having its own dictionary of symbols.


Screenshot of Mixed Radix Calculator
      (Project released under Creative Commons)

-  52:7:24:60:60:1000  -


  A numeral system (or system of numeration) is a writing system for expressing numbers.


  The most familiar one is of course the decimal numeral system. This is a 10-base numbering system. Computers use a binary numeral system. The base is sometimes called the radix or scale.

  Not all numbering systems have just one base. Take for example, how we currently divide time: There are 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 365 days in a year. This is called a mixed radix numeral system, and one might express the above mixed radix system like: 365:24:60:60.

  https://en.wikipedia.org/wiki/Mixed_radix
  http://mathworld.wolfram.com/Base.html

Uses:
  I haven't found a lot of use cases for it yet, but it is interesting. I originally built this because I wanted to experiment with numeral systems that uses increasing consecutive prime numbers for each radix, as well as experiment with some off-bases, such as base 3 or base 7.

  In a single base, say base 7, then 'round numbers' with only one place value having a 1 and the rest having zeros, such as 1:0:0:0:0 (in base 7), such numbers are powers of 7, and ever other number except for the 1's place value is a multiple of 7.

  A mixed radix numeral system can represent a polynomial, and possibly provide for a simpler way to visualize and reason about them.

  Yet another possible use is to make a numeral system with a base that is larger than and co-prime to some other target number (say 256) to make a bijective map from every value in a byte to some other value exactly once by repeatedly adding the value of the co-prime, modulus 256. This can appear rather random (or sometimes not at all) but the mapping is easily determined given the co-prime. I have talked about this notion before on my blog
  https://csharpcodewhisperer.blogspot.com/search/label/Coprime

  If you like this project you would probably like my project EquationFinder, it finds equations given constraints
  https://github.com/AdamWhiteHat/EquationFinder


Wednesday, April 9, 2014

An Implementation of a One-Way Hashing Algorithm using Substitution and Permutation




        This program is a one-way hashing algorithm that can be used for hashing passwords or verifying a file's integrity. Like other ciphers, it has rounds. While this particular implementation was designed to overwrite some of the information required to reverse the rounds operations to be asymmetric, the same core 'substitution' algorithm could be used to make a symmetric stream cipher by taking a different approach.
        Each round proceeds as follows: First a scrambling or mangling (sounds violent) of a substitution table is performed before substituting the plain-text (or last round's output) with values from the table chosen by the index corresponding to the plain-text character's equivalent byte value. The substituted-text is then converted into a List of bits (bool) and the list then split in half into two lists of bits. One of the lists of bits is optionally flipped (reversed) based on the value of the first bit of that list. The value of the middle bit of the other list is used to determine which list is to lead during the (next step) bit shuffling. Starting with the selected list, the bits of the two lists are interleaved (shuffled) making one list again. The bits are converted back into bytes via an extension method and used as the input for the next round.
        The substitution/cipher block step uses a variation of the RC4 cipher with the key-scheduling algorithm dropped in favor of a pseudo-random-appearing, but evenly distributed table with each number from 0 to 256 appearing exactly once. This table is deterministically populated by an initial value, IV, which is some number that is both greater than 256 and co-prime to it. Starting with zero, set each byte in the table to the value plus IV, modulus 256, in order. Then, to 'prime' the table/block, iterate over the table several thousand times with the RC4 algorithm, discarding the byte stream (in the case of RC4). In the case of my particular RC4 variant implementation, the result byte is put into the table at an index that congruent to 256 modulus the current iteration count multiplied by some co-prime of 256, in this case, 331.
        While many people believe that RC4 is well on its way to being broken, it is my hope that by using a smarter key-scheduling function with a different (larger) block size and maybe some bit country line-dancing (shuffling) will make a variant of RC4 that we all can have confidence in again. Hmm, there is already an RC5 and RC6. Maybe a good name would be RC4.1 or RC4.Awesome. RC4 and seven thirteenths? R2C2 perhaps? No, I got it: The stream cipher formally known as RC4. RC4Realz or RCL33t? Anyways, this article is not about symmetric algorithms, but rather asymmetric ones, so read on...

        Cryptography is one of my interests. I love the math behind it and I think the mechanics behind RSA are just so elegant. I often daydream that one day I will be able to come out with something incredibly clever and useful. For the past several months, I have been working on implementing several different concepts of cryptography in C#. I have wrote several classes that do one specific type of job, such as transposition, permutation or substitution. I have been toying with combinations these different methods into 'rounds', with each round getting its input from the output of the previous round.  In this post, I am proposing a one-way hashing algorithm that uses a combination of bit-shuffling (permutation) and a substitution table with logic that mangle the table based on the current state of the table. It is possible to make an asymmetric algorithm from this idea, but that is not the aim of this particular tool.

        This algorithm still has some areas that need improving. First of all, its rather slow, so hashing large files is not practical. The number of rounds has a large baring on its hash time. A smaller number of rounds is recommended for anything beyond a paragraph. The current default is 18. This is rather arbitrary, as I have not done any testing to find the optimal number of rounds. Anyone who uses less than 4 rounds will start to see some repeating characters, not to mention make their implementation more susceptible to cryptanalysis. Sometimes a lengthy hashing time can be desirable, such as a proof of work concept, or to make it more difficult to brute force. For a password-length input at 1024 rounds, it takes almost a full second.


        Another area for improvement is the permutation step; is basically just shuffles the bits by dividing the bit array in half and interleaving the bits (think a deck of cards). While there is some variation in the order, based off the input, a much better bit shuffling would probably be an expansion or compression style of permutation. See this link to Wikipedia for an visual depiction.


        Well that about sums it up, so now on to the code! Take note at how I used extension methods to take care of converting a BitArray and a Byte array into a string, and back again. This makes the code very readable and not take up nearly as many lines by reusing code. The code is pretty heavily commented, so I will just let it explaining itself:





public class AJRHash
{
 byte[] table;
 // Unsure the optimum setting for rounds.
 // No less that 4. 1024 takes around 1 second for password-length hashes.
 // Use lower setting for large hashes
 static int rounds = 18;
 static int minimumLength = 10;
 static int tableSize = 256; // Because we are using bytes
 // By choosing a co-prime to 256, we ensure we get an evenly distributed table
 static int[] coprimesOf256 = { 4489, 2209, 1369 }; // Changing the co-primes will change the hash
 
 public AJRHash()
 {
  table = new byte[tableSize];
  InitializeTable();
 }
 
 public string Hash(string Input)
 {
  // Pad the input to minimum length
  if(Input.Length<minimumLength)
  {
   List<byte> padding = new List<byte>();
   
   // Copy the input first
   if(Input.Length>0)
    padding.AddRange(Input.GetBytes());
   
   // Make up the diffrence with nulls
   int diffrence = minimumLength - Input.Length;
   while(diffrence>0)
   {
    padding.Add(0);
    diffrence--;
   }
   Input = padding.ToArray().GetString();
  }
  
  byte[] permutation = Input.GetBytes();
  byte[] substitution = new byte[Input.Length+1];
  
  int counter = rounds;
  while(counter>0)
  {
   substitution = Substitution(permutation);
   permutation  = Permutation_Interleave(substitution);
   counter--;
  }

  return Convert.ToBase64String(permutation);
 }
 
 void InitializeTable()
 {
  int prime = coprimesOf256[0];
  
  byte index = 0;
  for (int i = 0; i < tableSize; i++)
  {
   table[index] = (byte)i;
   
   unchecked // The large prime will just roll over. Essentially just modular arithmetic
   { // By choosing a co-prime to 256, we ensure we get an evenly distributed table
    index += (byte)prime;
   }
  }
  
  // Finish initializing the table by shuffling it
  ScrambleTable(prime);
 }
 
 void ScrambleTable(int iterations)
 {
  byte i = 0;
  byte j = 0;
  byte k = 0;
  byte l = 0;
  
  byte iteration = 0;
  int counter = iterations;
  
  // Just roll over on overflow
  unchecked
  {
   // Some initial values
   j=table[table.Length/2];
   i=table[j];

   // Use the current state of the table to determine how it is changed
   while(counter>0)
   {
    i++;
    l=(byte)(table[i]+1);
    j=(byte)(j+l+1);
    
    table[i] = (byte)(j + 3);
    table[j] = (byte)(l - 1);
    
    k = (byte)( (table[i] + table[j]));
    l = (byte)(k % 255);
    
    table[k] = (byte)(table[k]+1);
    
    k = table[l];
    l = table[iteration];
    
    table[(byte)(iteration*331)] = (byte)(k ^ l);
    
    counter--;
    iteration++;
   }
  }
 }
 
 byte[] Substitution(byte[] input)
 {
  // Shuffle the table by an ammount unique to the input
  ScrambleTable(input.Length*input[0]);
  
  // Apply input against the substitution table
  List<byte> result = new List<byte>();
  foreach(byte b in input)
  {
   result.Add( table[(int)b] );
  }
  
  return result.ToArray();
 }
 
 // TODO: Add more robust bit shuffling/permutation such as compression or expansion
 byte[] Permutation_Interleave(byte[] input)
 {
  // Convert input into an array of bits
  BitArray temp = new BitArray(input);
  List<bool> bits = temp.GetList();
  
  // Ensure we have an even number of bits
  if(bits.Count%2 != 0)
   bits.Add(false);
  
  // Split the input up into two arrays of bits
  List<bool[]> split = bits.Split();
  
  // Make sure we have at least two arrays to interleave
  if(split.Count<2)
   throw new Exception("Input is too short.");
  
  bool[] one = split[0];
  bool[] two = split[1];
  
  // Ensure both arrays are the same length
  if(one.Length != two.Length)
   throw new Exception("Lengths must match.");
  
  // If the first bit of two is 1, reverse it.
  //   This makes the deterministic shuffling more difficult to reverse 
  if(two[0])
   Array.Reverse(two);
  
  // If the 'middle' bit of one is 1, take from two first.
  //   This makes the deterministic shuffling more difficult to reverse
  bool SecondFirst = false;
  if(one[one.Length/2])
   SecondFirst = true;
  
  // Interleave the two arrays
  int counter = 0;
  List<bool> result = new List<bool>();
  while(counter<one.Length)
  {
   // Two possible ways to order (decided above)
   if(SecondFirst)
   {
    result.Add(two[counter]);
    result.Add(one[counter]);
   }
   else
   {
    result.Add(one[counter]);
    result.Add(two[counter]);
   }
   counter++;
  }
  
  return new BitArray(result.ToArray()).GetBytes();
 }
}



I also wrote a few extension methods to help with all the tedious converting of data types. Extension methods can help keep you code tidy, readable. Don't forget to add these extension methods, or the above code wont compile:




public static class ExtentionMethods
{
 public static byte[] GetBytes(this string Input)
 {
  if(Input == null) return new byte[0];   
  return Encoding.UTF8.GetBytes(Input);
 }
 
 public static string GetString(this byte[] Input)
 {
  if(Input == null) return string.Empty;   
  return Encoding.UTF8.GetString(Input);
 }
 
 public static string GetString(this BitArray Input)
 {
  if(Input == null) return string.Empty;   
  return Encoding.UTF8.GetString(Input.GetBytes());
 }
 
 public static byte[] GetBytes(this BitArray Input)
 {
  if(Input == null) return new byte[0];   
  int lengthInBytes = (Input.Length%8==0) ? Input.Length/8 : (Input.Length/8)+1;
  
  byte[] result = new byte[lengthInBytes];
  Input.CopyTo(result,0);
  return result;
 }
 
 public static bool[] GetArray(this BitArray Input)
 {
  if(Input == null) return new bool[0];   
  return Input.GetList().ToArray();
 }
 
 public static List<bool> GetList(this BitArray Input)
 {
  if(Input == null) return new List<bool>();
  
  List<bool> result = new List<bool>();
  foreach(bool b in Input)
  {
   result.Add(b);
  }   
  return result;
 }
 
 public static List<bool[]> Split(this List<bool> Input)
 {
  if(Input == null) return new List<bool[]>();
  
  int midpoint = (Input.Count/2);
  List<bool> result1 = new List<bool>();
  List<bool> result2 = new List<bool>();
  
  int counter = 0;
  foreach(bool b in Input)
  {
   if(counter<midpoint)
   {
    result1.Add(b);
   }
   else
   {
    result2.Add(b);
   }    
   counter++;
  }
  
  List<bool[]> result = new List<bool[]>();
  result.Add(result1.ToArray());
  result.Add(result2.ToArray());
  return result;   
 }
}

Wednesday, August 7, 2013

Pseudo 'random' even distribution table




In my last post, I discussed what a co-prime is and showed you to find them.

So, what's so special about relatively prime numbers? Well, then can be used to create an one-for-one distribution table that is seemingly random, but is deterministically calculated (created).

To understand what I mean, picture the face of a clock...


It has the hours 1 through 12, and if you and an hour to 12, you get 1. This can also be thought of as a single digit in a base 12 number system. Now we need a co-prime to 12. 7 is relatively prime to 12, so lets choose 7.

Starting at hour 1, if we add 7 hours, it will be 8. If we add 7 more hours, we will get 3. 7 more, 10. If we keep adding 7 hours to our clock, the hour hand will land on each of the different numbers exactly once before repeating itself, 12 steps later. Intrigued yet?

If, say, we find a co-prime to the largest number that can be represented by a byte (8-bits, 256 [also expressed as 2^8=256 or 8=Log2(256)]), we can create an array of bytes with a length of 256, containing each of the 256 different possible bytes, distributed in a seemingly random order. The discrete order, or sequence, in which each each number is visited it completely dependent on the value of the co-prime that was selected.

This table is now essentially a one-to-one, bijective mapping of one byte to another. To express this mapping to another party, say to map a stream of bytes back to their original values (decrypt), the entire table need not be exchanged, only the co-prime.

This provides a foundation for an encryption scheme who's technical requirements are similar to handling a cipher-block-chain (CBC) and its changing IV (initialization vector).

Now, it it easy to jump to the conclusion that such an encryption scheme is less secure than a CBC, but this is not necessarily the case. While this approach may be conceptually more simple, the difficulty of discovering the sequence can be made arbitrarily hard.

First of all, the number of relatively prime numbers to 256 is probably infinite. A co-prime to 256 does not have to be less than 256. Indeed, it may be several thousand time greater than 256. Additionally, any prime greater than 256 is, by definition, co-prime to 256, and likely will have a seemingly more 'random' distribution/appearance.

There is, however, a limit here. It does not have to do with the number of co-primes, but is instead limited by the number of possible sequences that can be represented by our array of 256 bytes; eventually, two different co-primes are going to map to the same unique sequence. The order matters, and we don't allow repetition to exist in our sequence. This is called a permutation without repetition, and can be expressed as 256! or 256 factorial and is instructing one to calculate the product of 256 * 255 * 254 * 253 * [...] * 6 * 5 * 4 * 3 * 2 * 1, which equals exactly this number:

857817775342842654119082271681232625157781520279485619859655650377269452553147589377440291360451408450375885342336584306157196834693696475322289288497426025679637332563368786442675207626794560187968867971521143307702077526646451464709187326100832876325702818980773671781454170250523018608495319068138257481070252817559459476987034665712738139286205234756808218860701203611083152093501947437109101726968262861606263662435022840944191408424615936000000000000000000000000000000000000000000000000000000000000000

Yeah, that's right, that number has exactly 63 zeros on the end and is 507 digits long. (As an aside, the reason there is so many zeros on the end of this number is, well for one it is highly composite, but more specifically, its prime factorization includes 2^255 and 5^63 and so 63 fives multiply with 63 of those twos to make 63 tens, and hence that many zeros.)

Above I said arbitrarily hard. So far we have only considered one table, but try and fathom the complexity of many tables. I present three different ways to use multiple tables; Nested, sequentially, and mangled.
Furthermore, the distribution tables can be discarded and replaced.

I will explain what those mean and finish this post tomorrow.