Showing posts with label Math. Show all posts
Showing posts with label Math. Show all posts

Wednesday, August 7, 2013

Pseudo 'random' even distribution table




In my last post, I discussed what a co-prime is and showed you to find them.

So, what's so special about relatively prime numbers? Well, then can be used to create an one-for-one distribution table that is seemingly random, but is deterministically calculated (created).

To understand what I mean, picture the face of a clock...


It has the hours 1 through 12, and if you and an hour to 12, you get 1. This can also be thought of as a single digit in a base 12 number system. Now we need a co-prime to 12. 7 is relatively prime to 12, so lets choose 7.

Starting at hour 1, if we add 7 hours, it will be 8. If we add 7 more hours, we will get 3. 7 more, 10. If we keep adding 7 hours to our clock, the hour hand will land on each of the different numbers exactly once before repeating itself, 12 steps later. Intrigued yet?

If, say, we find a co-prime to the largest number that can be represented by a byte (8-bits, 256 [also expressed as 2^8=256 or 8=Log2(256)]), we can create an array of bytes with a length of 256, containing each of the 256 different possible bytes, distributed in a seemingly random order. The discrete order, or sequence, in which each each number is visited it completely dependent on the value of the co-prime that was selected.

This table is now essentially a one-to-one, bijective mapping of one byte to another. To express this mapping to another party, say to map a stream of bytes back to their original values (decrypt), the entire table need not be exchanged, only the co-prime.

This provides a foundation for an encryption scheme who's technical requirements are similar to handling a cipher-block-chain (CBC) and its changing IV (initialization vector).

Now, it it easy to jump to the conclusion that such an encryption scheme is less secure than a CBC, but this is not necessarily the case. While this approach may be conceptually more simple, the difficulty of discovering the sequence can be made arbitrarily hard.

First of all, the number of relatively prime numbers to 256 is probably infinite. A co-prime to 256 does not have to be less than 256. Indeed, it may be several thousand time greater than 256. Additionally, any prime greater than 256 is, by definition, co-prime to 256, and likely will have a seemingly more 'random' distribution/appearance.

There is, however, a limit here. It does not have to do with the number of co-primes, but is instead limited by the number of possible sequences that can be represented by our array of 256 bytes; eventually, two different co-primes are going to map to the same unique sequence. The order matters, and we don't allow repetition to exist in our sequence. This is called a permutation without repetition, and can be expressed as 256! or 256 factorial and is instructing one to calculate the product of 256 * 255 * 254 * 253 * [...] * 6 * 5 * 4 * 3 * 2 * 1, which equals exactly this number:

857817775342842654119082271681232625157781520279485619859655650377269452553147589377440291360451408450375885342336584306157196834693696475322289288497426025679637332563368786442675207626794560187968867971521143307702077526646451464709187326100832876325702818980773671781454170250523018608495319068138257481070252817559459476987034665712738139286205234756808218860701203611083152093501947437109101726968262861606263662435022840944191408424615936000000000000000000000000000000000000000000000000000000000000000

Yeah, that's right, that number has exactly 63 zeros on the end and is 507 digits long. (As an aside, the reason there is so many zeros on the end of this number is, well for one it is highly composite, but more specifically, its prime factorization includes 2^255 and 5^63 and so 63 fives multiply with 63 of those twos to make 63 tens, and hence that many zeros.)

Above I said arbitrarily hard. So far we have only considered one table, but try and fathom the complexity of many tables. I present three different ways to use multiple tables; Nested, sequentially, and mangled.
Furthermore, the distribution tables can be discarded and replaced.

I will explain what those mean and finish this post tomorrow.


Thursday, August 1, 2013

Greatest common divisor & Co-primeness



Below are three different methods for finding the greatest common divisor of a number. I provide a method using modulus, one using Euclid's method, and one that uses the Stein method.

Believe it or not, the modulus method has the best performance. I would have guessed the Stein method. Credit goes to blackwasp for the stein method.


public static uint FindGCDModulus(uint value1, uint value2)
{
        while(value1 != 0 && value2 != 0)
        {
                if (value1 > value2)
                {
                        value1 %= value2;
                }
                else
                {
                        value2 %= value1;
                }
        }
        return Math.Max(value1, value2);
}

public static uint FindGCDEuclid(uint value1, uint value2)
{
        while(value1 != 0 && value2 != 0)
        {
                if (value1 > value2)
                {
                        value1 -= value2;
                }
                else
                {
                        value2 -= value1;
                }
        }
        return Math.Max(value1, value2);
}

public static uint FindGCDStein(uint value1, uint value2)
{
        if (value1 == 0) return value2;
        if (value2 == 0) return value1;
        if (value1 == value2) return value1;

        bool value1IsEven = (value1 & 1u) == 0;
        bool value2IsEven = (value2 & 1u) == 0;
                       
        if (value1IsEven && value2IsEven)
        {
                return FindGCDStein(value1 >> 1, value2 >> 1) << 1;
        }
        else if (value1IsEven && !value2IsEven)
        {
                return FindGCDStein(value1 >> 1, value2);
        }
        else if (value2IsEven)
        {
                return FindGCDStein(value1, value2 >> 1);
        }
        else if (value1 > value2)
        {
                return FindGCDStein((value1 - value2) >> 1, value2);
        }
        else
        {
                return FindGCDStein(value1, (value2 - value1) >> 1);
        }
}

Here are the results of my benchmark tests:

Stein: 36.4657 milliseconds.
Euclid: 31.2369 milliseconds.
Modulus: 7.5199 milliseconds.



Yeah, I couldn't believe it either, the modulus approach is the fastest!

Here is the benchmark code that I created:

public class CodeStopwatch
{
        //int m_pointer;
        Stopwatch m_elapsed;
        
        public CodeStopwatch()
        {
                Reset();
        }

        public double Stop()
        {
                m_elapsed.Stop();
                return m_elapsed.Elapsed.TotalMilliseconds;
        }
        
        public void Reset()
        {
                GC.Collect(GC.MaxGeneration);
                m_elapsed = new Stopwatch();
                m_elapsed.Start();
        }
}

public class CoprimeBenchmark
{
        public delegate bool IsCoprimeDelegate(uint value1, uint value2);
        public List<uint> FindCoprimesDelegate(uint quantity,IsCoprimeDelegate isCoprime)
        {
                List<uint> results = new List<uint>();
                if(quantity<2) {
                        return results;
                }
                
                for (uint count = 2; count < quantity; count++)
                {
                        if(isCoprime(count,quantity))
                        {
                                results.Add(count);
                        }
                }
                return results;
        }
        
        public static bool IsCoprimeModulus(uint value1, uint value2)
        {
                return FindGCDModulus(value1, value2) == 1;
        }
        
        public static bool IsCoprimeEuclid(uint value1, uint value2)
        {
                return FindGCDEuclid(value1, value2) == 1;
        }
        
        public static bool IsCoprimeStein(uint value1, uint value2)
        {
                return FindGCDStein(value1, value2) == 1;
        }

        // [ Please refer to methods above. Ommited for brevity. ]
        public static uint FindGCDModulus(uint value1, uint value2) { ... }
        public static uint FindGCDEuclid(uint value1, uint value2) { ... }
        public static uint FindGCDStein(uint value1, uint value2) { ... }
}

Usage of above classes:

// Button OnClick event
void ButtonBenchmarkClick(object sender, EventArgs e)
{
        uint number = 50000;
        List<uint> resultsStein = new List<uint>();
        List<uint> resultsEuclid = new List<uint>();
        List<uint> resultsModulus = new List<uint>();
        
        CoprimeBenchmark cop = new CoprimeBenchmark();
        
        CodeStopwatch time = new CodeStopwatch();
        resultsStein = cop.FindCoprimesDelegate(number,CoprimeBenchmark.IsCoprimeStein);
        string stein = time.Stop().ToString();
        
        time = new CodeStopwatch();
        resultsEuclid = cop.FindCoprimesDelegate(number,CoprimeBenchmark.IsCoprimeEuclid);
        string euclid = time.Stop().ToString();
        
        time = new CodeStopwatch();
        resultsModulus = cop.FindCoprimesDelegate(number,CoprimeBenchmark.IsCoprimeModulus);
        string modulus = time.Stop().ToString();
        
        textBoxOutput.Text = "Stein:\t" + stein + " milliseconds." + Environment.NewLine;
        textBoxOutput.Text += "Euclid:\t" + euclid + " milliseconds." + Environment.NewLine;
        textBoxOutput.Text += "Modulus:\t" + modulus + " milliseconds." + Environment.NewLine;
        
        textBoxOutput.Text += Environment.NewLine + Environment.NewLine;
        foreach(uint a in resultsStein)
        {
                textBoxOutput.Text += a.ToString() + Environment.NewLine;
        }
        
        textBoxOutput.Text += Environment.NewLine + Environment.NewLine;
        foreach(uint b in resultsEuclid)
        {
                textBoxOutput.Text += b.ToString() + Environment.NewLine;
        }
        
        textBoxOutput.Text += Environment.NewLine + Environment.NewLine;
        foreach(uint c in resultsModulus)
        {
                textBoxOutput.Text += c.ToString() + Environment.NewLine;
        }
}


--


Now, I will take the fastest algorithm and show you the code for finding co-primness and a list of co-primes for a given number.

Finding co-primness is simple; Two numbers are relatively prime to each other if their greatest common divisor is equal to one.

My function FindCoprimes returns a List<uint> of co-primes given a number and a range

public static class Coprimes
{
        public static List<uint> FindCoprimes(uint number,uint min,uint max)
        {
                if(max<2 || min<2 || max<=min || number<1) {    return null;    }
                
                List<uint> results = new List<uint>();
                for(uint counter = min; counter<max; counter++) {
                        if(IsCoprime(number,counter)) {
                                results.Add(counter);
                        }
                }
                return results;
        }
       
        public static bool IsCoprime(uint value1, uint value2)
        {
                return FindGCDModulus(value1, value2) == 1;
        }
}

Pretty easy!

What is so special about co-primes? Well, one thing I use co-primes for is for making an evenly distributed table. For an explanation and the code, see my post on Evenly Distributed Tables.



Captcha: Drawing Text along a Bézier Spline



The following post/code will achieve something to the effect of:


All this code, and this article was inspired by planetclegg.com/projects/WarpingTextToSplines.html.
Explanation of the math and why it works will come shortly. In the meantime, please see the above link for a detailed explanation.

This code assumes you have already drawn some text on your GraphicsPath. Here is the code to transform your GraphicsPath text to follow a cubic Bézier Spline:

GraphicsPath BezierWarp(GraphicsPath text,Size size)
{
 // Control points for a cubic Bézier spline
 PointF P0 = new PointF();
 PointF P1 = new PointF();
 PointF P2 = new PointF();
 PointF P3 = new PointF();
 
 float shrink = 20;
 float shift = 0;
 
 P0.X = shrink;
 P0.Y = shrink+shift;
 P1.X = size.Width-shrink;
 P1.Y = shrink;
 P2.X = shrink;
 P2.Y = size.Height-shrink;
 P3.X = size.Width-shrink;
 P3.Y = size.Height-shrink-shift;

 // Calculate coefficients A thru H from the control points
 float A = P3.X - 3 * P2.X + 3 * P1.X - P0.X;
 float B = 3 * P2.X - 6 * P1.X + 3 * P0.X;
 float C = 3 * P1.X - 3 * P0.X;
 float D = P0.X;

 float E = P3.Y - 3 * P2.Y + 3 * P1.Y - P0.Y;
 float F = 3 * P2.Y - 6 * P1.Y + 3 * P0.Y;
 float G = 3 * P1.Y - 3 * P0.Y;
 float H = P0.Y;

 PointF[] pathPoints = text.PathPoints;
 RectangleF textBounds = text.GetBounds();
 
 for (int i =0; i  < pathPoints.Length; i++)
 {
  PointF pt = pathPoints[i];
  float textX = pt.X;
  float textY = pt.Y;
  
  // Normalize the x coordinate into the parameterized
  // value with a domain between 0 and 1.
  float t  =  textX / textBounds.Width;
  float t2 = (t * t);
  float t3 = (t * t * t);
  
  // Calculate spline point for parameter t
  float Sx = A * t3 + B * t2 + C * t + D;
  float Sy = E * t3 + F * t2 + G * t + H;
  
  // Calculate the tangent vector for the point
  float Tx = 3 * A * t2 + 2 * B * t + C;
  float Ty = 3 * E * t2 + 2 * F * t + G;

  // Rotate 90 or 270 degrees to make it a perpendicular
  float Px = - Ty;
  float Py =   Tx;
  
  // Normalize the perpendicular into a unit vector
  float magnitude = (float)Math.Sqrt((Px*Px) + (Py*Py));
  Px /= magnitude;
  Py /= magnitude;
  
  // Assume that input text point y coord is the "height" or
  // distance from the spline.  Multiply the perpendicular
  // vector with y. it becomes the new magnitude of the vector
  Px *= textY;
  Py *= textY;
  
  // Translate the spline point using the resultant vector
  float finalX = Px + Sx;
  float finalY = Py + Sy;
  
  pathPoints[i] = new PointF(finalX, finalY);
 }
 
 return new GraphicsPath(pathPoints,text.PathTypes);
}

Saturday, July 27, 2013

Information entropy and data compression



In my last post, I talked about Shannon data entropy and showed a class to calculate that. Lets take it one step further and actually compress some data based off the data entropy we calculated.

To do this, first we calculate how many bits are needed to compress each byte of our data. Theoretically, this is the data entropy, rounded up to the next whole number (Math.Ceiling). But this is not always the case, and the number of unique symbols in our data may be a number that is too large to be represented in that many number of bits. We calculate the number of bits needed to represent the number of unique symbols by getting its Base2 logarithm. This returns a decimal (double), so we use Math.Ceiling to round to up to the nearest whole number as well. We set entropy_Ceiling to which ever number is larger. If the entropy_Ceiling is 8, then we should immediately return, as we cannot compress the data any further.

We start by making a compression and decompression dictionary. We make these by taking the sorted distribution dictionary (DataEntropyUTF8.GetSortedDistribution) and start assigning X-bit-length value to each entry in the sorted distribution dictionary, with X being entropy_Ceiling. The compression dictionary has a byte as the key and an array of bool (bool[]) as the value, while the decompression dictionary has an array of bool as the key, and a byte as a value. You'll notice in the decompression dictionary we store the array of bool as a string, as using an actual array as a key will not work, as the dictionary's EqualityComparer will not assign the same hash code for two arrays of the same values.

Then, compression is as easy as reading each byte, and getting the value from the compression dictionary for that byte and adding it to a list of bool (List), then converting that array of bool to an array of bytes.

Decompression consists of converting the compressed array of bytes into an array of bool, then reading in X bools at a time and getting the byte value from the decompression library, again with X being entropy_Ceiling.




But first, to make this process easier, and to make our code more manageable and readable, I define several extension methods to help us out, since .NET provides almost no support for working with data on the bit level, besides the BitArray class. Here are the extension methods that to make working with bits easier:

public static class BitExtentionMethods
{
    //
    // List<bool> extention methods
    //
    public static List<bool> ToBitList(this byte source)
    {
        List<bool> temp = ( new BitArray(source.ToArray()) ).ToList();
        temp.Reverse();
        return temp;
    }
    
    public static List<bool> ToBitList(this byte source,int startIndex)
    {
        if(startIndex<0 || startIndex>7) {
            return new List<bool>();
        }
        return source.ToBitList().GetRange(startIndex,(8-startIndex));
    }
    
    //
    // bool[] extention methods
    //
    public static string GetString(this bool[] source)
    {
        string result = string.Empty;
        foreach(bool b in source)
        {
            if(b) {
                result += "1";
            } else {
                result += "0";
            }
        }
        return result;
    }
    
    public static bool[] ToBitArray(this byte source,int MaxLength)
    {
        List<bool> temp = source.ToBitList(8-MaxLength);
        return temp.ToArray();
    }
    
    public static bool[] ToBitArray(this byte source)
    {
        return source.ToBitList().ToArray();
    }

    //
    // BYTE extention methods
    //
    public static byte[] ToArray(this byte source)
    {
        List<byte> result = new List<byte>();
        result.Add(source);
        return result.ToArray();
    }
    
    //
    // BITARRAY extention methods
    //
    public static List<bool> ToList(this BitArray source)
    {
        List<bool> result = new List<bool>();
        foreach(bool bit in source)
        {
            result.Add(bit);
        }
        return result;
    }
    
    public static bool[] ToArray(this BitArray source)
    {
        return ToList(source).ToArray();
    }
}
Remember, these need to be the base class in a namespace, not in a nested class.


Now, we are free to write our compression/decompression class:

public class BitCompression
{
    // Data to encode
    byte[] data;
    // Compressed data
    byte[] encodeData;
    // # of bits needed to represent data
    int encodeLength_Bits;
    // Original size before padding. Decompressed data will be truncated to this length.
    int decodeLength_Bits;
    // Bits needed to represent each byte (entropy rounded up to nearist whole number)
    int entropy_Ceiling;
    // Data entropy class
    DataEntropyUTF8 fileEntropy;
    // Stores the compressed symbol table
    Dictionary<byte,bool[]> compressionLibrary;
    Dictionary<string,byte> decompressionLibrary;
    
    void GenerateLibrary()
    {
        byte[] distTable = new byte[fileEntropy.Distribution.Keys.Count];
        fileEntropy.Distribution.Keys.CopyTo(distTable,0);
        
        byte bitSymbol = 0x0;
        bool[] bitBuffer = new bool[entropy_Ceiling];
        foreach(byte symbol in distTable)
        {
            bitBuffer = bitSymbol.ToBitArray(entropy_Ceiling);
            compressionLibrary.Add(symbol,bitBuffer);
            decompressionLibrary.Add(bitBuffer.GetString(),symbol);
            bitSymbol++;
        }
    }
    
    public byte[] Compress()
    {
        // Error checking
        if(entropy_Ceiling>7 || entropy_Ceiling<1) {
            return data;
        }

        // Compress data using compressionLibrar
        List<bool> compressedBits = new List<bool>();
        foreach(byte bite in data) {    // Take each byte, find the matching bit array in the dictionary
            compressedBits.AddRange(compressionLibrary[bite]);
        }
        decodeLength_Bits = compressedBits.Count;
        
        // Pad to fill last byte
        while(compressedBits.Count % 8 != 0) {
            compressedBits.Add(false);  // Pad to the nearest byte
        }
        encodeLength_Bits = compressedBits.Count;
        
        // Convert from array of bits to array of bytes
        List<byte> result = new List<byte>();
        int count = 0;
        int shift = 0;
        int offset= 0;
        int stop  = 0;
        byte current = 0;
        do
        {
            stop = encodeLength_Bits - count;
            stop = 8 - stop;
            if(stop<0) {
                stop = 0;
            }
            if(stop<8)
            {
                shift = 7;
                offset = count;
                current = 0;
                
                while(shift>=stop)
                {
                    current |= (byte)(Convert.ToByte(compressedBits[offset]) << shift);
                    shift--;
                    offset++;
                }
                
                result.Add(current);
                count += 8;
            }
        } while(count < encodeLength_Bits);
        
        encodeData = result.ToArray();
        return encodeData;
    }
    
    public byte[] Decompress(byte[] compressedData)
    {
        // Error check
        if(compressedData.Length<1) {
            return null;
        }
        
        // Convert to bit array for decompressing
        List<bool> bitArray = new List<bool>();
        foreach(byte bite in compressedData) {
            bitArray.AddRange(bite.ToBitList());
        }
        
        // Truncate to original size, removes padding for byte array
        int diff = bitArray.Count-decodeLength_Bits;
        if(diff>0) {
            bitArray.RemoveRange(decodeLength_Bits-1,diff);
        }

        // Decompress
        List<byte> result = new List<byte>();
        int count = 0;
        do
        {
            bool[] word = bitArray.GetRange(count,entropy_Ceiling).ToArray();
            result.Add(decompressionLibrary[word.GetString()]);
            
            count+=entropy_Ceiling;
        } while(count < bitArray.Count);
        
        return result.ToArray();
    }
    
    public BitCompression(string filename)
    {
        compressionLibrary  = new Dictionary<byte, bool[]>();
        decompressionLibrary = new Dictionary<string, byte>();
        
        if(!File.Exists(filename))  {
            return;
        }
        
        data = File.ReadAllBytes(filename);
        fileEntropy = new DataEntropyUTF8();
        fileEntropy.ExamineChunk(data);
        
        int unique  = (int)Math.Ceiling(Math.Log((double)fileEntropy.UniqueSymbols,2f));
        int entropy = (int)Math.Ceiling(fileEntropy.Entropy);
        
        entropy_Ceiling = Math.Max(unique,entropy);
        encodeLength_Bits   = data.Length * entropy_Ceiling;
        
        GenerateLibrary();
    }
}


Please feel free to comment with ideas, suggestions or corrections.

Monday, July 22, 2013

Information Shannon Entropy



Shannon/data entropy is a measurement of uncertainty. Entropy can be used as a measure of randomness. Data entropy is typically expressed as the number of bits needed to encode or represent data. In the example below, we are working with bytes, so the max entropy for a stream of bytes is 8.

A file with high entropy means that each symbol is more-or-less equally as likely to appear next. If a file or file stream has high entropy, it is either probably compressed, encrypted or random. This can be used to detect packed executables, cipher streams on a network, or a breakdown of encrypted communication on a network that is expected to be always encrypted.

A text file will have low entropy. If a file has low data entropy, it mean that the file will compress well.

This post and code was inspired by Mike Schiffman's excelent explaination of data entropy on his Cisco Security Blog.

Here is what I wrote:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace DataEntropy
{
    public class DataEntropyUTF8
    {
        // Stores the number of times each symbol appears
        SortedList<byte,int>        distributionDict;
        // Stores the entropy for each character
        SortedList<byte,double> probabilityDict;
        // Stores the last calculated entropy
        double overalEntropy;
        // Used for preventing unnecessary processing
        bool isDirty;
        // Bytes of data processed
        int dataSize;
        
        public int DataSampleSize
        {
            get { return dataSize; }
            private set { dataSize = value; }
        }
        
        public int UniqueSymbols
        {
            get { return distributionDict.Count; }
        }
        
        public double Entropy
        {
            get { return GetEntropy(); }
        }
        
        public Dictionary<byte,int> Distribution
        {
            get { return GetSortedDistribution(); }
        }
        
        public Dictionary<byte,double> Probability
        {
            get { return GetSortedProbability(); }
        }
        
        public byte GetGreatestDistribution()
        {
            return distributionDict.Keys[0];
        }
        
        public byte GetGreatestProbability()
        {
            return probabilityDict.Keys[0];
        }
        
        public double GetSymbolDistribution(byte symbol)
        {
            return distributionDict[symbol];
        }
        
        public double GetSymbolEntropy(byte symbol)
        {
            return probabilityDict[symbol];
        }
        
        Dictionary<byte,int> GetSortedDistribution()
        {
            List<Tuple<int,byte>> entryList = new List<Tuple<int, byte>>();
            foreach(KeyValuePair<byte,int> entry in distributionDict)
            {
                entryList.Add(new Tuple<int,byte>(entry.Value,entry.Key));
            }
            entryList.Sort();
            entryList.Reverse();
            
            Dictionary<byte,int> result = new Dictionary<byte, int>();
            foreach(Tuple<int,byte> entry in entryList)
            {
                result.Add(entry.Item2,entry.Item1);
            }
            return result;
        }
        
        Dictionary<byte,double> GetSortedProbability()
        {
            List<Tuple<double,byte>> entryList = new List<Tuple<double,byte>>();
            foreach(KeyValuePair<byte,double> entry in probabilityDict)
            {
                entryList.Add(new Tuple<double,byte>(entry.Value,entry.Key));
            }
            entryList.Sort();
            entryList.Reverse();
            
            Dictionary<byte,double> result = new Dictionary<byte,double>();
            foreach(Tuple<double,byte> entry in entryList)
            {
                result.Add(entry.Item2,entry.Item1);
            }
            return result;
        }
        
        double GetEntropy()
        {
            // If nothing has changed, dont recalculate
            if(!isDirty) {
                return overalEntropy;
            }
            // Reset values
            overalEntropy = 0;
            probabilityDict = new SortedList<byte,double>();
            
            foreach(KeyValuePair<byte,int> entry in distributionDict)
            {
                // Probability = Freq of symbol / # symbols examined thus far
                probabilityDict.Add(
                    entry.Key,
                    (double)distributionDict[entry.Key] / (double)dataSize
                );
            }
            
            foreach(KeyValuePair<byte,double> entry in probabilityDict)
            {
                // Entropy = probability * Log2(1/probability)
                overalEntropy += entry.Value * Math.Log((1/entry.Value),2);
            }
            
            isDirty = false;
            return overalEntropy;
        }
        
        public void ExamineChunk(byte[] chunk)
        {
            if(chunk.Length<1 || chunk==null) {
                return;
            }
            
            isDirty = true;
            dataSize += chunk.Length;
            
            foreach(byte bite in chunk)
            {
                if(!distributionDict.ContainsKey(bite))
                {
                    distributionDict.Add(bite,1);
                    continue;
                }
                distributionDict[bite]++;
            }
        }
        
        public void ExamineChunk(string chunk)
        {
            ExamineChunk(StringToByteArray(chunk));
        }
        
        byte[] StringToByteArray(string inputString)
        {
            char[] c = inputString.ToCharArray();
            IEnumerable<byte> b = c.Cast<byte>();
            return b.ToArray();
        }
        
        void Clear()
        {
            isDirty = true;
            overalEntropy = 0;
            dataSize = 0;
            distributionDict = new SortedList<byte, int>();
            probabilityDict = new SortedList<byte, double>();
        }
        
        public DataEntropyUTF8(string fileName)
        {
            this.Clear();
            if(File.Exists(fileName))
            {
                ExamineChunk(  File.ReadAllBytes(fileName) );
                GetEntropy();
                GetSortedDistribution();
            }
        }
        
        public DataEntropyUTF8()
        {
            this.Clear();
        }
    }
}

Tuesday, July 9, 2013

Procedural generation of cave-like maps for rogue-like games



This post is about procedural content generation of cave-like dungeons/maps for rogue-like games using what is known as the Cellular Automata method.

To understand what I mean by cellular automata method, imagine Conway's Game of Life. Many algorithms use what is called the '4-5 method', which means a tile will become a wall if it is a wall and 4 or more of its nine neighbors are walls, or if it is not a wall and 5 or more neighbors are walls. I start by filling the map randomly with walls or space, then visit each x/y position iteratively and apply the 4-5 rule. Usually this is preceded with 'seeding' the map by randomly filling each cell of the map with a wall or space, based on some weight (say, 40% of the time it chooses to place a wall). Then the automata step is applied multiple times over the entire map, precipitating walls and subsequently smoothing them. About 3 rounds is all that is required, with about 4-5 rounds being pretty typical amongst most implementations. Perhaps a picture of the the output will help you understand what I mean.

Using the automata-based method for procedural generation of levels will produce something similar to this:

Sure the dungeon generation by linking rooms approach has its place, but I really like the 'natural' look to the automata inspired method. I first originally discovered this technique on the website called Roguebasin. It is a great resource for information concerning the different problems involved with programming a rogue-like game, such as Nethack or Angband.

One of the major problems developers run into while employing this technique is the formation of isolated caves. Instead of one big cave-like room, you may get section of the map that is inaccessible without digging through walls. Isolated caves can trap key items or (worse) stairs leading to the next level, preventing further progress in the game. I have seen many different approaches proposed to solve this problem. Some suggestions I've seen include: 1) Discarding maps that have isolated caves, filling in the isolated sections, or finely tweaking the variables/rules to reduce occurrences of such maps. None of these are ideal (in my mind), and most require a way to detect isolated sections, which is another non-trivial problem in itself.

Despite this, I consider the problem solved because I have discovered a solution that is dead simple and almost** never fail because of the rules of the automata generation itself dictate such. I call my method 'Horizontal Blanking' because you can probably guess how it works now just from hearing the name. This step comes after the random filling of the map (initialization), but before the cellular automata iterations. After the map is 'seeded' with a random fill of walls, a horizontal strip in the middle of of the map is cleared of all walls. The horizontal strip is about 3 or 4 block tall (depending on rules). Clearing a horizontal strip of sufficient width will prevent a continuous vertical wall from being created and forming isolated caves in your maps. After horizontal blanking, you can begin applying the cellular automata method to your map.

** I say 'almost' because although it it not possible to get whole rooms that are disconnected from each other, it is possible to get tiny squares of blank space in the northern or southern walls that consist of about 4-5 blocks in total area. Often, these little holes will resolve themselves during the rounds of automata rules, but there still exists the possibility that one may persist. My answer to this edge case would be to use some rules around the placement of stairwells (and other must-find items) dictating that such objects must have a 2-3 block radius clear of walls to be placed.


See below for the code that produced the above screenshot, or click here to download the entire project with source code that you can compile yourself.

public class MapHandler
{
 Random rand = new Random();
 
 public int[,] Map;
 
 public int MapWidth   { get; set; }
 public int MapHeight  { get; set; }
 public int PercentAreWalls { get; set; }
 
 public MapHandler()
 {
  MapWidth = 40;
  MapHeight = 21;
  PercentAreWalls = 40;
  
  RandomFillMap();
 }

 public void MakeCaverns()
 {
  // By initilizing column in the outter loop, its only created ONCE
  for(int column=0, row=0; row <= MapHeight-1; row++)
  {
   for(column = 0; column <= MapWidth-1; column++)
   {
    Map[column,row] = PlaceWallLogic(column,row);
   }
  }
 }
 
 public int PlaceWallLogic(int x,int y)
 {
  int numWalls = GetAdjacentWalls(x,y,1,1);

  
  if(Map[x,y]==1)
  {
   if( numWalls >= 4 )
   {
    return 1;
   }
   return 0;   
  }
  else
  {
   if(numWalls>=5)
   {
    return 1;
   }
  }
  return 0;
 }
 
 public int GetAdjacentWalls(int x,int y,int scopeX,int scopeY)
 {
  int startX = x - scopeX;
  int startY = y - scopeY;
  int endX = x + scopeX;
  int endY = y + scopeY;
  
  int iX = startX;
  int iY = startY;
  
  int wallCounter = 0;
  
  for(iY = startY; iY <= endY; iY++) {
   for(iX = startX; iX <= endX; iX++)
   {
    if(!(iX==x && iY==y))
    {
     if(IsWall(iX,iY))
     {
      wallCounter += 1;
     }
    }
   }
  }
  return wallCounter;
 }
 
 bool IsWall(int x,int y)
 {
  // Consider out-of-bound a wall
  if( IsOutOfBounds(x,y) )
  {
   return true;
  }
  
  if( Map[x,y]==1  )
  {
   return true;
  }
  
  if( Map[x,y]==0  )
  {
   return false;
  }
  return false;
 }
 
 bool IsOutOfBounds(int x, int y)
 {
  if( x<0 data-blogger-escaped-else="" data-blogger-escaped-if="" data-blogger-escaped-return="" data-blogger-escaped-true="" data-blogger-escaped-x="" data-blogger-escaped-y="">MapWidth-1 || y>MapHeight-1 )
  {
   return true;
  }
  return false;
 }
Above is the main core of the logic.


Here is the rest of the program, such as filling, printing and blanking:
 
 public void PrintMap()
 {
  Console.Clear();
  Console.Write(MapToString());
 }
 
 string MapToString()
 {
  string returnString = string.Join(" ", // Seperator between each element
                                    "Width:",
                                    MapWidth.ToString(),
                                    "\tHeight:",
                                    MapHeight.ToString(),
                                    "\t% Walls:",
                                    PercentAreWalls.ToString(),
                                    Environment.NewLine
                                   );
  
  List<string> mapSymbols = new List();
  mapSymbols.Add(".");
  mapSymbols.Add("#");
  mapSymbols.Add("+");
  
  for(int column=0,row=0; row < MapHeight; row++ ) {
   for( column = 0; column < MapWidth; column++ )
   {
    returnString += mapSymbols[Map[column,row]];
   }
   returnString += Environment.NewLine;
  }
  return returnString;
 }
 
 public void BlankMap()
 {
  for(int column=0,row=0; row < MapHeight; row++) {
   for(column = 0; column < MapWidth; column++) {
    Map[column,row] = 0;
   }
  }
 }
 
 public void RandomFillMap()
 {
      // New, empty map
      Map = new int[MapWidth,MapHeight];
  
      int mapMiddle = 0; // Temp variable
      for(int column=0,row=0; row < MapHeight; row++) {
         for(column = 0; column < MapWidth; column++)
         {
       // If coordinants lie on the edge of the map
       // (creates a border)
       if(column == 0)
       {
      Map[column,row] = 1;
       }
       else if (row == 0)
       {
      Map[column,row] = 1;
       }
       else if (column == MapWidth-1)
       {
      Map[column,row] = 1;
       }
       else if (row == MapHeight-1)
       {
      Map[column,row] = 1;
       }
       // Else, fill with a wall a random percent of the time
       else
       {
      mapMiddle = (MapHeight / 2);
    
      if(row == mapMiddle)
      {
     Map[column,row] = 0;
      }
      else
      {
     Map[column,row] = RandomPercent(PercentAreWalls);
      }
       }
   }
      }
 }

 int RandomPercent(int percent)
 {
  if(percent>=rand.Next(1,101))
  {
   return 1;
  }
  return 0;
 }
 
 public MapHandler(int mapWidth, int mapHeight, int[,] map, int percentWalls=40)
 {
  this.MapWidth = mapWidth;
  this.MapHeight = mapHeight;
  this.PercentAreWalls = percentWalls;
  this.Map = new int[this.MapWidth,this.MapHeight];
  this.Map = map;
 }
}


And of course, the main function:
 
public static void Main(string[] args)
{
 char key  = new Char();
 MapHandler Map = new MapHandler();
 
 string instructions =
  "[Q]uit [N]ew [+][-]Percent walls [R]andom [B]lank" + Environment.NewLine +
  "Press any other key to smooth/step";

 Map.MakeCaverns();
 Map.PrintMap();
 Console.WriteLine(instructions);
 
 key = Char.ToUpper(Console.ReadKey(true).KeyChar);
 while(!key.Equals('Q'))
 {
  if(key.Equals('+')) {
   Map.PercentAreWalls+=1;
   Map.RandomFillMap();
   Map.MakeCaverns();
   Map.PrintMap();
  } else if(key.Equals('-')) {
   Map.PercentAreWalls-=1;
   Map.RandomFillMap();
   Map.MakeCaverns();
   Map.PrintMap();
  } else if(key.Equals('R')) {
   Map.RandomFillMap();
   Map.PrintMap();
  } else if(key.Equals('N')) {
   Map.RandomFillMap();
   Map.MakeCaverns();
   Map.PrintMap();
  } else if(key.Equals('B')) {
   Map.BlankMap();
   Map.PrintMap();
  } else if(key.Equals('D')) {
   // I set a breakpoint here...
  } else {
   Map.MakeCaverns();
   Map.PrintMap();
  }
  Console.WriteLine(instructions);
  key = Char.ToUpper(Console.ReadKey(true).KeyChar);
 }
 Console.Clear();
 Console.Write(" Thank you for playing!");
 Console.ReadKey(true);
}


See also: Roguebasin - Cellular Automata Method for Generating Random Cave-Like Levels


Thursday, June 27, 2013

Fake/Random Identity Generator




Inspiration


During my research on RSA cryptography and the importance of a truly random number for having a large key-space, I stumbled on to FakeNameGenerator.com. I thought the concept could be really useful for certain applications and could easily envision how to implement it in C#, and make it extensible/customizable.

Take a look at these:

<?xml version="1.0" standalone="yes"?>
<DocumentElement>
  <Order>
    <Date>3/18/2005</Date>
    <TrackingNumber>1Z 8A8 238 01 9398 182 1</TrackingNumber>
    <FirstName>Keaton </FirstName>
    <LastName>Day</LastName>
    <StreetAddress>4828 Cherry St.</StreetAddress>
    <City>Nanticoke</City>
    <State>SC</State>
    <Zip>89130</Zip>
    <Email>HaleHale8026@mail.com</Email>
    <Phone>425-765-4520</Phone>
  </Order>

  <Payroll>
    <PhoneNumber>971-258-5703</PhoneNumber>
    <AltPhoneNumber>501-769-1331</AltPhoneNumber>
    <FirstName>Xyla </FirstName>
    <LastName>Hoover</LastName>
    <EmployeeID>499</EmployeeID>
    <HireDate>5/28/2011</HireDate>
    <Birthdate>5/28/1990</Birthdate>
    <SSN>520-52-4275</SSN>
    <AccountNumber>5696618825</AccountNumber>
    <RoutingNumber>575159859</RoutingNumber>
    <Address>8348 Court Ave.</Address>
    <City>Pittsburgh,</City>
    <State>PA.</State>
    <Zip>15201</Zip>
  </Payroll>

 CREATE TABLE ReservationData (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `UniqueID` MEDIUMINT default NULL,
  `TripDate` varchar(50) default NULL,
  `FirstName` varchar(255) default NULL,
  `LastName` varchar(255) default NULL,
  `Phone` varchar(100) default NULL,
  `AltPhone` varchar(100) default NULL,
  `Email` varchar(255) default NULL,
  `StreetAddress` varchar(255) default NULL,
  `City` varchar(50) default NULL,
  `State` varchar(50) default NULL,
  `Zip` varchar(10) default NULL,
  `Country` varchar(255) default NULL,
  `DayOfYear` varchar(50) default NULL,
  `TotalCost` varchar(50) default NULL,
  `Balance` varchar(10) default NULL,
  `CCard` varchar(18) default NULL,
  `Expires` varchar(5) default NULL,
  `CVC2` varchar(3) default NULL,
  PRIMARY KEY (`id`)
) TYPE=MyISAM AUTO_INCREMENT=1;

This would make great honey for a honey pot; just fill an SQL database with this random, realistic looking data, serve and log any and all attempts to access, query or dump the database. This can be done on a VM and you have a easily deployed, high interaction honeypot!

Aside from being able to see their IP address, I think the most useful data that can be attained is their behavior; what injection attacks are they using to drop the database? Write rules to prevent your honey against trivial attempts such as the ' AND 1=(SELECT attacks and see what they come up with next. Rule writing is inherently a cat-and-mouse game, honeypots like this clearly give the white-hats the upper hand.



Implementation



A quick, fast and dirty solution is to simply read a random line from a text file (i.e. Name_First.txt and Address_Street.txt).

This way, you can choose from names that are common, or customize your list to for different nationalities.

One could read the whole file in to a string, Parse() it into an array of strings, then randomly select an index, but this would be unacceptable for very large files. Instead, we can set the file pointer to a random position that is less than its size, roll back to the last new line and call ReadLine.




public string ReturnRandomLine(string FileName)
{
 string sReturn = string.Empty;
 
 using(FileStream myFile = new FileStream(FileName,FileMode.Open,FileAccess.Read))
 {
  using(StreamReader myStream = new StreamReader(myFile))
  {
   // Seek file stream pointer to a rand position...
   myStream.BaseStream.Seek(rand.Next(1,myFile.Length),SeekOrigin.Begin);
   
   // Read the rest of that line.
   myStream.ReadLine();
   
   // Return the next, full line...
   sReturn = myStream.ReadLine();
  }
 }
 
 // If our random file position was too close to the end of the file, it will return an empty string
 // I avoided a while loop in the case that the file is empty or contains only one line
 if(System.String.IsNullOrWhiteSpace(sReturn)) {
  sReturn = ReturnRandomLine(FileName);
 }
 
 return sReturn;
}

Example use:


public string GenerateFistName()
{
 return ReturnRandomLine("Name_First.txt") + " ";
}

public string GenerateLastName()
{
 return ReturnRandomLine("Name_Last.txt");
}

public string GenerateFullName()
{
 return GenerateFistName() + GenerateLastName();
}

public string GenerateGender()
{
 if(ReturnPercent(84)) {
  return "Male";
 } else {
  return "Female";
 }
}

public string GenerateStreetNumber()
{
 return rand.Next(1,9999).ToString();
}

public string GenerateStreetName()
{
 return ReturnRandomLine("Address_Street.txt");
}

One limitation is where the data is relational, such as in the case of generating a random zip code along with the city and state that it exists in.

A quick work-around would be CityZipState.txt



Other types of data that can be generated that would not make sense to put in a text file:


public bool ReturnPercent(int Percent) // Return true Percent times out of 100, randomly
{
 int iTemp = rand.Next(1,101);
 
 if(iTemp<=Percent) {
  return true;
 } else {
  return false;
 }
}

public string GenerateDate(int YearFrom,int YearTo)
{
 int Month = rand.Next(1,13);
 int Day  = rand.Next(1,32);
 int Year = GenerateYear(YearFrom,YearTo);
 
 return Month.ToString() + "/" + Day.ToString() + "/" + Year.ToString();
}

public string GenerateYear(int YearFrom,int YearTo)
{
 return rand.Next(YearFrom,YearTo+1).ToString();
}

public string GeneratePhoneNumber()
{
 return GeneratePhoneNumber(ReturnRandomLine("PhoneNumber_Prefix.txt"));
}

public string GeneratePhoneNumber(string Prefix)
{
 int iThree = rand.Next(192,999);
 int iFour = rand.Next(1000,9999);
 
 return Prefix + iThree.ToString() + "-" + iFour.ToString();
}

public string GenerateSSN()
{
 int iThree = rand.Next(132,921);
 int iTwo = rand.Next(12,83);
 int iFour = rand.Next(1423,9211);
 return iThree.ToString() + "-" + iTwo.ToString() + "-" + iFour.ToString();
}

Obviously, these methods can be improved to conform to the standards of a real social security number, national identification number, credit card number, ect...


public string GenerateCCNum()
{
 string sCCNum = string.Empty;
 
 byte[] bCCNum = {0};
 rand.NextBytes(bCCNum);
 
 // generate random 16 digit number
 int iTemp1 = rand.Next(10000000,99999999);
 int iTemp2 = rand.Next(10000000,99999999);
 string sTemp = iTemp1.ToString() + iTemp2.ToString();
 
 // while loop?
 while(!IsValidNumber(sTemp))
 {
  iTemp1 = rand.Next(10000000,99999999);
  iTemp2 = rand.Next(10000000,99999999);
  sTemp = iTemp1.ToString() + iTemp2.ToString();
 }
 
 sCCNum = sTemp;
 
 return sCCNum;
}

The implementation of IsValidNumber() is left as an exercise for the reader.

The serialization of your data is a trivial matter. Please see my post on a XML Serializable Dictionary, Tuple, and Object for the code to serialize an object (such as a list, or a class).