Thursday, August 1, 2013

Captcha: Drawing Text along a Bézier Spline



The following post/code will achieve something to the effect of:


All this code, and this article was inspired by planetclegg.com/projects/WarpingTextToSplines.html.
Explanation of the math and why it works will come shortly. In the meantime, please see the above link for a detailed explanation.

This code assumes you have already drawn some text on your GraphicsPath. Here is the code to transform your GraphicsPath text to follow a cubic Bézier Spline:

GraphicsPath BezierWarp(GraphicsPath text,Size size)
{
 // Control points for a cubic Bézier spline
 PointF P0 = new PointF();
 PointF P1 = new PointF();
 PointF P2 = new PointF();
 PointF P3 = new PointF();
 
 float shrink = 20;
 float shift = 0;
 
 P0.X = shrink;
 P0.Y = shrink+shift;
 P1.X = size.Width-shrink;
 P1.Y = shrink;
 P2.X = shrink;
 P2.Y = size.Height-shrink;
 P3.X = size.Width-shrink;
 P3.Y = size.Height-shrink-shift;

 // Calculate coefficients A thru H from the control points
 float A = P3.X - 3 * P2.X + 3 * P1.X - P0.X;
 float B = 3 * P2.X - 6 * P1.X + 3 * P0.X;
 float C = 3 * P1.X - 3 * P0.X;
 float D = P0.X;

 float E = P3.Y - 3 * P2.Y + 3 * P1.Y - P0.Y;
 float F = 3 * P2.Y - 6 * P1.Y + 3 * P0.Y;
 float G = 3 * P1.Y - 3 * P0.Y;
 float H = P0.Y;

 PointF[] pathPoints = text.PathPoints;
 RectangleF textBounds = text.GetBounds();
 
 for (int i =0; i  < pathPoints.Length; i++)
 {
  PointF pt = pathPoints[i];
  float textX = pt.X;
  float textY = pt.Y;
  
  // Normalize the x coordinate into the parameterized
  // value with a domain between 0 and 1.
  float t  =  textX / textBounds.Width;
  float t2 = (t * t);
  float t3 = (t * t * t);
  
  // Calculate spline point for parameter t
  float Sx = A * t3 + B * t2 + C * t + D;
  float Sy = E * t3 + F * t2 + G * t + H;
  
  // Calculate the tangent vector for the point
  float Tx = 3 * A * t2 + 2 * B * t + C;
  float Ty = 3 * E * t2 + 2 * F * t + G;

  // Rotate 90 or 270 degrees to make it a perpendicular
  float Px = - Ty;
  float Py =   Tx;
  
  // Normalize the perpendicular into a unit vector
  float magnitude = (float)Math.Sqrt((Px*Px) + (Py*Py));
  Px /= magnitude;
  Py /= magnitude;
  
  // Assume that input text point y coord is the "height" or
  // distance from the spline.  Multiply the perpendicular
  // vector with y. it becomes the new magnitude of the vector
  Px *= textY;
  Py *= textY;
  
  // Translate the spline point using the resultant vector
  float finalX = Px + Sx;
  float finalY = Py + Sy;
  
  pathPoints[i] = new PointF(finalX, finalY);
 }
 
 return new GraphicsPath(pathPoints,text.PathTypes);
}

Wednesday, July 31, 2013

Humorous Software License Agreement

Software Usage License Agreement
Copyright (C) [year] [copyright holders]

This software, it's code, or any fan fiction resulting from its enormous popularity, is provided "AS IS". That means that the software included with with this license is provided without warranty, regardless of any previous verbal contract you may have swindled out of the author with your fast words on the phone...
To the maximum extent permitted by law, the author of this software disclaims all liability for any damages, lost profits, lost socks, lost puppies, hair loss, or architectural fanaticism and cults that may occur as a result of using this software. Under no circumstances shall the author of this software, nor his pet cat 'Mr.Whiskers', be held legally, financially or morally liable for any claim of damages or liability resulting from the use of, abstinence from, ritualistic worshiping of, or illegal pirating of this software. This includes, but is not limited to, lost profits, stolen data, bankruptcy, autonomous homicidal cyborgs or man-eating miniature pony attacks.
Use at your own risk! This software comes with no guarantees of fitness for any purpose, so do not use it for the back-end of your fortune 500 business without at least hiring me first.

Saturday, July 27, 2013

Information entropy and data compression



In my last post, I talked about Shannon data entropy and showed a class to calculate that. Lets take it one step further and actually compress some data based off the data entropy we calculated.

To do this, first we calculate how many bits are needed to compress each byte of our data. Theoretically, this is the data entropy, rounded up to the next whole number (Math.Ceiling). But this is not always the case, and the number of unique symbols in our data may be a number that is too large to be represented in that many number of bits. We calculate the number of bits needed to represent the number of unique symbols by getting its Base2 logarithm. This returns a decimal (double), so we use Math.Ceiling to round to up to the nearest whole number as well. We set entropy_Ceiling to which ever number is larger. If the entropy_Ceiling is 8, then we should immediately return, as we cannot compress the data any further.

We start by making a compression and decompression dictionary. We make these by taking the sorted distribution dictionary (DataEntropyUTF8.GetSortedDistribution) and start assigning X-bit-length value to each entry in the sorted distribution dictionary, with X being entropy_Ceiling. The compression dictionary has a byte as the key and an array of bool (bool[]) as the value, while the decompression dictionary has an array of bool as the key, and a byte as a value. You'll notice in the decompression dictionary we store the array of bool as a string, as using an actual array as a key will not work, as the dictionary's EqualityComparer will not assign the same hash code for two arrays of the same values.

Then, compression is as easy as reading each byte, and getting the value from the compression dictionary for that byte and adding it to a list of bool (List), then converting that array of bool to an array of bytes.

Decompression consists of converting the compressed array of bytes into an array of bool, then reading in X bools at a time and getting the byte value from the decompression library, again with X being entropy_Ceiling.




But first, to make this process easier, and to make our code more manageable and readable, I define several extension methods to help us out, since .NET provides almost no support for working with data on the bit level, besides the BitArray class. Here are the extension methods that to make working with bits easier:

public static class BitExtentionMethods
{
    //
    // List<bool> extention methods
    //
    public static List<bool> ToBitList(this byte source)
    {
        List<bool> temp = ( new BitArray(source.ToArray()) ).ToList();
        temp.Reverse();
        return temp;
    }
    
    public static List<bool> ToBitList(this byte source,int startIndex)
    {
        if(startIndex<0 || startIndex>7) {
            return new List<bool>();
        }
        return source.ToBitList().GetRange(startIndex,(8-startIndex));
    }
    
    //
    // bool[] extention methods
    //
    public static string GetString(this bool[] source)
    {
        string result = string.Empty;
        foreach(bool b in source)
        {
            if(b) {
                result += "1";
            } else {
                result += "0";
            }
        }
        return result;
    }
    
    public static bool[] ToBitArray(this byte source,int MaxLength)
    {
        List<bool> temp = source.ToBitList(8-MaxLength);
        return temp.ToArray();
    }
    
    public static bool[] ToBitArray(this byte source)
    {
        return source.ToBitList().ToArray();
    }

    //
    // BYTE extention methods
    //
    public static byte[] ToArray(this byte source)
    {
        List<byte> result = new List<byte>();
        result.Add(source);
        return result.ToArray();
    }
    
    //
    // BITARRAY extention methods
    //
    public static List<bool> ToList(this BitArray source)
    {
        List<bool> result = new List<bool>();
        foreach(bool bit in source)
        {
            result.Add(bit);
        }
        return result;
    }
    
    public static bool[] ToArray(this BitArray source)
    {
        return ToList(source).ToArray();
    }
}
Remember, these need to be the base class in a namespace, not in a nested class.


Now, we are free to write our compression/decompression class:

public class BitCompression
{
    // Data to encode
    byte[] data;
    // Compressed data
    byte[] encodeData;
    // # of bits needed to represent data
    int encodeLength_Bits;
    // Original size before padding. Decompressed data will be truncated to this length.
    int decodeLength_Bits;
    // Bits needed to represent each byte (entropy rounded up to nearist whole number)
    int entropy_Ceiling;
    // Data entropy class
    DataEntropyUTF8 fileEntropy;
    // Stores the compressed symbol table
    Dictionary<byte,bool[]> compressionLibrary;
    Dictionary<string,byte> decompressionLibrary;
    
    void GenerateLibrary()
    {
        byte[] distTable = new byte[fileEntropy.Distribution.Keys.Count];
        fileEntropy.Distribution.Keys.CopyTo(distTable,0);
        
        byte bitSymbol = 0x0;
        bool[] bitBuffer = new bool[entropy_Ceiling];
        foreach(byte symbol in distTable)
        {
            bitBuffer = bitSymbol.ToBitArray(entropy_Ceiling);
            compressionLibrary.Add(symbol,bitBuffer);
            decompressionLibrary.Add(bitBuffer.GetString(),symbol);
            bitSymbol++;
        }
    }
    
    public byte[] Compress()
    {
        // Error checking
        if(entropy_Ceiling>7 || entropy_Ceiling<1) {
            return data;
        }

        // Compress data using compressionLibrar
        List<bool> compressedBits = new List<bool>();
        foreach(byte bite in data) {    // Take each byte, find the matching bit array in the dictionary
            compressedBits.AddRange(compressionLibrary[bite]);
        }
        decodeLength_Bits = compressedBits.Count;
        
        // Pad to fill last byte
        while(compressedBits.Count % 8 != 0) {
            compressedBits.Add(false);  // Pad to the nearest byte
        }
        encodeLength_Bits = compressedBits.Count;
        
        // Convert from array of bits to array of bytes
        List<byte> result = new List<byte>();
        int count = 0;
        int shift = 0;
        int offset= 0;
        int stop  = 0;
        byte current = 0;
        do
        {
            stop = encodeLength_Bits - count;
            stop = 8 - stop;
            if(stop<0) {
                stop = 0;
            }
            if(stop<8)
            {
                shift = 7;
                offset = count;
                current = 0;
                
                while(shift>=stop)
                {
                    current |= (byte)(Convert.ToByte(compressedBits[offset]) << shift);
                    shift--;
                    offset++;
                }
                
                result.Add(current);
                count += 8;
            }
        } while(count < encodeLength_Bits);
        
        encodeData = result.ToArray();
        return encodeData;
    }
    
    public byte[] Decompress(byte[] compressedData)
    {
        // Error check
        if(compressedData.Length<1) {
            return null;
        }
        
        // Convert to bit array for decompressing
        List<bool> bitArray = new List<bool>();
        foreach(byte bite in compressedData) {
            bitArray.AddRange(bite.ToBitList());
        }
        
        // Truncate to original size, removes padding for byte array
        int diff = bitArray.Count-decodeLength_Bits;
        if(diff>0) {
            bitArray.RemoveRange(decodeLength_Bits-1,diff);
        }

        // Decompress
        List<byte> result = new List<byte>();
        int count = 0;
        do
        {
            bool[] word = bitArray.GetRange(count,entropy_Ceiling).ToArray();
            result.Add(decompressionLibrary[word.GetString()]);
            
            count+=entropy_Ceiling;
        } while(count < bitArray.Count);
        
        return result.ToArray();
    }
    
    public BitCompression(string filename)
    {
        compressionLibrary  = new Dictionary<byte, bool[]>();
        decompressionLibrary = new Dictionary<string, byte>();
        
        if(!File.Exists(filename))  {
            return;
        }
        
        data = File.ReadAllBytes(filename);
        fileEntropy = new DataEntropyUTF8();
        fileEntropy.ExamineChunk(data);
        
        int unique  = (int)Math.Ceiling(Math.Log((double)fileEntropy.UniqueSymbols,2f));
        int entropy = (int)Math.Ceiling(fileEntropy.Entropy);
        
        entropy_Ceiling = Math.Max(unique,entropy);
        encodeLength_Bits   = data.Length * entropy_Ceiling;
        
        GenerateLibrary();
    }
}


Please feel free to comment with ideas, suggestions or corrections.

Monday, July 22, 2013

Information Shannon Entropy



Shannon/data entropy is a measurement of uncertainty. Entropy can be used as a measure of randomness. Data entropy is typically expressed as the number of bits needed to encode or represent data. In the example below, we are working with bytes, so the max entropy for a stream of bytes is 8.

A file with high entropy means that each symbol is more-or-less equally as likely to appear next. If a file or file stream has high entropy, it is either probably compressed, encrypted or random. This can be used to detect packed executables, cipher streams on a network, or a breakdown of encrypted communication on a network that is expected to be always encrypted.

A text file will have low entropy. If a file has low data entropy, it mean that the file will compress well.

This post and code was inspired by Mike Schiffman's excelent explaination of data entropy on his Cisco Security Blog.

Here is what I wrote:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace DataEntropy
{
    public class DataEntropyUTF8
    {
        // Stores the number of times each symbol appears
        SortedList<byte,int>        distributionDict;
        // Stores the entropy for each character
        SortedList<byte,double> probabilityDict;
        // Stores the last calculated entropy
        double overalEntropy;
        // Used for preventing unnecessary processing
        bool isDirty;
        // Bytes of data processed
        int dataSize;
        
        public int DataSampleSize
        {
            get { return dataSize; }
            private set { dataSize = value; }
        }
        
        public int UniqueSymbols
        {
            get { return distributionDict.Count; }
        }
        
        public double Entropy
        {
            get { return GetEntropy(); }
        }
        
        public Dictionary<byte,int> Distribution
        {
            get { return GetSortedDistribution(); }
        }
        
        public Dictionary<byte,double> Probability
        {
            get { return GetSortedProbability(); }
        }
        
        public byte GetGreatestDistribution()
        {
            return distributionDict.Keys[0];
        }
        
        public byte GetGreatestProbability()
        {
            return probabilityDict.Keys[0];
        }
        
        public double GetSymbolDistribution(byte symbol)
        {
            return distributionDict[symbol];
        }
        
        public double GetSymbolEntropy(byte symbol)
        {
            return probabilityDict[symbol];
        }
        
        Dictionary<byte,int> GetSortedDistribution()
        {
            List<Tuple<int,byte>> entryList = new List<Tuple<int, byte>>();
            foreach(KeyValuePair<byte,int> entry in distributionDict)
            {
                entryList.Add(new Tuple<int,byte>(entry.Value,entry.Key));
            }
            entryList.Sort();
            entryList.Reverse();
            
            Dictionary<byte,int> result = new Dictionary<byte, int>();
            foreach(Tuple<int,byte> entry in entryList)
            {
                result.Add(entry.Item2,entry.Item1);
            }
            return result;
        }
        
        Dictionary<byte,double> GetSortedProbability()
        {
            List<Tuple<double,byte>> entryList = new List<Tuple<double,byte>>();
            foreach(KeyValuePair<byte,double> entry in probabilityDict)
            {
                entryList.Add(new Tuple<double,byte>(entry.Value,entry.Key));
            }
            entryList.Sort();
            entryList.Reverse();
            
            Dictionary<byte,double> result = new Dictionary<byte,double>();
            foreach(Tuple<double,byte> entry in entryList)
            {
                result.Add(entry.Item2,entry.Item1);
            }
            return result;
        }
        
        double GetEntropy()
        {
            // If nothing has changed, dont recalculate
            if(!isDirty) {
                return overalEntropy;
            }
            // Reset values
            overalEntropy = 0;
            probabilityDict = new SortedList<byte,double>();
            
            foreach(KeyValuePair<byte,int> entry in distributionDict)
            {
                // Probability = Freq of symbol / # symbols examined thus far
                probabilityDict.Add(
                    entry.Key,
                    (double)distributionDict[entry.Key] / (double)dataSize
                );
            }
            
            foreach(KeyValuePair<byte,double> entry in probabilityDict)
            {
                // Entropy = probability * Log2(1/probability)
                overalEntropy += entry.Value * Math.Log((1/entry.Value),2);
            }
            
            isDirty = false;
            return overalEntropy;
        }
        
        public void ExamineChunk(byte[] chunk)
        {
            if(chunk.Length<1 || chunk==null) {
                return;
            }
            
            isDirty = true;
            dataSize += chunk.Length;
            
            foreach(byte bite in chunk)
            {
                if(!distributionDict.ContainsKey(bite))
                {
                    distributionDict.Add(bite,1);
                    continue;
                }
                distributionDict[bite]++;
            }
        }
        
        public void ExamineChunk(string chunk)
        {
            ExamineChunk(StringToByteArray(chunk));
        }
        
        byte[] StringToByteArray(string inputString)
        {
            char[] c = inputString.ToCharArray();
            IEnumerable<byte> b = c.Cast<byte>();
            return b.ToArray();
        }
        
        void Clear()
        {
            isDirty = true;
            overalEntropy = 0;
            dataSize = 0;
            distributionDict = new SortedList<byte, int>();
            probabilityDict = new SortedList<byte, double>();
        }
        
        public DataEntropyUTF8(string fileName)
        {
            this.Clear();
            if(File.Exists(fileName))
            {
                ExamineChunk(  File.ReadAllBytes(fileName) );
                GetEntropy();
                GetSortedDistribution();
            }
        }
        
        public DataEntropyUTF8()
        {
            this.Clear();
        }
    }
}

Sunday, July 21, 2013

C# developer humor - Chuck Norris.

These jokes were inspired by this link. I have modified them a bit to apply to the C or C# language.



  • Chuck Nor­ris can make a class that is both abstract and constant.
  • Chuck Nor­ris seri­al­izes objects straight into human skulls.
  • Chuck Nor­ris doesn’t deploy web appli­ca­tions, he round­house kicks them into the server.
  • Chuck Nor­ris always uses his own design pat­terns, and his favorite is the Round­house Kick.
  • Chuck Nor­ris always programs using unsafe code.
  • Chuck Nor­ris only enumerates roundhouse kicks to the face.
  • Chuck Nor­ris demon­strated the mean­ing of float.PositiveInfinity by count­ing to it, twice.
  • A lock statement doesn’t pro­tect against Chuck Nor­ris, if he wants the object, he takes it.
  • Chuck Nor­ris doesn’t use VisualStudio, he codes .NET by using a hex edi­tor on the MSIL.
  • When some­one attempts to use one of Chuck Nor­ris’ dep­re­cated meth­ods, they auto­mat­i­cally get a round­house kick to the face at com­pile time.
  • Chuck Nor­ris never has a bug in his code, without exception!
  • Chuck Nor­ris doesn’t write code. He stares at a com­puter screen until he gets the progam he wants.
  • Code runs faster when Chuck Nor­ris watches it.
  • Chuck Nor­ris meth­ods don't catch excep­tions because no one has the guts to throw any at them.
  • Chuck Nor­ris will cast a value to any type, just by star­ing at it.
  • If you catch { } a Chuc­kNor­ri­sEx­cep­tion, you’ll prob­a­bly die.
  • Chuck Norris’s code can round­house kick all other classes' privates.
  • C#'s vis­i­bil­ity lev­els are pub­lic, private, pro­tected, and “pro­tected by Chuck Nor­ris”. Don’t try to access a field with this last modifier!
  • Chuck Nor­ris can divide by 0!
  • The garbage col­lec­tor only runs on Chuck Nor­ris code to col­lect the bodies.
  • Chuck Nor­ris can exe­cute 64bit length instruc­tions in a 32bit CPU.
  • To Chuck Nor­ris, all other classes are IDisposable.
  • Chuck Nor­ris can do mul­ti­ple inher­i­tance in C#.
  • MSBuild never throws excep­tions to Chuck Nor­ris, not any­more. 753 killed Microsoft engi­neers is enough.
  • Chuck Nor­ris doesn’t need unit tests, because his code always work. ALWAYS.
  • Chuck Nor­ris has been coding in gener­ics since 1.1.
  • Chuck Nor­ris’ classes can’t be decom­piled... don’t bother trying.

Here are some originals:
  • If you try derive from a Chuck Norris Interface, you'll only get an IRoundhouseKick in-the-face.
  • Chuck Nor­ris can serialize a dictionary to XML without implementing IXMLSerializable.
  • Chuck Norris can decompile your assembly by only reading the MSIL.