Showing posts with label Project Euler. Show all posts
Showing posts with label Project Euler. Show all posts

Monday, 14 June 2010

Project Euler 59: Cracking an XOR code

Code breaking is about as glamorous as it gets for a Geek. So when it comes to Project Euler Problem 59, think 24, with Jack Bauer unscathed amidst a hail of bullets, memory stick in one hand, terrorist’s neck throttled in the other, threatening terrible retribution to the bad guy’s family if the secret goes to the grave with him. Terrorist rasps out,

“There’s a file … cipher1.txt … ASCII format … urrrh … XOR … 3 … letter … password … lower case … all English”

What would Chloe O’Brian make of that when she received the file over a secure socket from Jack’s Smart Phone? Being the bright girl she is, I’ve no doubt that she’d fire up MonoDevelop (only the dumb FBI would use Visual Studio and Windoze) and write the following:

var encryptedBytes = File.ReadAllText(dataPath)
                        .Split(',')
                        .Select(byte.Parse)
                        .ToArray();

That would get her the encrypted data from the file as a sequence of bytes. Now to generate all possible passwords:

const byte FirstChar = (byte)'a';
const byte LastChar = (byte)'z';

var allPossiblePasswords = from a in FirstChar.To(LastChar)
                           from b in FirstChar.To(LastChar)
                           from c in FirstChar.To(LastChar)
                           select new[] { a, b, c };

XOR encryption requires that the password be repeated cyclically, abcabcabc.... So given a sequence containing the characters of password, how about a Repeat extension method that repeats the sequence ad infinitum? (if you look at the following code and hear the Infinite Loop alarms go off, remember that iterators are lazily evaluated):

public static IEnumerable<T> Repeat<T>(this IEnumerable<T> sequence)
{
    while (true)
    {
        foreach (var item in sequence)
        {
            yield return item;
        }
    }
}

While she’s at it, she’ll need to Xor two sequences together:

public static IEnumerable<byte> Xor(this IEnumerable<byte> sequenceA, IEnumerable<byte> sequenceB)
{
    return sequenceA.Zip(sequenceB, (left, right) => (byte)(left ^ right));
}

This uses the Zip method, new in .Net 4.0, which merges two sequences together, allowing elements from left and right sequences to be processed in pairs (if you’re surprised, as I was, that the (byte) cast is necessary, see this question on Stack Overflow).

Now she can try out all the passwords, seeing what they give her if she uses them to decrypt the text. Note that Chloe has already adopted .Net 4.0 and the PLINQ extensions, so she’s using AsParallel() to max out her 128 Core workstation. (ConvertAsciiBytesToString is another extension method that simply delegates to ASCIIEncoding.GetString).

var possibleDecryptions = from trialPassword in allPossiblePasswords.AsParallel()
                          let repeatedPassword = trialPassword.Repeat()
                          select encryptedBytes
                                          .Xor(repeatedPassword)
                                          .ConvertAsciiBytesToString();

But she’s hardly going to spend the next 24 hours trawling through the possible decryptions to pick out the one in recognisable English from the 17575 in gobledygook. So she needs some means of assessing them automatically. This is where Chloe is forever in my debt. She read last weeks episode of Functional Fun, so she knows about my LanguageAnalyser that uses frequency analysis to determine how likely a sample of text is to be written in a particular language.

Rather than using the analyser to determine which of several languages the sample is written in, Chloe can use her prior knowledge that the coded message is written in English, and determine which of the possible decryptions has letter frequencies that most closely match those in English.

var languageAnalyser = new LanguageAnalyser();

var decryptionsWithDistanceToEnglish = from decryptedString in possibleDecryptions
                                       select new
                                                  {
                                                      DecryptedString = decryptedString,
                                                      Distance = languageAnalyser.DetermineCloseness(decryptedString, LanguageProfiles.English)
                                                  };

var mostLikelyDecryption = decryptionsWithDistanceToEnglish.MinItem(p => p.Distance);

And that's it. Thanks to Linguistic Geometry and LINQ, Chloe can report the decrypted (and surprisingly biblical!) message to Jack, so that he can get on with saving the world.

Ahem. Before we got carried away we were trying to solve Project Euler Problem 59, and that asks for the sum of the Ascii values of the decrypted message. So we need one more step:

var sumOfAsciiValues = mostLikelyDecryption.DecryptedString
                        .ConvertStringToAsciiBytes()
                        .Aggregate(0, (runningTotal, next) => runningTotal += next);

The full code is shown below, and is also available on MSDN Code Gallery.

namespace ProjectEuler
{
    [EulerProblem(59, Title="Using a brute force attack, can you decrypt the cipher using XOR encryption?")]
    public class Problem59
    {
        public long Solve()
        {
            const byte FirstChar = 97;
            const byte LastChar = 122;

            var dataPath = @"ProblemData\Problem59.txt";

            var encryptedBytes = File.ReadAllText(dataPath)
                                    .Split(',')
                                    .Select(byte.Parse)
                                    .ToArray();

            var allPossiblePasswords = from a in FirstChar.To(LastChar)
                                       from b in FirstChar.To(LastChar)
                                       from c in FirstChar.To(LastChar)
                                       select new[] { a, b, c };

            var possibleDecryptions = from trialPassword in allPossiblePasswords.AsParallel()
                                      let repeatedPassword = trialPassword.Repeat()
                                      select encryptedBytes
                                          .Xor(repeatedPassword)
                                          .ConvertAsciiBytesToString();

            var languageAnalyser = new LanguageAnalyser();

            var decryptionsWithDistanceToEnglish = from decryptedString in possibleDecryptions
                                                   select new
                                                              {
                                                                  DecryptedString = decryptedString,
                                                                  Distance = languageAnalyser.DetermineCloseness(decryptedString, LanguageProfiles.English)
                                                              };

            var mostLikelyDecryption = decryptionsWithDistanceToEnglish.MinItem(p => p.Distance);

            var sumOfAsciiValues = mostLikelyDecryption.DecryptedString
                .ConvertStringToAsciiBytes()
                .Aggregate(0, (runningTotal, next) => runningTotal += next);

            return sumOfAsciiValues;
        }
    }
}

Thursday, 10 June 2010

Practical Linq #5: Guessing the language of a piece of text

Google Chrome is magic: getting ready for my summer holiday to Brugge, I hit the city’s homepage. A little bar popped up at the top of the page: “This page is in Dutch. Would you like to translate it?”

image

How does Chrome know that? I suppose the domain name “.be” is one clue, but they speak French as well as Dutch in Belgium, so that by itself is not enough. Chrome does the same trick for pages written in a whole raft of languages, from Afrikaans to Yiddish.

I don’t know what Google’s secret sauce is, but I can show you one simple and surprisingly effective technique with an elegant implementation using LINQ in C#.

Studies in Linguistic Geometry

It is built around the observation that in each sufficiently large sample of text in a given language, letters of the alphabet will all occur with roughly the same frequencies. In English for example, the letter ‘e’ tops the popularity charts, making up about 13% of the average body of text. ‘t’ and ‘a’ are next, accounting for 9% and 8%. By contrast, in Portuguese, ‘a’ is the most frequent, at 15%, with ‘e’ and ‘o’ close behind.

So letter frequencies can be used as a kind of signature, or DNA profile, for a language. But given a sample of text, how do we decide which language signature it matches most closely? This is where we switch from linguistics to geometry.

In 2-dimensional space, we calculate the distance between two points using Pythagoras’ theorem. image The same thing works for two points in the real, 3-d, world. The distance from (x1, y1, z1) to (x2, y2, z2) is

image

But why stop there? What about the distance between two points in 26-dimensional space? You guessed it!

image

26-d space? Why bring that into the equation? Well think about those language signatures, giving the frequency of occurrence of each letter. You can reinterpret those signatures as points in 26-d space: the point representing the signature for English would be at 0.13 along the ‘e’ axis, 0.09 along the ‘t’ axis, 0.08 along the ‘a’ axis, and so on.

This then gives us our straightforward method of determining which language our sample text belongs to. We do a frequency analysis on its letters, then imagine the result of that analysis as a point in 26-d space, and work out which language signature it lies closest to using Pythagoras. Easy!

So how does it look in LINQ?

Linq to Linguistics

We start off with the frequency analysis (taking care to normalise the text to lower-case):

IEnumerable<CharacterFrequency> CalculateCharacterFrequencies(string sample)
{
    var characterFrequencies = sample
        .Select(char.ToLower)
        .GroupBy(c => c)
        .Select(group => new CharacterFrequency
        {
            Character = group.Key,
            Frequency = (double)group.Count() / sample.Length
        });

    return characterFrequencies;
}

Next we introduce a LanguageSignature class to hold the signature of each language:

[DebuggerDisplay("Language = { Language }")]
public class LanguageSignature
{
    private IDictionary<char, double> _frequencies;

    public LanguageSignature(string language, IDictionary<char, double> characterFrequencies)
    {
        Language = language;
        _frequencies = characterFrequencies;
    }

    public string Language { get; protected set; }

    public double GetFrequency(char character)
    {
        double frequency;
        return _frequencies.TryGetValue(character, out frequency) ? frequency : 0;
    }
}

Here are some snippets of sample signatures, data courtesy of Wikipedia

public static class LanguageSignatures
{
    public static LanguageSignature English =
        new LanguageSignature(
            "English",
            new Dictionary<char, double>
                {
                    {'a', 0.08167},
                    {'b', 0.01492},
                    {'c', 0.02782},
                    /// etc.
                });

    public static LanguageSignature French =
        new LanguageSignature(
            "French",
            new Dictionary<char, double>
                {
                    {'a', 0.07636},
                    {'b', 0.00901},
                    // etc.
                });

    
}

Now we do our sums in 26-d space, ignoring any letters that we don’t have frequency data for (note that Sqrt is a little extension method I created on the Double class that simply calls Math.Sqrt):

double CalculateDistanceFromSignature(LanguageSignature signature, IEnumerable<CharacterFrequency> characterFrequencies)
{
    var distance = characterFrequencies
        .Where(characterFrequency => signature.GetFrequency(characterFrequency.Character) > 0)
        .Select(characterFrequency
            => Math.Pow(characterFrequency.Frequency - signature.GetFrequency(characterFrequency.Character), 2))
        .Sum()
        .Sqrt();
    return distance;
}

Now we’re ready to determine which language our sample is closest to:

public LanguageSignature DetermineMostLikelyLanguage(string sample, IEnumerable<LanguageSignature> signatures)
{
    var characterFrequencies = CalculateCharacterFrequencies(sample);

    var closestLanguage = signatures.Select(signature => new
    {
        Language = signature,
        Distance = CalculateDistanceFromSignature(signature, characterFrequencies)
    })
    .MinItem(languageDistance => languageDistance.Distance);

    return closestLanguage.Language;
}

MinItem is an extension method I knocked up that returns the item from a sequence that yields the smallest value from the expression you supply (in this case, the smallest distance):

public static T MinItem<T, TCompare>(this IEnumerable<T> sequence, Func<T, TCompare> comparatorSelector) where TCompare : IComparable<TCompare>
{
    var minItem = sequence.Aggregate(
        sequence.First(), 
        (current, min) => comparatorSelector(current).CompareTo(comparatorSelector(min)) < 0 ? current : min);

    return minItem;
}

And we’re done.

Testing, Testing, Un, deux, trois

To prove it, I created a little test that analysed samples of texts in several languages taken from Project Gutenberg (the test uses MbUnit’s very handy XmlData feature to read the different test cases from my xml file).

public class LanguageAnalyserTests
{
    [Test]
    [XmlData("//Sample", FilePath = "ProblemData\\LanguageSamples.xml")]
    public void TestAnalysis([Bind("Text")]string text, [Bind("@Language")]string expectedLanguage)
    {
        var analyser = new LanguageAnalyser();
        var determinedLanguage = analyser.DetermineMostLikelyLanguage(
            text
            new[] { LanguageSignatures.English, LanguageSignatures.French, LanguageSignatures.German, LanguageSignatures.Dutch, LanguageSignatures.Spanish, LanguageSignatures.Italian });

        Assert.AreEqual(expectedLanguage, determinedLanguage.Language);
    }
}

In my test I’ve got 9 samples in German, English, French, Dutch, Spanish and Italian, extracted at random from the Gutenberg archives and each sample is correctly identified by my analyser. I also did a little experiment to see how much text it needed to correctly distinguish the samples. Limiting the samples to 600 characters caused some confusion between Spanish and English, but 700 characters was sufficient in my tests to enable the analyser to pick the right language for all samples.

A Challenge

I actually developed this LanguageAnalyser to help me solve a problem on Project Euler. Which one was it? Answers in the comments please, or by email, and bonus marks to anybody who can guess what my solution looks like. Check back Monday for the answer, and for full source code to today’s post.

Update: The answer's here, and the source code is now on MSDN Code Gallery

Friday, 27 March 2009

Project Euler 21: Getting friendly with the Amicable numbers

I’ve remarked before that to Mathematicians, numbers aren’t, … well, they’re not just numbers. Numbers have personalities. Whilst some are deficient, there are others which are aspiring, and a few who are actually perfect;  half of them are rather odd, some downright weird. Of more concern are the odious and evil numbers, especially those that consider themselves untouchable1.

But today we’ll look at a more friendly subject: the amicable numbers. What is an amicable number? Let’s get mathematical and define a function d(n) to be the sum of all proper divisors of n (a proper divisor of n is any divisor of n less than n). Then an amicable number a is one which has a friend b where d(a) = b and d(b) = a (note that we don’t allow a to be narcissistic and count itself as a friend). If you do the math (or these days, more realistically, the google search) you’ll find that the first pair of amicable numbers is 220 and 284.

Amicable numbers have been studied since Greek times: the Pythagoreans in particular credited these special numbers with mystical powers. They’ve even by tried out as a kind of mathematical aphrodisiac:

El Madshriti, an Arab of the 11th century, experimented with the erotic effects of amicable numbers by giving a beautiful woman the smaller number 220 to eat in the form of a cookie, and himself eating the larger 284!

My sudden interest in amicable numbers was driven by strictly mathematical desire  - to solve Project Euler 212. The problem simply asks us to find the sum of all amicable numbers less than 10000. And in fact, the solution to the problem pretty much falls out of its definition.

First, I’ll borrow a function from a few problems back that lists all the divisors of a number:

public static class NumericExtensions
{
    public static IEnumerable<int> Divisors(this int number)
    {
        return (from factor in 1.To((int)Math.Sqrt(number))
                where number.IsDivisibleBy(factor)
                select new [] { factor, number / factor }).Concat();
    }

}

With that, the answer is straightforward:

[EulerProblem(21, Title = "Evaluate the sum of all amicable pairs under 10000.")]
public class Problem21
{
    public long Solve()
    {
        Func<int, int> d = n => n.Divisors().Sum() - n;

        return 	(
				from a in 1.To(10000)
                let b = d(a)
                where a != b && d(b) == a
                select a
				)
                .Sum();
    }
}

As always, full code is available on the Project Euler Code Gallery page.

Footnotes
  1. If you want to learn more about number personalities, see this catalogue.
  2. Thanks to Bela Istok who sent me a nice functional solution to the problem, which prompted me to think about how I would solve it myself.

Tuesday, 6 January 2009

Project Euler 89: Converting to and from Roman Numerals

It would not surprise me at all to learn that the Roman Numeral system was invented by a Stone Mason with one eye on his retirement fund. As he charged per character chiseled he could expect a substantial fee whenever called upon to inscribe numbers; dates on tombstones were even better, because on the whole he could expect the length of the inscription and thus the fee to increase as the years progressed.

Everybody knows how the Roman Numeral system works, though the only time we usually get to show off our skills is when the end-credits roll on the the big or small screen and the date of film or programme production flashes up. It consists of 7 symbols, each standing for a value:

Numeral Value
I 1
V 5
X 10
L 50
C 100
D 500
M 1000

A number in Roman Numerals is just a string of these numerals written in descending order (e.g. M's first, followed by D's, etc.). To convert from Roman Numerals to decimal we just work through the string, adding up the values of the numerals. So MMVIIII represents 2 x 1000 + 1 x 5 + 4 x 1 = 2009.

The only complication is the subtractive rule. This rule makes it possible for some numbers to be written in a more compact form (I'm sure archaeologists will one day unearth a document recording the protests made by the Mason's Union on its introduction!). It allows a numeral of a lower value to be placed before one of a higher value to indicate that the lesser should be subtracted from the greater. Thus IV means 5 - 1 = 4, and XC = 100 - 10 = 90. Unfortunately, it's not quite that simple. According to PPI MMDCL1, only I, X and C can be used subtractively, and then only in front of numerals up to ten times as big. So I can only be used in front of V and X, X in front of L and C; and C in front of D and M.

Converting Numerals to Decimal Programmatically

It's the subtractive rule that stumped me at first when I tried to figure out an algorithm to parse roman numerals. It certainly makes things complicated if you think about parsing the string from left to right. But as soon as you start thinking backwards, that is working from right to left, the difficulty evaporates:

  1. Split the string into characters
  2. Convert each character into the value it represents
  3. Start from the right - the lowest value numeral
  4. Keep a running total, and a record of the maximum numeral encountered so far
  5. Take characters one by one:
    1. If character is greater than the maximum, add it to the running total and update the maximum
    2. If character is less than the maximum, subtract it from the running total

This converts straight-forwardly into LINQ:

public static int ConvertToDecimal(this string numerals)
{
    var result = numerals
        .ToCharArray()
        .Reverse()
        .Select(c => _numeralToNumberLookup[c.ToString()])
        .Aggregate(
            new { MaxValue = 0, RunningTotal = 0 },
            (state, item) => new
            {
                MaxValue = Math.Max(state.MaxValue, item),
                RunningTotal = item >= state.MaxValue ? state.RunningTotal + item 
                                                               : state.RunningTotal - item
            }, 
            aggregate => aggregate.RunningTotal);

    return result;
}

You'll note that I'm using an anonymous type in my call to Aggregate to allow me to record both pieces of state. Notice that I have to seed the call to Aggregate with an empty instance of the anonymous type (line 8). The C# compiler makes this possible by ensuring that any anonymous type that has properties with the same name and type and in the same order as another anonymous type actually become the same type. This overload of Aggregate also requires a lambda expression that extracts the overall result from the final aggregate object.

Converting Numbers to Numerals (with added recurserators!)

So that's Parsing taken care of. What about turning a number into numerals?

The algorithm isn't much more difficult, though the implementation I've chosen is a little unusual. The algorithm is recursive. We start off with the number we want to convert, and a stack containing the values of all the numerals that are available to us, greatest on the top. This stack includes the values of "composite numerals" - pairs of numerals that follow the subtractive rule, for example, 4 (IV), 9 (IX) and 40 (XL). In each iteration we peek at the numeral value on the top of the stack and test to see whether the number is divisible by it. If it is we've got us some numerals - we produce part of the final numeral string by looking up the appropriate numeral(s) and repeating them as many times as the number is divisible by the numeral value. If not, we recurse anyway. To recurse, we pop the top numeral off the stack, and pass the reduced list of numerals through to the next iteration, along with the remainder of the number after dividing by the numeral value. We terminate the recursion when we are asked to convert number 0 - which can't be done in roman numerals. If you understood all that, you have permission to skip the next paragraph; otherwise read on for a worked run-through of the algorithm.

For example, suppose we want to convert 2051. We start with the complete list of numerals: { 1000, 900, 500, 400, ... 50, 40, 10, 9, 1}. Peek at the first numeral value, and discover that it's 1000. 2051 is 2 x 1000, with remainder 51. Looking up 1000 in our numeral dictionary yields "M", so the first part of the numeral string is "MM". We're done with that iteration, so we remove 1000 from our list of numerals, and pass the smaller list, and the remainder 51 to the next iteration. The next few numeral values {900, 500, etc.} don't divide into 51, so we keep popping numerals, passing remainders and recursing till we get to a numeral list starting {50, 40, ...}.  51 is divisible by 50 (so we deduce that the next part of the numeral string is "L"). We then pop, pass and recurse until the number 1 is given for conversion, and the numeral list just contains {1}. At this point we add numeral "I" to the result, recursing hits the terminating condition, so we stop.

Now for the implementation. I could have used a straight-forward recursive function: I did this with my solution to Problem 31 (Counting Coin Combinations) which had a similar algorithm. But my Lazy Trampoline hasn't seen much action recently, so I decided to give it a bounce. The idea of the Lazy Trampoline is to create a recursive function that is able to yield a result after each iteration - a recurserator, you could call it.

It looks like this2:

public static string ConvertToNumerals(this int number)
{
    Func<int, IStack<int>, IEnumerable<string>> numeralsEnumerator = Trampoline.MakeLazyTrampoline((int numberToConvert, IStack<int> availableNumerals) =>
    {
        if (numberToConvert == 0)
        {
            return Trampoline.YieldBreak<int, IStack<int>, string>();
        }

        int currentNumeral = availableNumerals.Peek();

        int quotient = numberToConvert / currentNumeral;
        int remainder = numberToConvert % currentNumeral;

        string numberAsNumerals = _numberToNumeralLookup[currentNumeral].Repeat(quotient);

        return Trampoline.YieldAndRecurse<int, IStack<int>, string>(
            numberAsNumerals, 
            remainder, availableNumerals.Pop());
    }
    );

    var result = numeralsEnumerator(number, _availableNumerals)
             .Aggregate(
                new StringBuilder(),
                (sb, numerals) => sb.Append(numerals),
                sb => sb.ToString());

    return result;
}

The call to MakeLazyTrampoline (line 3) generates an iterator (numeralsEnumerator). Each item in the sequence generated by the iterator is a string consisting of one type of numeral. These are all combined by the query starting line 23 to create the complete numeral.

Within the recurserator, note that I return special values generated by Trampoline.YieldBreak and Trampoline.YieldAndRecurse. These control the behaviour of the recurserator. YieldBreak indicates that the recursion should stop now; YieldAndRecurse indicates (as its first parameter) which value should be returned in the sequence, and (in the second and third parameters) which values should be passed through to the next iteration.

It may have slipped under you radar, but in Line 15 I'm making a call to the Repeat method which doesn't exist on the String class. Here's the missing extension method definition:

public static string Repeat(this string value, int count)
{
    if (count == 0)
    {
        return string.Empty;
    }
    
    if (count == 1)
    {
        return value;
    }

    return Enumerable.Repeat(value, count)
        .Aggregate(new StringBuilder(), 
        (aggregate, item) => aggregate.Append(item), 
        aggregate => aggregate.ToString());
}

Solving Project Euler Problem 89

Project Euler Problem 89 gives us a file containing a load of numbers in Roman Numeral form, and asks us to calculate how many characters can be saved if the numbers are compacted using the subtractive rule. Given the two extension methods we've just implemented, the solution is a simple pair of LINQ queries:

[EulerProblem(89, Title = "Develop a method to express Roman numerals in minimal form.")]
class Problem89
{
    public long Solve()
    {
        var lines = File.ReadAllLines("ProblemData\\Problem89Data.txt");

        var numberOfCharacters = lines.Select(line => line.Length).Sum();

        var minimisedNumberOfCharacters = lines
            .Select(line => line.ConvertToDecimal())
            .Select(number => number.ConvertToNumerals())
            .Select(numerals => numerals.Length)
            .Sum();

        return numberOfCharacters - minimisedNumberOfCharacters;
    }
}

The full code is available, as always, on the MSDN Code Gallery page. Hope you like it Saevar.

Footnotes
  1. PPI - Prex pro Ineo. ie Request For Comment! Thanks, InterTran!
  2. Thanks again to Eric Lippert for his Immutable Stack code.

Thursday, 16 October 2008

Project Euler Code Clearance

Roll up! Roll up! Get them while they're hot, don't leave them 'till they're not. Get your genuine, authentic Functional Fun Project Euler solutions here. Bargain prices, can't beat 'em. Visual Studio solution thrown in free.

I've been pretty busy over the last week or so, and it looks like I'll continue to be busy until I depart for LA at the end of next week. And you know what that means: .Net 4, C# 4, Oslo, Cloud Computing, Windows 7. That gang of acronyms and code names should keep me in blog posts for a good long while. Meanwhile, I have several Project Euler solutions languishing in the basement of my hard-disk (fourth platter, third sector, last block on the right, I think), and I can't see myself getting the time to do them justice in individual blog posts.

So today I'm going to do them an injustice and launch them into the world with barely a scant mention each. As usual, you can get the code from MSDN Code Gallery.

Problem 20: Summing the digits of a Factorial

Problem 20 gave me a chance to reuse my code for summing large numbers again. That's because calculating a factorial can be done iteratively. Imagine the sequence 1!, 2!, 3!, etc. You get the (n + 1)th term by multiplying the nth term by itself n + 1 times (that's just the definition of factorial). And the multiplication boils down to adding up n + 1 copies of the nth term. Wrap that simple idea inside an application of Unfold, and you have the solution.

Problem 22: Number crunching names

Problem 22 is just a file processing problem. Nothing more to it than that. It has a nice, one line solution with LINQ though:

return File.ReadAllText(problemDataFileName)
           .Split(new[] {','})
           .Select(name => name.Trim(new[] {'\"'}))
           .OrderBy(name => name)
           .Select((name, index) => (index + 1) * name.ToCharArray()
													  .Select(letter => (letter - 'A') +1)
                                                      .Sum())
           .Sum();

Problem 25: Big Fibonacci numbers

In Problem 25, Euler wants to know the first term in the Fibonacci sequence that has 1000 digits, which is of course his way of getting us to find a way of computing with big numbers. No problem to us though: we've got the Unfold method to calculate the Fibonacci sequence - and did I mention the code I wrote before that we can use to sum the big numbers we'll find?

Problem 31: Counting Coin Combinations

I'm quite pleased with my solution to Problem 31, wherein Euler asks us to count the number of ways in which £2 can be given using the British coinage system. I came up with a recursive algorithm that starts with the value of change you wish to give and a list of available coin types. Then it removes the biggest coin from the list, works out how many times that coin could be used, and for each possibility calculates the number of combinations for the reduced amount, and the reduced coin set by calling itself recursively.

For example, if we needed to make £1 using 10p and 2p, and 1p coins, we'd start by seeing that we could use between zero and ten 10p coins, so there are eleven possibilities we need to investigate. For each possibility we calculate how much remains once we've put down the appropriate number of 10 pences, then use the same algorithm again, but considering just 2p and 1p coins.

Problem 36: Binary and Decimal Palindromes

We last encountered numeric palindromes in Problem 4: they're numbers that read the same in both directions. In Problem 4 we were only interested in numbers that are palindromic when written in base 10. Problem 36 asks us to count the numbers that are palindromic when written in binary and in decimal. The most interesting part about this problem was finding a number's representation in binary. I could probably have used the BitArray class to do this, but instead chose to use this LINQ query:

private static IEnumerable<bool> GetNumberBits(uint number)
{
    if (number == 0)
    {
        return new[] {false};
    }

    // iterate through the bit positions, checking whether that
    // bit of the the number is set;
    // do it in reverse, so that we can ignore leading zeros
    return 31.To(0)
        .Select(bit => (number & (1 << bit)) == (1 << bit))
        .SkipWhile(bit => !bit);
}

Problem 112: Finding the density of Bouncy numbers

Mathematicians don't try to hide their obsession with numbers, do they? They make it plain as day that they count numbers amongst their closest friends, by giving them all names. It's the Bouncy numbers that are the heroes of Problem 112. These are numbers that, considering their digits as a sequence have a neither increasing nor decreasing sequence. My solution to this problem puts the Aggregate and LazilyAggregate methods to a rather unconventional use.

Friday, 5 September 2008

Project Euler Problem 19: A Century of Sundays

There's a hard way and an easy way to solve Problem 19 in our ongoing struggle with Project Euler, where we're asked to count how many Sundays fell on the first of the month during the twentieth century - being software developers and thus virtuously lazy by nature we wouldn't consider the tedious way, getting out all those old diaries (wherein entries invariably finish around January 5th) and actually counting whilst flicking over the pages.

The hard way is to use the information given in the problem definition and devise an algorithm for calculating days of the week that each date falls on. The easy way uses with gratitude the brainpower the .Net Framework developers invested in their work. It's Friday: I'll take the easy way.

Thanks to C#3.0 and LINQ, the actual code to the solution is a one-expressioner:

[EulerProblem(19, Title = "How many Sundays fell on the first of the month during the twentieth century?")]
public class Problem19
{
    public long Solve()
    {
        DateTime endDate = new DateTime(2000, 12, 31);
        DateTime startDate = new DateTime(1901, 1, 1);

        return startDate
            .Unfold(date => date.AddMonths(1))
            .TakeWhile(date => date <= endDate)
            .Count(date => date.DayOfWeek == DayOfWeek.Sunday);
    }
}

If it's Friday for you as well then I'll let you read the following explanatory notes: otherwise, work it out for yourself!

Remember my Unfold extension method? The one that takes a seed value (startDate in this case) and generates a sequence by repeatedly applying an expression to previous value in the sequence to create the next. That combined with the TakeWhile method represent the functional-programming equivalent of the for loop. The Count overload that I'm using here counts every item in a sequence matching a particular criterion. So the code I have above is equivalent to this:

DateTime endDate = new DateTime(2000, 12, 31);
DateTime startDate = new DateTime(1901, 1, 1);

int count = 0;
for (DateTime date = startDate; date < endDate; date = date.AddMonths(1))
{
    if (date.DayOfWeek == DayOfWeek.Sunday)
    {
        count++;
    } 
}

return count;

Thursday, 28 August 2008

I am now a Tetrahedron

In my lunch hour yesterday I solved my 25th Project Euler problem. As Euler kindly informed me, that puts me in the top 78% of Project Euler members, and makes me a Tetrahedron according to the newly introduced scoring system. The levels in this system are named after the Platonic Solids: when I've solved another 25 problems, I'll become a cube; after another 50, an octahedron, and so on, evolving step by step to the mathematical perfection of a sphere when I've tucked 250 problems under my belt.

But don't let me get carried away. I started solving problems in Mid March. It's now the end of August. By my reckoning it's taken 27 weeks to solve those 25 problems, averaging about 0.9 problems a week. Extrapolating this, I'm looking at another 3 years to solve the 200 or so current problems, or 4 years to reach Spherehood. And that's looking on the rosy side: the problems get harder as you get further on, so my rate is bound to drop.

With that in mind, I'll probably change my current tack of blogging about the problems one by one, and start picking out those that I think will make for the most interesting posts. I've got my eye on number 96 which is all about Sudoku ...

If you want to follow my progress, you can have a look at my Project Euler profile.

Project Euler Problems 18 and 67: Finding the Maximal Route through a Triangle

If this doesn't pique the interest of my fellow developers then I don't know what will: a problem that will take twenty billion years to solve (at an extremely optimistic estimate), unless we find an efficient algorithm. The problem in question is Project Euler Problem 67. We're given a triangle made up of one hundred rows of numbers, and asked to consider all routes from the top to the bottom, made by moving from one row to the next via adjacent cells. Over all such routes, we have to find the maximum sum. Can't picture it? Let me help:

TriangleRoute

For the route that I've drawn, the sum is 5 + 15 + 29 + 45 = [train your brain and work it out yourself!].

If we only wanted to tackle the baby brother of this problem (Problem 18 - the same problem for a triangle with just 15 rows), we could get away with checking each of the routes. But it's not feasible to do that here because there are 299 possibilities: hence the need for an efficient algorithm. Any ideas?

You could take a bath and wait for the Eureka moment. Or if you want to avoid the ensuing streaking, read on.

An algorithm

Suppose that for each cell in a row we know the maximum sum over all routes ending at that cell. From that it's easy to work out the same for each of the cells in the next row. Here's how you do it. If you've got a cell with value c, somewhere in the middle of the row, then you can only reach it from either of the two cells adjacent to it in the row above. 

CalculatingCellMaxima

We're assuming that we already know the maximum sum for routes ending at these cells: suppose it is a and b respectively. So the biggest possible sum we can get for routes ending at c is Max(c + a, c + b). For the two cells at either end of the row the calculation is even simpler: they can only be reached from the cell at the appropriate end of the row above them, so we just sum the cell value with the known maximum sum to the cell above. In this way, we can crunch through the rows, 'flattening' the triangle until we reach the last row, at which point we will know the maximum possible sum over all routes ending at each of the cells in that row.

That's the middle of an algorithm. To solve the overall problem we just need to add a beginning and an end. The middle part starts by assuming that we already know the maximum sum for all routes ending at a particular row. So to get the algorithm rolling we need a row for which we do have that information, and the first row of the triangle is the obvious candidate: all routes start there, so the maximum up to that point is just the value of the cell in that row. How about the end of the algorithm? To solve the problem, we need to know the maximum sum from top to bottom through the triangle; we've calculated the maximum sum ending at each of the cells in the last row, so we just need to take the maximum over all these cells.

Some code

To code this, we'll reach once again into our Functional Programming toolbox. I think the Aggregate method is just the hammer to crack this particular nut. Do you see how?

Ignore the input part for now, and assume we've got a sequence of List<int>, one List for each row of the triangle. Like this:

{75}
{95, 64}
{17, 47, 82}
{18, 35, 87, 10}
...
In this layout, it's a little tricky to visualise which cells are adjacent to which: you need to look at the cell directly above, and the one above but to the left. For example, the 35 in row four is adjacent to 47 and 17 in the row above.

We'll use Aggregate to 'flatten' the triangle represented by this sequence of Lists using the algorithm I described above: the aggregate that we'll be calculating is a List containing the maximum sum over all routes to each cell up to the last row we've processed. Remember that Aggregate needs a function that combines the aggregate so far with the next item in the series. We'll supply a function that takes the maximums so far and combines them with the cell values for the next row. The code looks like this:

return rows.Aggregate(
    new List<int> {0},
    (currentMaxima, nextRow) =>
        {
            var rowLength = nextRow.Count();
            return nextRow
                .Select((cell, index) =>
                        index == 0 ? cell + currentMaxima[index]
                        : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                        : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                .ToList();
        }
    )
    .Max();

Notice that at line 7 we're processing the sequence nextRow (containing the cell values) using an overload of the Select method that gives us the index of each cell value in the sequence; we use this index to find the appropriate entries in the list currentMaxima containing the maximum sums to the cells in the row above.

And with that, I believe we've solved the problem. It takes 2 milliseconds to solve the 100-row triangle problem on my computer - a satisfying improvement over 20 billion years.

Here's the complete code. Note that in the source code project I have included two files containing the data for these problems.

namespace ProjectEuler
{
    [EulerProblem(18, Title = "Find the maximum sum travelling from the top of the triangle to the base.")]
    public class Problem18
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem18Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    [EulerProblem(67, Title = "Using an efficient algorithm find the maximal sum in the triangle?")]
    public class Problem67
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem67Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    public static class MaximumSumThroughTriangleSolver
    {
        public static int SolveFromFile(string dataFilePath)
        {
            string problemData = File.ReadAllText(dataFilePath);

            return Solve(problemData);
        }

        public static int Solve(string problemData)
        {
            var rows = problemData
                .Split(new[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
                .Select(line =>
                        line.Split(' ')
                            .Select(token => int.Parse(token))
                            .ToList()
                )
                .ToList();

            return rows.Aggregate(
                new List<int> {0},
                (currentMaxima, nextRow) =>
                    {
                        var rowLength = nextRow.Count();
                        return nextRow
                            .Select((cell, index) =>
                                    index == 0 ? cell + currentMaxima[index]
                                    : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                                    : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                            .ToList();
                    }
                )
                .Max();
        }
    }
}

Friday, 8 August 2008

Project Euler Problem 17: Converting numbers to words

Zimbabwe Prices: Picture Credit, SokwaneleFancy earning a Lotto prize of 1.2 quadrillion dollars? Then you should move to Zimbabwe! Don't be in too much of a hurry, though. By the time you'd brought it back home, your Zimbabwe-dollar haul would have shrunk to less than 4,000 US Dollars.

A couple of weeks back the BBC published an article talking about the problem of big numbers in Zimbabwe. Due to daily expenses measured in the trillions of dollars, with hyperinflation of 2,200,000% pushing that figure ever higher, the citizens of Zimbabwe have been googling to find the names for the next big numbers after trillions. Coincidentally, I was doing the same thing the day before I read the article as I went about solving Project Euler Problem 17.

The problem wants to know how many letters are needed to write out the numbers 1 to 1000 in words. Clearly this calls for an algorithm to convert numbers to words; since I always like to give a little extra to my loyal readers, I wanted to create an algorithm to handle numbers bigger than 1000. I ended up with one that can convert anything up to long.MaxValue: in words, nine quintillion, two hundred and twenty-three quadrillion, three hundred and seventy-two trillion, thirty-six billion, eight hundred and fifty-four million, seven hundred and seventy-five thousand, eight hundred and seven! Plenty powerful enough to write out Bill Gate's paychecks, and sufficient to keep the Zimbabweans going for the next few months!

So how does the algorithm work? A basic ingredient is a dictionary mapping the basic numbers to words. This covers the numbers 1 to 20, then every multiple of ten up to 100, then each larger number that has a special name. The key ingredient is a recursive function (functional programming had to come into it somewhere) that starts with the largest part of a number and repeatedly breaks it down into its components.

In Pseudo-code:

  1. Let n be the number to process
  2. For numbers smaller than 0, combine "Negative" with the result of calling the algorithm on the absolute value of n
  3. For n between 0 and 20, look up the result in the word dictionary
  4. For n between 20 and 100, calculate the number of 10s and number of units; look up the name for the number of tens; if there are any units, look up the appropriate name and combine the two results with a hyphen
  5. For n between 100 and 1000, calculate the number of 100s and the remainder; look up the name for the number of hundreds; if there is any remainder, recurse to get its wordy representation, then combine it with result for the hundreds using "and" to join the two parts
  6. For n bigger than 1000: decide which is the biggest named number (million, billion, etc.) that divides into n. Calculate the number of that base unit and the remainder. Recurse to convert the number of base units into words, and recurse to convert the remainder into words. Combine the two parts using ","

In C#:

public static class ConvertToWordsExtension
{
 private static Dictionary<long, string> _wordDictionary;
 private const int OneThousand = 1000;
 private const long OneMillion = 1000000;
 private const long OneBillion = 1000000000;
 private const long OneTrillion = 1000000000000;
 private const long OneQuadrillion = 1000000000000000;
 private const long OneQuintillion = 1000000000000000000;

 /// <summary>
 /// Converts a number to its English representation in words.
 /// </summary>
 /// <remarks>Uses the Short Scale for large numbers. See http://en.wikipedia.org/wiki/Long_and_short_scales for more details</remarks>
 public static string ConvertToWords(this long number)
 {
     EnsureWordDictionaryInitialised();

     if (number == long.MinValue)
     {
        throw new ArgumentOutOfRangeException();
     }

     return ConvertToWordsCore(number);
 }

 /// <summary>
 /// Converts a number to its English representation in words
 /// </summary>
 public static string ConvertToWords(this int number)
 {
     return ConvertToWords((long)number);
 }

 private static Dictionary<long, string> CreateWordDictionary()
 {
     return new Dictionary<long, string>
                {
                    {0, "zero"},
                    {1, "one"},
                    {2, "two"},
                    {3, "three"},
                    {4, "four"},
                    {5, "five"},
                    {6, "six"},
                    {7, "seven"},
                    {8, "eight"},
                    {9, "nine"},
                    {10, "ten"},
                    {11, "eleven"},
                    {12, "twelve"},
                    {13, "thirteen"},
                    {14, "fourteen"},
                    {15, "fifteen"},
                    {16, "sixteen"},
                    {17, "seventeen"},
                    {18, "eighteen"},
                    {19, "nineteen"},
                    {20, "twenty"},
                    {30, "thirty"},
                    {40, "forty"},
                    {50, "fifty"},
                    {60, "sixty"},
                    {70, "seventy"},
                    {80, "eighty"},
                    {90, "ninety"},
                    {100, "hundred"},
                    {OneThousand, "thousand"},
                    {OneMillion, "million"},
                    {OneBillion, "billion"},
                    {OneTrillion, "trillion"},
                    {OneQuadrillion, "quadrillion"},
                    {OneQuintillion, "quintillion"}
                };
 }

 private static void EnsureWordDictionaryInitialised()
 {
     // ensure thread-safety when caching our word dictionary
     // note: this doesn't prevent two copies of the word dictionary
     // being initialised - but that doesn't matter; only one would be
     // cached, the other garbage collected.
     if (_wordDictionary == null)
     {
         var dictionary = CreateWordDictionary();
         Interlocked.CompareExchange(ref _wordDictionary, dictionary, null);
     }
 }

 private static string ConvertToWordsCore(long number)
 {
     if (number < 0)
     {
         return "negative " + ConvertToWordsCore(Math.Abs(number));
     }

     // if between 1 and 19, convert to word
     if (0 <= number && number < 20)
     {
         return _wordDictionary[number];
     }

     // if between 20 and 99, convert tens to word then recurse for units
     if (20 <= number && number < 100)
     {
         return ProcessTens(number, _wordDictionary);
     }

     // if between 100 and 999, convert hundreds to word then recurse for tens
     if (100 <= number && number < OneThousand)
     {
         return ProcessHundreds(number, _wordDictionary);
     }

     if (OneThousand <= number && number < OneMillion)
     {
         return ProcessLargeNumber(number, OneThousand, _wordDictionary);
     }

     if (OneMillion <= number && number < OneBillion)
     {
         return ProcessLargeNumber(number, OneMillion, _wordDictionary);
     }

     if (OneBillion <= number && number < OneTrillion)
     {
         return ProcessLargeNumber(number, OneBillion, _wordDictionary);
     }

     if (OneTrillion <= number && number < OneQuadrillion)
     {
         return ProcessLargeNumber(number, OneTrillion, _wordDictionary);
     }

     if (OneQuadrillion <= number && number < OneQuintillion)
     {
         return ProcessLargeNumber(number, OneQuadrillion, _wordDictionary);
     }
     else
     {
         return ProcessLargeNumber(number, OneQuintillion, _wordDictionary);
     }
 }

 private static string ProcessLargeNumber(long number, long baseUnit, Dictionary<long, string> wordDictionary)
 {
     // split the number into number of baseUnits (thousands, millions, etc.)
     // and the remainder
     var numberOfBaseUnits = number / baseUnit;
     var remainder = number % baseUnit;
     // apply ConvertToWordsCore to represent the number of baseUnits as words
     string conversion = ConvertToWordsCore(numberOfBaseUnits) + " " + wordDictionary[baseUnit];
     // recurse for any remainder
                 conversion += remainder <= 0 ? ""
                            : (remainder < 100 ? " and " : ", ") + ConvertToWordsCore(remainder);
     return conversion;
 }

 private static string ProcessHundreds(long number, Dictionary<long, string> wordDictionary)
 {
     var hundreds = number / 100;
     var remainder = number % 100;
     string conversion = wordDictionary[hundreds] + " " + wordDictionary[100];
     conversion += remainder > 0 ? " and " + ConvertToWordsCore(remainder) : "";
     return conversion;
 }

 private static string ProcessTens(long number, Dictionary<long, string> wordDictionary)
 {
     Debug.Assert(0 <= number && number < 100);

     // split the number into the number of tens and the number of units,
     // so that words for both can be looked up independantly
     var tens = (number / 10) * 10;
     var units = number % 10;
     string conversion = wordDictionary[tens];
     conversion += units > 0 ? "-" + wordDictionary[units] : "";
     return conversion;
 }
}

Having coded the algorithm like that, I was casting around to find a way of making the ConvertToWordsCore method more compact. That was when I came across Igor Ostrovsky's post on the ternary conditional operation ("?") and was reminded of a neat trick. I re-wrote the method like this:

private static string ConvertToWordsCore(long number)
{
 return
     number < 0 ? "negative " + ConvertToWordsCore(Math.Abs(number))
         : 0 <= number && number < 20 ? _wordDictionary[number]
         : 20 <= number && number < 100 ? ProcessTens(number, _wordDictionary)
         : 100 <= number && number < OneThousand ? ProcessHundreds(number, _wordDictionary)
         : OneThousand <= number && number < OneMillion ? ProcessLargeNumber(number, OneThousand, _wordDictionary)
         : OneMillion <= number && number < OneBillion ? ProcessLargeNumber(number, OneMillion, _wordDictionary)
         : OneBillion <= number && number < OneTrillion ? ProcessLargeNumber(number, OneBillion, _wordDictionary)
         : OneTrillion <= number && number < OneQuadrillion ? ProcessLargeNumber(number, OneTrillion, _wordDictionary)
         : OneQuadrillion <= number && number < OneQuintillion ? ProcessLargeNumber(number, OneQuadrillion, _wordDictionary)
         : ProcessLargeNumber(number, OneQuintillion, _wordDictionary); // long.Max value is just over nine quintillion
}

That's much more compact, and to my eyes, a lot more elegant. The only downside I can see is that debugging might be more difficult. Which form do you prefer?

With this algorithm in place, solving Project Euler 17 is trivial:

[EulerProblem(17, Title="How many letters would be needed to write all the numbers in words from 1 to 1000?")]
public class Problem17
{
 public long Solve()
 {
     return
         1.To(1000)
             .Select(
                 number => number
                             .ConvertToWords()
                             .ToCharArray()
                             .Count(character => char.IsLetter(character))
             )
             .Sum();
 }
}

As always, full code (including some unit tests for the ConvertToWords method!) is available on my Project Euler code gallery page. You can also try out the algorithm in my handy online utility to convert numbers to words.

Update 27/5/09:Fixed bug where 'and' was missed out in numbers like 1006

Monday, 28 July 2008

Project Euler Problem 16: Calculating Large Powers of Two

I always feel like I've got good value-for-brain-cycles when I can reuse a solution to a problem. I had that satisfaction when I came to look at Problem 16 in the Project Euler series: calculating the sum of the digits of the number 21000. Why? Because I realised that calculating 21000 (which we clearly need to do to calculate the sum of its digits) is just the same as summing a list of very big integers, which we did to solve Problem 13.

How is that? On the face of it calculating an exponent looks quite different to calculating a sum. But take a look at this line of thought:

2n = 2 x 2n-1
   = 2n-1 + 2n-1
   = (2 x 2n-2) + (2 x 2n-2)
   = (2n-2 + 2n-2) + (2n-2 + 2n-2)
   = ...

Bottom up duck. Picture credit: Glisglis, http://flickr.com/photos/glisglis/365003674/Or if you prefer bottom-up thinking (you are a duck, perhaps):

21 = 2
22 = 2 x 2 = 2 + 2
23 = 2 x (2 x 2) = (2 + 2) + (2 + 2)
24 = 2 x [2 x ( 2 x 2)] = [(2 + 2) + (2 + 2)] + [(2 + 2) + (2 + 2)]
25 = you get the idea

Thus we have a recursive definition for calculating a power of two in terms of the addition operation: just add together two of the previous power of two. Putting my Functional Programming Thinking cap on immediately brings to mind the Unfold method, which generates a sequence by applying a transformation to the previous element. Used in combination with my code from an earlier solution to sum up two numbers represented as digits, this should give us what we need.

In order to know how many terms of the sequence to Unfold to get us up to 21000 we'll use the trick of smuggling some additional information into the terms of the sequence using an anonymous type: in the following code the Power property in the anonymous type indicates which power of two is held in the Digits sequence. Digits is a number represented as a sequence of digits, least significant digit first.

The full code, as always, is available on Code Gallery.

[EulerProblem(16, Title = "What is the sum of the digits of the number 2^1000?")]
public class Problem16
{
    public long Solve()
    {  
        return new {Power = 1, Digits = (IEnumerable<int>) new int[] {2}}
            .Unfold((previous) => new { Power = previous.Power + 1, Digits = SumNumbers(previous.Digits, previous.Digits) })
            .SkipWhile(item => item.Power < 1000)
            .First() // this will be the term containing 2^1000
            .Digits // take the Digits property of the anonymous type
            .Sum(); // sum the digits of 2^1000
    }

    /// <summary>
    /// Calculate the sum of two numbers represented as a sequence of digits. Numbers should be given
    /// with least significant digits first
    /// </summary>
    /// <param name="number1"></param>
    /// <param name="number2"></param>
    /// <returns>The sum of the two numbers as a sequence of digits</returns>
    private IEnumerable<int> SumNumbers(IEnumerable<int> number1, IEnumerable<int> number2)
    {
        // put the numbers together in an array so that we can zip them easily
        var numbers = new[] {number1, number2};

        // process the entire list of numbers column by column
        // ie, process least significant digits of all numbers first, then the next column of digits, etc.
        return numbers
        .Zip()
            // work through each column, calculating which digit represents the sum for that column,
            // and what the carryover is
        .Aggregate(
            // initialise an annoymous data structure to hold our workings out as we
            // move column-by-column.
            // Digits holds the digits of the sum that we've calculated so far
            // CarryOver holds whatever has been carried over from the previous column
            new { Digits = Stack<int>.Empty, CarryOver = 0 },
            (previousResult, columnDigits) =>
            {
                var columnSum = columnDigits.Sum() + previousResult.CarryOver;
                var nextDigit = columnSum % 10;
                var carryOver = columnSum / 10;
                return new { Digits = previousResult.Digits.Push(nextDigit), CarryOver = carryOver };
            },
            // to complete the aggregation, we need to add the digits of the final carry-over
            // to the digits of the sum;
            // note that because completeSum.Digits is a stack, when it is enumerated
            // we get the items back in reverse order - ie most significant digit first; hence the call to Reverse()
            // to make the order of the digits in the sum we return consistent with the input digits
            (completeSum) => completeSum.CarryOver == 0 ? 
                completeSum.Digits.Reverse() 
                : completeSum.CarryOver.Digits().Concat(completeSum.Digits).Reverse());
    }
}

I also had to update the Zip method to fix a bug. Here's the updated version:

public static IEnumerable<IEnumerable<T>> Zip<T>(this IEnumerable<IEnumerable<T>> sequences)
{
    // get the enumerators for each sequence
    var enumerators = sequences.Select(sequence => sequence.GetEnumerator()).ToList();

    var activeEnumerators = new List<IEnumerator<T>>();
    var items = new List<T>();

    while (enumerators.Count > 0)
    {
        items.Clear();
        activeEnumerators.Clear();

        foreach (var enumerator in enumerators)
        {
            if (enumerator.MoveNext())
            {
                items.Add(enumerator.Current);
                activeEnumerators.Add(enumerator);
            }
        }

        if (items.Count > 0)
        {
            yield return items;
        }

        enumerators.Clear();
        enumerators.AddRange(activeEnumerators);
    }
}

Tuesday, 15 July 2008

Project Euler Problem 15: City Grids and Pascal's Triangle

Problem 15 in the Project Euler series asks us to count the number of routes through a 20x20 grid starting from the top left and travelling to bottom right, always moving either rightwards or downwards, never backtracking. I can give you the solution in one line of code; explaining why it is the solution will take me many more lines of prose. The solution is:

return PascalsTriangle.GetEntry(2*20, 20);

(OK, you caught me: PascalsTriangle isn't a C# primitive type, so I guess I'll have to give some code for that as well. But all in good time.)

So what does Pascal's triangle have to do with City Grids? This is one of the things that I love about mathematics: seemingly unrelated problems often have subtle and beautiful connections. The connections are completely hidden on the surface, but once you uncover them, link at a time, you wonder that you never spotted them immediately.

How should we go about counting the number of routes through the grid? Here, try with this smaller grid - take my copy and scribble on it all you like:

A 6x6 city grid

My first insight was to start counting, not at the start of the grid, but at the end. Rather than try to work out all the possible routes from the starting point, let's count how many routes there are from points near the end, then work backwards.

Let me show you.

How many routes are there from the point F6 (marked "End") to the same point, F6? Stupid question, obvious answer: one - just follow the complicated directions: "Stand Still". From F5 to F6 there's one route: "Go South one block". From E6 to F6, one route: "Go East one block". But if I'm stood at E5 and want to get to End I have two choices, I can either go "East one block, then south one block" or "South one block, then East one block".

In fact, from anywhere along column F there's only one route to the end ("Go South x blocks"), because we're not allowed to turn back west; and from anywhere along row 6 there's only one route ("Go East x blocks") because we're not allowed to turn up north. On the diagram I've drawn what we've got so far - blue shows the number of routes from each point:

CityGridv3

Now lets consider how to find the number of routes from other spots on the grid. Take E4, for example. If we go east to F4 then we already know that there's only one way we can go on from there. If, however, we go south one block to E5 then we know that there are two routes we can take from there. So this makes a total of three possible routes from E4. By the same argument, we can see that there are 3 possible routes from D5. Then we can apply the same reasoning to cell D4: three possible routes if we go east to E4 plus three possible routes if we go south to D5 make for a total of 6 ways to get from D4 to the end. Another diagram to summarise:

CityGridv4

You see the way this is going now, don't you? To work out the total number of routes from each intersection we add up the total number of routes from the intersections one block to the east, and one block to the south.

Now lets spin the grid about its centre to put the point marked End at the top, drop the gridlines, and write out the numbers by themselves. Like this:

PascalsTriangle

What have we got? Pascal's Triangle - a fascinating mathematical object d'art that has been rediscovered many times throughout history in many different contexts. (Wikipedia will be delighted to tell you more about it- there's more to it than meets the eye). You construct it by writing 1s down the diagonal edges of the triangle, then forming each cell by summing the two neighbours in the row above.

I've highlighted the cells in red, because they are the ones that particularly concern us. Imagine spinning the triangle back so that it aligns with the grid. Then you see that "2" is the number of routes through a 1x1 grid, and "6" is the number of routes through a 2x2 grid.

Let's label the cells of the grid. Call the apex of the triangle row 0, and call the first "1" on each row column 0. Thus the number of routes through a 1x1 grid appears in row 2, column 1; the number of routes through a 2x2 grid appears at row 4 column 2. From there we can take a flying un-mathematical leap to a conclusion and say that the number of routes through an n x n grid will appear in row 2n, column n of Pascal's Triangle. That explains my solution: obvious now?

But how to calculate the number in that cell? My first thought was to write code that builds the triangle row by row. But there's a quicker way. Using Proof by Wikipedia, we can show that P(r,c) = ((r + 1 - c) / c) * P(r, c - 1), where P(r, c) is the value of Pascal's triangle at row r, column c. In other words, to calculate the value of a cell at column, c, in a particular row, r, we multiply the value of the cell to the left by the fraction (r + 1 - c) / c.

A sequence defined recursively like that makes me think of the Unfold method for generating it. Remember that after you have given it a seed value, Unfold generates a sequence for you by applying the lambda you give it to the previous term in the sequence. So our mathematical definition becomes:

public class PascalsTriangle
{
    public static long GetEntry(int row, int column)
    {
        // the L suffix on "Entry = 1L" is to force Entry to have a long type
        return Functional.Unfold(new {Entry = 1L, Column = 1},
                                 previous =>
                                 new
                                     {
                                         Entry = (previous.Entry*(row + 1 - previous.Column))/previous.Column,
                                         Column = previous.Column + 1
                                     })
            .SkipWhile(item => item.Column <= column)
            .First()
            .Entry;
    }
}

Note the use of the anonymous type to pass an additional piece of information through to the next iteration: Entry holds the cell value, and Column holds an incrementing count of the number of columns that we've calculated.

For reference, here's equivalent code that uses looping rather than functional concepts:

public static long GetEntry(int row, int column)
{
    long current = 1;

    for (int i = 1; i <= column; i++ )
    {
        current = (current * (row + 1 - i)) / i;
    }

    return current;
}

The complete code is shown below, and can also be downloaded from my Project Euler on the MSDN Code Gallery.

[EulerProblem(15, Title = "Starting in the top left corner in a 20 by 20 grid, how many routes are there to the bottom right corner?")]
public class Problem15
{
    public long Solve()
    {
        return PascalsTriangle.GetEntry(2*20, 20);
    }
}

public class PascalsTriangle
{

    public static long GetEntry(int row, int column)
    {
        // the L suffix on "Entry = 1L" is to force Entry to have a long type
        return Functional.Unfold(new { Entry = 1L, Column = 1 },
                                 previous =>
                                 new
                                     {
                                         Entry = (previous.Entry * (row + 1 - previous.Column)) / previous.Column,
                                         Column = previous.Column + 1
                                     })
            .SkipWhile(item => item.Column <= column)
            .First()
            .Entry;
    }
}

public static class Functional
{
    public static IEnumerable<T> Unfold<T>(this T seed, Func<T, T> generator)
    {
        // include seed in the sequence
        yield return seed;

        T current = seed;

        // now continue the sequence
        while (true)
        {
            current = generator(current);
            yield return current;
        }
    }
}

Thursday, 3 July 2008

Project Euler Problem 14: Hailstone Numbers

Today we're going to talk about hailstones. No, not the sort that, last week, caused tens of millions of pounds of damage to 30,000 Volkswagen cars in an outside storage area in Germany. Rather the hailstones that are the subject of Project Euler Problem 14: hailstone numbers.

Hailstone numbers are just the kind of thing that fascinate mathematicians. You make them by picking any whole starting number, n, and then repeatedly applying one of the following two rules:

  • if n is even, the next number is n/2
  • if n is odd, the next number is 3n + 1

Try it; Start with 6. Then you get 6, 3, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1, 4, 2, 1, ... . Try it with another number. I bet you find yourself eventually writing 4, 2, 1, 4, 2, 1, ... all over again. If you don't, then, congratulations: you've just solved the Collatz problem, and won yourself at least a £1000 pound reward! You must also have picked a pretty big staring number. Using computers (obviously!) mathematicians have showed that if you kick off the sequence with any number smaller than about 2.7x1016 you'll eventually get down to the 4,2,1 repeating pattern.

You or I might be inclined to think, based on that evidence, that the sequence will always end in 4,2,1 for any starting number; but no quantity of successful tests would satisfy a pure-blooded mathematician: what if the next generation of computers tested even bigger numbers and found a sequence which didn't terminate like that? Hence the reward, which is being offered for a proof that for every whole number, the sequence generated by these rules eventually reaches 1; or, alternatively, for a demonstration of a number which begins a sequence that doesn't reach 1. This problem has taxed the minds of some of the brightest mathematicians for over 70 years.

But why are they called Hailstone numbers? Have a look at this chart, where I've plotted a few sequences starting with different numbers: Hailstone Numbers

See how the sequences jump up and down? Apparently real hailstones do that when they're growing in the clouds: rising currents of air lift dust particles up into the clouds where they get smothered in ice; this makes them heavy, so they sink. Then the ice melts a little which makes them light enough to be pushed back up again- and so the little hailstones bounce up and down in the clouds getting fatter and fatter, until eventually they get so heavy that they fall to the ground, bouncing there until they come to a complete stop.

Now that you know the what and the why of Hailstone numbers, let's get to Euler's problem, which is to do with the length of sequences of hailstone numbers. We define the length of the sequence as being the number of terms up to the first occurrence of the number 1 (the example above starting with 6 has length 9, for example). Euler asks which starting number under 1 million produces the longest sequence?

A recursive solution

How to solve it? We could do it using an iterator or the Unfold method to generate the sequence, then counting the number of elements in it: but we're not actually interested in the numbers in the sequence, just the length of the sequence.

We can find the length of the sequence which starts at a particular number by finding the next term in the sequence, then adding one to the length of the sequence that starts at that term. In other words we can tackle this using a recursive function; and to make things interesting, we'll make it a recursive lambda function (this isn't gratuitous lambdaisation - you'll see why I've done it in a minute).

Func<long, long> findSequenceLength = null;
findSequenceLength = (long n) =>
{
 if (n == 1) { return 1; }
 var nextTerm = n.IsEven() ? n / 2 : 3* n + 1;
 return findSequenceLength(nextTerm) + 1;
};

The trick to recursive lambda functions, as Eric Lippert pointed out, is to do what I've done in line 1 in the snippet above and assign null to the lambda before giving its actual definition; otherwise the compiler complains that the variable holding the lambda is being used before it is assigned.

Actually I'm misleading you a bit here, because we've not really created a recursive function at all. Thanks to the magic of closures, the lambda that we've assigned to findSequenceLength is really just calling the delegate stored in the variable findSequenceLength -that's why we need to be careful to assign that variable first (Wes Dyer explains more about this, and shows how to do real anonymous recursion in C#). But that very fact creates an opportunity for us: by changing the delegate that is stored in findSequenceLength we can subtly but usefully change the way findSequenceLength works.

Memoization

If you try out a couple of hailstone sequences, you'll soon discover that there are many that start off differently, but converge onto the same (bumpy) glide path down to the (4,2,1) ending. If you start with 24, for example, the third term in the sequence will be 6, and then will continue exactly the same as the sequence starting at 6. This means that once we have encountered one of these 'glide paths' and calculated its length, we really shouldn't need to do the same calculations all over again.

Memoization is a neat trick you can apply to give your lambdas a memory of what they've calculated before (thanks, again, to Wes Dyer for his post on memoization). Have a look at this code:

public static Func<T, TResult> Memoize<T, TResult>(this Func<T, TResult> function)
{
  var cache = new Dictionary<T, TResult>();

  return (T arg) =>
  {
      TResult result;
      if (cache.TryGetValue(arg, out result))
      {
          return result;
      }
      else
      {
          result = function(arg);
          cache.Add(arg, result);
          return result;
      }
  };
}

When you pass a delegate to Memoize, it will return a new delegate that only ever calls your original one if it's called with an argument it has not seen before; otherwise it supplies the answer from its cache of previous results. Obviously you'll only want to use this on delegates whose results are wholly dependent on the input parameters: if you've got a function that uses the current time, for example, caching results will get you stale answers.

So we can do this:

Func<long, long> findSequenceLength = null;
findSequenceLength = (long n) =>
{
 if (n == 1) { return 1; }
 var nextTerm = n.IsEven() ? n / 2 : 3* n + 1;
 return findSequenceLength(nextTerm) + 1;
};
findSequenceLength = findSequenceLength.Memoize();

Now, because the original lambda stored in findSequenceLength is defined to call whatever delegate is stored in the findSequenceLength variable, when we store the memoized version of the lambda back into findSequenceLength the result is a recursive function that can short-cut any calculations for parameters that it has seen before.

I'll let you untwist your brain before I carry on.

Recovered? Now the final answer is just a step away.

Using our findSequenceLength lambda we can calculate the sequence length for any number we like. But to get the final answer to the problem we don't actually want to know the sequence length: we need to know which number generates the longest sequence. So we can't just generate all the sequence lengths and find the maximum over that set. Instead, we'll generate objects containing both the starting number and the sequence length, and use my MaxItem extension method which finds the maximum object in a sequence, using which ever property of the object you grab with a lambda expression to make the comparison. Like so:

return 1.To(1000000)
          .Select(n => new { n, Length = findSequenceLength(n) })
          .MaxItem(info => info.Length)
          .n;
The full code is shown below, together with the code for MaxItem. Or you can download it from my Project Euler page on MSDN Code Gallery.

Oh, and what difference in performance does Memoization make? 0.85 seconds to run the code on my machine compared with 4.4 seconds when I take out the call to Memoize().

[EulerProblem(14, Title = "Find the longest sequence using a starting number under one million.")]
public class Problem14
{
  public long Solve()
  {
      Func<long, long> findSequenceLength = null;
      findSequenceLength = (long n) =>
      {
          if (n == 1) { return 1; }
          var nextTerm = n.IsEven() ? n / 2 : 3 * n + 1;
          return findSequenceLength(nextTerm) + 1;
      };

      findSequenceLength = findSequenceLength.Memoize();

      return 1.To(1000000)
          .Select(n => new { n, Length = findSequenceLength(n) })
          .MaxItem(info => info.Length)
          .n;
  }
}
public static T MaxItem<T, TCompare>(this IEnumerable<T> sequence, Func<T, TCompare> comparatorSelector) where TCompare : IComparable<TCompare>
{
  T max = sequence.First();
  TCompare maxComparator = comparatorSelector(max);

  foreach (var item in sequence)
  {
      var comparator = comparatorSelector(item);

      if (comparator.CompareTo(maxComparator) > 0)
      {
          max = item;
          maxComparator = comparator;
      }
  }

  return max;

}