Showing posts with label LINQ. Show all posts
Showing posts with label LINQ. Show all posts

Thursday, 21 October 2010

Microsoft PDC 2010 Speculations

We’re now less than a week away from Microsoft PDC 2010. The session list has just been published so I think it’s time to indulge in some speculation as to this years announcements.

C# 5

For me the highlight of recent PDCs has been discovering what Anders and his sous-chefs have been cooking in the C# kitchen. Of all the changes likely to be announced, it is changes to the C# language that are likely to make the most day-to-day difference to my work. In 2005 it was LINQ, in 2008 it was dynamic programming. What will it be in 2010? Well, Anders has been given a slot, and we already have some clues as to what he might say. At his presentation on the future of C# and VB.Net at PDC 2009, Luca Bolognese trailed some of the likely features, including:

  • Compiler as a Service – the C# compiler completely rewritten in C#, and available to use programmatically, allowing all kinds of interesting meta-programming, including the ability to write refactorings.
  • Better support for immutability built into the language
  • Resumable methods.

It was this last one that interested me most. Luca showed an example involving asynchronous programming. Today we do this by calling a Begin* method, and passing in a callback that will be executed when the asynchronous invocation has finished (and we never, ever, forget to call the corresponding End* method!). This can get quite difficult to wrap your head around if you have a whole chain of methods that need to be invoked asynchronously – say in the callback of the second one you kick off another asynchronous method, passing in a callback that runs a third – and so on.

This style of programming actually has a name – Continuation Passing Style (or CPS) – and, from recent developments on Eric Lippert’s blog it seems they’re planning to add support for CPS to the C# language to make it easier to write this kind of code without blowing a fuse in your brain (in typical Eric Lippert style, he is neither confirming or denying what hypothetical features might or might not hypothetically be added to a hypothetical future version of C#, hypothetically speaking).

No doubt Anders will reveal all.

LINQ

Since it was released in 2007, LINQ has colonised .Net code bases at an incredible pace. Parallel LINQ was another great innovation in .Net 4. And Microsoft aren’t done yet. The Reactive Framework (aka Linq-to-Events) is coming on nicely. And this year Bart de Smet has a session tantalisingly entitled “Linq, Take Two: Realizing the LINQ to Everything Dream”. Linq-to-Everything sounds pretty ambitious, so I’m looking forward to hearing what he has to say.

Silverlight

It looks like we can expect some big announcements for Silverlight this year – check the session list for proof: there’s a note against Joe Stegman’s session on Silverlight saying that the abstract is embargoed until after the PDC keynote.

What would I like to see? Well, one of the biggest pain-points I’ve found in my Silverlight programming is that Continuation-Passing-Style programming is thrust upon you whenever you want to show a dialog, or make a call to the server: because Silverlight runs in a browser, you aren’t allowed to block the UI thread, so every potential blocking operation has to be done asynchronously. It would be really nice if C# 5 does simplify CPS, and if that also applied to Silverlight.

Azure

I’ve already made my PDC prediction for Windows Azure: I’m guessing that Microsoft will announce free, enthusiast-scale hosting for Web Applications in Azure. This is looking even more likely now that Amazon have announced a free entry-level version of their EC2 cloud hosting platform.

Thursday 28th October will reveal all. Stay tuned – once again my company has graciously given me time to follow along with the PDC sessions (and this year they’re being transmitted live, as well as being available for download within 24 hours). As on previous occasions, I’m hoping to find time to share with you all what I learn.

Wednesday, 4 August 2010

Practical Linq #6: Anonymous types as Parameter Dictionaries (with applications to Ado.Net and string formatting)

I recently had cause to venture out from the cosy comforts of Linq-to-Sql into the scary wilds of raw ADO.Net. But I’m pleased to tell you I found a few tools in my C# 3 backpack that made the experience more tolerable than I feared.

Take a look at this:

public bool Exists(string tableName)
{
    return ExecuteScalar<int>(
        "SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = @tableName",
        new { tableName }) > 0;
}

The details of the query are irrelevant. What’s interesting is the way I’m using an anonymous type to pass in its parameters. I’m sad to say I’m by no means the first to think of the anonymous-type-as-parameter-dictionary idea – ASP.Net MVC was probably the first place I saw it: but this application is particularly neat in the way the anonymous type appears to capture the local variable on behalf of the query.

So how does it work?

Simple. It uses Reflection to grab the names and values of each property on the anonymous type, and then adds parameters to a SqlCommand accordingly.

Hey. Don’t run away: it only uses Reflection the first time round. Then it uses Expression-tree magic to compile delegates for each of the property accesses so that subsequent calls cost barely anything.

Intrigued?

All aboard the Reflection Express

The crux of the matter is this method here:

private static Func<object, object> CreatePropertyGetter(PropertyInfo propertyInfo)
{
    var instanceParameter = Expression.Parameter(typeof(object), "instance");

    var expression = Expression.Lambda<Func<object, object>>(
        Expression.ConvertChecked(
            Expression.MakeMemberAccess(
                Expression.ConvertChecked(instanceParameter, propertyInfo.DeclaringType),
                propertyInfo),
            typeof(object)),
        instanceParameter);

    var compiledExpression = expression.Compile();

    return compiledExpression;
}

Given a PropertyInfo representing the property we’re interested in, this method creates a delegate that retrieves the value of the property from an instance of the type. If, for example, MyType has AProperty, then all that Expression malarkey amounts to this:

Func<object, object> compiledExpression = instance => (object)(((MyType)instance).AProperty)

(If you’ve not encountered an Expression tree before, think of it as CodeDOM on a diet. Expression trees are data structures representing a fragment of a program. The LambdaExpression is the most interesting specimen, because, as you see, it allows you to compile the expression into a delegate. And whereas CodeDOM fires up a full-blown compiler and creates an Assembly (with all the overhead that entails), LambdaExpressions are compiled into DynamicMethods using Reflection.Emit. All in all, Expression trees allow you to create little programs on the fly in a very light-weight and performant fashion.)

With that in place, we can now take a Type and create a list of assessors for all its public properties:

var type = instance.GetType();

var propertyDetails = (from propertyInfo in type.GetProperties()
                   where propertyInfo.CanRead
                         && propertyInfo.GetGetMethod().GetParameters().Length == 0
                   select new PropertyDetails
                              {
                                  Name = propertyInfo.Name,
                                  GetValue = CreatePropertyGetter(propertyInfo)
                              })
.ToList();

(Before I put lines 4 and 5 in there I was getting rather funky VerificationExceptions when I tried this on certain objects: .Net slapped my wrist and told me the “Operation could destabilize the runtime”. I apologized and sat humbly on the naughty step for a minute until I remembered that Indexers are properties too, and they take parameters, which the delegates made by CreatePropertyGetter don’t allow for.)

Since Types in .Net don’t grow new properties during the execution of a program, we only need to do this reflection once, and we can then cache the result. That’s exactly what I’ve done in the version of the code shown at the bottom of the post, where I’ve packaged all of this into a handy little extension method, GetPropertyValues.

Finally it’s just a case of running over the object, using each of the getters in turn:

IEnumerable<KeyValuePair<string, object>> EnumeratePropertyValues(object instance, IList<PropertyDetails> propertyDetails)
{
    foreach (var property in propertyDetails)
    {
        yield return new KeyValuePair<string, object>(
                                     property.Name, 
                                     property.GetValue(instance));
    }
}

Applications

Returning to my introductory Ado.Net example, here’s how you make use of this:

public TResult ExecuteScalar<TResult>(string commandText, object parametersObject)
{
    using (var connection = new SqlConnection(ConnectionString))
    {
        connection.Open();
        var command = new SqlCommand(commandText, connection);

        foreach (var propertyValue in parametersObject.GetPropertyValues())
        {
            command.Parameters.AddWithValue("@" + propertyValue.Key, propertyValue.Value);
        }

        return (TResult)command.ExecuteScalar();
    }
}

Another example. How often do you get in a muddle with string formatting, trying to keep your values matched up with the {0}s, {1}s and {2}s in the format string? Wouldn’t this be simpler:

var name = "Bob Smith";
var age = 37;
var town = "Bognor Regis";

var result = "{name} is {age} years old and lives in {town}. He likes reading {book}."
             .FormatWithParameters(new { age, name, book = “Winnie-the-Pooh”, town });

As you see FormatWithParameters doesn’t care what order the parameters arrive in.

With GetPropertyValues to hand, it’s this easy:

public static string FormatWithParameters(this string value, object parametersObject)
{
    var formattedString = value;

    foreach (var propertyValue in parametersObject.GetPropertyValues())
    {
        formattedString = formattedString.Replace(
            "{" + propertyValue.Key + "}",
            propertyValue.Value.ToString());
    }

    return formattedString;
}

Ready to Copy-and-Paste

Here’s the full code to the GetPropertyValues extension method:

public static class ObjectExtensions
{
    private static readonly Dictionary<Type, IList<PropertyDetails>> PropertyDetailsCache = new Dictionary<Type, IList<PropertyDetails>>();

    public static IEnumerable<KeyValuePair<string, object>> GetPropertyValues(this object instance)
    {
        if (instance == null)
        {
            throw new ArgumentNullException("instance");
        }

        var propertyDetails = GetPropertyDetails(instance.GetType());
        return EnumeratePropertyValues(instance, propertyDetails);
    }

    private static IEnumerable<KeyValuePair<string, object>> EnumeratePropertyValues(object instance, IList<PropertyDetails> propertyDetails)
    {
        foreach (var property in propertyDetails)
        {
            yield return new KeyValuePair<string, object>(property.Name, property.GetValue(instance));
        }
    }

    private static IList<PropertyDetails> GetPropertyDetails(Type type)
    {
        lock (PropertyDetailsCache)
        {
            if (!PropertyDetailsCache.ContainsKey(type))
            {
                var details = CreatePropertyDetailsList(type);
                PropertyDetailsCache.Add(type, details);
            }

            return PropertyDetailsCache[type];
        }
    }

    private static List<PropertyDetails> CreatePropertyDetailsList(Type type)
    {
        var propertyDetails = (from propertyInfo in type.GetProperties()
                               where propertyInfo.CanRead
                                     && propertyInfo.GetGetMethod().GetParameters().Length == 0
                               select new PropertyDetails
                                          {
                                              Name = propertyInfo.Name,
                                              GetValue = CreatePropertyGetter(propertyInfo)
                                          })
            .ToList();

        return propertyDetails;
    }

    private static Func<object, object> CreatePropertyGetter(PropertyInfo propertyInfo)
    {
        var instanceParameter = Expression.Parameter(typeof(object), "instance");

        var expression = Expression.Lambda<Func<object, object>>(
            Expression.ConvertChecked(
                Expression.MakeMemberAccess(
                    Expression.ConvertChecked(instanceParameter, propertyInfo.DeclaringType),
                    propertyInfo),
                typeof(object)),
            instanceParameter);

        var compiledExpression = expression.Compile();

        return compiledExpression;
    }

    private class PropertyDetails
    {
        public string Name { get; set; }
        public Func<object, object> GetValue { get; set; }
    }
}

Monday, 14 June 2010

Project Euler 59: Cracking an XOR code

Code breaking is about as glamorous as it gets for a Geek. So when it comes to Project Euler Problem 59, think 24, with Jack Bauer unscathed amidst a hail of bullets, memory stick in one hand, terrorist’s neck throttled in the other, threatening terrible retribution to the bad guy’s family if the secret goes to the grave with him. Terrorist rasps out,

“There’s a file … cipher1.txt … ASCII format … urrrh … XOR … 3 … letter … password … lower case … all English”

What would Chloe O’Brian make of that when she received the file over a secure socket from Jack’s Smart Phone? Being the bright girl she is, I’ve no doubt that she’d fire up MonoDevelop (only the dumb FBI would use Visual Studio and Windoze) and write the following:

var encryptedBytes = File.ReadAllText(dataPath)
                        .Split(',')
                        .Select(byte.Parse)
                        .ToArray();

That would get her the encrypted data from the file as a sequence of bytes. Now to generate all possible passwords:

const byte FirstChar = (byte)'a';
const byte LastChar = (byte)'z';

var allPossiblePasswords = from a in FirstChar.To(LastChar)
                           from b in FirstChar.To(LastChar)
                           from c in FirstChar.To(LastChar)
                           select new[] { a, b, c };

XOR encryption requires that the password be repeated cyclically, abcabcabc.... So given a sequence containing the characters of password, how about a Repeat extension method that repeats the sequence ad infinitum? (if you look at the following code and hear the Infinite Loop alarms go off, remember that iterators are lazily evaluated):

public static IEnumerable<T> Repeat<T>(this IEnumerable<T> sequence)
{
    while (true)
    {
        foreach (var item in sequence)
        {
            yield return item;
        }
    }
}

While she’s at it, she’ll need to Xor two sequences together:

public static IEnumerable<byte> Xor(this IEnumerable<byte> sequenceA, IEnumerable<byte> sequenceB)
{
    return sequenceA.Zip(sequenceB, (left, right) => (byte)(left ^ right));
}

This uses the Zip method, new in .Net 4.0, which merges two sequences together, allowing elements from left and right sequences to be processed in pairs (if you’re surprised, as I was, that the (byte) cast is necessary, see this question on Stack Overflow).

Now she can try out all the passwords, seeing what they give her if she uses them to decrypt the text. Note that Chloe has already adopted .Net 4.0 and the PLINQ extensions, so she’s using AsParallel() to max out her 128 Core workstation. (ConvertAsciiBytesToString is another extension method that simply delegates to ASCIIEncoding.GetString).

var possibleDecryptions = from trialPassword in allPossiblePasswords.AsParallel()
                          let repeatedPassword = trialPassword.Repeat()
                          select encryptedBytes
                                          .Xor(repeatedPassword)
                                          .ConvertAsciiBytesToString();

But she’s hardly going to spend the next 24 hours trawling through the possible decryptions to pick out the one in recognisable English from the 17575 in gobledygook. So she needs some means of assessing them automatically. This is where Chloe is forever in my debt. She read last weeks episode of Functional Fun, so she knows about my LanguageAnalyser that uses frequency analysis to determine how likely a sample of text is to be written in a particular language.

Rather than using the analyser to determine which of several languages the sample is written in, Chloe can use her prior knowledge that the coded message is written in English, and determine which of the possible decryptions has letter frequencies that most closely match those in English.

var languageAnalyser = new LanguageAnalyser();

var decryptionsWithDistanceToEnglish = from decryptedString in possibleDecryptions
                                       select new
                                                  {
                                                      DecryptedString = decryptedString,
                                                      Distance = languageAnalyser.DetermineCloseness(decryptedString, LanguageProfiles.English)
                                                  };

var mostLikelyDecryption = decryptionsWithDistanceToEnglish.MinItem(p => p.Distance);

And that's it. Thanks to Linguistic Geometry and LINQ, Chloe can report the decrypted (and surprisingly biblical!) message to Jack, so that he can get on with saving the world.

Ahem. Before we got carried away we were trying to solve Project Euler Problem 59, and that asks for the sum of the Ascii values of the decrypted message. So we need one more step:

var sumOfAsciiValues = mostLikelyDecryption.DecryptedString
                        .ConvertStringToAsciiBytes()
                        .Aggregate(0, (runningTotal, next) => runningTotal += next);

The full code is shown below, and is also available on MSDN Code Gallery.

namespace ProjectEuler
{
    [EulerProblem(59, Title="Using a brute force attack, can you decrypt the cipher using XOR encryption?")]
    public class Problem59
    {
        public long Solve()
        {
            const byte FirstChar = 97;
            const byte LastChar = 122;

            var dataPath = @"ProblemData\Problem59.txt";

            var encryptedBytes = File.ReadAllText(dataPath)
                                    .Split(',')
                                    .Select(byte.Parse)
                                    .ToArray();

            var allPossiblePasswords = from a in FirstChar.To(LastChar)
                                       from b in FirstChar.To(LastChar)
                                       from c in FirstChar.To(LastChar)
                                       select new[] { a, b, c };

            var possibleDecryptions = from trialPassword in allPossiblePasswords.AsParallel()
                                      let repeatedPassword = trialPassword.Repeat()
                                      select encryptedBytes
                                          .Xor(repeatedPassword)
                                          .ConvertAsciiBytesToString();

            var languageAnalyser = new LanguageAnalyser();

            var decryptionsWithDistanceToEnglish = from decryptedString in possibleDecryptions
                                                   select new
                                                              {
                                                                  DecryptedString = decryptedString,
                                                                  Distance = languageAnalyser.DetermineCloseness(decryptedString, LanguageProfiles.English)
                                                              };

            var mostLikelyDecryption = decryptionsWithDistanceToEnglish.MinItem(p => p.Distance);

            var sumOfAsciiValues = mostLikelyDecryption.DecryptedString
                .ConvertStringToAsciiBytes()
                .Aggregate(0, (runningTotal, next) => runningTotal += next);

            return sumOfAsciiValues;
        }
    }
}

Thursday, 10 June 2010

Practical Linq #5: Guessing the language of a piece of text

Google Chrome is magic: getting ready for my summer holiday to Brugge, I hit the city’s homepage. A little bar popped up at the top of the page: “This page is in Dutch. Would you like to translate it?”

image

How does Chrome know that? I suppose the domain name “.be” is one clue, but they speak French as well as Dutch in Belgium, so that by itself is not enough. Chrome does the same trick for pages written in a whole raft of languages, from Afrikaans to Yiddish.

I don’t know what Google’s secret sauce is, but I can show you one simple and surprisingly effective technique with an elegant implementation using LINQ in C#.

Studies in Linguistic Geometry

It is built around the observation that in each sufficiently large sample of text in a given language, letters of the alphabet will all occur with roughly the same frequencies. In English for example, the letter ‘e’ tops the popularity charts, making up about 13% of the average body of text. ‘t’ and ‘a’ are next, accounting for 9% and 8%. By contrast, in Portuguese, ‘a’ is the most frequent, at 15%, with ‘e’ and ‘o’ close behind.

So letter frequencies can be used as a kind of signature, or DNA profile, for a language. But given a sample of text, how do we decide which language signature it matches most closely? This is where we switch from linguistics to geometry.

In 2-dimensional space, we calculate the distance between two points using Pythagoras’ theorem. image The same thing works for two points in the real, 3-d, world. The distance from (x1, y1, z1) to (x2, y2, z2) is

image

But why stop there? What about the distance between two points in 26-dimensional space? You guessed it!

image

26-d space? Why bring that into the equation? Well think about those language signatures, giving the frequency of occurrence of each letter. You can reinterpret those signatures as points in 26-d space: the point representing the signature for English would be at 0.13 along the ‘e’ axis, 0.09 along the ‘t’ axis, 0.08 along the ‘a’ axis, and so on.

This then gives us our straightforward method of determining which language our sample text belongs to. We do a frequency analysis on its letters, then imagine the result of that analysis as a point in 26-d space, and work out which language signature it lies closest to using Pythagoras. Easy!

So how does it look in LINQ?

Linq to Linguistics

We start off with the frequency analysis (taking care to normalise the text to lower-case):

IEnumerable<CharacterFrequency> CalculateCharacterFrequencies(string sample)
{
    var characterFrequencies = sample
        .Select(char.ToLower)
        .GroupBy(c => c)
        .Select(group => new CharacterFrequency
        {
            Character = group.Key,
            Frequency = (double)group.Count() / sample.Length
        });

    return characterFrequencies;
}

Next we introduce a LanguageSignature class to hold the signature of each language:

[DebuggerDisplay("Language = { Language }")]
public class LanguageSignature
{
    private IDictionary<char, double> _frequencies;

    public LanguageSignature(string language, IDictionary<char, double> characterFrequencies)
    {
        Language = language;
        _frequencies = characterFrequencies;
    }

    public string Language { get; protected set; }

    public double GetFrequency(char character)
    {
        double frequency;
        return _frequencies.TryGetValue(character, out frequency) ? frequency : 0;
    }
}

Here are some snippets of sample signatures, data courtesy of Wikipedia

public static class LanguageSignatures
{
    public static LanguageSignature English =
        new LanguageSignature(
            "English",
            new Dictionary<char, double>
                {
                    {'a', 0.08167},
                    {'b', 0.01492},
                    {'c', 0.02782},
                    /// etc.
                });

    public static LanguageSignature French =
        new LanguageSignature(
            "French",
            new Dictionary<char, double>
                {
                    {'a', 0.07636},
                    {'b', 0.00901},
                    // etc.
                });

    
}

Now we do our sums in 26-d space, ignoring any letters that we don’t have frequency data for (note that Sqrt is a little extension method I created on the Double class that simply calls Math.Sqrt):

double CalculateDistanceFromSignature(LanguageSignature signature, IEnumerable<CharacterFrequency> characterFrequencies)
{
    var distance = characterFrequencies
        .Where(characterFrequency => signature.GetFrequency(characterFrequency.Character) > 0)
        .Select(characterFrequency
            => Math.Pow(characterFrequency.Frequency - signature.GetFrequency(characterFrequency.Character), 2))
        .Sum()
        .Sqrt();
    return distance;
}

Now we’re ready to determine which language our sample is closest to:

public LanguageSignature DetermineMostLikelyLanguage(string sample, IEnumerable<LanguageSignature> signatures)
{
    var characterFrequencies = CalculateCharacterFrequencies(sample);

    var closestLanguage = signatures.Select(signature => new
    {
        Language = signature,
        Distance = CalculateDistanceFromSignature(signature, characterFrequencies)
    })
    .MinItem(languageDistance => languageDistance.Distance);

    return closestLanguage.Language;
}

MinItem is an extension method I knocked up that returns the item from a sequence that yields the smallest value from the expression you supply (in this case, the smallest distance):

public static T MinItem<T, TCompare>(this IEnumerable<T> sequence, Func<T, TCompare> comparatorSelector) where TCompare : IComparable<TCompare>
{
    var minItem = sequence.Aggregate(
        sequence.First(), 
        (current, min) => comparatorSelector(current).CompareTo(comparatorSelector(min)) < 0 ? current : min);

    return minItem;
}

And we’re done.

Testing, Testing, Un, deux, trois

To prove it, I created a little test that analysed samples of texts in several languages taken from Project Gutenberg (the test uses MbUnit’s very handy XmlData feature to read the different test cases from my xml file).

public class LanguageAnalyserTests
{
    [Test]
    [XmlData("//Sample", FilePath = "ProblemData\\LanguageSamples.xml")]
    public void TestAnalysis([Bind("Text")]string text, [Bind("@Language")]string expectedLanguage)
    {
        var analyser = new LanguageAnalyser();
        var determinedLanguage = analyser.DetermineMostLikelyLanguage(
            text
            new[] { LanguageSignatures.English, LanguageSignatures.French, LanguageSignatures.German, LanguageSignatures.Dutch, LanguageSignatures.Spanish, LanguageSignatures.Italian });

        Assert.AreEqual(expectedLanguage, determinedLanguage.Language);
    }
}

In my test I’ve got 9 samples in German, English, French, Dutch, Spanish and Italian, extracted at random from the Gutenberg archives and each sample is correctly identified by my analyser. I also did a little experiment to see how much text it needed to correctly distinguish the samples. Limiting the samples to 600 characters caused some confusion between Spanish and English, but 700 characters was sufficient in my tests to enable the analyser to pick the right language for all samples.

A Challenge

I actually developed this LanguageAnalyser to help me solve a problem on Project Euler. Which one was it? Answers in the comments please, or by email, and bonus marks to anybody who can guess what my solution looks like. Check back Monday for the answer, and for full source code to today’s post.

Update: The answer's here, and the source code is now on MSDN Code Gallery

Wednesday, 21 April 2010

Quick C# Tip: You can use Collection Initializers without creating a Collection

Thanks to some code I saw on Ayende's blog, I’ve just learnt something about C# 3.0 that I never knew before: you don’t have to be creating a brand new collection in order to use a collection initializer.

Here’s what I mean. Collection initializers let you populate a collection without the noise of repeated “Add(…)” calls. Like this:

var myStrings = new List<string> { "A", "B", "C" };

Or if you are fleshing out a collection that is part of another object, you can do this

var instance = new MyType 
               { 
                    MyStrings = new List<string> { "A", "B", "C" }
               };

What I have only just realised is that if the collection property you are assigning to is read only or already has a value, you can still use a collection initializer - you just miss out the construction of the collection type:

var instance = new MyType { MyStrings = { "A", "B", "C" } };

When I first saw this, I didn’t think it was real C# until I checked it myself in LinqPad. But it’s right there in the C#3.0 Specification.

Thanks Ayende!

Saturday, 23 January 2010

Simulating Snakes and Ladders with a PLINQ Turbo boost

“Daddy, will you play a game with me?”

I’d been left in charge, whilst my wife had some well-earned time to herself. Baby brother had been bounced to sleep, and now big sister was wanting some attention.

“Sure! What shall we play?”

My daughter fished around in the cupboard, then thumped a box on the table. My heart sank. “Snakes and Ladders”, she announced.

Snakes and Ladders

For the benefit of any reader who has the good fortune never to have thus occupied their children, Snakes and Ladders is a game played on 10 x 10 board wherein players take it in turns to move across the board boustrophedonically (that is, left to right then right to left – I’ve been itching to work that word into a blog post), on each go moving the number of spaces indicated by the throw of a die. The winner is the first person to reach the last space on the board. The only modicum of excitement to be found in the game is provided by the eponymous snakes and ladders, which connect certain squares on the board. Land on a square at the foot of a ladder, and you zoom ahead to the square at its top. But beware the squares where snakes lie in wait: land here, and you must slide back down the board to the snake’s tail.

You see then, why I wasn’t thrilled at my daughter’s choice: I don’t expect much from a children’s game, but it is nice to be sure that it will finish before one’s brain goes into power-save mode. And no mathematician would be prepared to give such a guarantee about this game: it is perfectly possible to find yourself within a throw of winning, but the next moment unceremoniously snaking your way back to the beginning, to start all over again.

So as I fixed my grin in place and started rolling the die, I got to wondering: how long till we can play something else? How many turns does a game of Snakes and Ladders last, on average? By the time I’d slid down my first snake, I knew how I could find out: I would set my computer playing snakes and ladders, over and over, and instruct it to count how many turns each game took. Apply a little high-school Maths and Stats to the results, and I’d have my answer.

I will confess the game board received little of my attention from then on, as code began to assemble in the editor in my mind (which has syntax highlighting, but is lacking Intellisense - CommonSense too, my wife would tell you). When I realised that I could use PLINQ to parallelise the code and speed up the simulation, my fingers began itching to get at the keyboard.

Here’s how the code turned out, once I fired up Visual Studio.

Modelling the Game

First, a fluent interface for defining the game board:

var board = new BoardBuilder()
            .Length(100)
            .Ladder().From(1).To(38)
            .Ladder().From(4).To(14)
            .Ladder().From(9).To(31)
            .Snake().From(16).To(6)
            .Ladder().From(21).To(42)
            .Ladder().From(28).To(84)
            .Ladder().From(36).To(44)
            .Snake().From(47).To(26)
            .Snake().From(49).To(11)
   //...
   .Build()

Then the game logic. The GameBoard built by the BoardBuilder consists of a number of Places. Each Place has a reference to the Place following it on the board, and if it is at the foot of a ladder or the head of a snake then it has a reference to the Place at the other end of the link. Most of the logic of the game simulation is contained in the MoveOn method. This method is defined recursively. It is passed the numberOfPlaces that the player should move, and it returns a reference to the Place they end up at, taking into account any snakes or ladders leaving the final square. For simplicity, I’ve decided that players cannot move beyond the last space on the board – they simply get stuck there, whatever they throw.

    class Place
    {
        private Place _ladderTo;
        private Place _chuteTo;
        private Place _nextPlace;

  // ...

        public Place MoveOn(int numberOfPlaces)
        {
            Contract.Requires(numberOfPlaces >= 0, "numberOfPlaces must be positive");

            if (numberOfPlaces > 0)
            {
                if (IsLastSpace)
                {
                    return this;
                }
                else
                {
                    return _nextPlace.MoveOn(numberOfPlaces - 1);
                }
            }

            if (_ladderTo != null)
            {
                return _ladderTo;
            }
            else if (_chuteTo != null)
            {
                return _chuteTo;
            }
            else
            {
                return this;
            }
        }

        // ...
    }
}

Running the Simulation

At last, the simulation:

var gameLengths
    = 1.To(numberOfTrials)
    .Select(trial =>
        {
            var die = new Die();
            return board.
           FirstPlace
           .Unfold(place => place.MoveOn(die.Roll()))
           .TakeWhile(place => !place.IsLastPlace)
           .Count();
        })
    .ToList();

Not much to look at, is it? I start by generating a simple sequence, 1.To(numberOfTrials), and for each item in that sequence I instruct the computer to play a LINQified game of Snakes and ladders, and to count how many moves it takes to get to the end of the board. Each game starts by creating a new Die (would be handy if we could do that in real life – we’re always losing ours). Then, using the Unfold method, seeded with the first place on the board, we generate a sequence of all the spaces landed on during that game. We pull items out of this sequence until we hit the last place on the board, at which point we count how many moves the game took.

Analysing the Results

Of course, we want to do some analysis on the results. LINQ makes it trivial to find the average length of a game:

var averageLength = games.Average();

This turns out to be 35.8 moves.

More interesting is the frequency distribution of game lengths which tells us how often we can expect to play games of a given length.

var frequencyDistribution =
    gameLengths
   .GroupBy(gameLength => gameLength)
   .OrderBy(gameLengthGroup => gameLengthGroup.Key)
   .Select(gameLengthGroup => gameLengthGroup.Key + "," + gameLengthGroup.Count() + "," + gameLengthGroup.Count() / (double)numberOfTrials)
   .ToList();
System.IO.File.WriteAllLines("SnakesAndLadderResult.csv", frequencyDistribution);

gameLengths is the sequence containing the length of each simulated game. This query is counting how many games of each length were encountered in the simulation by grouping together all games of the same length. After ordering the groups, shortest game length first, it creates a string for each group, listing the length, the absolute number of games of that length and the frequency of that length of game. File.WriteAllLines is a neat method that (from .Net 4.0 onwards) takes an IEnumerable<string> and writes each string as a new line to a file, giving us an instant CSV file from which Excel can draw a pretty chart: SnakesAndLadders_FrequenciesOfGameLengths

As you can see at a glance, the shortest game I can hope for takes 7 moves, but that’s pretty unlikely to happen – about once in a thousand games. The most common length of game encountered in the simulation was 19. Now let’s transform this into a cumulative frequency chart (which shows how often games take a given number of moves or fewer): imageThis shows that in at least half of all games I’ll need to throw the dice 28 times or more before getting to the end. Worse, there’s a 10% chance that it’ll be 68 moves or more before I beat those slippery snakes.

The PLINQ Part

Where does PLINQ come into all this? And what is PLINQ anyway? PLINQ is an extension to LINQ, part of the Parallel Tasks library that is shipping as part of .Net 4.0. You know how you splashed out for that quad-core, hyper-threading enabled processor and geeked out over all those little processor utilisation charts in Task Manager – then noticed that your CPU rarely hits more than 12.5% total utilisation? Well PLINQ helps you put those lazy cores to work. MultiCoreTaskManagerWatch closely:

var gamesLengths
    = 1.To(numberOfTrials)
    .AsParallel()
    .Select(trial =>
        {
            var die = new Die();
            return board.
        FirstPlace
        .Unfold(place => place.MoveOn(die.Roll()))
        .TakeWhile(place => !place.IsLastPlace)
        .Count();
        })
    .ToList();

Spot it? Check out line 3. AsParallel(), that’s all you need to add. It takes a standard Linq-to-Objects query, and splits the work across multiple threads. On my bog-standard dual-core home laptop, that change took the time to run 1 million iterations down from 58 seconds to 36. That’s a 1.5x speed-up – by changing one line of code! And the best thing is, the speed-up increases as the number of cores in your machine goes up. On my Core i7 Quad core box at work, the simulation took 7 seconds in its parallelised form – and 9 seconds without the AsParallel magic! Still a 1.3x speed up – but overshadowed by the sheer awesomeness of the Nehalem architecture!

I should point out one subtlety here. You might have been wondering why it was necessary to create a new Die for every game? Why not share a die between games, like you do in the real world? The reason is the multiple threads introduced by the AsParallel() call. The Roll() method on Die calls through to Random.Next() under the covers, and the Next method isn’t thread-safe.

The easiest fix then, is to give each game its own Die. But there’s another thing: with all those threads beavering away simultaneously there’s a good chance that several Die will be created at the same time, each initializing their own instance of Random at the same time. Since Random by default seeds itself from the system clock, this would result in identical games being played – not exactly representative of the real world.

So what I do is have a static instance of Random which I use to seed all the others, taking care to take a lock on it of course:

class Die
{
    private Random _random;
    private static Random _seedGenerator = new Random();

    public Die()
    {
        lock (_seedGenerator)
        {
            _random = new Random(_seedGenerator.Value.Next());
        }
    }

    public int Roll()
    {
        return _random.Next(1, 7);
    }
}

Try it for yourself

As always the complete source code is available for download from MSDN Code Gallery. Try it out, and let me know what you think.

Thursday, 22 October 2009

Getting the MethodInfo of a generic method using Lambda expressions

Getting hold of the MethodInfo of a generic method via Reflection (so you can invoke it dynamically, for example) can be a bit of a pain. So this afternoon I formulated a pain-killer, SymbolExtensions.GetMethodInfo. It’s not fussy: it works for non-generic methods too. You use it like this:

internal class SymbolExtensionsTests
{
    [Test]
    public void GetMethodInfo_should_return_method_info()
    {
        var methodInfo = SymbolExtensions.GetMethodInfo<TestClass>(c => c.AMethod());
        methodInfo.Name.ShouldEqual("AMethod");
    }

    [Test]
    public void GetMethodInfo_should_return_method_info_for_generic_method()
    {
        var methodInfo = SymbolExtensions.GetMethodInfo<TestClass>(c => c.AGenericMethod(default(int)));

        methodInfo.Name.ShouldEqual("AGenericMethod");
        methodInfo.GetParameters().First().ParameterType.ShouldEqual(typeof(int));
    }

    [Test]
    public void GetMethodInfo_should_return_method_info_for_static_method_on_static_class()
    {
        var methodInfo = SymbolExtensions.GetMethodInfo(() => StaticTestClass.StaticTestMethod());

        methodInfo.Name.ShouldEqual("StaticTestMethod");
        methodInfo.IsStatic.ShouldBeTrue();
    }
}

The active ingredient, as you can see, is Lambda expressions:

using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using System;

public static class SymbolExtensions
{
    /// <summary>
    /// Given a lambda expression that calls a method, returns the method info.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="expression">The expression.</param>
    /// <returns></returns>
    public static MethodInfo GetMethodInfo(Expression<Action> expression)
    {
        return GetMethodInfo((LambdaExpression)expression);
    }

    /// <summary>
    /// Given a lambda expression that calls a method, returns the method info.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="expression">The expression.</param>
    /// <returns></returns>
    public static MethodInfo GetMethodInfo<T>(Expression<Action<T>> expression)
    {
        return GetMethodInfo((LambdaExpression)expression);
    }

    /// <summary>
    /// Given a lambda expression that calls a method, returns the method info.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="expression">The expression.</param>
    /// <returns></returns>
    public static MethodInfo GetMethodInfo<T, TResult>(Expression<Func<T, TResult>> expression)
    {
        return GetMethodInfo((LambdaExpression)expression);
    }

    /// <summary>
    /// Given a lambda expression that calls a method, returns the method info.
    /// </summary>
    /// <param name="expression">The expression.</param>
    /// <returns></returns>
    public static MethodInfo GetMethodInfo(LambdaExpression expression)
    {
        MethodCallExpression outermostExpression = expression.Body as MethodCallExpression;

        if (outermostExpression == null)
        {
            throw new ArgumentException("Invalid Expression. Expression should consist of a Method call only.");
        }

        return outermostExpression.Method;
    }
}

Saturday, 10 October 2009

Practical LINQ #4: Finding a descendant in a tree

Trees are everywhere. I don’t mean the green, woody variety. I mean the peculiar ones invented by programmers that have their root somewhere in the clouds and branch downwards. As inevitably as apples fall from trees on the heads of meditating physicists, so coders find themselves writing code to traverse tree structures.

In an earlier post we explored the tree created by the UI Automation API representing all the widgets on the Windows desktop. One very common tree-traversal operation is finding a particular child several nodes down from a root starting point. I gave the example of locating the Paint.Net drawing canvas within the main window of the application:

// find the Paint.Net drawing Canvas
 var canvas = mainWindow.FindDescendentByIdPath(new[] {
     "appWorkspace",
       "workspacePanel",
         "DocumentView",
           "panel",
             "surfaceBox" });

What we have here is an extension method, FindDescendentByIdPath, that starts from a parent element and works its way down through the tree, at each level picking out the child with the given Automation Id. In my last post, I skipped over my implementation of this method, but it deserves a closer look because of the way it uses Functional techniques and LINQ to traverse the hierarchy.

So here it is:

public static AutomationElement FindDescendentByIdPath(this AutomationElement element, IEnumerable<string> idPath)
{
    var conditionPath = CreateConditionPathForPropertyValues(AutomationElement.AutomationIdProperty, idPath.Cast<object>());

    return FindDescendentByConditionPath(element, conditionPath);
}

public static IEnumerable<Condition> CreateConditionPathForPropertyValues(AutomationProperty property, IEnumerable<object> values)
{
    var conditions = values.Select(value => new PropertyCondition(property, value));

    return conditions.Cast<Condition>();
}

public static AutomationElement FindDescendentByConditionPath(this AutomationElement element, IEnumerable<Condition> conditionPath)
{
    if (!conditionPath.Any())
    {
        return element;
    }

    var result = conditionPath.Aggregate(
        element,
        (parentElement, nextCondition) => parentElement == null
                                              ? null
                                              : parentElement.FindChildByCondition(nextCondition));

    return result;
}

The first thing we do is convert the sequence of Automation Id strings into a sequence of Conditions (actually PropertyConditions) that the UI Automation API can use to pick out children of parent elements. This is handled for us by the CreateConditionPathForPropertyValues method in line 8.

The actual method of interest is FindDescendentByConditionPath, starting in line 15. Here we put the Enumerable.Aggregate method to a slightly unconventional use. Considered in the abstract, the Aggregate method takes elements from a sequence one by one and performs an function (of your choice) to combine the element with the previous result; the very first element is combined with a seed value.

In this case the seed value is the parent element at the top of tree, the sequence of elements is the list of conditions that we use to pick out the child at each level of the tree. And we provide a lambda function that, each time it is called, takes the element it found in the previous iteration, together with the next condition from the list and uses an extension method that I demonstrated in the earlier blog post, FindChildByCondition, to find the appropriate child.

I’ve found this method a great help when monkeying around in UI Automation trees. If you think you might, look for it in the the source code from last time.

Friday, 1 May 2009

Practical LINQ #3: Compacting ranges of numbers

Here’s a problem that has popped up several times in one guise or another during the six years of my career:

Given a list of numbers, present it to the user in its most compact form.

So if you have 1, 3, 4, 5, 9, 11, 12, 13 it should be presented as 1, 3 – 5, 9, 11 – 13. It’s what your brain does automatically when you fill out the “Selected Pages” box in the Print dialog of Microsoft Word.

I’m sure you can imagine for yourself the kind of for-loop I crafted the last time I solved this problem. This time I wanted to make life more interesting.

The first thing is to make sure the numbers are marshalled in ascending order, with no duplicates in the ranks:

var preparedList = numbers.Distinct().OrderBy(x => x);

For the next step, I imagined into being a little Scanner class, with a nice fluent interface. I wrote down the following code, which was how I wanted the Scanner to work; then, with Resharper pointing out my omissions in glaring red text, I implemented the concepts to make it work:

public IEnumerable<Range> Compact(IEnumerable<int> numbers)
{
    var preparedList = numbers.Distinct().OrderBy(x => x);

    var scanner = new Scanner<int, Range>();

    scanner.ConfigurePattern()
        .StartMatchingWhenAny()
        .ContinueMatchingWhile((nextItem, matchedItems) => nextItem == matchedItems.Last() + 1)
        .Output(details => new Range(
                                    details.MatchedItems.First(),
                                    details.MatchedItems.Last()));

    return scanner.Scan(preparedList);
}

The idea is for the Scanner to walk a sequence of objects (ints in this case), producing from it a sequence of tokens (here, Range objects – simple structs with First and Last properties). The Scanner produces its output by matching patterns of contiguous input elements. The patterns are configured using a fluent interface:

  • the condition that must be met in order to start matching this pattern (any integer is accepted at the start of this simple pattern)
  • which items are considered part of the pattern: here an integer is accepted if it is only one bigger than the previous matched integer
  • how to generate a token from the matched part of the sequence

The internal workings of the scanner aren’t terribly interesting. See the download link below if you’re desperate for a look.

Now don’t laugh: String.Join is my discovery of the week. It concatenates an array of strings, putting a separator of your choice between them; leaving you free to fry bigger fish than how to put a comma after every item except the last.

So my final answer looks like this:

var numbers = new[] {1, 3, 4, 5, 9, 11, 12, 13};

var compacter = new RangeCompacter();
var ranges = compacter.Compact(numbers);

// create a pretty string, each range or number separated by a comma
var display = string.Join(
    ", ", 
    ranges
    .Select(range => range.ToString())
    .ToArray());

Assert.AreEqual("1, 3 - 5, 9, 11 - 13", display);

I’ve put all the code, including my Scanner class, on Code Gallery; there’s even a sprinkling of unit tests! If you find any other uses for it, do let me know.

Friday, 27 March 2009

Project Euler 21: Getting friendly with the Amicable numbers

I’ve remarked before that to Mathematicians, numbers aren’t, … well, they’re not just numbers. Numbers have personalities. Whilst some are deficient, there are others which are aspiring, and a few who are actually perfect;  half of them are rather odd, some downright weird. Of more concern are the odious and evil numbers, especially those that consider themselves untouchable1.

But today we’ll look at a more friendly subject: the amicable numbers. What is an amicable number? Let’s get mathematical and define a function d(n) to be the sum of all proper divisors of n (a proper divisor of n is any divisor of n less than n). Then an amicable number a is one which has a friend b where d(a) = b and d(b) = a (note that we don’t allow a to be narcissistic and count itself as a friend). If you do the math (or these days, more realistically, the google search) you’ll find that the first pair of amicable numbers is 220 and 284.

Amicable numbers have been studied since Greek times: the Pythagoreans in particular credited these special numbers with mystical powers. They’ve even by tried out as a kind of mathematical aphrodisiac:

El Madshriti, an Arab of the 11th century, experimented with the erotic effects of amicable numbers by giving a beautiful woman the smaller number 220 to eat in the form of a cookie, and himself eating the larger 284!

My sudden interest in amicable numbers was driven by strictly mathematical desire  - to solve Project Euler 212. The problem simply asks us to find the sum of all amicable numbers less than 10000. And in fact, the solution to the problem pretty much falls out of its definition.

First, I’ll borrow a function from a few problems back that lists all the divisors of a number:

public static class NumericExtensions
{
    public static IEnumerable<int> Divisors(this int number)
    {
        return (from factor in 1.To((int)Math.Sqrt(number))
                where number.IsDivisibleBy(factor)
                select new [] { factor, number / factor }).Concat();
    }

}

With that, the answer is straightforward:

[EulerProblem(21, Title = "Evaluate the sum of all amicable pairs under 10000.")]
public class Problem21
{
    public long Solve()
    {
        Func<int, int> d = n => n.Divisors().Sum() - n;

        return 	(
				from a in 1.To(10000)
                let b = d(a)
                where a != b && d(b) == a
                select a
				)
                .Sum();
    }
}

As always, full code is available on the Project Euler Code Gallery page.

Footnotes
  1. If you want to learn more about number personalities, see this catalogue.
  2. Thanks to Bela Istok who sent me a nice functional solution to the problem, which prompted me to think about how I would solve it myself.

Wednesday, 4 March 2009

Practical LINQ #2: Reporting duplicate names

So my Widget editor can helpfully suggest default names, but it also lets users change the names of widgets. Ohh! Danger ahead: what if they give two (or more) widgets the same name? We can’t have that! The poor dears might confuse themselves. So we better make sure we warn them; that means we have to write code to detect duplicate names. And that gives me another opportunity to show some elegant LINQy C# for solving this real-life problem.

My requirements are simple: I want to generate an error for all widgets with a shared name, except for the first one with that name – no point in overburdening a user with error messages. I also want to ignore Widgets without a name – I’ve got another rule that deals with them.

The code is straightforward: group the widgets by name, and ignore any groups containing just one widget. Then for each group generate error messages for all items except the first. Translating that into proper English:

private static IEnumerable<TMessage> DetectDuplicates<T,TMessage>(IEnumerable<T> instances, Func<T, string> nameSelector, Func<T, TMessage> messageGenerator)
{
   var groupedDuplicates = instances
       .GroupBy(nameSelector)
       .Where(group => group.Count() > 1 && !string.IsNullOrEmpty(group.Key));

   var errors = groupedDuplicates
       .SelectMany(group => group.
                                Skip(1)
                                .Select(messageGenerator));

   return errors;
}

All the angle brackets in the method signature can mean only one thing: I’ve made the method generic, so it doesn’t just work with my Widgets; it will work with anything that has a property vaguely resembling a name. Along with your list of instances, you pass in a function that can extract a name from each of your objects, and a function that will generate the appropriate message for each duplicate. In fact it doesn’t just have to a message: you could return a rich error object if you wanted to be really sophisticated.

Here’s an example1:

static void Main()
{
   var widgets = new[]
                     {
                         new Widget {Name = "Super"},
                         new Widget {Name = "Competent"},
                         new Widget {Name = "Competent"}
                     };

   var errors = DetectDuplicates(
       widgets,
       w => w.Name,
       w => "There's already a widget called " + w.Name + ". Be more original!");

   foreach (var error in errors)
   {
       Console.WriteLine(error);
   }

   Console.ReadLine();
}

Footnotes

  1. The name of one of the widgets here was inspired by a brand of AEG oven that I spotted the other day: they named them “Competence”. I thought brand names were supposed to be aspirational!

Thursday, 19 February 2009

Practical LINQ: Generating the next default name

Pleasurable as it is, constructing LINQ queries to solve abstruse Mathematical problems, a developer’s real purpose in life, as his boss reminds him often enough, is creating business value. Happily I’ve found that that can be done using LINQ too.

Here’s an example that came up recently.

I was writing a Widget editor dialog box with functionality to create new Widgets. Being the helpful soul I am, I wanted to give the new widgets default names, “Widget1”, “Widget2” and so on (I can be helpful or imaginative, but not both!). But the names also needed to be unique: if “Widget1” and “Widget2” were already in the list, the next new widget should be called “Widget3”.

This is the method I came up with:

public static class NamingExtensions
{
    public static string GetNextDefaultName(this IEnumerable<string> existingNames, string defaultNamePrefix)
    {
        if (existingNames == null)
        {
            throw new ArgumentNullException("existingNames");
        }

        defaultNamePrefix = defaultNamePrefix ?? string.Empty;
        string matchNumberInDefaultNamePattern = "^" + defaultNamePrefix + @"(\d+)$";

        const RegexOptions regexOptions = RegexOptions.IgnoreCase | RegexOptions.Singleline;

        int nextNumber = existingNames.Select(name => Regex.Match(name, matchNumberInDefaultNamePattern, regexOptions) )
            .Where(match => match.Success)
            .Select(match => match.Groups[1].Value)
            .Select(matchValue => int.Parse(matchValue, CultureInfo.InvariantCulture))
            .Aggregate(0, Math.Max, max => max + 1);

        return defaultNamePrefix + nextNumber.ToString(CultureInfo.InvariantCulture);
    }
}

Since I wrote it as an extension method, I can use it like this:

var existingNames = new[] { "Widget1", "Widget2" };
var nextName = existingNames.GetNextDefaultName("Widget");

You might wonder why, in line 19, I use Aggregate instead of a straightforward Max()? That’s because Max() throws an exception if you try using it on an empty sequence, which might happen if none of the existing names match the “Widget[n]” pattern. Also, Aggregate has the nice overload that allows you to apply a result selector function to the final aggregate: as you see, I take advantage of this to return the next number in the sequence, rather than the max itself.

Isn’t LINQ lovely?

Friday, 6 February 2009

functional => prizes #1: We have a winner

TrophyThe closing date of the the inaugural Functional Fun programming competition has passed, and I’m relieved to say I actually had some judging to do. Further to my post on Monday, two bits are now required to hold the number of submissions I have to choose between - though not used to full capacity.

The Winner

Our winner is Jamie, a High School Student from Massachusetts, whose code surpasses in elegance my attempt at solving my own problem. Jamie gets the prize of £25 in Amazon gift vouchers. Honourable mention goes to Dan Dumitru who was the first to submit a solution.

Congratulations to Jamie, and thanks to all 10 of you who entered!

The Solution

The puzzle that Jamie and Dan solved was as follows;

Which Mathematicians have found themselves prime positions in the grid below?

HiddenMathematiciansImage

So who were the coy Mathematicians, and how did they conceal themselves? Well, given the maths connection, “prime positions” must surely have something to do with prime numbers? So what happens if we count along the cells of the grid (wrapping over from the end of one row to the beginning of the following row) and note down the characters we find at indices which are prime-numbered?

Interesting! We get the string

tinyurl#teyuir#gqrtyh#hyirtq##

Teyuir? What theorem did he prove? must be Slovakian with a name like that. Now tinyurl – that rings a bell. Hey! I wonder what tinyurl.com/ teyuir looks like?

WikipediaRestApiResult

Huh! XML?

But wait: Leonhard Euler. There’s a name I recognise. Didn’t he dabble in mathematics, and prove a lemma or two?

And having reasoned thus, all that remains is to craft a handful of LINQ queries to winkle all three mathematicians out of the hiding places: Euler, Euclid and Cantor.

Here’s Jamie’s solution in its entirety. The code speaks for itself.

// Gets all the nonwhitespace characters
var rawChars = from ch in File.ReadAllText("HiddenMathematicians.txt") 
               where !char.IsWhiteSpace(ch)
               select ch;

// Gets only the chars at prime indices (char[0] is at index 1)
var primeChars = from num in Enumerable.Range(2, rawChars.Count() - 1) // creates all indices from 1 to numRawChars
                 let divisors = Enumerable.Range(2, num - 2) // all possible divisors, 2 to num - 1
                 let isPrime = !divisors.Any(div => num % div == 0)
                 where isPrime
                 select rawChars.ElementAt(num - 1);

// All the words, including the beginning hint
var words = from str in primeChars.Aggregate(string.Empty, (str, ch) => str += ch).Split('#')
            where !string.IsNullOrEmpty(str)
            select str;

var names = from word in words.Skip(1)
            let url = "https://" + words.First() + ".com/" + word
            let xmlData = XDocument.Load(url)
            let element = xmlData.XPathSelectElement("./api/query/pages/page")
            let attribute = element.Attribute("title")
            select attribute.Value;

foreach (var name in names)
    Console.WriteLine(name);

Console.ReadLine();

Thursday, 16 October 2008

Project Euler Code Clearance

Roll up! Roll up! Get them while they're hot, don't leave them 'till they're not. Get your genuine, authentic Functional Fun Project Euler solutions here. Bargain prices, can't beat 'em. Visual Studio solution thrown in free.

I've been pretty busy over the last week or so, and it looks like I'll continue to be busy until I depart for LA at the end of next week. And you know what that means: .Net 4, C# 4, Oslo, Cloud Computing, Windows 7. That gang of acronyms and code names should keep me in blog posts for a good long while. Meanwhile, I have several Project Euler solutions languishing in the basement of my hard-disk (fourth platter, third sector, last block on the right, I think), and I can't see myself getting the time to do them justice in individual blog posts.

So today I'm going to do them an injustice and launch them into the world with barely a scant mention each. As usual, you can get the code from MSDN Code Gallery.

Problem 20: Summing the digits of a Factorial

Problem 20 gave me a chance to reuse my code for summing large numbers again. That's because calculating a factorial can be done iteratively. Imagine the sequence 1!, 2!, 3!, etc. You get the (n + 1)th term by multiplying the nth term by itself n + 1 times (that's just the definition of factorial). And the multiplication boils down to adding up n + 1 copies of the nth term. Wrap that simple idea inside an application of Unfold, and you have the solution.

Problem 22: Number crunching names

Problem 22 is just a file processing problem. Nothing more to it than that. It has a nice, one line solution with LINQ though:

return File.ReadAllText(problemDataFileName)
           .Split(new[] {','})
           .Select(name => name.Trim(new[] {'\"'}))
           .OrderBy(name => name)
           .Select((name, index) => (index + 1) * name.ToCharArray()
													  .Select(letter => (letter - 'A') +1)
                                                      .Sum())
           .Sum();

Problem 25: Big Fibonacci numbers

In Problem 25, Euler wants to know the first term in the Fibonacci sequence that has 1000 digits, which is of course his way of getting us to find a way of computing with big numbers. No problem to us though: we've got the Unfold method to calculate the Fibonacci sequence - and did I mention the code I wrote before that we can use to sum the big numbers we'll find?

Problem 31: Counting Coin Combinations

I'm quite pleased with my solution to Problem 31, wherein Euler asks us to count the number of ways in which £2 can be given using the British coinage system. I came up with a recursive algorithm that starts with the value of change you wish to give and a list of available coin types. Then it removes the biggest coin from the list, works out how many times that coin could be used, and for each possibility calculates the number of combinations for the reduced amount, and the reduced coin set by calling itself recursively.

For example, if we needed to make £1 using 10p and 2p, and 1p coins, we'd start by seeing that we could use between zero and ten 10p coins, so there are eleven possibilities we need to investigate. For each possibility we calculate how much remains once we've put down the appropriate number of 10 pences, then use the same algorithm again, but considering just 2p and 1p coins.

Problem 36: Binary and Decimal Palindromes

We last encountered numeric palindromes in Problem 4: they're numbers that read the same in both directions. In Problem 4 we were only interested in numbers that are palindromic when written in base 10. Problem 36 asks us to count the numbers that are palindromic when written in binary and in decimal. The most interesting part about this problem was finding a number's representation in binary. I could probably have used the BitArray class to do this, but instead chose to use this LINQ query:

private static IEnumerable<bool> GetNumberBits(uint number)
{
    if (number == 0)
    {
        return new[] {false};
    }

    // iterate through the bit positions, checking whether that
    // bit of the the number is set;
    // do it in reverse, so that we can ignore leading zeros
    return 31.To(0)
        .Select(bit => (number & (1 << bit)) == (1 << bit))
        .SkipWhile(bit => !bit);
}

Problem 112: Finding the density of Bouncy numbers

Mathematicians don't try to hide their obsession with numbers, do they? They make it plain as day that they count numbers amongst their closest friends, by giving them all names. It's the Bouncy numbers that are the heroes of Problem 112. These are numbers that, considering their digits as a sequence have a neither increasing nor decreasing sequence. My solution to this problem puts the Aggregate and LazilyAggregate methods to a rather unconventional use.

Friday, 5 September 2008

Project Euler Problem 19: A Century of Sundays

There's a hard way and an easy way to solve Problem 19 in our ongoing struggle with Project Euler, where we're asked to count how many Sundays fell on the first of the month during the twentieth century - being software developers and thus virtuously lazy by nature we wouldn't consider the tedious way, getting out all those old diaries (wherein entries invariably finish around January 5th) and actually counting whilst flicking over the pages.

The hard way is to use the information given in the problem definition and devise an algorithm for calculating days of the week that each date falls on. The easy way uses with gratitude the brainpower the .Net Framework developers invested in their work. It's Friday: I'll take the easy way.

Thanks to C#3.0 and LINQ, the actual code to the solution is a one-expressioner:

[EulerProblem(19, Title = "How many Sundays fell on the first of the month during the twentieth century?")]
public class Problem19
{
    public long Solve()
    {
        DateTime endDate = new DateTime(2000, 12, 31);
        DateTime startDate = new DateTime(1901, 1, 1);

        return startDate
            .Unfold(date => date.AddMonths(1))
            .TakeWhile(date => date <= endDate)
            .Count(date => date.DayOfWeek == DayOfWeek.Sunday);
    }
}

If it's Friday for you as well then I'll let you read the following explanatory notes: otherwise, work it out for yourself!

Remember my Unfold extension method? The one that takes a seed value (startDate in this case) and generates a sequence by repeatedly applying an expression to previous value in the sequence to create the next. That combined with the TakeWhile method represent the functional-programming equivalent of the for loop. The Count overload that I'm using here counts every item in a sequence matching a particular criterion. So the code I have above is equivalent to this:

DateTime endDate = new DateTime(2000, 12, 31);
DateTime startDate = new DateTime(1901, 1, 1);

int count = 0;
for (DateTime date = startDate; date < endDate; date = date.AddMonths(1))
{
    if (date.DayOfWeek == DayOfWeek.Sunday)
    {
        count++;
    } 
}

return count;

Thursday, 28 August 2008

Project Euler Problems 18 and 67: Finding the Maximal Route through a Triangle

If this doesn't pique the interest of my fellow developers then I don't know what will: a problem that will take twenty billion years to solve (at an extremely optimistic estimate), unless we find an efficient algorithm. The problem in question is Project Euler Problem 67. We're given a triangle made up of one hundred rows of numbers, and asked to consider all routes from the top to the bottom, made by moving from one row to the next via adjacent cells. Over all such routes, we have to find the maximum sum. Can't picture it? Let me help:

TriangleRoute

For the route that I've drawn, the sum is 5 + 15 + 29 + 45 = [train your brain and work it out yourself!].

If we only wanted to tackle the baby brother of this problem (Problem 18 - the same problem for a triangle with just 15 rows), we could get away with checking each of the routes. But it's not feasible to do that here because there are 299 possibilities: hence the need for an efficient algorithm. Any ideas?

You could take a bath and wait for the Eureka moment. Or if you want to avoid the ensuing streaking, read on.

An algorithm

Suppose that for each cell in a row we know the maximum sum over all routes ending at that cell. From that it's easy to work out the same for each of the cells in the next row. Here's how you do it. If you've got a cell with value c, somewhere in the middle of the row, then you can only reach it from either of the two cells adjacent to it in the row above. 

CalculatingCellMaxima

We're assuming that we already know the maximum sum for routes ending at these cells: suppose it is a and b respectively. So the biggest possible sum we can get for routes ending at c is Max(c + a, c + b). For the two cells at either end of the row the calculation is even simpler: they can only be reached from the cell at the appropriate end of the row above them, so we just sum the cell value with the known maximum sum to the cell above. In this way, we can crunch through the rows, 'flattening' the triangle until we reach the last row, at which point we will know the maximum possible sum over all routes ending at each of the cells in that row.

That's the middle of an algorithm. To solve the overall problem we just need to add a beginning and an end. The middle part starts by assuming that we already know the maximum sum for all routes ending at a particular row. So to get the algorithm rolling we need a row for which we do have that information, and the first row of the triangle is the obvious candidate: all routes start there, so the maximum up to that point is just the value of the cell in that row. How about the end of the algorithm? To solve the problem, we need to know the maximum sum from top to bottom through the triangle; we've calculated the maximum sum ending at each of the cells in the last row, so we just need to take the maximum over all these cells.

Some code

To code this, we'll reach once again into our Functional Programming toolbox. I think the Aggregate method is just the hammer to crack this particular nut. Do you see how?

Ignore the input part for now, and assume we've got a sequence of List<int>, one List for each row of the triangle. Like this:

{75}
{95, 64}
{17, 47, 82}
{18, 35, 87, 10}
...
In this layout, it's a little tricky to visualise which cells are adjacent to which: you need to look at the cell directly above, and the one above but to the left. For example, the 35 in row four is adjacent to 47 and 17 in the row above.

We'll use Aggregate to 'flatten' the triangle represented by this sequence of Lists using the algorithm I described above: the aggregate that we'll be calculating is a List containing the maximum sum over all routes to each cell up to the last row we've processed. Remember that Aggregate needs a function that combines the aggregate so far with the next item in the series. We'll supply a function that takes the maximums so far and combines them with the cell values for the next row. The code looks like this:

return rows.Aggregate(
    new List<int> {0},
    (currentMaxima, nextRow) =>
        {
            var rowLength = nextRow.Count();
            return nextRow
                .Select((cell, index) =>
                        index == 0 ? cell + currentMaxima[index]
                        : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                        : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                .ToList();
        }
    )
    .Max();

Notice that at line 7 we're processing the sequence nextRow (containing the cell values) using an overload of the Select method that gives us the index of each cell value in the sequence; we use this index to find the appropriate entries in the list currentMaxima containing the maximum sums to the cells in the row above.

And with that, I believe we've solved the problem. It takes 2 milliseconds to solve the 100-row triangle problem on my computer - a satisfying improvement over 20 billion years.

Here's the complete code. Note that in the source code project I have included two files containing the data for these problems.

namespace ProjectEuler
{
    [EulerProblem(18, Title = "Find the maximum sum travelling from the top of the triangle to the base.")]
    public class Problem18
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem18Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    [EulerProblem(67, Title = "Using an efficient algorithm find the maximal sum in the triangle?")]
    public class Problem67
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem67Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    public static class MaximumSumThroughTriangleSolver
    {
        public static int SolveFromFile(string dataFilePath)
        {
            string problemData = File.ReadAllText(dataFilePath);

            return Solve(problemData);
        }

        public static int Solve(string problemData)
        {
            var rows = problemData
                .Split(new[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
                .Select(line =>
                        line.Split(' ')
                            .Select(token => int.Parse(token))
                            .ToList()
                )
                .ToList();

            return rows.Aggregate(
                new List<int> {0},
                (currentMaxima, nextRow) =>
                    {
                        var rowLength = nextRow.Count();
                        return nextRow
                            .Select((cell, index) =>
                                    index == 0 ? cell + currentMaxima[index]
                                    : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                                    : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                            .ToList();
                    }
                )
                .Max();
        }
    }
}