A functional solution to interfacitis?

/ˈɪntəfeɪsʌɪtəs/
noun
noun: interfacitis
inflammation of a software, most commonly from overuse of interfaces and other abstractions but also from… well… actually it’s mostly just interfaces.

An illness of tedium

Over the years my experience has come to show me that unnecessary abstractions cause some of the most significant overheads and inertia in software projects. Specifically, I want to talk about one of the more tedious and time consuming areas of maintaining abstracted code; that which lies in the overzealous use of interfaces (C#/Java).

Neither C# or Java are particularly terse languages. When compared to F# with its Hindley-Milner type inference, working in these high-level OO languages often feels like filling out forms in triplicate. All too often I have experienced the already verbose syntax of these languages amplified by dozens of lengthy interfaces, each only there to repeat the exact signature of it’s singular implementation. I’m sure you’ve all been there. In my experience this is one of the more painful areas of maintenance, causing slowdowns, distraction and lack of focus. And I’ve been thinking for some time now that we’d probably be better off using interfaces (or even thin abstract classes) only when absolutely necessary.

What is necessary?

I like to apply a simple yard stick here: if you have a piece of application functionality that necessitates the ability to call multiple different implementations of a component then you probably require an interface. This means situations such as plugins or provider-based architectures would use an interface (of course!) but your CustomerRegistrationService that is called only by your CustomerRegistrationController will not. The message is simple, don’t start introducing unnecessary bureaucracy for the sake of it.

There are, I admit, cases where you might feel abstraction is required. What about a component that calls out to a third party system on the network? Surely you want to be able to isolate this behind an interface? And so I put it to you; why do you need an interface here? Why not use a function? After all, C# is now very well equipped with numerous, elegant functional features and many popular DI frameworks support delegate injection. Furthermore if you are following the SOLID practice of interface segregation then chances are your interface will contain only one or two method definitions anyways.

An example

So, for those times when you absolutely must abstract a single implementation, here is a simple example of an MVC controller using ‘functional’ IoC:

public class RegistrationController : Controller
{

    private readonly Func<string, RegistrationDetails> _registrationDetailsQuery;

    public RegistrationController(Func<string, RegistrationDetails> registrationDetailsQuery)
    {
        _registrationDetailsQuery = registrationDetailsQuery;
    }

    public ActionResult Index()
    {
        var currentRegistration = _registrationDetailsQuery(User.Identity.Name);

        var viewModel = ViewModelMapper.Instance
            .Map<RegistrationDetails, RegistrationDetailsViewModel>(currentRegistration);

        return View(viewModel);
    }
}

 

Addendum

13-March-2018: It has been pointed out to me that a further benefit of this approach is that static providers may also supply IoC dependencies whereas instances are required for interface-based IoC. What are your thoughts on this approach?

Advertisements
A functional solution to interfacitis?

Demystifying AI – The AI explosion

This is an article I had originally written as part of a stream of work that has now been put on hold indefinitely. I thought it a shame for it to languish in OneNote.

What’s with all this attention to Artificial Intelligence then?

Well that is a very good question. To be perfectly frank, not that much has changed of late in the world of Artificial Intelligence (AI) as a whole that should justify all the current excitement. That’s not to say that there isn’t cool stuff going on; there really is great progress being made… in the world of Machine Learning. And if we are to begin the process of ‘Demystifying AI’ then this is a very good place to start.

 

AI is a very broad area of technology encompassing research from robotics to computing emotions (affective computing) and everything in between, including Machine Learning or ‘ML’. As alluded to just a moment ago it is within ML specifically that we are seeing the greatest progress. Think of a modern ‘AI’ technology that is gaining a lot of attention and you can place a safe bet on it specifically using ML techniques: Natural Language Processing? That’s ML. Image classification? That’s ML. Sentiment Analysis? Also ML. The recent news of Go players being defeated by a computer? You guessed it… ML.

 

What is ML?

ML is an approach to analysing data that is based on training statistical models to predict outcomes. You may well have come across Statistical (or Linear) Regression back in your school days; well this is possibly the best known example from a range of techniques that make up the world of ML. To put it simply, an ML model learns from past data to make better decisions in the future.

mlaidl

Now it’s time to introduce what is arguably the beating heart of the AI frenzy: Deep Learning. While there are no trendy acronyms for Deep Learning it is fair to say that Deep Learning has become a bit of a buzz-word itself. Deep Learning takes its name from the concept of Deep Neural Networks (or DNNs, there’s your acronym!). The useful details of what DNNs are and how they function cannot be easily summarized, suffice it to say that DNNs are an ML technology that borrows heavily from the structure of the brain, hence the ‘neural’ part of the name. [N.B. These details are already planned for a follow-up piece.] To re-cap: Deep Learning is a subset of ML, which is in-turn a subset of AI and it is Deep Learning that drives the current hype.

 

What is Deep Learning?

Say you wanted to build some software to identify objects in an image; your usual non-Deep-Learning approach to this would include manually writing rules into the software to recognise the details you’re looking for. If you wanted to identify if a picture was of a bird or a cat, you would manually write rules to identify features such as whiskers or ears or wings and so on. This is complicated, time-consuming and error prone. Deep Learning takes a different approach. Instead you would create a Deep Learning model then supply it with a bunch of pictures. For each picture you supply, you would tell the model if it was of a bird or a cat. As you supply each image and it’s corresponding label, the model learns. Once enough data has been supplied you can then supply an image without a label and the model will give an accurate indication of whether it is a bird or a cat.

 

So what is the ‘explosion’ all about?

Continuing the bird/cat model example, the more example labelled pictures you supply to the model, the better the results will be. This seems simple and even somewhat obvious but it strikes at the heart of the current ‘AI boom’. Deep Learning has been around for a while now, evolving over a period of 30 years more or less, and one of the key reasons that it has never been so commercially successful as now is that there just hasn’t been enough readily available data to make it so accessible. To give you some idea of why this has been an issue, if you want to get to a high level of accuracy for classifying complex pictures then you’re going to need  thousands or even millions of examples depending on the complexity. Well we now have data, lots and lots and lots of it and it has never been more easy to get our hands on it. Do a quick Google image search for ‘Cat’; there is a rough cut of half your ‘training’ set (*ahem* copyright issues aside) and I’m sure you can figure out how to get the other half.

 

So we have data, but that isn’t all we need. The other side of the current explosion is raw computing power. Building a statistical model that can accurately identify cats and birds in pictures is very heavy work for a computer but thankfully with the advent of cloud-scale computing resources, available computing power is now big enough and cheap enough to make running this sort of model both practical and cost-effective. It’s cheap enough that Google can even give this stuff away as an educational toy (https://teachablemachine.withgoogle.com/).

 

So its all about pictures of cats and birds?

Beyond the abundance of data and computing power, probably the most significant factor in the commercial success of Deep Learning is its versatility. This is especially true when considering the success of Deep Learning against other ML techniques which have not gained the same level of attention. If you have enough data, regardless of its form, Deep Learning can be trained to extract knowledge from it. This has sent businesses, scientists and engineers into a global flurry of R&D to find all the amazing ways in which this technology can enhance our lives.

For years now the financial services industry has been at the forefront of applying ML techniques to everything from fraud prevention and risk management to investments and savings predictions; there are few – if any – areas of the industry that have yet to see the benefits of AI.

Manufacturing is seeing growing uptake in the application of ML to improve efficiency through waste reduction and better predictive analysis of production demands and infrastructure maintenance.

More recently utilities are beginning to get into the ML game with the UK National Grid striking up discussions with Google to investigate applying the infamous DeepMind AI to maximise National Grid’s use of renewables and to more efficiently balance supply and demand across its nationwide infrastructure.

Across all sectors business now find themselves in a position to use ML to better understand and engage with their customers. From utilities gaining greater knowledge of their customer’s consumption habits through to retailers and service providers more effectively capturing sales conversion opportunities, the possibilities are as varied as your data.

 

Would you like some knowledge with that?

So that concludes this effort to clear away some of the fog and hyperbole from the current AI phenomenon (ahem! It’s all ML, remember!?). In a nutshell, if you have a ton of data and you need to get knowledge from it then Deep Learning could well be your go-to tool.

Demystifying AI – The AI explosion

FileFormatException: Format error in package

OK so we’re all completely clear on what this error means and what must be done to resolve it right? I mean with a meaningful error like that how can anyone be mistaken? Oh? What’s that? You still don’t know? Let’s be a bit more specific: System.IO.FileFormatException: Format error in package Better? Didn’t think so. It’s not an error message, that’s why. I’ll tell you what it is though, it’s stupid and even more stupid when you find out what causes it.

I came across this delightfully wishy-washy error when configuring an Umbraco 7 deployment pipeline in TeamCity and Octopus Deploy. The Umbraco .csproj MSBuild file referenced a bunch of files as you might expect but I also needed to add a .nuspec file which referenced a bunch of other files. Long-story-short, the error came about because the files specified by the .csproj overlapped with the files specified by the .nuspec file. There were about 1000-odd generated files that the NuGet packaging components in their infinite wisdom added to the .nupkg archive as many times as they were referenced. NuGet was able to do this silly thing without any complaints and inspecting the confused package in NuGet Package Explorer or 7Zip or Windows Zip gave no indication of any issues whatsoever. It was not until Octopus called on NuGet to unpack the archive for deployment that we got the above error.

Stupid, right? Stupid!

FYI: I was able to get to the bottom of this issue after 2 freaking days of pain when I eventually used JetBrains dotPeek to debug step-through the NuGet.Core and System.IO.Packaging components to see what on earth was going on. In the end it was this piece of code in System.IO.Packaging.Package that was causing the issue:

public PackagePartCollection GetParts()
{
...
	PackagePart[] partsCore = this.GetPartsCore();
	Dictionary dictionary = new Dictionary(partsCore.Length);
	for (int index = 0; index < partsCore.Length; ++index)
	{
	  PackUriHelper.ValidatedPartUri uri = (PackUriHelper.ValidatedPartUri) partsCore[index].Uri;
	  if (dictionary.ContainsKey(uri))
		throw new FileFormatException(MS.Internal.WindowsBase.SR.Get("BadPackageFormat"));
	  dictionary.Add(uri, partsCore[index]);
	  ...
	}
...
}

I mean, why would anyone consuming such a core piece of functionality as this API ever want to know anything about the conditions that led to the corruption of a 30MB package containing thousands of files? I mean it’s not like System.IO.Packaging was ever intended to be re-used all across the globe, right?

Anyways, here’s the error log for helping others with searching for this error and stuff.

[14:21:27]Step 1/1: Create Octopus Release
[14:21:27][Step 1/1] Step 1/1: Create Octopus release (OctopusDeploy: Create release)
[14:21:27][Step 1/1] Octopus Deploy
[14:21:27][Octopus Deploy] Running command:   octo.exe create-release --server https://octopus.url --apikey SECRET --project client-co-uk --enableservicemessages --channel Client Release --deployto Client CI --progress --packagesFolder=packagesFolder
[14:21:27][Octopus Deploy] Creating Octopus Deploy release
[14:21:27][Octopus Deploy] Octopus Deploy Command Line Tool, version 3.3.8+Branch.master.Sha.f8a34fc6097785d7d382ddfaa9a7f009f29bc5fb
[14:21:27][Octopus Deploy] 
[14:21:27][Octopus Deploy] Build environment is NoneOrUnknown
[14:21:27][Octopus Deploy] Using package versions from folder: packagesFolder
[14:21:27][Octopus Deploy] Package file: packagesFolder\Client.0.1.0-unstable0047.nupkg
[14:21:28][Octopus Deploy] System.IO.FileFormatException: Format error in package.
[14:21:28][Octopus Deploy]    at System.IO.Packaging.Package.GetParts()
[14:21:28][Octopus Deploy]    at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess, Boolean streaming)
[14:21:28][Octopus Deploy]    at System.IO.Packaging.Package.Open(Stream stream)
[14:21:28][Octopus Deploy]    at NuGet.ZipPackage.GetManifestStreamFromPackage(Stream packageStream)
[14:21:28][Octopus Deploy]    at NuGet.ZipPackage.c__DisplayClassa.b__5()
[14:21:28][Octopus Deploy]    at NuGet.ZipPackage.EnsureManifest(Func`1 manifestStreamFactory)
[14:21:28][Octopus Deploy]    at NuGet.ZipPackage..ctor(String filePath, Boolean enableCaching)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.PackageVersionResolver.AddFolder(String folderPath)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.CreateReleaseCommand.c__DisplayClass1_0.b__5(String v)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.OptionSet.c__DisplayClass15_0.b__0(OptionValueCollection v)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.OptionSet.ActionOption.OnParseComplete(OptionContext c)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.Option.Invoke(OptionContext c)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.OptionSet.ParseValue(String option, OptionContext c)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.OptionSet.Parse(String argument, OptionContext c)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.OptionSet.c__DisplayClass26_0.b__0(String argument)
[14:21:28][Octopus Deploy]    at System.Linq.Enumerable.WhereArrayIterator`1.MoveNext()
[14:21:28][Octopus Deploy]    at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
[14:21:28][Octopus Deploy]    at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.OptionSet.Parse(IEnumerable`1 arguments)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.Options.Parse(IEnumerable`1 arguments)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Commands.ApiCommand.Execute(String[] commandLineArguments)
[14:21:28][Octopus Deploy]    at Octopus.Cli.Program.Main(String[] args)
[14:21:28][Octopus Deploy] Exit code: -3
[14:21:28][Octopus Deploy] Octo.exe exit code: -3
[14:21:28][Step 1/1] Unable to create or deploy release. Please check the build log for details on the error.
[14:21:28][Step 1/1] Step Create Octopus release (OctopusDeploy: Create release) failed

Update – 19th September 2018

I have just this morning helped a colleague through a permutation of this issue. We have recently upgraded TeamCity and it seems this has pushed the issue further down the pipeline. Where the above error would appear in TeamCity during packaging, it seems that the components have been updated to no longer throw the obscure error. My colleague found this issue now manifests when attempting to deploy through Octopus, throwing up the error: “Unable to download package: Item has already been added. Key in dictionary: …”

As before, here is a slightly redacted log to help with searching:

Acquiring packages
Making a list of packages to download
Downloading package CLIENT_NAME.Web version 6.0.0-beta0000 from feed: 'https://teamcity.CLIENT_NAME.com/httpAuth/app/nuget/v1/FeedService.svc/'
Unable to download package: 
Item has already been added. Key in dictionary: 'assets/fonts/fsmatthew-light-webfont.svg'  Key being added: 'assets/fonts/fsmatthew-light-webfont.svg'
System.ArgumentException: Item has already been added. Key in dictionary: 'assets/fonts/fsmatthew-light-webfont.svg'  Key being added: 'assets/fonts/fsmatthew-light-webfont.svg'
FileFormatException: Format error in package