Run! The Distributed Systems are coming!

You can avoid the fate of those that fell before you

This is a letter to my fellow engineers, specifically those who operate in the front-end world. This letter is sent with love, but carries a message of deep foreboding. It is a warning and yet it is a message of hope, that there is a chance to maintain peace and happiness in your work life.

This letter is inspired by a question recently posed by the talented Andy Pimlett – with whom I had the pleasure of serving alongside the lovely people of Mando. See Andy Pimlett on LinkedIn: JavaScript Front-End Architecture. Therein Andy poses a question that has been keeping my mind in constant churn: ‘what does the line between the UI and the backend platform look like now?’. More importantly to me however is raising awareness of the quite frankly ridiculous levels of complexity that hide behind the innocent term ‘Microservices’, as I shall lay out here…

Lurking in your Jam

The devil doesn’t come dressed in a red cape and pointy horns. He comes as everything you’ve ever wished for.

Tucker Max – Apparently

Jamstack is exactly the sort of revolution the front-end world has desperately needed for too long. It promises to reduce the long-standing pain resulting from complexity and bloat that has emerged out of the constant over-engineering introduced by well intending technologies such as npm, React and Angular.

I’ll not cover the ‘Client’ end definition of Jamstack as there are plenty of perfectly good resources for that already including the Jamstack website itself: https://jamstack.org/. To be absolutely clear on my position here, I have been blown away by the power of Jamstack. This really is the tech that the web needs now. You want to put your money into new web technology? I’m telling you it is Jamstack and its global community; Web3 can get in the bin (really, forget Web3, it’s another last-ditch Ponzi-scheme from the crypto bros to try and keep their awful blockchain investments on life-support for another year).

That said, the threat to your peace and happiness that I wish to warn you about lies, unbeknownst to many, innocently nestled away in a corner of a diagram featured in many of the Jamstack articles, videos and websites that talk about this otherwise wonderful new technology:

Buried in these tempting looking simple diagrams lies a terrible, terrible beast that wears a red coat and charges through your peaceful villages shooting anyone who fails to kneel before the might of the King.

(Let’s forget for a moment that all the diagram shows is that the app server on down has been moved into a small little innocent tidy-looking box alongside CDN)

A Microservice by any other name will be just as difficult

Scenario: You’re building an application… You are writing some logic that calls a method on a class in another area of the application.

Question: Would calling the method be much improved by sending it on a trip along a network?

The only sensible answer, surely, would be: “are you mad?”. And, yet, here we are, some 10 years on since the term ‘Microservice’ was first bandied about, with half the planet decomposing their applications into separate ‘Microservices ‘ and having them directly call each other synchronously across a network. This antipattern now even has its own name: ‘distributed monolith’ and it is known to give all the pain of a monolith plus the pain of a distributed system and basically none of the benefits.

Why would anyone willingly triple the complexity of an application for no tangible benefit? Well… first of all… I guess… we’re social creatures, if our peers and those we look up to are doing something then we naturally take an interest. If everyone in your industry is talking about a thing then it becomes so well known that you may even be getting pressure from your boss or clients to do that thing.

Secondly… we’re techies, nerds, geeks, we like the challenging things. Hey, if Netflix do it and it gives them great results then the complexity must be worth it and it will feel great to wield complexity and win fortune for the business!

Finally, I think the name has a lot to do with it. How much harm can come from something called a ‘microservice’? Well, would you willingly embark on the path of a Service Oriented Architecture for your UI tier? Because that is what you are doing  whether you realise it or not. Microservices are a slant on the classic ‘Distributed System Architecture’. It has all, if not more, of the complexity of your classic Distributed System, but the name ‘microservice’ doesn’t really give you that feeling. Sure a Microservice must be an easy service, easy to deploy, easy to build, easy to integrate? No. No. No. Read on…

Distributed Systems destroyed my technical career

If ever there was a clear way to demonstrate how complicated Distributed Systems really are, it lies in my career trajectory. During the first half of my career I always imagined I was destined to become a technical architect. That is until I encountered technical architecture in Distributed Systems design. Now I am a hands-off manager. I find that managing teams of people and managing projects with all of the pressures and complexities that come with those responsibilities is far easier and more rewarding than programming Distributed Systems. But of course some people love this stuff! I’m not saying it is bad or a terrible choice, just that I do not have the technical chops to survive a career in this area and I don’t consider myself to be very dim. I certainly wouldn’t want to wander into this quagmire by accident!

So, what, exactly, are you trying to protect me from?

I get it, by now you’re probably sick of hearing me describing how difficult I find Distributed Systems. So let me tell what I know that engineers must know to be able to deliver effective Distributed Systems, or Microservices, or SOA, or Cloud Native Architecture, or MACH or whatever you want to call it – they all have the same fundamental concerns.

  1. Loosely coupled architecture – You need it. Its numero uno for any distributed system. Without it you have a distributed monolith which isn’t a distributed system at all. Loosely coupled means embracing asynchrony and with that comes eventual consistency, which your application must also embrace. Asynchrony is different from ‘async’ calls. Instead your tools in this space include message queues, service buses and brokers – more specifically for cloud native you will include events and streams.
  2. SRE – Distributed Systems are so complex that the industry consensus is that you should assume your system is in a failure state and go from there. I’m not even kidding. See Software Reliability Engineering, you will likely need someone skilled in this on your engineering and operational teams. You need to make sure your system is consistent, available and durable. This really is a career path in its own right!
  3. Deployment Strategies – You need to be able to safely update your microservices, components and so on without disrupting service. See the Google DORA research in this area for the different strategies you will need to familiarise yourself with: https://cloud.google.com/architecture/application-deployment-and-testing-strategies
  4. Observability: Modern cloud native systems will often have hundreds of components interacting with each other to achieve common business goals. How do you know if the system is healthy or not? How do you know the lifecycle of your customer’s data through your system for compliance with data protection standards? Enter Application Performance Monitoring and Distributed Tracing. See: https://opentracing.io/docs/overview/what-is-tracing/

Hey, I would always default to a single application until such time that the technical demands necessitate decoupling. Monoliths are NOT a dirty word, they have many benefits in terms of maintainability. A helpful rule here is thus: if your components need to call each other directly to achieve their goals then keep them together in a monolith and save yourself a world of pain. Most UIs that call down into a distributed architecture will probably be fine with a single UI aggregation tier or at most BFFs (Sam Newman) https://samnewman.io/patterns/architectural/bff/). Introducing loosely coupled architectures into your UI strikes me as a one-way trip to pain.

Your options have been laid before you…

Know that it is not a technical decision, it is a business decision and often closely tied to your business architecture.

It is too late for many of us in backend/platform engineering but I am worried for those of you responsible for client tier architecture. The concept of ‘Microservices’ is a creeping menace and I hope that these lessons serve to protect your sanity and stress levels.

So don’t feel pressured into a Distributed Architecture just because it’s a name you’ve heard (Microservices, MACH) or part of a wider architecture that you already get some benefits from (i.e. Jamstack). Just be aware that those little boxes on those diagrams belie decades of computer science that will likely never become as simple as the tempting diagrams want you to believe.

Understand whether you truly need microservices and avoid them if at all possible! But! But just because I struggle to imagine many scenarios where a UI tier would warrant a distributed architecture does not mean you don’t need it. Netflix and Amazon need it but so do some smaller businesses. Hell, my current employer has carefully considered its technical strategy, weighed the pros and cons and has decided that the increase in complexity of a distributed system is worth it for their commercial strategy. Know that it is not a technical decision, it is a business decision and often closely tied to your business architecture.

If you wish to stand and fight then be prepared to wield your weapons of Loosely Coupled Architecture, SRE, Deployment Strategy and Observability. But be sure, no one will blame you should you choose to consolidate your efforts in something far simpler; something that protects your daily life, bringing joy to you and your customers.

Run! The Distributed Systems are coming!

Improvement Sprints

or Shuhari Sprints

Nothing worth doing is easy and anything worth doing is worth doing well 

… or so the sayings go. My personal experience certainly proves to me that these are more than simple soundbites and I am fairly confident in those sayings having more than a ring of truth to most IT professionals. So then why do we so often see innovation and professional development limited to the occasional day or Friday afternoon? Well read on…

What’s in a name? 

Let’s start with some background to the subject. Dan Pink recently popularised in his ground-breaking book ‘Drive’ (Amazon Link) that there are three key pillars to employee motivation, namely autonomy, mastery and purpose. This follows on closely from the RAMP model defined in Self-determination Theory (Deci and Ryan 1985). If you’ve not read either book then you should! I assume it is from here that we get the name I have been using to describe this subject ‘Mastery Sprints’. For what its worth, if I recall correctly I first heard this term when speaking to some of the engineers from ING (of the famous Agile transformation story: https://www.mckinsey.com/industries/financial-services/our-insights/ings-agile-transformation). 

I have been considering the implications of the term ‘mastery’ having been using ‘Mastery Sprint’ for some time now as I have grown increasingly aware of its connotations with white supremacy and slavery. For some reason I had been incorrectly thinking that mastery was of a different root and meaning than ‘master’ – famously now excluded from most areas of technology terminology. I will be avoiding using the term mastery from now on as its roots are beside the point, the term has severely negative connotations and meaning and that is enough. I’m instead opting for ‘Improvement Sprints’ as that does just as effective a job of revealing the intention of the activity. 

I would love to hear of your thoughts on the naming of this. It’s a conversation worth having! 

Update 1st February, 2021

I have already had some feedback on the naming with the term Shuhari Sprints suggested by one of my GBG colleagues:

One of the issues I faced when dropping ‘Mastery’ from my vocabulary is that there didn’t seem to be an equivalent term in English for the specific use case. Perhaps unsurprisingly there is a perfect term in Japanese and I will be proposing the use of Shuhari Sprints from here on. I dare say that this term is even more suitable than ‘Mastery’ for describing what we are trying to achieve and seems to have many further related uses. Google Shuhari, I’m sure you’ll agree!

Nothing worth doing is easy 

When I first heard the idea my initial thought was along the lines of ‘that’s nice and all but I can’t see it being that much benefit to justify the disruption’. Then I started to hear more about the idea from people I trust and my curiosity grew. From there I found myself frequently identifying common organisational issues that would be addressed by Improvement Sprints, so I started making a list from conversations discussing related issues:

“We allow engineers 1/2 a day every week of training time, and this can be rolled up if needs be but they never seem to take it” 

“There’s always too much planned or reactive work to take training time” 

“I’ve tried using training time but it always takes too much effort to switch my environment and my focus then someone calls me and the whole thing goes out the window” 

“I’ve done this research but then I don’t have the time to present it” 

“I’ve done this research and the results are incredible but now there’s no time to implement it in our main codebase” 

“Damnit this part of the codebase is always such a pain to work with we really need to find time to rewrite this part of it using this new tool that wasn’t available!” 

“These innovation days are great but we always need so much longer to really make a dent in this problem” 

Ask any developer of their experience in trying to be productive or effective in fitting training and innovation into the odd day or Friday afternoon and you’ll hear the same story of struggling to utilise the time effectively.  

Dedicated Improvement Sprints present an antidote to these problems. By taking an entire week or two and spreading the overheads across a number of people – ideally bringing independent squads and teams together for collaboration – you completely shift the overhead-to-productivity ratio as demonstrated in this very sciencey diagram: 

Diagram showing ratio of overheads in an activity

If an organisation allows or even encourages the use of ‘company time’ and budgets for training, innovation and improvements then it stands to reason that they would wish to see the most effective use of that time. 

Benefits 

The benefits beyond the drastic increase in efficient use of training/improvements time are many. 

Engagement 

I fully subscribe to the idea that the most effective teams are made of individual’s who’s personal and professional objectives align closely with those of the business at a given time. It’s classic synergy. And if Gee Paw Hill is on the same side then its fact in my opinion: 

This is reinforced by Dan Pink’s work in Drive, which makes clear how much more effective and engaged people are when they are progressing meaningfully in their careers; businesses that can leverage this by providing real opportunities for ‘mastery’ (sic, Dan Pink) will benefit from more engaged and productive team members.  

Improvements 

As the name suggests Improvement Sprints are how you make things better. You will be aware of the many issues your development teams complain about on a regular basis and this is a chance to address those using the latest and greatest. 

Innovation 

Speaking to INGs engineers indicated that as many as 1 in 5 developer suggestions make it into production – This isn’t just playing around, well it is really, but it is proven to result in actual usable value for your business.  

A scientifically proven competitive advantage 

The DORA DevOps research program (https://www.devops-research.com/research.html) has highlighted a number of key cultural behaviours of elite performing technology organisations that are clear differentiators for their competitive advantage. These include encouraging learning, experimentation, team collaboration and job satisfaction, all of these can be improved upon by leveraging dedicated improvements time. 

Step 1: Plan, Step 2: Collaborate, Step 3: Profit! 

In terms of how you schedule the Improvement Sprints I would suggest starting by taking the time already agreed for training and rolling it all up. So if you have say ½ a day each week per engineer then roll this up into a week giving you 1 week in 10, or 1 week after every 5 two-week sprints, you get the idea, use the maths that works for you. You’ll be starting with a NET benefit. Then the only thing you need to sell to the business is the planned disruption as the costs should level out and as discussed the benefits make it a solid sell.  

You have two options for organising your teams here. If you are fortunate to have no reactive or operational demands on your engineers then just book the time in and tell the business you are on a training break and unreachable. If you must maintain operational support availability then look at splitting your team in half and run the Improvement Sprints in sequence, one half at a time. Just make sure to mix up the split next time around. 

Then plan the work. Team members should be allowed to use this time how they see fit (autonomy) but one way of helping guide their efforts would be to have a brainstorming session of current burning issues or popular ideas and take a vote on each for being included in the Improvements Sprint. 

Measure and report the outcomes. It may be that some work will end up being fed into further production sprints but for the most part getting a team together to focus on a common goal should yield plenty of great outputs so be sure to shout about them! 

Some final notes

Overrun still happens 

Sure you can do a lot more in a week or two with a whole team than a few people in one day but sometimes things worth doing are even harder than expected 

Technical Debt need not apply 

Technical debt is a project decision and every time it is taken on it must be accompanied by a paydown plan and included in formal project planning. This is nothing to do with innovation or improvement and should be kept out of these discussions. 

Thank you and good luck!

I am poised to embark on this journey myself and the above captures where I am up to with my thinking and planning. I’ve found scant details on this on the Internet so would love to hear of your stories and experiences. I will report back here as my story unfolds.

Improvement Sprints

LinkedIn Error “There was a problem sharing your update. Please try again”.

Obscure Error

I was trying to reply to a comment on an article I posted to LinkedIn the other day and kept hitting the error “There was a problem sharing your update. Please try again”. Just a note to help anyone who might come across this error when attempting to post an update to LinkedIn, there is an unadvertised comment character limit of 800 characters.

A little help?

It would be great if this was made obvious somewhere such as in the error itself or at least somewhere on the site but even searching the Internet for “There was a problem sharing your update. Please try again” didn’t turn up much for me. It wasn’t until I opened a support ticket that I was given this info.

I hope posting this here will at some point in the future save someone from wasting the time I did.

LinkedIn Error “There was a problem sharing your update. Please try again”.