Posts

Showing posts with the label continuous delivery

The Truth About Testing

Image
/via https://www.monkeyuser.com/2016/testing-hammering-nails/ This totally nails it when it comes to Manual Testing, doesn’t it? Oh, I agree, there is a place for manual testing — smoke tests to build confidence, complex tests that simply can’t be automated, and so forth. However, what I  do  see all too often is Manual Testing being used as the default approach to existence. This, typically in organizations where QA consists of “ those folks over there ”, and developers consider testing a four-letter word. My advice — if you’re working at one of those places, and if you have agency, leave. It’s just not worth the candle…

Climbing The Environment Ladder

Image
/via http://www.commitstrip.com/en/2017/02/10/proud-or-worried/ Think about the all the different environments that your product runs in. You probably ran through some variation of the following, no? 1. One Environment To Rule Them All . It was just you, all the code is on your laptop (and GitHub), and “deploy” pretty much translates to “Runs on my Laptop”. Come demo time for that sweet sweet investor money, you just spun the stuff up on AWS, and that was about it. This is really pretty much the default when you’re starting off. 2. Environment As Coordinator . You’ve got environments deployed so that multiple teams can co-ordinate their efforts. Alice’s team deploys the latest version of Component A to the  development   environment on AWS, and a wee bit later Bob’s team complains that something broke.  This is quite common for “Integration Testing”, but equally common when folks are starting out with micro-services and/or distributed systems. When you don’t qui...

Tests, and Bug Fixes

Image
“Bug fixes must include a test that exercises the bug, and the fix ” This really shouldn’t be controversial, y’know? I mean, after all 1. There is a bug. We all know there is a bug. There is clearly something bad happening (“ Why did the service restart?  I  didn’t ask it to do so! ”), and bad is not good. 2. If we’re lucky, the bug even comes with a test case that exercises the bug (“ Send in an  int  instead of a  string , and watch the fun! ”) 3. If we’re  very  lucky, the bug report includes code (“ To dream the impossible dream… ”) Regardless  of where one is in the spectrum above, once you admit to yourself that there  is  a bug — and this can be an awfully hard admission to make sometimes — then you’re going to have to fix the damn thing. And that is going to involve some level of process where you’ll be doing  something  to make sure that there  is  a bug, right? After all, taking the above in...

Continuous Testing — Some Semantics

Image
If you’re doing any form of  Continuous Testing  (or heck, even just automating your tests), you’ve probably already run into the  Semantics Gap , where what  you  mean by XXX isn’t what  others  mean by it. This can be quite the killer, in ways both subtle and gross. I recall a post-mortem in the past that boiled down to the QA and Release teams having different assumptions about what “ The Smoke-tests passed ” meant. The resulting chaos — both between the teams, and for the customer — was  epic , and something that still makes me shudder reflexively when I look back at it  . And that, my friends, is just about when I put together the following terminology. Mind you, far be it for me to tell you to adopt this terminology. Heck, you may very well vehemently disagree with it — and that’s ok. The thing is, whatever terminology you use needs to be agreed upon by everybody! (And, you’ve probably got all the same stuff below, just broken ...

Ignorance Is NOT Bliss — Flaky Tests Edition

Image
We’ve all dealt with flaky tests, right? Tests that don’t consistently pass or fail, but are, instead,  nondeterministic? They’re deeply annoying, end up being a huge time-suck, and inevitably end up occupying the bulk of your “productive” time. I’ve written about the common reasons for flakiness before , viz.,  External Components, Infrastructure, Setup/Teardown,  and,  Complexity  ( read this post for the details  — it’s a very short read!). The big takeaway from that article though is that  flakiness never goes away, it just moves up the food-chain!  For example, as you clean up your  Infrastructure issues, you’ll start running into issues with  Setup/Teardown . Fix those, and you’re now dealing with  Complexity . And there is, of course, a special place of pain involved with anything involving  distributed systems , what with consensus, transactions, and whatnot. There are  many many ways of dealing with flakiness...

Canary Release, and Experimentation

Image
Canary Release  (also called  phased , or  incremental deployment) is when you deploy your new release to a subset of servers, check it out there against real live production traffic, and take it from there — rolling it out, or rolling it back, depending on what happened. You’re basically doing the  Canary In A Coal Mine  thing, using a few servers and live traffic to ensure that at worst, you only affect a subset of your users. It’s not a bad approach at all, and depending on how you do this, can be quite efficient resource-wise (you don’t need an entire second environment a-la   Blue-Green releases ). Mind you, the flip side to this is that you need to   really careful about compatibility. You’ve got multiple releases running at the same time, so things like data versioning, persistence formats, process flows, transaction recovery, etc. need to either be forward/backward compatible, or very (!!!) carefully partitioned/rollback-able. The tri...

Blue-Green Deployments … and Persistence-Hell

Image
Blue-Green deployments are what you’re most likely doing in the early stages of your product (and, quite possibly, later on too!). Basically, you have two “production” environments — call them “Blue” and “Green” — with only one of them live at any point in time (say, “Blue”). You test out your new release on the “Green” environment, and when you’re good to go, switch over the traffic in one shot.  There are about a jillion variations of this, and names too (heck,  Netflix calls it Red/Black ), but the underlying philosophy is just about the same. (•) The thing is, this is remarkably simple and straightforward if each deployment is a “change the world” scenario — you’re shoving out the latest documentation, or deploying a self-contained app, or other suchlike. But what if you  do  need to remember the past? Deal with  Persistence  as it were? The answer, as with most things, is “ it depends ”. 1. Is there any change in the way the new version wor...

Smooth is Fast when it comes to Deployments

Image
“ Are you deployments predictable? ” The answer to the above really,   really   depends on the context. After all if all you want to do is spin the latest build up on AWS, then nothing could be simpler right? Unless, of course, by “ predictable ” you mean “ It takes Alice 7 minutes to spin up the build ”, and Alice is on vacation. Or your pipeline is automated   once you log in , and your laptop crashed. Or hey, your laptop is just fine, but you ran up against an AWS limit (“ too many EC2 instances ”). And that presupposes that the build actually exists. After all, where did   that come from? And is it a unitary thing, or does “ build ” involve some kind of complex orchestration, involving getting artifacts from   here , doing a docker thing   there , and so forth. Is   that   process predictable? Or are you also at the vagaries of Alice, laptops, DockerHub, and whatnot? The point behind all this is that you need to   know   th...

Flaky Tests — The Bane of Existence

Image
We’ve all had to deal with  flaky tests  — tests that don’t consistently pass or fail, but are, instead,  nondeterministic . They’re deeply annoying, end up being a huge time-suck, and inevitably end up occupying the bulk of your “productive” time. There are many,  many  reasons for flakiness, but in my experience, the vast majority of them can be boiled down to some combination of the following 1. External Components : When the code relies on something that isn’t under it’s control, and makes assumptions about it. I’ve seen people validate internet access by retrieving  http://google.com  (“ because Google is always up ”, conveniently ignoring the path from the test environment to Google), assume that there is a GPU present (“ because it’s Bob’s code, and he always runs it on his desktop ”), and so forth. The thing is, these assumptions get made even after stubbing — our assumptions about the environment we’re in can frequently lead us to places wh...