Automated testing in Yahoo Mail
Background:
We blogged about the evolution of Yahoo Mail to React + Flux and Node.js. It is important to focus on building a strong foundation when you are building a new platform and having a robust test infrastructure is a big part of the foundation. Yahoo Mail today relies on automated testing on our Continuous Integration pipeline before we deploy changes to production. We run Cucumber and Waitr-Webdriver based functional tests run across IE, Chrome and Firefox using selenium to certify our builds. Building this infrastructure gave us a lot of insight into the challenges of doing automated testing at the scale of Yahoo Mail.
Our requirements for a robust automated test infrastructure are as follows
- Comprehensive
- Fast and consistent
- Easy to maintain
- Shorter learning curve
All engineers are accountable for quality of the product and maintaining the infrastructure. So the infrastructure should be easy to understand. We want to make sure that the tests are comprehensive enough so that it gives us the confidence to push the code everyday without human intervention and we should have the ability to run the tests multiple times a day.
There are different levels and capabilities of tests we could have invested in. Based on the above requirements, we chose to focus on the following types of tests
Unit Tests:
We have used multiple unit testing infrastructures in Mail in the past. So going by our experience, we arrived fairly quickly at our decision to use Mocha as our test framework, Karma as our test runner and Chai for assertion. We also decided to use Sinon to stub out external methods.
Functional Tests:
Now comes the interesting part. We knew that we needed to test UI thoroughly since there are chances of things breaking when code components start interacting with each other. No one can just rely on unit tests to determine whether the code will work as expected.
On the other hand, we had to be careful about what we actually end up testing as part of the functional test suite. In an application like mail, executing our tests on actual mail data could mean that the functional tests are actually executing as integration tests. It was important for us to call out this difference that functional tests should just test the functionality of the code written agnostic to the actual data. Working with actual data brings in dependencies to setup the account to a given initial state and also go over network to all our sub-systems every time. This can be time consuming and can potentially trigger false alarms.
We divided our functional tests into two categories
- Tests at Component Level
- Tests at App Level
Component level tests focused on functionally testing a component in isolation. We would pass different props and see if the component behaved as expected. React TestUtils with Karma and PhantomJS seemed like the way to go as it was fast and easy to write. We earlier blogged about our experience with using React TestUtils for this.
App level tests focused on launching the entire app in the browser and testing real user scenarios. For example, Marking a mail as read. The Mail backend and APIs have their own robust automated tests, so our focus was to test the functionality of the client. So we decided to stub our data request layer using Sinon and return responses that can execute all the code paths for the given functionality. This means our test runs are very fast, the tests are reliable and predictable.
Now for the choice of framework, we narrowed down to two options. First option was to use the already familiar Waitr Webdriver with Cucumber. We loved this because we can write true BDD styles tests with Cucumber. We had well integrated tooling around this like support for screen capture for failing tests and running tests in parallel. On the down side, not everyone was comfortable with the relaxed syntax of Ruby, there were plenty of hacks done to make sure the tests run consistently on Chrome, Firefox and IE.
The second option was to use Protractor. The biggest advantage Protractor brought to the table was that we would be doing development and tests in the same language, Javascript. This would eliminate learning curve for writing and debugging tests. Protractor also speeds up tests by avoiding the need for a lot of "sleeps" and "waits" in tests. We used chai-as-promised for asserting promises returned by Protractor. Even though Protractor was built for testing AngularJS applications, in reality it can be used for testing any web application. It had everything that you would expect from UI testing framework.
Based on the benefits we saw and the fact that other teams in Yahoo were also going with Protractor, we chose the second option. Since we had good experience with BDD, it was easy to set this up with Protractor in the future if needed.
Organizing Protractor tests:
A typical protractor test starts to get messy very soon with promise chains. We wanted our code to be readable and maintainable. So we started creating Page Objects for the different components of our app that eliminated duplication of code and made our test files to read more like business-like expressions. With page objects and support for adding stubs for API, our protractor test started looking like this
it('marks a conversation as read', function() { var conversation; testHelper.stub('ReadConversations', 'read_success'); conversation = page.conversationList.getConversation(1); expect(conversation.isRead()).to.eventually.equal(false); conversation.read(); expect(conversation.isRead()).to.eventually.equal(true); });
Smoke and Integration Tests:
We chose to use the Protractor setup for writing smoke and integration tests as well. The only difference is that with Smoke and Integration tests we interact with actual mail data instead of using stubs. Smoke tests comprise of a collection of tests just to make sure application is able to launch and the core flows work. A core flow could be something as simple as whether clicking on compose and sending a message works.
Integration tests are the meanest, most intensive tests we want to run on actual data before the code goes to production. Using protractor for functional, smoke and integration tests meant reusing the same infrastructure and same code for running all the tests (e.g. page objects and specs)
Running tests in the pipeline
Having a clear separation between the various tests meant we can now configure the various stages in the automation pipeline to run the test suites. The unit tests are the lowest level tests we want to run to make sure the code units are all working as expected. We want these to be completed within 5 minutes. The functional tests make sure that the components work well together. Smoke is a quick sanity check that nothing major is broken and the Integration tests are the true gate keepers for end to end quality.
On every pull request, we would run all unit tests, functional tests and smoke tests. That gave us high confidence whenever any code is merged to master, we are keeping the quality bar high and at the same time having them finished really quickly.
Every 3 hours, we build a production candidate from master. We put this build through all the tests including integration tests to make sure that the package is completely certified end to end.
The setup also allows for deploying individual components separately because we can now pick and choose the test suite we want to run, either the entire functional test suite or just the functional tests for the given component. We also have strict code coverage thresholds, application metric checks, payload size checks, code style ( lint ) checks etc on every build before its deployed to production.
We are overall excited to see things fall into place. We have many more challenges we need to overcome. We will do a separate blog post for a deep dive into our protractor setup where we will talk about page objects, writing synchronous code, stubbing and failing the test on js exceptions
Ankit Shah (@ankitshah2787) - Yahoo Mail Engineer

