Instrumentation Test Batching Guide

What is Test Batching?

Outside of Chromium, it is most common to run all tests of a test suite using a single adb shell am instrument command (a single execution / OS process). However, Chromium's test runner runs each test using a separate command, which means tests cannot interfere with one another, but also that tests take much longer to run. Test batching is a way to make our tests run faster by not restarting the process between every test.

All on-device tests would ideally be annotated with one of:

  • @Batch(Batch.UNIT_TESTS): For tests the do not rely on global application state.
  • @Batch(Batch.PER_CLASS): For test classes where the process does not need to be restarted between @Tests within the class, but should be restarted before and after the suite runs.
  • @DoNotBatch(reason = "...": For tests classes that require the process to be restarted for each test or are infeasible to batch.

Tests that are not annotated are treated as @DoNotBatch and are assumed to have not yet been assessed.

Is the Activity kept between test cases?

  • The browser process is always kept.
  • The Android Activities are finished between test cases in Activity-restarted batched tests.
  • The Android Activities are kept between test cases in Activity-reused batched tests.
    • This means the developer needs to get the app state back to where the next test in the batch assumes it starts.

How to tell Activity-restarted from Activity-reused batched tests

A batched test is Activity-restarted if the ActivityTestRule is a non-static @Rule. It is the ActivityTestRule which stops activities after each test case.

Activity-restarted Public Transit tests use the FreshCtaTransitRule.

A batched test is Activity-reused if either:

  • The ActivityTestRule is a static @ClassRule
  • setFinishActivity(false) is called on the non-static ActivityTestRule. BlankCTATabInitialStateRule does this.

In either batched test type, @ClassRules are applied only once per batch, while @Rules are applied once per test case. @BeforeClass and @AfterClass are the same. This contrasts with non-batched tests, where both @ClassRules and @Rules are applied for each test case, and both @BeforeClass and @Before are run for each test case (same for @AfterClass / @After).

Activity-reused Public Transit tests use the ReusedCtaTransitRule or the AutoResetCtaTransitRule.

Performance

In terms of performance:

Activity-reused batched tests > Activity-restarted batched tests > non-batched
tests

Activity-restarted batched tests are faster than non-batched tests. This different is huge in Debug (>50%) and significant in Release (~25%).

Activity-reused batched tests are even faster than Activity-restarted batched tests. This difference is significant in Debug and huge in Release.

How to Batch a Test

Add the @Batch annotation to the test class, and ensure that each test within the chosen batch doesn't leave behind state that could cause other tests in the batch to fail.

For some tests, batching won’t be as useful (tests that test Activity startup, for example), and tests that test process startup shouldn’t be batched at all.

If a few tests within a larger batched suite cannot be batched (eg. it tests process initialization), you may add the @RequiresRestart annotation to test methods to exclude them from the batch.

Types of Batched tests

UNIT_TESTS

Tests that belong in this category are tests that are effectively unit tests. They may be written as instrumentation tests rather than junit tests for a variety of reasons such as needing to use real Android APIs, or needing to use the native library.

Batching Unit Test style tests is usually fairly simple (example). It requires adding the @Batch(Batch.UNIT_TESTS) annotation, and ensuring no global state, like test overrides, persists across tests. Unit Tests should also not start the browser process, but may load the native library. Note that even with Batched tests, the test fixture (the class) is recreated for each test.

Note that since the browser isn't initialized for unit tests, if you would like to take advantage of feature annotations in your test you will have to use Features.JUnitProcessor instead of Features.InstrumentationProcessor.

PER_CLASS

This batching type is typically for larger and more complex test suites, and will run the suite in its own batch. This will limit side-effects and reduce the complexity of managing state from these tests as you only have to think about tests within the suite.

Tests with different @Features annotations (@EnableFeatures and @DisableFeatures) or @CommandLineFlags will be run in separate batches.

Custom

This batching type is best for smaller and less complex test suites, that require browser initialization, or something else that prevents them from being unit tests. Custom batches allow you to pay the process startup cost once per batch instead of once per test suite. To put multiple test suites into the same batch, you will have to use a shared custom batch name (example). When batching across suites you’ll want to use something like BlankCTATabInitialStateRule to persist static state (like the Activity) between test suites and perform any necessary state cleanup between tests.

Note that there is an inherent tradeoff here between batch size and debuggability - the larger your batch, the harder it will be to diagnose one test causing a different test to fail/flake. I would recommend grouping tests semantically to make it easier to understand relationships between the tests and which shared state is relevant.

Running Test Batches

Run all tests with @Batch=UnitTests:

out/<dir>/bin/run_chrome_public_unit_test_apk -A Batch=UnitTests

out/<dir>/bin/run_chrome_public_test_apk -A Batch=UnitTests

Run all tests in a custom batch:

./tools/autotest.py -C out/Debug BluetoothChooserDialogTest \
--gtest_filter="*" -A Batch=device_dialog

Things worth noting

  • Important for Activity-reused tests: @ClassRule and @BeforeClass/@AfterClass run during test listing, so don’t do any heavy work in them (and will run twice for parameterized tests). See issue 1090043.
  • Sometimes it can be very difficult to figure out which test in a batch is causing another test to fail. A good first step is to minimize _TEST_BATCH_MAX_GROUP_SIZE to minimize the number of tests within the batch while still reproducing the failure. Then, you can use multiple gtest filter patterns to control which tests run together. Ex:
    ./tools/autotest.py -C out/Debug ExternalNavigationHandlerTest \
    --gtest_filter="*#testOrdinaryIncognitoUri:*#testChromeReferrer"