Shrinking an Android app through end-to-end testing
Let’s finally shrink the app! Many developers are well familiar with ProGuard and R8. These tools are best known when it comes to the size reduction of an Android app. Both statically analyse intermediate bytecode at compile time to shrink, minify and optimise it. Nowadays, R8 has fully replaced ProGuard.
While static-analysis tools are already doing a great job, our approach has dynamic nature — it is based on end-to-end tests. The key idea behind it is to identify and leave only actually executing code and safely remove all the redundant code. Basically, run everything and remove unused. To achieve this, we run end-to-end tests on our preprocessed app, generate the instruction coverage report, and finally remove those instructions that are identified as not executed. Thus, the bloated code (such as the Firebase dependency) to be reduced in size, and hence the app to becomes way lighter.
Test-Driven Shrinking: Run everything. Remove unused.
The following scheme gives an overview of our approach in two phases: instruction coverage measurement and shrinking.
Curious reader may find more technical details, experiments and the description to the attached scheme in the paper “Don’t Trust Me, Test Me: 100% Code Coverage for a 3rd-party Android App” (APSEC 2020).
Yet, code removal is a very sensitive operation where success highly depends on the quality of tests. Really, we almost blindly remove a bunch of code from a released app! From the other hand, the unexpected code would never run because there is no such code anymore 😏 Moreover, we benefit from the size reduction and hence improve the install and launch time.
Indeed, instructions removal contributes a lot to size reduction. Instructions are the smallest executable unit, they reference plenty of methods and therefore contribute most to the code cohesion. When we remove instructions we remove method calls among not (fully) executed methods, which removes plenty of edges on the app call graph. Thus, instructions removal reduces dependency surface for class hierarchy optimisation, and simplifies class fields, methods and further unreferenced resources removal. However, we leave class hierarchy and resource dependency analyses for another time.
In this section, let’s see how our app size changes when optimised by R8 and ACV shrinking. But first goes the experiment design.
We took our single-activity WebView-based Android app that we had discussed. The app has a single Firebase dependency that relies on AndroidX classes. The app is an APK compiled in the default release mode with no optimisations. The app allows only to open this website and accept background notifications. The APK weighs 1.1MB.
The major functionality we test:
app launching, navigating through pages
switching to front and back
background notifications for both background and foreground states
launching without internet connection
In this experiment we would like to see the actually executed code, how much we can save in size, and compare it to R8 optimisations in terms of size reduction. The general outline of our experiment is as follows:
Test the app and generate instruction coverage for not optimised APK.
Test the app and generate instruction coverage for R8-optimised APK.
Based on instruction coverage, shrink the not optimised APK.
Based on instruction coverage, shrink the R8-optimised APK.
Compare, draw conclusions.
Not optimised release APK
⇦ This is what we expect to see - the 97% instruction coverage on the ACV-shrunk app (657KB). Though it is not enough optimised.
Thus, we got the following insights:
only ~8% of all instructions were useful
app size decreased 41% from 1.1MB to 657KB
app size is still bigger than R8-optimised (see below)
static analysis (such as R8 does) would help to improve our results
Shrunk APK (R8 + ACV Shrinking)
Comparison & Results
The table below summarises the results of R8 and ACV shrinking combined. We first measured R8-optimised APKs, we then applied ACV shrinking. Thus, we could achieve the minimum app size of just 277KB. This is 24% smaller compared to R8-optimised APK and in total 75% smaller compared to not optimised app. Meanwhile DEX size decreased by 91% in total.
Automated low-level code manipulations are complicated. We have to keep execution flow, maintain bytecode correctness and moreover respect Android Verifier requirements. The example below the evolution of a single method during optimisations. First, R8 removes useless annotation. Second, not executed instructions (highlighted in red) get removed. However, Android Verifier forces us to keep not executing monitor-exit instructions for safety in synchronised methods. That's is why get ~97% coverage, but never 100%.
Further, because ACV shrinking does not implement such a clever static analysis as R8 does, for now we keep stubs — definitions of methods that are not called anymore. We can't just remove them yet because most of classes participate in complicated code inheritance relationships. As you can see, our report highlights stubbed methods in grey. We consider their removal in the next version of our tool with the help of class hierarchy analysis.
To conclude, measuring instruction coverage may significantly improve the results of static analysers in terms of app size shrinking. In our experiment, test-driven shrinking managed to remove 24% of R8-optimised APK, meanwhile DEX size reduced by 78.5%. Altogether this APK became 75% smaller (DEX decreased by 91%) compared to the initial app size. But we will do even better, stay tuned!