Telemetry-Driven Tuning with Build Scan: Trace-Level Performance Diagnostics
A Build Scan serves as the ultimate observability dashboard for a Gradle build. It synthesizes the host environment, the task execution timeline, dependency resolution logs, cache hit ratios, configuration phase metrics, and exact failure causes into a deeply searchable, interactive diagnostic interface.
Its value is not merely "generating a shareable link"; its true value lies in forcing performance engineering to transition from intuition to hard cryptographic evidence. Without a Build Scan, performance discussions inevitably degrade into baseless speculation like "I feel like Kotlin is compiling slower," "Maybe R8 is acting up," or "Let's just clear the caches and see."
Generating a Build Scan
./gradlew :app:assembleDebug --scan
Executing this command commands Gradle to aggressively collect local telemetry. Upon user confirmation, it publishes this data to Gradle's hosted service, yielding a private diagnostic URL. Crucially, the --scan flag does not inject additional heavy execution logic into the build, nor does it maliciously alter the generated build artifacts.
For enterprise environments prohibited from publishing telemetry externally, internal instances like Develocity (formerly Gradle Enterprise) are mandatory infrastructure. The physical hosting location is irrelevant; maintaining the diagnostic dimensions is paramount.
The Diagnostic Sequence
When confronted with a sluggish build, analyze the scan in this strict sequence:
- Summary: Assess absolute total duration, explicit failure causes, and the Gradle/JVM/OS environmental footprint.
- Timeline: Identify the longest executing tasks and evaluate the degree of concurrent execution.
- Performance: Scrutinize the Configuration phase duration, task creation overhead, and dependency resolution latency.
- Build Cache: Analyze which tasks were eligible for caching, which hit, and—critically—the exact reasons for misses.
- Dependencies: Investigate network latency during dependency resolution and remote repository access times.
- Tests: Review the execution duration and failure distribution of test task suites.
Do not instantly fixate on the single longest task in the execution graph. If the Configuration phase accounts for an absurd percentage of the total build time, optimizing the longest execution task is attacking the wrong bottleneck entirely.
Reading the Timeline Waterfall
The Timeline visualizes the task execution flow as a chronological waterfall:
time ─────────────────────────────>
:core:compileKotlin ███████
:app:mergeResources ████
:app:dexBuilderDebug █████
:app:packageDebug ██
You must extract three critical signals from this chart:
- The Critical Path: Is a single, monolithic task dominating the entire temporal critical path?
- Serialization Anomalies: Are dozens of fast, independent tasks executing serially instead of concurrently?
- Graph Bottlenecks: Is the overall concurrency artificially constrained by poor dependency graph modeling (e.g., an unnecessary
dependsOnforcing synchronization)?
If a massive task blocks the critical path, optimizing it yields massive ROI. If latency is scattered across hundreds of micro-tasks executing serially, the structural solution is either reducing the raw task count or radically improving caching and configuration avoidance.
Analyzing the Performance Dashboard
The Performance tab reveals exactly which tasks were realized (configured and instantiated) during the configuration phase. This is the ultimate tool for verifying Task Configuration Avoidance migrations.
The Typical Anti-Pattern:
// BAD: Eagerly creates and configures all tasks immediately
tasks.withType<KotlinCompile> {
compilerOptions.jvmTarget.set(JvmTarget.JVM_17)
}
The Architectural Fix:
// GOOD: Lazily configures only the tasks that enter the execution graph
tasks.withType<KotlinCompile>().configureEach {
compilerOptions.jvmTarget.set(JvmTarget.JVM_17)
}
The Scan will definitively prove whether running a simple ./gradlew help or executing a single, isolated task is still parasiticly triggering the configuration of hundreds of entirely irrelevant tasks.
Using the Cache Dashboard to Fix Misses
The Build Cache dashboard forces you to confront the exact mechanical reason a task failed to hit the cache:
- Not cacheable: The task type inherently lacks the
@CacheableTaskannotation. - No outputs: The task executed, but produced zero output files.
- Overlapping outputs: A lethal concurrency violation where multiple distinct tasks attempted to write to the exact same output directory.
- Input property changed: A specific input (e.g., a file hash, an argument, a source file) mutated since the last run.
- Not worth caching: The task executes so rapidly that downloading the cached output from the network is mathematically slower than local execution.
When optimizing, do not pursue a maniacal goal of forcing every task to report FROM-CACHE. Tasks with side effects (executing tests, uploading artifacts, signing bundles, deploying to devices) are fundamentally uncacheable. The engineering objective is to guarantee that computationally expensive, pure-function tasks (Kotlin compilation, AAPT2 processing, Code Generation) hit the cache with near-perfect reliability.
Diagnosing Dependency Resolution Bottlenecks
Sluggish dependency resolution is typically triggered by structural misconfigurations:
- Dynamic Versions: Utilizing
1.+orlatest.releaseforces Gradle to constantly poll the network. - Volatile SNAPSHOTs: The caching TTL for changing modules is configured too aggressively.
- Suboptimal Repository Ordering: Gradle queries the slowest or highest-latency repository first.
- Redundant Declarations: Every subproject independently re-declares identical repository blocks.
- Classpath Pollution: Mixing buildscript repositories with project dependency repositories.
The Build Scan exposes the exact network latency for every repository access. The structural fixes involve centralizing repository declarations in settings.gradle.kts, ruthlessly eliminating dynamic versions, and leveraging Dependency Locking.
Establishing A/B Contrast Experiments
A legitimate performance optimization must be verifiable via a strict A/B contrast:
baseline scan: assembleDebug 92s
mutation: migrate Room annotation processor from KAPT to KSP
after scan: assembleDebug 68s
evidence: 'kaptGenerateStubs' task completely vanished; 'compileKotlin' latency reduced.
If an optimization is performed without generating an accompanying baseline and post-mutation Build Scan URL (or local profile), the actual ROI is impossible to review, and defending the build against future performance regressions becomes an exercise in guesswork.
Engineering Risks and Observability Checklist
Once Build Scan profiling logic enters a live Android monorepo, the paramount risk is not a trivial API typo; it is the catastrophic loss of build explainability. A minuscule change might trigger a massive recompilation storm, CI might spontaneously timeout, cache hits might yield untrustworthy artifacts, or a shattered variant pipeline might only be discovered post-release.
Therefore, mastering this domain requires constructing two distinct mental models: one explaining the underlying mechanics, and another defining the engineering risks, observability signals, rollback strategies, and audit boundaries. The former explains why the system behaves this way; the latter proves that it is behaving exactly as anticipated in production.
Key Risk Matrix
| Risk Vector | Trigger Condition | Direct Consequence | Observability Strategy | Mitigation Strategy |
|---|---|---|---|---|
| Missing Input Declarations | Build logic reads undeclared files or env vars. | False UP-TO-DATE flags or corrupted cache hits. | Audit input drift via --info and Build Scans. |
Model all state impacting output as @Input or Provider. |
| Absolute Path Leakage | Task keys incorporate local machine paths. | Cache misses across CI and disparate developer machines. | Diff cache keys across distinct environments. | Enforce relative path sensitivity and path normalization. |
| Configuration Phase Side Effects | Build scripts execute I/O, Git, or network requests. | Unrelated commands lag; configuration cache detonates. | Profile configuration latency via help --scan. |
Isolate side effects inside Task actions with explicit inputs/outputs. |
| Variant Pollution | Heavy tasks registered indiscriminately across all variants. | Debug builds are crippled by release-tier logic. | Inspect realized tasks and task timelines. | Utilize precise selectors to target exact variants. |
| Privilege Escalation | Scripts arbitrarily access CI secrets or user home directories. | Builds lose reproducibility; severe supply chain vulnerability. | Audit build logs and environment variable access. | Enforce principle of least privilege; use explicit secret injection. |
| Concurrency Race Conditions | Overlapping tasks write to identical output directories. | Mutually corrupted artifacts or sporadic build failures. | Scrutinize overlapping outputs reports. | Guarantee independent, isolated output directories per task. |
| Cache Contamination | Untrusted branches push poisoned artifacts to remote cache. | The entire team consumes corrupted artifacts. | Monitor remote cache push origins. | Restrict cache write permissions exclusively to trusted CI branches. |
| Rollback Paralysis | Build logic mutations are intertwined with business code changes. | Rapid triangulation is impossible during release failures. | Correlate change audits with Build Scan diffs. | Isolate build logic in independent, atomic commits. |
| Downgrade Chasms | No fallback strategy for novel Gradle/AGP APIs. | A failed upgrade paralyzes the entire engineering floor. | Maintain strict compatibility matrices and failure logs. | Preserve rollback versions and deploy feature flags. |
| Resource Leakage | Custom tasks abandon open file handles or orphaned processes. | Deletion failures or locked files on Windows/CI. | Monitor daemon logs and file lock exceptions. | Enforce Worker API or rigorous try/finally resource cleanup. |
Metrics Requiring Continuous Observation
- Does configuration phase latency scale linearly or supra-linearly with module count?
- What is the critical path task for a single local debug build?
- What is the latency delta between a CI clean build and an incremental build?
- Remote Build Cache: Hit rate, specific miss reasons, and download latency.
- Configuration Cache: Hit rate and exact invalidation triggers.
- Are Kotlin/Java compilation tasks wildly triggered by unrelated resource or dependency mutations?
- Do resource merging, DEX, R8, or packaging tasks completely rerun after a trivial code change?
- Do custom plugins eagerly realize tasks that will never be executed?
- Do build logs exhibit undeclared inputs, overlapping outputs, or screaming deprecated APIs?
- Can a published artifact be mathematically traced back to a singular source commit, dependency lock, and build scan?
- Is a failure deterministically reproducible, or does it randomly strike specific machines under high concurrency?
- Does a specific mutation violently impact development builds, test builds, and release builds simultaneously?
Rollback and Downgrade Strategies
- Isolate build logic commits from business code to enable merciless binary search (git bisect) during triaging.
- Upgrading Gradle, AGP, Kotlin, or the JDK demands a pre-verified compatibility matrix and an immediate rollback version.
- Quarantine new plugin capabilities to a single, low-risk module before unleashing them globally.
- Configure remote caches as pull-only initially; only authorize CI writes after the artifacts are proven mathematically stable.
- Novel bytecode instrumentation, code generation, or resource processing logic must be guarded by a toggle switch.
- When a release build detonates, rollback the build logic version immediately rather than nuking all caches and praying.
- Segment logs for CI timeouts to ruthlessly isolate whether the hang occurred during configuration, dependency resolution, or task execution.
- Document meticulous migration steps for irreversible build artifact mutations to prevent local developer state from decaying.
Minimum Verification Matrix
| Verification Scenario | Command or Action | Expected Signal |
|---|---|---|
| Empty Task Configuration Cost | ./gradlew help --scan |
Configuration phase is devoid of irrelevant heavy tasks. |
| Local Incremental Build | Execute the identical assemble task sequentially. |
The subsequent execution overwhelmingly reports UP-TO-DATE. |
| Cache Utilization | Wipe outputs, then enable build cache. | Cacheable tasks report FROM-CACHE. |
| Variant Isolation | Build debug and release independently. | Only tasks affiliated with the targeted variant are realized. |
| CI Reproducibility | Execute a release build in a sterile workspace. | The build survives without relying on hidden local machine files. |
| Dependency Stability | Execute dependencyInsight. |
Version selections are hyper-explainable; zero dynamic drift. |
| Configuration Cache | Execute --configuration-cache sequentially. |
The subsequent run instantly reuses the configuration cache. |
| Release Auditing | Archive the scan, mapping file, and cryptographic signatures. | The artifact is 100% traceable and capable of being rolled back. |
Audit Questions
- Does this specific block of build logic possess a named, accountable owner, or is it scattered randomly across dozens of module scripts?
- Does it silently read undeclared files, environment variables, or system properties?
- Does it brazenly execute heavy logic during the configuration phase that belongs in a task action?
- Does it blindly infect all variants, or is it surgically scoped to specific variants?
- Will it survive execution in a sterile CI environment devoid of network access and local IDE state?
- Have you committed raw credentials, API keys, or keystore paths into the repository?
- Does it shatter concurrency guarantees, for instance, by forcing multiple tasks to write to the exact same directory?
- When it fails, does it emit sufficient logging context to instantly isolate the root cause?
- Can it be instantaneously downgraded via a toggle switch to prevent it from paralyzing the entire project build?
- Is it defended by a minimal reproducible example, TestKit, or integration tests?
- Does it forcefully inflict unnecessary dependencies or task latency upon downstream modules?
- Will it survive an upgrade to the next major Gradle/AGP version, or is it parasitically hooked into volatile internal APIs?
Anti-pattern Checklist
- Weaponizing
cleanto mask input/output declaration blunders. - Hacking
afterEvaluateto patch dependency graphs that should have been elegantly modeled withProvider. - Injecting dynamic versions to sidestep dependency conflicts, thereby annihilating build reproducibility.
- Dumping the entire project's public configuration into a single, monolithic, bloated convention plugin.
- Accidentally enabling release-tier, heavy optimizations during default debug builds.
- Reading
projectstate or globalconfigurationdirectly within a task execution action. - Forcing multiple distinct tasks to share a single temporary directory.
- Blindly restarting CI when cache hit rates plummet, rather than surgically analyzing the
miss reason. - Treating build scan URLs as optional trivia rather than hard evidence for performance regressions.
- Proclaiming that because "it ran successfully in the local IDE," the CI release pipeline is guaranteed to be safe.
Minimum Practical Scripts
./gradlew help --scan
./gradlew :app:assembleDebug --scan --info
./gradlew :app:assembleDebug --build-cache --info
./gradlew :app:assembleDebug --configuration-cache
./gradlew :app:dependencies --configuration debugRuntimeClasspath
./gradlew :app:dependencyInsight --dependency <module> --configuration debugRuntimeClasspath
This matrix of commands blankets the configuration phase, execution phase, caching, configuration caching, and dependency resolution. Any architectural mutation related to "Build Profiling" must be capable of explaining its behavioral impact using at least one of these commands.