Enhancing CI/CD: A Deep Dive into Building a Resilient Maestro Test Pipeline

In the realm of mobile application development, ensuring a seamless user experience is paramount. UI testing frameworks like Maestro have become essential for validating application behavior. However, integrating these tests into a Continuous Integration (CI) pipeline, especially one that spans multiple repositories and cloud device services, presents a unique set of challenges. This article details the journey of evolving a CI pipeline for running Maestro tests on DeviceCloud, highlighting the problems encountered, the solutions implemented, and the robust system that emerged.

The Problem Statement: The Need for Reliable Automated UI Testing

For the Android application, a process was required to automate Maestro UI tests on real devices to catch regressions and bugs before they reached users. The initial goal was to create a GitHub Actions workflow that could:

Trigger a new build of the Android application in its own repository.
Wait for the build to complete and retrieve the resulting APK file.
Upload the APK to DeviceCloud and execute the suite of Maestro tests.
Report the results to the development team.

An initial version of this pipeline was created, but it was soon discovered that it suffered from issues with reliability and reporting accuracy, which undermined its effectiveness.

The Initial Implementation and Its Shortcomings

The first iteration of the workflow used existing GitHub Actions to orchestrate the process. A third-party action was employed to send a repository_dispatch event to the Android application repository, triggering an APK build. Another action, specifically for DeviceCloud, was used to run the tests. While functional, this approach had several weaknesses:

Flaky Build Detection: The mechanism for finding the triggered build was not precise. It simply looked for the latest run, which could lead to race conditions or picking an unrelated, older build if there were delays.
Brittle Artifact Handling: The process of waiting for the build and downloading the artifact was not resilient. It would often fail due to transient network issues or GitHub API rate limits.
Inaccurate Test Reporting: A significant issue was found in how test results were counted. DeviceCloud has a retry mechanism for failed tests. The initial pipeline parsed raw logs to count passed and failed tests. If a test failed on its first attempt but passed on a retry, it was incorrectly counted as one failure and one pass, skewing the final metrics.

The Rework: Engineering for Reliability and Accuracy

To address these shortcomings, a significant overhaul of the CI workflow was undertaken. The focus was on building a more robust and intelligent system.

Challenge 1: Reliably Triggering and Tracking Cross-Repository Builds

The first problem to be solved was the unreliable method of triggering and tracking the APK build. Instead of using a third-party action, a custom script using the official GitHub CLI (gh) was implemented.

A key improvement was the introduction of a timestamp. Before a build is triggered, the current time is recorded. The workflow then polls for a new workflow run in the Android repository that was created after this start time. This ensures the correct build is being tracked.

Furthermore, a comprehensive retry mechanism was built around the entire process. If a new workflow run is not found after a certain period, or if the artifact download fails, the entire process of triggering a build and waiting for the artifact is attempted again. This multi-layered resilience was crucial for overcoming the transient failures that previously plagued the pipeline.

Challenge 2: Achieving Accurate Test Result Parsing

To solve the issue of inaccurate test reporting, the implementation was shifted from parsing raw text logs to using the DeviceCloud Status API. After a test run is initiated, the workflow now makes an API call to dcd status --json to fetch structured JSON data containing the results of every test flow, including all retry attempts. This structured data is then processed using jq. The tests are grouped by name, and a final status is determined for each. A test is considered “PASSED” if any of its attempts succeeded. This logic ensures that the final counts for passed, failed, and skipped tests are accurate, providing a true reflection of the application’s state. As a fallback, if the API call fails, the workflow can still parse the raw logs, ensuring that some results are always available.

Challenge 3: Moving to a CLI-First Approach

A conscious decision was made to move away from a dedicated “all-in-one” GitHub Action for DeviceCloud and instead use the dcd command-line interface (CLI) directly. This change provided finer-grained control over the execution, improved logging, and made it easier to capture critical information, like the DeviceCloud console URL, even if the test execution step itself failed.

The Final Implementation: A Walkthrough

The resulting workflow is a testament to these improvements.

A Workflow run of the implementation

Here are the key stages of the final pipeline:

Setup: The workflow runner is prepared by installing node, jq, bc (for calculations), and the GitHub CLI.
Trigger and Download APK: The robust, timestamp-based script is executed. It triggers the APK build in the separate Android application repository, waits for its completion, and downloads the *-debug.apk artifact, with retries at every stage.
Run DeviceCloud Tests: The dcd CLI is installed. The dcd cloud command is then used to upload the APK and run the Maestro flows defined in the workspace config file. The output is logged to a file.
Parse Test Results: This step first attempts to fetch and parse the structured JSON results from the DeviceCloud API. If that fails, it falls back to parsing the log file. Accurate counts for passed, failed, and skipped tests are generated.
Send Slack Notification: A detailed summary of the test run is compiled and sent to a Slack channel. The message includes the pass rate, counts for each test status, and direct links to the build log and the DeviceCloud console for easy debugging.

Slack message notification on completion

A deviceCloud testrun report

Benefits and Future Improvements

This enhanced pipeline has yielded significant benefits. It is far more reliable, with a much lower rate of spurious failures. The test result reports are now accurate, giving the team confidence in the metrics. Finally, the improved logging and direct links have made the pipeline more maintainable and easier to debug.

While the current system is a major step forward, further improvements can be considered:

Parallelization: Test execution could be parallelized across a wider range of devices and Android API levels in DeviceCloud.
Dynamic Test Selection: A more advanced implementation could dynamically select which Maestro flows to run based on the specific code changes in a pull request.
Optimizing Build Times: The APK build process itself could be analyzed for potential optimizations.

In conclusion, by systematically identifying and addressing the weaknesses of an initial CI implementation, a highly reliable and accurate automated UI testing pipeline was constructed. This demonstrates that for complex CI workflows, investing in custom scripting, robust error handling, and the use of structured data APIs can pay significant dividends.

Curiositas

Curiositas

Enhancing CI/CD: A Deep Dive into Building a Resilient Maestro Test Pipeline

The Problem Statement: The Need for Reliable Automated UI Testing

The Initial Implementation and Its Shortcomings

The Rework: Engineering for Reliability and Accuracy

Challenge 1: Reliably Triggering and Tracking Cross-Repository Builds

Challenge 2: Achieving Accurate Test Result Parsing

Challenge 3: Moving to a CLI-First Approach

The Final Implementation: A Walkthrough

Benefits and Future Improvements

Punit Goswami

Leave a ReplyCancel Reply

Curiositas

Curiositas

The Problem Statement: The Need for Reliable Automated UI Testing

The Initial Implementation and Its Shortcomings

The Rework: Engineering for Reliability and Accuracy

Challenge 1: Reliably Triggering and Tracking Cross-Repository Builds

Challenge 2: Achieving Accurate Test Result Parsing

Challenge 3: Moving to a CLI-First Approach

The Final Implementation: A Walkthrough

Benefits and Future Improvements

Punit Goswami

Related Posts

Mastering Maestro: A Guide to Crafting Robust Test Flows

Mobile Apps UI Testing using the Maestro Framework

How to show common overlay loader in Angular

Leave a ReplyCancel Reply