Why Webtrekk replaced massive releases with continuous deployment
A post from Webtrekk Software Developer Tony Horst
This is part of the Dev Blog series, posts written in the language of programmers.
Many companies treat a new software release the same way a young child treats Christmas: It’s a rare event, packed with equal parts excitement and angst: How good have we been?
Until this one special day, legions of employees develop, test and pre-sell software. Since even a partial failure is usually absolutely unacceptable, releases can understandably cause some sweaty palms.
This post will examine how we generate happiness on release days. Maybe not Christmas happiness, but at least a mini holiday for our developers.
Release Process at Webtrekk
Webtrekk used to be that like that anxious child on Christmas morning. Releases were planned only a few times per year, normally to coincide with important fairs or events, and included a huge set of features and/or bug fixes.
This rigid schedule has been softened by introducing release trains as a step towards continuous deployment. As a result, technical releases can happen more frequently and are disconnected from full-scale product releases.
These release trains depart biweekly and contain a set of features that is first tested on an “integration stage” to ensure it interacts correctly with existing components. During this phase there is a deadline, after which the deployed set of features is immutable. Developers now usually have about a week to add bug fixes and test how their features interact with any dependent components.
Afterwards, this set of features is deployed onto a “quality assurance stage” for a final test. This is where Test Engineers normally perform their more specific tests, including regression tests. If this final test is positive, the set of features and the corresponding state of all software components can go live.
As soon as the release train reaches the quality assurance stage, the software’s state is final and unchangeable for most components. If a malfunction in any of the train’s features and changes results in a component’s failure, the entire release train is stopped and the release is either skipped or postponed.
It’s usually impossible to subsequently deploy a feature set with a corresponding fix or a removal of the malfunctioning feature.
Feature Toggles vs. Business Toggles
The danger (and annoyance) of stopping an entire release train can be avoided by using so-called “toggles” to switch certain functionalities on or off – another aspect of our continuous deployment approach. This is increasingly done for certain functions of the Webtrekk Digital Intelligence Suite.
For most teams, development happens in separate branches, following Git-Flow principles, with the master branch representing the current live system.
If the live system needs a hotfix – that is, a quick fix to address a small, very specific issue – a hotfix branch is created from the respective marker on the master branch and deployed to the live system after successful quality assurance. Here a toggle is basically an IF-THEN-ELSE condition that can be controlled externally.
In general, toggles can be divided into “feature toggles” and “business toggles”. Feature toggles only live until a feature’s development is complete, while business toggles exist for an indeterminate period and usually hide functionalities that need to be requested and paid for by customers.
By using toggles instead of branches, the so-called “merge hell” – in other words, a situation where integrating upgrades takes longer than implementing those upgrades – can be avoided. Because merging branches is unnecessary, this eliminates any problems arising from functional code or test code being overwritten, or from merge conflicts being misinterpreted by colleagues.
Even though this theoretically makes branches superfluous, self-containing features with self-reliant toggles can also be developed in separate branches. Those will then later get merged into the master branch and tested with all other toggle combinations.
This way the system’s correct function isn’t hindered by incomplete features that may have been added to the master branch for testing, since these features can be deactivated any time.
Using feature toggles, every developer, tester or product owner can configure their environment by purposefully activating and deactivating features. A tester can then unlock a feature in the live system and test it with live data without customers noticing anything. And product owners can use the quality assurance stage or any other environment to assemble the features that will be included in the next product release.
However, just like branches, feature toggles should be removed as soon as a feature’s development is complete. Toggles result in technical debt for developers, and double the testing effort needed to ensure correct functionality (as long as testers check all possible toggle states).
Example “Business Toggle”
Predictions, as described in an earlier blog entry, can be activated for Analytics by using their business toggle. Here a boolean flag decides whether the corresponding dimensions and metrics are enabled for specific accounts.
This fairly old business toggle affects several components. It is managed by a central application and stored in a specific account database. As a result, features using predictions have to connect to that account database and retrieve the value of a specific key.
Example “Feature Toggle”
Feature toggles have been particularly helpful for testing relatively new Webtrekk features, such as Predefined Dashboards, and for validating report elements.
For performance reasons, database storage was skipped, giving the corresponding services full sovereignty over their feature toggles.
Even though toggles can be used and implemented without using any libraries, these feature toggles are currently using the external Java library Togglz.
This library was chosen due to its Java MyBatis DB integration, easy code usage and ability to manage all toggles via a web interface for different stages.
In the example above, as visible by the traffic light status indicator, the toggle for Predefined Dashboards is active and the toggle for validating elements has been deactivated.
Access to the web interface is restricted to internal connections and can be configured and removed as required via the source code’s TogglzFactory.
Reasonable and high-performing persistence of feature toggles will be a necessary step towards a successful future.
Webtrekk is currently considering a dedicated central management for feature toggles used by different services, similar to the existing one for business toggles.
As in many other companies, the very usage of toggles is still contested and is managed differently by Webtrekk’s teams. The common conflict “feature branches vs. feature toggles” – which will ring a bell to more than a few programmers – is still continuing at Webtrekk.
As is often the case, the golden mean between branches and toggles may prove successful. Or the future might bring a new possibility, leading to both branches and toggles vanishing.
For now, Webtrekk will continue to use and improve feature toggles on the path towards continuous deployment.