Reproducible Builds
Last September a lot of users got infected with malware when they downloaded the CCleaner update. Hackers had been able to hide the Floxif Trojan inside the installer of the update.
CCleaner is a utility program used to clean potentially unwanted files and invalid Windows Registry entries from a computer. This is a chart from their own website that shows what it can do:
But, due to the hack, the product suddenly contained a new feature:
Malware!
How was this possible? And more importantly, how could it have been prevented?
When you download something from the internet you always have to be careful. Make sure that you download from trusted sources only and when you do so, always make sure you’re using a secure connection and that you check the signature1 if it’s available.
But then, had their users been careless? Not at all. The infected download package was hosted on the official website and it had been signed with a valid certificate. Users downloading and installing this file could not have known that they were actually getting infected with malware.
The reason why this was possible is that the attackers were able to get their hands on the product before it got signed and offered for download. This is why all the checks that a user could do were futile.
But could this have been prevented?
Reproducible Builds to the rescue!
Reproducible builds are a set of software development practices that create a verifiable path from human readable source code to the binary code used by computers. What that means is that multiple parties can create the binaries from the source code and ensure that they have the exact same result. This is exactly what is needed, because if the binaries are distributed across multiple parties and everyone gets the exact same output, it is very unlikely this hack would have succeeded as they would have had to compromise each and every party. This of course assumes that the source code has also been verified, but since source code is human readable that is much easier to do. If I were to manipulate the source code in a malicious way and check that into a versioning system, my commit would be visible for anyone working with the source code.
How does it work?
First: the build system needs to be made entirely deterministic. This means that when the source code is compiled, it always needs to give the same output. This has consequences for the code because it cannot, for example, have a random generator create output for the binary.
Second: the build environment should either be recorded or pre-defined. To give an example: source code is typically written in a language that knows many versions and functions may behave differently between these different versions. This is one of those aspects that will need to be exactly the same if it is to generate the same output.
Third: users should be given a way to recreate a close enough build environment, perform the build process, and verify that the output matches the original build.
More information
Curious? This website has a lot of information. (In fact, it has been a big source of information for this post)
1 A digital signature allows you to verify that the contents have not been altered, because different content would result in a different signature.