Earlier this year I took over that project at my new company. A project, that existed for many years and has been continuously growing. My first impression, it was missing some love recently. The repository was cluttered by many files, that could assumed to be dead code. Unfortunately, you never know. Although I felt the urgent need of removing stuff, I was able to keep myself from blindly deleting files and breaking everything ;). The mission was clear: Cleaning up the project, without breaking things.
We started by browsing through the code and removing everything, which was collective agreed to be dead. But sometimes you simply cannot tell, if that piece of code is still in use. Even worse, good old human error can make you delete something, which is still in use. So I had to find a better way to clean the project. Thankfully, there are tools out there, which can help you finding dead pieces of code. phpdcd is such a tool and PHPStorm has built-in code analysis and dead code detection. Both do a quite decent job, but pure static code analysis has its limitations – especially in dynamic languages such as PHP.
First, they’re not able to detect dynamic paths in the code. This usually results in code being wrongly detected as dead. Finding dynamic calls programmatically can be hard, sometimes even impossible, when you do dirty things like
$object->$method() or using reflections. The second problem of static code analysis is, that it only checks for references of methods/functions. It can’t tell, if a path in the code is actually invoked, when the software is executed. Often, a piece of code is still linked somewhere, but the feature is no longer in use and therefore those lines will never be executed. This is also dead code, which should be removed, because it adds unnecessary complexity to the project.
So, what else could I do? I searched the web and came across that interesting concept of tombstones. If you haven’t heard of tombstones yet, I highly recommend this article and watching the video of David Schnepper’s ignite talk. A tombstone is basically an executable marker in your code (in the PHP world: a function call), which is placed in fragments of code, that you’ve assumed to be dead. Then, everything is deployed to production and, when a tombstone is invoked, it writes some data to a log. After a while, the logs will enable you to identify dead and undead code (called “Vampires”) in your project.
I wanted to try out that concept, but to my surprise didn’t find a ready-to-use implementation for PHP. So I started to implement a library for tombstones on my own, and here it is: Tombstones for PHP. It is split into two packages, scheb/tombstone and scheb/tombstone-analyzer. The first, lightweight one provides the basic functionality, which is required for placing tombstones in the code and logging their invocations. The other one adds report-generation on top of it. I recommend adding that one to the
require-dev part of
composer.json, since it is most probably not needed in a production environment.
- scheb/tombstone is basically stable. I’ll maybe do some refactoring and eventually move some code from/to the analyzer, but the public interface is stable.
- scheb/tombstone-analyzer is work in progress. It’s still a little bit dirty and hardly covered by tests. Apart from that, the console command is working and will be kept like this.
The tombstone-analyzer is a command line tool, which aggregates the log files and creates a textual overview of all tombstones, their invokers (if any), their age and other useful information. It can also create a HTML report, which is the more useful one in my opinion, similar to what PHPUnit is doing for code coverage (the template is pretty much the same, thanks to Sebastian Bergmann, who allowed me to re-use it). In the HTML report you can browse the source code and find the tombstones highlighted as dead or undead.
As always, if the concept of tombstones fits to your project, highly depends on the circumstances. Tombstones are great to clean up a software with a lot of legacy code, where no one can tell, which parts of the code are still in use and static code analysis reaches its limit. But you have to be aware of the overhead created by tombstones. Logging takes at least some time. So maybe it’s not the best choice for a high-performance system. And with the tombstone, you add another potential point of error, when logging fails for some reason. If you’re fine with all of its disadvantages, give it a try.
Feedback on the tombstones library is welcome. Either on GitHub or simply down in the comments. Happy dead code removal and vampire hunting!