Scripts to help guide cleanup of #include lines in a codebase, using clangd
apply_include_changes.py
- Apply include changes to files in the source treefilter_include_changes.py
- Filter include changes outputpost_process_compilation_db.py
- Post-process the clang compilation database for analysisset_edge_weights.py
- Set edge weights in include changes outputsuggest_include_changes.py
- Suggests includes to add and remove
To use these scripts, you'll need:
- A release of
clangd
which has "IncludeCleaner" with support for missing includes (17.0.0+) - The full output of
//tools/clang/scripts/analyze_includes.py
, see discussion on the mailing list for how to generate it - A compilation database for
clangd
to use, which can be generated withgn gen . --export-compile-commands
in the Chromium output directory- The generated
compile_commands.json
should be post-processed with thepost_process_compilation_db.py
script for best results
- The generated
$ pip install -r ~/chromium-include-cleanup/requirements.txt
To get suggestions for includes to add, and other tweaks, clangd
needs to be
patched with the patches in clangd_patches
and built from source.
You need to enable MissingIncludes
and UnusedIncludes
diagnostics in a
clangd
config file:
Diagnostics:
MissingIncludes: Strict
UnusedIncludes: Strict
These instructions assume you've already built and processed the build
log with //tools/clang/scripts/analyze_includes.py
, if you haven't, see the link above under
"Prerequisites". It assumes the output is at ~/include-analysis.js
, so
adjust to taste.
This also assumes you have clangd
on your $PATH
.
$ cd ~/chromium/src/out/Default
$ gn gen . --export-compile-commands
$ python3 ~/chromium-include-cleanup/post_process_compilation_db.py compile_commands.json > compile_commands-fixed.json
$ mv compile_commands-fixed.json compile_commands.json
$ cd ../../
$ python3 ~/chromium-include-cleanup/suggest_include_changes.py --compile-commands-dir=out/Default ~/include-analysis.js > ~/unused-edges.csv
$ python3 ~/chromium-include-cleanup/set_edge_weights.py ~/unused-edges.csv ~/include-analysis.js --config ~/chromium-include-cleanup/configs/chromium.json > ~/weighted-unused-edges.csv
Another useful option is --filename-filter=^base/
, which lets you filter the
files which will be analyzed, which can speed things up considerably if it is
limited to a subset of the codebase.
Edge weights are set in a separate script to allow quick iteration, since
suggest_include_changes.py
takes many hours to run. The default metric
for edge weights pulls the "Added Size" metric from the include analysis
output. This means new weights can be easily be applied to the output of
suggest_include_changes.py
by downloading the latest hosted include
analysis output at https://commondatastorage.googleapis.com/chromium-browser-clang/include-analysis.js,
but mileage may vary since you're combining output from your local build
and the hosted build.
For a full codebase run of the suggest_include_changes.py
script on Ubuntu,
it takes 7 hours on a 4 core, 8 thread machine. clangd
is highly parallel
though, and the script is configured to use all available logical CPUs, so it
will scale well on beefier machines.
Currently the suggest_include_changes.py
script has problems with suggesting
includes to remove when the filename in the #include
line does not match the
filename in the include analysis output, which could happen for includes
inside third-party code which is including relative to itself, not the source
root.
When suggesting includes to add, clangd
will sometimes suggest headers which
are internal to the standard library, like <__hash_table>
, rather than the
public header. Unfortunately these cases can't be disambiguated by this script,
since there's not enough information to work off of.
These scripts rely on clangd
and specifically the "IncludeCleaner" feature
to determine which includes are unused, and which headers need to be added.
With the Chromium codebase, there are many places where clangd
will return
false positives, suggesting that an include is not used when it actually is.
As such, the output is more of a guide than something which can be used as-is
in an automated situation.
Known situations in Chromium where clangd
will produce false positives:
- When an include is only used for a
friend class
declaration - When the code using an include is inside an
#ifdef
not used on the system which built the codebase - Macros in general are often a struggle point
- Umbrella headers
- Certain forward declarations seem to be flagged incorrectly as the canonical location for a symbol, such as "base/callback_forward.h"
- Forward declarations in the file being analyzed
clangd
won't consider an include unused even if forward declarations exist which make it unnecessaryclangd
will still suggest an include even if a forward declaration makes it unnecessary