Quite some time ago I’ve already written about linting PHP files in the shell / build.
Even though parallel linting is fine, when the code-base is growing larger and larger, the build becomes slower and slower. A slow build sucks.
Git to the Rescue
One easy way to speed things up again is by reducing the number of files. So instead of linting all PHP files in directories, only linting the files that recently changed is a power-up.
For example if working with topic branches (e.g. bug-fixes or features) and having a main branch where changes are normally merged in (e.g. develop
or master
) the list of files to lint can be generated by listing all file-names changed (modified, added, deleted, …) between HEAD
and the target branch (e.g. develop
or master
).
Let’s consider working on a topic-branch while having it checked out (HEAD
) and the target branch is develop
.
git log --no-decorate --pretty=format: --abbrev-commit \ --no-merges --first-parent --name-only develop..
The output is not yet totally useful. First of all it contains empty lines but also deleted files. And it contains all files, when the change log was edited it will be also listed next to the php files interested for linting. And as this can span easily over multiple commits, files can be duplicated.
The answer to that are filters in the shell.
Streams all the Way Down
The first filter is to reduce the list already by file extension. Here is a filter w/ sed
letting only pass .php
and .phtml
files:
sed -n '/\.ph\(p\|tml\)$/p'
Next is to remove the duplicates (if any), sort
is suitable here:
sort -u
Last but not least only existing files (not the deleted ones) must be passed to PHP for linting as otherwise it would rightfully error out.
I had to crack a bit on this one as in my mindset there is a lot of find
when it is about finding files, but this time there is no find. What I came up with is ls
as it does no output (just errors) when a file (or directory) does not exists. With xargs
it is easy to provide multiple arguments to ls at once so that there is not so much to spawn.
xargs ls -f1 -- 2>/dev/null
This ensures to only list existing files, one per line, and the error output goes to /dev/null
. So the full filter first removes any files of unfitting type (sed
), any duplicates (sort
) and non-existing files (xargs
, ls
):
sed -n '/\.ph\(p\|tml\)$/p' \ | sort -u | xargs ls -f1 -- 2>/dev/null
This then only needs to get wired up to the existing parallel lint command which is xargs
to do the parallelism, php
for linting and grep
to detect any error (see as well the older blogpost).
But before taking a look at the command as a whole, as this is not so much about having it run far away like on Travis CI last time, let’s consider that not all changes are yet staged nor committed. So next to listing the files changed since branching from develop
, there are also all the files currently dealing with:
git diff --cached --name-only git diff --name-only
To wrap this all up a shell compound-list in curly brackets is handy to execute it in process.
{ git log --no-decorate --pretty=format: --abbrev-commit --no-merges --first-parent --name-only develop.. git diff --cached --name-only git diff --name-only } | sed -n '/\.ph\(p\|tml\)$/p' \ | sort -u | xargs ls -f1 -- 2>/dev/null \ | 2>/dev/null xargs -n1 -P8 php -n -d short_open_tag=0 -l \ | grep -v '^No syntax errors detected'
Exit status 1
means success, any other status a lint failure (requires set +e
as in the earlier post).
Regardless how large the overall code-base is, this keeps linting speed fast. Especially for the local build.