Adding a timeout for your CI jobs at ci.centos.org

By | October 26, 2016

The typical workflow for most ci.centos.org ( cico ) jobs is :

* Call Duffy's API endpoint with node/get and grab some machines
* Setup the machines environment for the ci job to come
* Push content to nodes
* Run the tests
* Clear out / tear down
* Call Duffy's API end point with node/done to return the machines
* Report status via Jenkins

Machines handed out in this manner to the CI Jobs are available for upto 6 hours at a time, at which point they are reaped back into the available pool for other jobs to consume. What this also means is that if for any reason, the job gets stuck, it could be upto six hours before the developer/user gets any feedback about the tests failing.

The usual way to resolve this situation is to setup a timeout in the jenkins job. That would allow Jenkins to watch for the run, on timeout, kill the job and report failure. However, if your job is setup with a single build step that also includes requesting the machines and returning them when done, Jenkins killing the job will mean your machines wont get returned for upto 6 hrs. Given that most projects are setup with a quota of 10 deployed machines; not returning them when done, would mean your jobs get put into a queue that isnt clearing out in a rush.

One way to work around this would be to split the machine request and machine return functions into a pre-build and post-build step, and then pass over the session-id for the deployed machines via the build steps. That way, you could trap and report specific conditions. A varioation of this would be to setup conditional build steps, and have them execute different functions as needed.

An easy and simple workaround however, is to just wrap the test commands in a /usr/bin/timeout call. timeout is delivered as a binary from the coreutils package on CentOS Linux 7 and would be available on all machines, including the jenkins worker instances. Take a look at https://github.com/almighty/almighty-jobs/blob/master/devtools-ci-index.yaml#L64 for a quick example of how this would work in a JJB template. This way we can timeout on the job, and yet be able to return nodes or handle any other content we need, in the same ci job script. A script that then does not have or need any Jenkins specific content, making it possible to run from developer laptops or as child jobs on its own.

/usr/bin/timeout ( man 1 timeout ) also allows you to preserve the sub commands exit status, if you need to track and report different status from your ci jobs. And ofcourse, there are many other uses for /usr/bin/timeout as well!

– KB

Leave a Reply

Your email address will not be published. Required fields are marked *