What was the issue:
There was an invalid rpm ( eclipse-ecj ) in the x86_64/os tree. Premature EOF in rpm payload.
Report:
Early morning on the 7th James Hogarth, BSkyB Entertainment reported to the QA team that there was an issue with the eclipse-ecj package shipped in the CentOS-5.6/x86_64 os tree. Stephen Walsh and Manuel Wolfshant tracked the issue back to the primary seed machine and confirmed the issue was present in not only the os tree, but also on CD3 and DVD1 of the x86_64 distribution.
Reason for this issue:
The CentOS distribution is composed on the buildservices, but then transferred over to the distro-build machines where the installer is built. There are automated tests that run at both of these locations. The rpm content tests ( rpm -K ) for md5's as well as gpg key. The distro had passed this test at both locations. The output from the distro-build machines is the actual package tree, the installer code, isos for cd's and dvd. This is transferred using rsync to the staging machine from where we start the release process ( initially to QAMachines when in qa mode; or into the mirror.centos.org network when in release mode ). There are no tests done at the staging machine. Packages are transferred one at a time, rather than as a whole tree ( mostly for legacy reasons ). It seems that the transfer for eclipse-ecj did not complete ( driven by the fact that its OK on one side and not on the other ).
Impact:
We had to rebuild the torrents, DVD isos, CD isos and update the CentOS-5.6/x86_64 distribution. There was no impact to CentOS-5.6/i386.
What did we do to fix things:
To address this issue, we had to issue a new set of ISOs, and since the package content was changing, rebuild metadata. Which in turn needed a complete rebuild of the ISOS ( but not the install tree ). Over the course of the morning, Fabian and Manuel were able to test the new tree, and our automated tests ran through for the ISOs
There was also a lot of rollback work that needed to be done, including handling the torrent tracker, issuing new torrents, making sure the mirror network etc. Much of which is done manually; and the main reason things took almost 18 hrs to resolve.
Steps taken to ensure this problem does not happen again:
I've now moved a large number of tests to the staging machine as well, including the rpm tests. This adds an additional 3 hours to the process, but its a worthwhile safeguard.
I hope this helps clear up things. Also, md5sum and sha1sum for ISOs as well as torrent files are published along with the torrent files and available on all mirrors. They will also be mentioned in the actual CentOS-5.6 Release Announcement. Everyone should check to make sure they get the right ones.
If you get the new .torrent files and drop in into the same place as the older ones did, you should see most of your data be reused ( 30% on the DVD and 86% on the CD's ).