Release Notes - Hadoop Chukwa - Version 0.5

Overall Status

This is the third public release of Chukwa, a log analysis framework on top of Hadoop and HBase. Chukwa has been tested at scale and used in some production settings, and is reasonably robust and well behaved. For instructions on setting up Chukwa, see the administration guide and the rest of the Chukwa documentation.

The collection components of Chukwa -- adaptors, agents, and collectors have been fairly aggressively tested, and can be counted on to perform properly and recover from failures.

The demux pipeline has been cleaned up somewhat, and is now documented. See the programming guide for a discussion of how to customize demux for your purposes.

HICC, the visualization component, is still "beta" quality. It's been used succesfully at multiple sites, but work is ongoing.

Important Changes Since Last Release

  • Chukwa can store data on HBase for improved random read/write performance.
  • New addition of SocketAdaptor for streaming Log4JSocketAppender traffic.
  • There have been a number of bug fixes and code cleanups since the last release; check the changelog and JIRA for details.


Chukwa relies on Java 1.6, and requires maven 3.0.3 to build. The back-end processing requires Hadoop, HBase 0.90.4+, and Pig 0.9.1+.

Known Limitations

  • HICC defaults to assuming data is UTC; if your machines run on local time, HICC graphs will not display properly until you change the HICC timezone. You can do this by clicking the small "gear" icon on the time selection tool.
  • As mentioned in the administration guide, the pig aggregation script requires external scheduling in cronjob or Jenkins.
  • Salsa Finite State Machine has not been ported forward to store data on HBase.
  • There is currently no down sampling script for data stored on HBase. Long term trending visualization might not work for large scale data.