Purpose
Apache Chukwa is a system for large-scale reliable log collection and processing with Hadoop. Apache Chukwa design overview discusses the overall architecture of Apache Chukwa. You should read that document before this one. The purpose of this document is to help you install and configure Apache Chukwa.
Pre-requisites
Apache Chukwa should work on any POSIX platform, but GNU/Linux is the only production platform that has been tested extensively. Apache Chukwa has also been used successfully on Mac OS X, which several members of the Apache Chukwa team use for development.
Software requirements are Java 1.6 or better, ZooKeeper 3.4.5, HBase 1.2.0 and Hadoop 2.7.2.
Apache Chukwa cluster management scripts rely on ssh; these scripts, however, are not required if you have some alternate mechanism for starting and stopping daemons.
Installing Apache Chukwa
A minimal Apache Chukwa deployment has five components:
- A Apache Hadoop and Apache HBase cluster on which Apache Chukwa will process data (referred to as the Chukwa cluster).
- One or more agent processes, that send monitoring data to Apache HBase. The nodes with active agent processes are referred to as the monitored source nodes.
- Solr Cloud cluster which Apache Chukwa will store indexed log files.
- Data analytics script, summarize Hadoop Cluster Health.
- HICC, Apache Chukwa visualization tool.
First Steps
- Obtain a copy of Apache Chukwa. You can find the latest release on the Apache Chukwa release page (or alternatively check the source code out from SCM).
- Un-tar the release, via tar xzf.
- Make sure a copy of Apache Chukwa is available on each node being monitored.
- We refer to the directory containing Apache Chukwa as CHUKWA_HOME. It may be useful to set CHUKWA_HOME explicitly in your environment for ease of use.
Setting Up Apache Chukwa Cluster
Configure Hadoop and HBase
- Copy Apache Chukwa files to Hadoop and HBase directories:
cp $CHUKWA_HOME/etc/chukwa/hadoop-log4j.properties $HADOOP_CONF_DIR/log4j.properties cp $CHUKWA_HOME/etc/chukwa/hadoop-metrics2.properties $HADOOP_CONF_DIR/hadoop-metrics2.properties cp $CHUKWA_HOME/share/chukwa/chukwa-0.8.0-client.jar $HADOOP_HOME/share/hadoop/common/lib cp $CHUKWA_HOME/share/chukwa/lib/json-simple-1.1.jar $HADOOP_HOME/share/hadoop/common/lib cp $CHUKWA_HOME/etc/chukwa/hbase-log4j.properties $HBASE_CONF_DIR/log4j.properties cp $CHUKWA_HOME/etc/chukwa/hadoop-metrics2-hbase.properties $HBASE_CONF_DIR/hadoop-metrics2-hbase.properties cp $CHUKWA_HOME/share/chukwa/chukwa-0.8.0-client.jar $HBASE_HOME/lib cp $CHUKWA_HOME/share/chukwa/lib/json-simple-1.1.jar $HBASE_HOME/lib
- Restart your Hadoop Cluster. General Hadoop configuration is available at: Hadoop Configuration. N.B. You may see some additional logging messages at this stage which looks as if error(s) are present. These messages are showing up because the log4j socket appender writes to stderr for warn messages when it is unable to stream logs to a log4j socket server. If Apache Chukwa agent is started with socket adaptors prior to Hadoop and HBase, those messages will not show up. For the time being do not worry about these messages, they will disappear once Apache Chukwa agent is started with socket adaptors.
- Make sure HBase is started. General HBASE configuration is available at: HBase Configuration
- After Hadoop and HBase are started, run:
bin/hbase shell < $CHUKWA_HOME/etc/chukwa/hbase.schema
This procedure initializes the default Apache Chukwa HBase schema.
Configuring And Starting Apache Chukwa Agent
- Edit CHUKWA_HOME/etc/chukwa/chukwa-env.sh. Make sure that JAVA_HOME, HADOOP_CONF_DIR, and HBASE_CONF_DIR are set correctly.
- Edit CHUKWA_HOME/etc/chukwa/chukwa-agent-conf.xml. Make sure that solr.cloud.address are set correctly.
- In CHUKWA_HOME, run:
sbin/chukwa-daemon.sh start agent
Setup Solr to index Service log files
- Start Solr 5.5.0 with Apache Chukwa Solr configuration:
bin/solr start -cloud -z localhost:2181 ./bin/solr create_collection -c chukwa -n chukwa