Azure HDInsight Kafka best practice

This practice is trying to use HDInsight Kafka to save to the streaming data into Azure DW. Even though Kafka has been added into the HDInsight of Azure, but there is not too much resources can be referred except you have good knowledge on Apache Kafka before.  

The practice will use two environment which are not the default in Windows system: JAVA JDK 7 or 8 (this can be download from Oracle website) and Maven (Apache Maven project, this can be downloaded from here: http://maven.apache.org/

  1. In windows OS -> Environment Variables, Add JAVA_HOME system variable to the JDK installation folder, for example: c:\projram Files\Java\jdk1.8...
  2. In Environment variables, add JAVA_HME BIN folder to the Path variable, for example: %JAVA_HAME%\bin;....
  3. In Environment variables, add Maven bin folder into the Path as well, for example: c:\maven\bin;...
  4. Download the project from here: https://github.com/Azure-Samples/hdinsight-kafka-java-get-started. This sample looks like to use Kafka as producer and consumer...
  5. In windows command line to build the sample: 
    • cd producer-Consumer
    • mvn clean package
  6. In Azure, create VNET, this VNET will be used to hose Kafka cluster or other clusters (Storm or Spark in the future)
  7. In Azure, create a Storage Account, this storage account will be used to save Kafka stream data. (Download and install MS Storage Explorer will be big help on access the Storage Account)
  8. In Azure, create Kafka HDInsight cluster, specify the storage account in step 7) and VNET in step 6)
  9. In Azure, open Cloud Shell (Use Bash, as the sample created by Apache and Java)
  10. Copy the compiled java file into Kafka cluster. (couldn't duplicate the command line which mentioned in the sample, but WinSCP can be downloaded, it is much better than command line). Login WinSCP by using sshuser@kafkaclustername-ssh.azurehdinsight.net, copy the jar file to kafka sshuser folder;
  11. In Azure Cloud shell: use the SSH to connect to the cluster: ssh sshuser@@kafkaclustername-ssh.azurehdinsight.net
    1. sudo apt -y install jq (use lower case). 
    2. Run the step 7 in the project webpage (those two statement have been verified)
    3. Run the step 8 in the project webpage (those two statement have been verified) to verify the kafkahosts and kafkabrokers
  12. Create the 'Test' topic: 
    /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 8 --topic test --zookeeper $KAFKAZKHOSTS

Add comment

Loading