2018年4月20日 星期五

Hadoop on Mac (on Virtual Box)



(1) Download Virtual Box and Cloudera VM 
     VBsource: https://www.virtualbox.org/
     Cloudera VM: https://link.zhihu.com/?target=https%3A//downloads.cloudera.com/demo_vm/virtualbox/cloudera-quickstart-vm-5.4.2-0-virtualbox.zip
  • username: cloudera
  • password: cloudera
(2) Import
     Select from option: File-> Import Appliance (cloudera-quickstart-vm-5.4.2-0-virtualbox/cloudera-quickstart-vm-5.4.2-0-virtualbox.ovf)
(3) Command
- Command to Upload file to HDFS:
    

- Command to view file on HDFS:
    
- Command to check Hadoop support command


    
  and output:



 
- Command to delete the file in HDFS:
   
- Command to run WORDCOUNT procedure

   
    and output

- Command to export file in HDFS to local
  

(4) Wordcount result






















Other idea:
- Performance between Hadoop and Lucene?
Overall Lucene gets a better performance but need more pre-processing to implement inverted index
     inverted index: fro a given keyword, the inverted index shows which file it store.

Reference:
[1] Search Engineer project of UCR CS242: Information Retrieving
https://github.com/IHSIENHUANG/CS242-Information-Retrieval-Web-Search/blob/master/CS242%20Project%20Report%20Part%20B.pdf
[2] Cloudera VM
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quickstart_vm_administrative_information.html
[3] Hadoop on MAC ( setup on local)
https://macmetric.com/how-to-install-hadoop-on-mac/

沒有留言:

張貼留言