(1) Download Virtual Box and Cloudera VM
VBsource: https://www.virtualbox.org/
Cloudera VM: https://link.zhihu.com/?target=https%3A//downloads.cloudera.com/demo_vm/virtualbox/cloudera-quickstart-vm-5.4.2-0-virtualbox.zip
- username: cloudera
- password: cloudera
Select from option: File-> Import Appliance (cloudera-quickstart-vm-5.4.2-0-virtualbox/cloudera-quickstart-vm-5.4.2-0-virtualbox.ovf)
(3) Command
- Command to Upload file to HDFS:
- Command to view file on HDFS:
- Command to check Hadoop support command
and output:
- Command to delete the file in HDFS:
- Command to run WORDCOUNT procedure
and output
- Command to export file in HDFS to local
(4) Wordcount result
Other idea:
- Performance between Hadoop and Lucene?
Overall Lucene gets a better performance but need more pre-processing to implement inverted index
inverted index: fro a given keyword, the inverted index shows which file it store.
Reference:
[1] Search Engineer project of UCR CS242: Information Retrieving
https://github.com/IHSIENHUANG/CS242-Information-Retrieval-Web-Search/blob/master/CS242%20Project%20Report%20Part%20B.pdf
[2] Cloudera VM
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quickstart_vm_administrative_information.html
[3] Hadoop on MAC ( setup on local)
https://macmetric.com/how-to-install-hadoop-on-mac/


沒有留言:
張貼留言