WordCount 示例编写
需求:在一堆给定的文本文件中统计输出每一个单词出现的总次数
数据格式准备如下:
hadoop,hive,hbase hive,storm hive,hbase,kafka spark,flume,kafka,storm hbase,hadoop,hbase hive,spark,storm
|
定义 main 方法
jobMain.javapackage com.qst.wordcount; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class JobMain extends Configured implements Tool { @Override public int run(String[] args) throws Exception { Job job = Job.getInstance(super.getConf(), "xxx"); job.setJarByClass(JobMain.class); job.setInputFormatClass(TextInputFormat.class); TextInputFormat.addInputPath(job,new Path("file:///F:\\input")); job.setMapperClass(WordCountMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setOutputFormatClass(TextOutputFormat.class); TextOutputFormat.setOutputPath(job,new Path("file:///F:\\output")); boolean b = job.waitForCompletion(true); return b?0:1; } public static void main(String[] args) throws Exception { Configuration configuration = new Configuration(); int run = ToolRunner.run(configuration, new JobMain(), args); System.exit(run); } }
|
提醒:本地运行完成之后,就可以打成 jar 包放到服务器上面去运行了,实际工作当中,都是将代码打成 jar 包,开发 main 方法作为程序的入口,然后放到集群上面去运行