- mapreduce編程實(shí)例

#e#
?　5、多表關(guān)聯(lián)

　　多表關(guān)聯(lián)和單表關(guān)聯(lián)類似，它也是通過對原始數(shù)據(jù)進(jìn)行一定的處理，從其中挖掘出關(guān)心的信息。下面進(jìn)入這個(gè)實(shí)例。

　　5.1 實(shí)例描述

　　輸入是兩個(gè)文件，一個(gè)代表工廠表，包含工廠名列和地址編號(hào)列；另一個(gè)代表地址表，包含地址名列和地址編號(hào)列。要求從輸入數(shù)據(jù)中找出工廠名和地址名的對應(yīng)關(guān)系，輸出“工廠名——地址名”表。

　　樣例輸入如下所示。

　　1）factory：

　　factoryname 　　　　addressed

　　Beijing Red Star 　　　　1

　　Shenzhen Thunder 　　　　3

　　Guangzhou Honda 　　　　2

　　Beijing Rising 　　　　1

　　Guangzhou Development Bank 2

　　Tencent 　　　　　　　　3

　　Back of Beijing 　　　　 1

　　2）address：

　　addressID addressname

　　1 　　　　Beijing

　　2 　　　　Guangzhou

　　3 　　　　Shenzhen

　　4 　　　　Xian

　　樣例輸出如下所示。

　　factoryname 　　　　addressname

　　Back of Beijing 　　　　 Beijing

　　Beijing Red Star 　　　　Beijing

　　Beijing Rising 　　　　　 Beijing

　　Guangzhou Development Bank Guangzhou

　　Guangzhou Honda 　　　　Guangzhou

　　Shenzhen Thunder 　　　　Shenzhen

　　Tencent 　　　　　　　　Shenzhen

　　5.2 設(shè)計(jì)思路

　　多表關(guān)聯(lián)和單表關(guān)聯(lián)相似，都類似于數(shù)據(jù)庫中的自然連接。相比單表關(guān)聯(lián)，多表關(guān)聯(lián)的左右表和連接列更加清楚。所以可以采用和單表關(guān)聯(lián)的相同的處理方式，map識(shí)別出輸入的行屬于哪個(gè)表之后，對其進(jìn)行分割，將連接的列值保存在key中，另一列和左右表標(biāo)識(shí)保存在value中，然后輸出。reduce拿到連接結(jié)果之后，解析value內(nèi)容，根據(jù)標(biāo)志將左右表內(nèi)容分開存放，然后求笛卡爾積，最后直接輸出。

　　這個(gè)實(shí)例的具體分析參考單表關(guān)聯(lián)實(shí)例。下面給出代碼。

　　5.3 程序代碼

　　程序代碼如下所示：

　　package com.hebut.mr;

　　import java.io.IOException;

　　import java.util.*;

　　import org.apache.hadoop.conf.Configuration;

　　import org.apache.hadoop.fs.Path;

　　import org.apache.hadoop.io.IntWritable;

　　import org.apache.hadoop.io.Text;

　　import org.apache.hadoop.mapreduce.Job;

　　import org.apache.hadoop.mapreduce.Mapper;

　　import org.apache.hadoop.mapreduce.Reducer;

　　import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

　　import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

　　import org.apache.hadoop.util.GenericOptionsParser;

　　public class MTjoin {

　　public static int time = 0;

　　* 在map中先區(qū)分輸入行屬于左表還是右表，然后對兩列值進(jìn)行分割，

　　* 保存連接列在key值，剩余列和左右表標(biāo)志在value中，最后輸出

　　public static class Map extends Mapper《Object， Text， Text， Text》 {

　　// 實(shí)現(xiàn)map函數(shù)

　　public void map（Object key， Text value， Context context）

　　throws IOException， InterruptedException {

　　String line = value.toString（）;// 每行文件

　　String relationtype = new String（）;// 左右表標(biāo)識(shí)

　　// 輸入文件首行，不處理

　　if （line.contains（“factoryname”） == true

　　|| line.contains（“addressed”） == true） {

　　return;

　　}

　　// 輸入的一行預(yù)處理文本

　　StringTokenizer itr = new StringTokenizer（line）;

　　String mapkey = new String（）;

　　String mapvalue = new String（）;

　　int i = 0;

　　while （itr.hasMoreTokens（）） {

　　// 先讀取一個(gè)單詞

　　String token = itr.nextToken（）;

　　// 判斷該地址ID就把存到“values［0］”

　　if （token.charAt（0）》= ‘0’ && token.charAt（0）《= ‘9’） {

　　mapkey = token;

　　if （i 》 0） {

　　relationtype = “1”;

　　} else {

　　relationtype = “2”;

　　}

　　continue;

　　}

　　// 存工廠名

　　mapvalue += token + “ ”;

　　i++;

　　}

　　// 輸出左右表

　　context.write（new Text（mapkey）， new Text（relationtype + “+”+ mapvalue））;

　　}

　　* reduce解析map輸出，將value中數(shù)據(jù)按照左右表分別保存，

　　* 然后求出笛卡爾積，并輸出。

　　public static class Reduce extends Reducer《Text， Text， Text， Text》 {

　　// 實(shí)現(xiàn)reduce函數(shù)

　　public void reduce（Text key， Iterable《Text》 values， Context context）

　　throws IOException， InterruptedException {

　　// 輸出表頭

　　if （0 == time） {

　　context.write（new Text（“factoryname”）， new Text（“addressname”））;

　　time++;

　　}

　　int factorynum = 0;

　　String［］ factory = new String［10］;

　　int addressnum = 0;

　　String［］ address = new String［10］;

　　Iterator ite = values.iterator（）;

　　while （ite.hasNext（）） {

　　String record = ite.next（）.toString（）;

　　int len = record.length（）;

　　int i = 2;

　　if （0 == len） {

　　continue;

　　}

　　// 取得左右表標(biāo)識(shí)

　　char relationtype = record.charAt（0）;

　　// 左表

　　if （‘1’ == relationtype） {

　　factory［factorynum］ = record.substring（i）;

　　factorynum++;

　　}

　　// 右表

　　if （‘2’ == relationtype） {

　　address［addressnum］ = record.substring（i）;

　　addressnum++;

　　}

　　// 求笛卡爾積

　　if （0 ！= factorynum && 0 ！= addressnum） {

　　for （int m = 0; m 《 factorynum; m++） {

　　for （int n = 0; n 《 addressnum; n++） {

　　// 輸出結(jié)果

　　context.write（new Text（factory［m］），

　　new Text（address［n］））;

　　}

　　public static void main（String［］ args） throws Exception {

　　Configuration conf = new Configuration（）;

　　// 這句話很關(guān)鍵

　　conf.set（“mapred.job.tracker”， “192.168.1.2:9001”）;

　　String［］ ioArgs = new String［］ { “MTjoin_in”， “MTjoin_out” };

　　String［］ otherArgs = new GenericOptionsParser（conf， ioArgs）.getRemainingArgs（）;

　　if （otherArgs.length ！= 2） {

　　System.err.println（“Usage： Multiple Table Join 《in》《out》”）;

　　System.exit（2）;

　　}

　　Job job = new Job（conf， “Multiple Table Join”）;

　　job.setJarByClass（MTjoin.class）;

　　// 設(shè)置Map和Reduce處理類

　　job.setMapperClass（Map.class）;

　　job.setReducerClass（Reduce.class）;

　　// 設(shè)置輸出類型

　　job.setOutputKeyClass（Text.class）;

　　job.setOutputValueClass（Text.class）;

　　// 設(shè)置輸入和輸出目錄

　　FileInputFormat.addInputPath（job， new Path（otherArgs［0］））;

　　FileOutputFormat.setOutputPath（job， new Path（otherArgs［1］））;

　　System.exit（job.waitForCompletion（true）？ 0 ： 1）;

　　}

　　5.4 代碼結(jié)果

　　1）準(zhǔn)備測試數(shù)據(jù)

　　通過Eclipse下面的“DFS Locations”在“/user/hadoop”目錄下創(chuàng)建輸入文件“MTjoin_in”文件夾（備注：“MTjoin_out”不需要?jiǎng)?chuàng)建。）如圖5.4-1所示，已經(jīng)成功創(chuàng)建。

　? mapreduce編程實(shí)例

　　然后在本地建立兩個(gè)txt文件，通過Eclipse上傳到“/user/hadoop/MTjoin_in”文件夾中，兩個(gè)txt文件的內(nèi)容如“實(shí)例描述”那兩個(gè)文件一樣。成功上傳之后，從SecureCRT遠(yuǎn)處查看“Master.Hadoop”的也能證實(shí)我們上傳的兩個(gè)文件。

　　2）查看運(yùn)行結(jié)果

　　這時(shí)我們右擊Eclipse 的“DFS Locations”中“/user/hadoop”文件夾進(jìn)行刷新，這時(shí)會(huì)發(fā)現(xiàn)多出一個(gè)“MTjoin_out”文件夾，且里面有3個(gè)文件，然后打開雙其“part-r-00000”文件，會(huì)在Eclipse中間把內(nèi)容顯示出來。

　　6、倒排索引

　　“倒排索引”是文檔檢索系統(tǒng)中最常用的數(shù)據(jù)結(jié)構(gòu)，被廣泛地應(yīng)用于全文搜索引擎。它主要是用來存儲(chǔ)某個(gè)單詞（或詞組）在一個(gè)文檔或一組文檔中的存儲(chǔ)位置的映射，即提供了一種根據(jù)內(nèi)容來查找文檔的方式。由于不是根據(jù)文檔來確定文檔所包含的內(nèi)容，而是進(jìn)行相反的操作，因而稱為倒排索引（Inverted Index）。

閱讀全文

上一頁 1 2 3 45全文

MapReduce(6251) MapReduce(6251)

評(píng)論

相關(guān)推薦

21個(gè)三菱PLC編程實(shí)例

21個(gè)三菱PLC編程實(shí)例，實(shí)現(xiàn)PLC從入門到精通

2022-10-25 09:27:04

7064

51的c編程實(shí)例

51的c編程實(shí)例

2012-08-11 16:46:29

MapReduce實(shí)例開發(fā)指南

MapReduce實(shí)例——wordcount（單詞統(tǒng)計(jì)）

2019-10-08 07:15:48

MapReduce數(shù)據(jù)壓縮的基本原則

黑猴子的家：MapReduce數(shù)據(jù)壓縮

2019-05-24 12:45:46

MapReduce框架音樂排行榜案例

Hadoop綜合實(shí)戰(zhàn)之MapReduce運(yùn)算優(yōu)化——音樂排行榜

2019-10-16 12:20:15

MapReduce的三種運(yùn)行模式

第二章關(guān)于MapReduce

2019-03-26 06:32:50

MapReduce的操作案例分析

一、MapReduce概述1、基本概念Hadoop核心組件之一：分布式計(jì)算的方案MapReduce，是一種編程模型，用于大規(guī)模數(shù)據(jù)集的并行運(yùn)算，其中Map（映射）和Reduce（歸約

2021-01-05 17:01:44

MapReduce綜述

的午餐11.2 串行與并行編程21.3 并行基本概念22 MapReduce基本原理介紹52.1 計(jì)算單詞數(shù)WordCount 62.2 類型72.3 其它實(shí)例73 MapReduce實(shí)現(xiàn)83.1

2010-09-18 08:31:59

編程實(shí)例供大家參考

搜集的一些編程實(shí)例供大家參考

2012-04-20 13:17:59

編程實(shí)例說明

編程實(shí)例說明初學(xué)者的資料

2013-05-27 20:53:56

LABVIEW編程實(shí)例

LABVIEW編程實(shí)例！?。?！

2013-12-21 18:58:37

MaxCompute MapReduce

摘要：大數(shù)據(jù)計(jì)算服務(wù)(MaxCompute)的功能詳解和使用心得點(diǎn)此查看原文：http://click.aliyun.com/m/41384/前言MapReduce已經(jīng)有文檔，用戶可以參考文檔

2018-01-31 17:08:45

PLC編程實(shí)例

PLC編程實(shí)例

2012-08-20 19:28:00

TLC1549的實(shí)例應(yīng)用及編程有哪些？

什么是TLC1549？TLC1549的工作原理是什么？TLC1549的實(shí)例應(yīng)用及編程有哪些？

2021-04-22 07:19:13

Yarn的偽分布部署步驟及MapReduce簡單使用

偽分布式部署yarn和MapReduce案例

2019-03-05 16:01:15

labview編程實(shí)例

編程實(shí)例，可供參考

2015-05-17 16:31:51

中斷編程實(shí)例

各種中斷編程實(shí)例，看完中斷編程毫無壓力……

2014-05-20 21:52:04

值得一看的MapReduce編程實(shí)例

MapReduce編程實(shí)例

2019-03-05 16:55:22

請問有基于USART IAP在線應(yīng)用編程的實(shí)例嗎？

有沒有基于USARTIAP在線應(yīng)用編程的實(shí)例啊.

2019-05-16 04:08:01

VISA編程及應(yīng)用實(shí)例

VISA編程及應(yīng)用實(shí)例 1、VISA編程概要在VISA編程過程中，面向儀器的所有操作都必須首先進(jìn)行打開VISA資源

2009-06-22 12:25:05

5862

C語言與MATLAB接口編程與實(shí)例

本書以簡潔的語言、豐富的實(shí)例系統(tǒng)地介紹了C語言與 MATLAB 接口函數(shù)(稱之為：C-MEX函數(shù))的編程方法。用實(shí)例詳細(xì)地介紹了MATLAB中所提供的數(shù)據(jù)類型在C-MEX函數(shù)中的編程方法。這些數(shù)據(jù)類

2011-08-08 11:23:00

abb_plc_500編程軟件使用實(shí)例

abb_plc_500編程軟件使用實(shí)例abb_plc_500編程軟件使用實(shí)例abb_plc_500編程軟件使用實(shí)例

2015-11-12 14:36:41

單片機(jī)C語言編程與實(shí)例

單片機(jī)C語言編程與實(shí)例 學(xué)習(xí)單片機(jī)開發(fā)非常不錯(cuò)的資料。

2016-01-11 14:50:21

單片機(jī)編程實(shí)例大全

受錄了多種實(shí)例的單片機(jī)編程，非常實(shí)用，與大家分享。

2016-03-23 17:06:41

PLC的原理、編程與應(yīng)用（實(shí)例）

PLC的原理、編程與應(yīng)用（實(shí)例）希望對大家有幫助

2016-08-03 18:20:02

Linux網(wǎng)絡(luò)編程實(shí)例詳解

網(wǎng)絡(luò)通訊教程學(xué)習(xí)之Linux網(wǎng)絡(luò)編程實(shí)例詳解

2016-09-01 14:55:49