如何分析spark lac 停留最长的两个地方
短信预约 -IT技能 免费直播动态提醒
这篇文章将为大家详细讲解有关如何分析spark lac 停留最长的两个地方,文章内容质量较高,因此小编分享给大家做个参考,希望大家阅读完这篇文章后对相关知识有一定的了解。
package hgs.spark.othertestimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextobject FindTheTop2 { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("FindTheTop2").setMaster("local[3]") val sc = new SparkContext(conf) val rdd1 = sc.textFile("D:\\bs_log") //rdd_phone_lac_time:(18688888888 16030401EAFB68F1E3CDF819735E1C66,-20160327082400,1), (18611132889 16030401EAFB68F1E3CDF819735E1C66,-20160327082500,1) //先映射为上一行为例的map(K,V) val rdd_phone_lac_time = rdd1.map(x=>{ val list = x.split(",") if(Integer.parseInt(list(3))==1) (list(0)+" "+list(2),-list(1).toLong) else{ (list(0)+" "+list(2),list(1).toLong) } } ) //根据rdd_phone_lac_time 的key进行reduce,将所有key相同的数据相加 val rdd_reduce_phone_lackey = rdd_phone_lac_time.reduceByKey((x,y)=>x+y) //(18688888888,CompactBuffer((18688888888 16030401EAFB68F1E3CDF819735E1C66,87600), (18688888888 9F36407EAD0629FC166F14DDE7970F68,51200), (18688888888 CC0710CC94ECC657A8561DE549D940E0,1300))) //取top2,mapValues对values操作,返回的是map(K,V),K是原始的K,V是操作后得到的V val rdd_reduce_phone_lackey_groupyed = rdd_reduce_phone_lackey.groupBy(x=>x._1.split(" ")(0)) val rdd_top2 = rdd_reduce_phone_lackey_groupyed.mapValues(x=>{ x.toList.sortBy(_._2).reverse.take(2) }) //(16030401EAFB68F1E3CDF819735E1C66,(18688888888,16030401EAFB68F1E3CDF819735E1C66,87600)) //下面需要与另一个map根据特定的字段例如16030401EAFB68F1E3CDF819735E1C66进行join,所以需要将‘18688888888 16030401EAFB68F1E3CDF819735E1C66’拆开,将第二个作为K,返回新的map val rdd_result = rdd_top2.flatMap(x=>{ x._2.map(y=>{ val li = y._1.split(" ") (li(1),(li(0),li(1),y._2)) }) }) //该文件中即是需要与上面的结果进行join val lati_longti = sc.textFile("D:\\lac_info", 1) //(9F36407EAD0629FC166F14DDE7970F68,(116.304864,40.050645)) //映射成如上一行的map val rdd_coordinate = lati_longti.map(f=>{ val li = f.split(",") (li(0),(li(0),li(1),li(2))) }) //进行join //rdd_coordinate 与rdd_result的结构类型已改是一样的,即K,V的类型对应,否则无法join val join_resultWithcoordinate = rdd_coordinate.join(rdd_result) // rdd_coordinate.to //println(rdd_result.collect().length) //保存文件 join_resultWithcoordinate.saveAsTextFile("d:\\dest") sc.stop() }}
样例数据
D:\\bs_log
18688888888,20160327082400,16030401EAFB68F1E3CDF819735E1C66,118611132889,20160327082500,16030401EAFB68F1E3CDF819735E1C66,118688888888,20160327170000,16030401EAFB68F1E3CDF819735E1C66,018611132889,20160327075000,9F36407EAD0629FC166F14DDE7970F68,118688888888,20160327075100,9F36407EAD0629FC166F14DDE7970F68,118611132889,20160327081000,9F36407EAD0629FC166F14DDE7970F68,018688888888,20160327081300,9F36407EAD0629FC166F14DDE7970F68,018688888888,20160327175000,9F36407EAD0629FC166F14DDE7970F68,118611132889,20160327182000,9F36407EAD0629FC166F14DDE7970F68,118688888888,20160327220000,9F36407EAD0629FC166F14DDE7970F68,018611132889,20160327230000,9F36407EAD0629FC166F14DDE7970F68,018611132889,20160327180000,16030401EAFB68F1E3CDF819735E1C66,018611132889,20160327081100,CC0710CC94ECC657A8561DE549D940E0,118688888888,20160327081200,CC0710CC94ECC657A8561DE549D940E0,118688888888,20160327081900,CC0710CC94ECC657A8561DE549D940E0,018611132889,20160327082000,CC0710CC94ECC657A8561DE549D940E0,018688888888,20160327171000,CC0710CC94ECC657A8561DE549D940E0,118688888888,20160327171600,CC0710CC94ECC657A8561DE549D940E0,018611132889,20160327180500,CC0710CC94ECC657A8561DE549D940E0,118611132889,20160327181500,CC0710CC94ECC657A8561DE549D940E0,0D:\\lac_info 9F36407EAD0629FC166F14DDE7970F68,116.304864,40.050645,6CC0710CC94ECC657A8561DE549D940E0,116.303955,40.041935,616030401EAFB68F1E3CDF819735E1C66,116.296302,40.032296,6数据结果:(16030401EAFB68F1E3CDF819735E1C66,((16030401EAFB68F1E3CDF819735E1C66,116.296302,40.032296),(18688888888,16030401EAFB68F1E3CDF819735E1C66,87600)))(16030401EAFB68F1E3CDF819735E1C66,((16030401EAFB68F1E3CDF819735E1C66,116.296302,40.032296),(18611132889,16030401EAFB68F1E3CDF819735E1C66,97500)))(9F36407EAD0629FC166F14DDE7970F68,((9F36407EAD0629FC166F14DDE7970F68,116.304864,40.050645),(18688888888,9F36407EAD0629FC166F14DDE7970F68,51200)))(9F36407EAD0629FC166F14DDE7970F68,((9F36407EAD0629FC166F14DDE7970F68,116.304864,40.050645),(18611132889,9F36407EAD0629FC166F14DDE7970F68,54000)))
关于如何分析spark lac 停留最长的两个地方就分享到这里了,希望以上内容可以对大家有一定的帮助,可以学到更多知识。如果觉得文章不错,可以把它分享出去让更多的人看到。
免责声明:
① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。
② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341