タグ

2017年4月25日のブックマーク (1件)

  • Write to multiple outputs by key Spark - one Spark job

    How can you write to multiple outputs dependent on the key using Spark in a single Job. Related: Write to multiple outputs by key Scalding Hadoop, one MapReduce Job E.g. sc.makeRDD(Seq((1, "a"), (1, "b"), (2, "c"))) .writeAsMultiple(prefix, compressionCodecOption) would ensure cat prefix/1 is a b and cat prefix/2 would be c EDIT: I've recently added a new answer that includes full imports, pimp an

    Write to multiple outputs by key Spark - one Spark job
    kimutansk
    kimutansk 2017/04/25
    Sparkでジョブ実行時にKey(またはKeyHash)毎に出力ファイル分けるためにはMultiTextOutput系のフォーマット追加するか、SparkSQLで区切ってしまうか、ですかね。これをストリーム化すれば・・?