CDH 5.4 から導入された、Sensitive Data Redaction (機密データのマスキング) 機能を紹介します。 できること Hadoopクラスタのログファイル、Hive/Impalaクエリに含まれる任意の機密データのマスキングが可能です。 必要なもの CDH 5.4 / Cloudera Manager 5.4 手順 Cloudera Managerにログインし、HDFSサービスを選択します。 2. HDFSの設定画面で、「redaction」で検索します。 3. デフォルトでは「クレジットカード情報」、「社会保障番号」、「ホスト名」、「メールアドレス」のマスキングテンプレートが用意されています。カスタムのマスキングを定義することも可能です。ここではクレジットカード情報をマスキングします。 4. 設定画面内で、マスキングがどのように動作するのか、テストすることができます。
Are you still building data pipelines with Java and Python? Are you curious about the current buzz in the Big Data community surrounding Scala as a data processing environment? In this talk I'll discuss how Spotify migrated its music recommendations pipeline from Python to Scala. I'll dive into the language specific features that make Scala the ideal candidate for big data processing as well as hi
Open Sourcing Cubert: A High Performance Computation Engine for Complex Big Data Analytics Authors: Maneesh Varshney, Srinivas Vemuri What do you do when your Hadoop ETL script is mercilessly killed because it is hogging too many resources on the cluster, or if it starts missing completion deadlines by hours? We encountered this exact same problem more than a year ago while building the computatio
Calling our Presto community speakers – we want to hear from you! Fill out out community call for papers to speak at upcoming meetups and conferences. What is Presto?Presto is an open source SQL query engine that’s fast, reliable, and efficient at scale. Use Presto to run interactive/ad hoc queries at sub-second performance for your high volume apps.
The ongoing progress in Artificial Intelligence is constantly expanding the realms of possibility, revolutionizing industries and societies on a global scale. The release of LLMs surged by 136% in 2023 compared to 2022, and this upward trend is projected to continue in 2024. Today, 44% of organizations are experimenting with generative AI, with 10% having […] Read blog post
I visited Cloudera Friday for, among other things, a chat about Impala with Marcel Kornacker and colleagues. Highlights included: Impala is meant to someday be a competitive MPP (Massively Parallel Processing) analytic RDBMS. At the moment, it is not one. For example, Impala lacks any meaningful form of workload management or query optimization. While Impala will run against any HDFS (Hadoop Distr
Documentation Download Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools.
Riding the wave of the generative AI revolution, third party large language model (LLM) services like ChatGPT and Bard have swiftly emerged as the talk of the town, converting AI skeptics to evangelists and transforming the way we interact with technology. For proof of this megatrend look no further than the instant success of ChatGPT, […] Read blog post
Enterprises see embracing AI as a strategic imperative that will enable them to stay relevant in increasingly competitive markets. However, it remains difficult to quickly build these capabilities given the challenges with finding readily available talent and resources to get started rapidly on the AI journey. Cloudera recently signed a strategic collaboration agreement with Amazon […] Read blog p
Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.Twitter / Photos Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers. Twitter / Photos Hadoopアドベントカレンダー2012 #hadoopAC12jpの、6日目のエントリです。前回は、CDH4.1で導入されたネームノードHAの自動フェイルオーバーについて紹介しました。本エントリでは、自動フェイルオーバー時のフェンシング機能について紹介
Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.Twitter / Photos Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers. Twitter / Photos Hadoopアドベントカレンダー2012 #hadoopAC12jpの4日目のエントリとして、CDH4.1で導入された高可用性(HA:High Availability)ネームノードの自動フェイルオーバーについて紹介します。 Introduction C
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く