「"stream processing"」を検索 - はてなブックマーク

41 - 80 件 / 87件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

"stream processing"の検索結果41 - 80 件 / 87件

The Future of Data Engineering
- 4 users
- cnr.sh
- テクノロジー
- 2019/09/15
The Future of Data Engineering Chris Riccomini on July 29, 2019 I have been thinking lately about where we’ve come in data engineering over the past few years, and about what the future holds for work in this area. Most of this thought has been framed in the context of what some of our teams are doing at WePay, but I believe the framework below applies more broadly, and is worth sharing. I present
- data
- development
- blog
Data-Oriented Design
- 4 users
- www.dataorienteddesign.com
- テクノロジー
- 2020/04/30
Online release of Data-Oriented Design : This is the free, online, reduced version. Some inessential chapters are excluded from this version, but in the spirit of this being an education resource, the essentials are present for anyone wanting to learn about data-oriented design. Expect some odd formatting and some broken images and listings as this is auto generated and the Latex to html converter
- architecture
Our First Netflix Data Engineering Summit
- 4 users
- netflixtechblog.com
- テクノロジー
- 2023/12/15
IntroductionEarlier this summer Netflix held our first-ever Data Engineering Forum. Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community! You can find each of the talks below with a short descri
DataHub: A generalized metadata search & discovery tool
- 4 users
- www.linkedin.com
- テクノロジー
- 2019/08/16
Authored byMars Lan Co-Founder & CTO at Metaphor | Co-creator of DataHub August 14, 2019 Co-authors: Mars Lan, Seyi Adebajo, Shirshanka Das Editor’s note: Since publishing this blog post, the team open sourced DataHub in February 2020. You can read more on the journey of open sourcing the platform here. As the operator of the world’s largest professional network and the Economic Graph, LinkedIn’s
Data-Oriented Design
- 4 users
- dataorienteddesign.com
- テクノロジー
- 2022/07/17
Online release of Data-Oriented Design : This is the free, online, reduced version. Some inessential chapters are excluded from this version, but in the spirit of this being an education resource, the essentials are present for anyone wanting to learn about data-oriented design. Expect some odd formatting and some broken images and listings as this is auto generated and the Latex to html converter
Optimizing batch processing with custom checkpoints in AWS Lambda | Amazon Web Services
- 4 users
- aws.amazon.com
- テクノロジー
- 2020/12/16
AWS Compute Blog Optimizing batch processing with custom checkpoints in AWS Lambda AWS Lambda can process batches of messages from sources like Amazon Kinesis Data Streams or Amazon DynamoDB Streams. In normal operation, the processing function moves from one batch to the next to consume messages from the stream. However, when an error occurs in one of the items in the batch, this can result in re
- aws
How LinkedIn customizes Apache Kafka for 7 trillion messages per day
- 4 users
- www.linkedin.com
- テクノロジー
- 2019/10/10
Open Source How LinkedIn customizes Apache Kafka for 7 trillion messages per day Co-authors: Jon Lee and Wesley Wu Apache Kafka is a core part of our infrastructure at LinkedIn. It was originally developed in-house as a stream processing platform and was subsequently open sourced, with a large external adoption rate today. While many other companies and projects leverage Kafka, few—if any—do so at
- kafka
- architecture
GitHub - ArroyoSystems/arroyo: Distributed stream processing engine in Rust
- 4 users
- github.com/ArroyoSystems
- テクノロジー
- 2023/06/07
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Introducing Amazon Kinesis Data Analytics Studio – Quickly Interact with Streaming Data Using SQL, Python, or Scala | Amazon Web Services
- 4 users
- aws.amazon.com
- テクノロジー
- 2021/05/28
AWS News Blog Introducing Amazon Kinesis Data Analytics Studio – Quickly Interact with Streaming Data Using SQL, Python, or Scala The best way to get timely insights and react quickly to new information you receive from your business and your applications is to analyze streaming data. This is data that must usually be processed sequentially and incrementally on a record-by-record basis or over sli
- あとで読む
"the most popular OSS data projects"を眺めてみる（1位〜10位）
- 4 users
- zenn.dev/notrogue
- テクノロジー
- 2021/04/11
※具体的なアンケートの質問は不明？この記事 ↑の上位20製品について、簡単に調べてみました。私がよく知らない製品（Flyteとか）、みんな知っているだろう製品（Sparkとか）は記載薄めです。なお、私の知識は知っている Apache Airflow, Trino, Prefect, Apache Spark, Amundsen, Apache Flink, Apache Kafka,Apache Duid, pandas 名前だけ知っている dbt, Apache Pinot, Apache SuperSet, Great Expectations, Dask, Apache Arrow, Apache Gobblin 知らない Dagster, Flyte, RudderStack, Ray な感じです。目次 dbt Apache Airflow Apache Superset
- mlops
- oss
- python
- data
- 資料
- まとめ
- あとで読む
Hello, Redis Stack - Redis
- 4 users
- redis.com
- テクノロジー
- 2022/03/24
Today we’re thrilled to announce Redis Stack. Redis Stack consolidates the capabilities of the leading Redis modules into a single product, making it easy for developers to build modern, real-time applications with the speed and stability of Redis. Prologue At Redis, we’re building a real-time data layer to meet the universal demand for responsive, low-latency applications and services. To build a
- redis
ストリーム処理システムに求められる機能性、および Apache Flink におけるその対応
- 4 users
- soonraah.github.io
- テクノロジー
- 2021/02/12
はじめに#このポストではストリーム処理の survay 論文の話題に対して Apache Flink における例を挙げて紹介する。論文概要#Fragkoulis, M., Carbone, P., Kalavri, V., & Katsifodimos, A. (2020). A Survey on the Evolution of Stream Processing Systems. 2020年の論文。過去30年ぐらいのストリーム処理のフレームワークを調査し、その発展を論じている。ストリーム処理に特徴的に求められるいくつかの機能性 (functionality) についてその実現方法をいくつか挙げ、比較的古いフレームワークと最近のフレームワークでの対比を行っている。このポストのスコープ#このポストでは前述のストリーム処理システムに求められる機能性とそれがなぜ必要となるかについて簡
Scribe: Transporting petabytes per hour via a distributed, buffered queueing system
- 4 users
- engineering.fb.com
- テクノロジー
- 2019/10/08
Scribe: Transporting petabytes per hour via a distributed, buffered queueing system Our hardware infrastructure comprises millions of machines, all of which generate logs that we need to process, store, and serve. The total size of these logs is several petabytes every hour. The outputs are generally processed somewhere other than where they were generated: They can be relevant to a variety of dow
- logging
- facebook
Project Flogo
- 4 users
- www.flogo.io
- テクノロジー
- 2019/12/17
Project Flogo Ecosystem Scroll through the action elements to read more about what you can build on the core! Project Flogo is a resource efficient, Go-based open source ecosystem for building event-driven apps. Event-driven, you say? Yup, the notion of triggers and actions are leveraged to process incoming events. An action, a common interface, exposes key capabilities such as application integra
Spring Batch on Kubernetes: Efficient batch processing at scale
- 4 users
- spring.io
- テクノロジー
- 2021/01/28
Spring Batch on Kubernetes: Efficient batch processing at scale Introduction Batch processing has been a challenging area of computer science since its inception in the early days of punch cards and magnetic tapes. Nowadays, the modern cloud computing era comes with a whole new set of challenges for how to develop and operate batch workload efficiently in a cloud environment. In this blog post, I
- tech
GitHub - gazette/core: Build platforms that flexibly mix SQL, batch, and stream processing paradigms
- 4 users
- github.com/gazette
- テクノロジー
- 2022/07/16
Gazette makes it easy to build platforms that flexibly mix SQL, batch, and millisecond-latency streaming processing paradigms. It enables teams, applications, and analysts to work from a common catalog of data in the way that's most convenient to them. Gazette's core abstraction is a "journal" -- a streaming append log that's represented using regular files in a BLOB store (i.e., S3). The magic of
- golang
Dataflow の仕組み: Dataflow の手法について | Google Cloud 公式ブログ
- 3 users
- cloud.google.com
- テクノロジー
- 2020/09/02
※この投稿は米国時間 2020 年 8 月 22 日に、Google Cloud blog に投稿されたものの抄訳です。編集者注: 本記事は Dataflow の開発に至った Google 内部の歴史と、Google Cloud サービスとしての Dataflow の機能、市場における他社製品との比較対照について掘り下げる 3 回シリーズのブログの第 2 回です。第 1 回の記事をご参照ください。Dataflow の仕組み: 誕生秘話本シリーズの第 1 回では、Google 内での Dataflow 開発の背景について取り上げ、ラムダアーキテクチャとの比較について解説しました。今回は Dataflow を動かす主要なシステムのいくつかについて、もう少し詳しく見ていきましょう。第 1 回で述べたように、Dataflow にはそれまでのシステムのために構築した数多くのテクノロジーが活用さ
Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing
- 3 users
- eng.uber.com
- テクノロジー
- 2020/01/24
Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing At Uber, we use robust data processing systems such as Apache Flink and Apache Spark to power the streaming applications that helps us calculate up-to-date pricing, enhance driver dispatching, and fight fraud on our platform. Such solutions can process data at a massive scale in real time with exactly-once semantics,
Fluent Bitを導入しました：ローカル実行・確認方法と、導入の過程でハマったこと - Uzabase for Engineers
- 3 users
- tech.uzabase.com
- テクノロジー
- 2022/05/16
AlphaDrive、NewsPicks兼務でエンジニアしている大場です。最近はNewsPicks Webの新基盤開発を行っています。新基盤はNext.jsで開発していてAWSのFargateで構築しているのですが、このFargate上で取得したログをS3、New Relicに送るためにFluent Bitを導入しました。今回はローカルでの実行・確認方法と、導入の過程で問題になったことを紹介します！ Fluent Bit とはローカル実行・確認方法イメージの選択設定ファイルの準備デバッグ用の設定を追加する動作確認 ltsv形式のログを展開する Stream Processorを使うその他の設定について Fluent Bitで導入の過程でハマったこと S3 プラグインでgzip圧縮時に Content-Encoding: gzip が固定 S3オブジェクト内のデータを正確に
Pub/Sub によりこれまで以上にアクセスしやすくなったスケーラブルなリアルタイム分析 | Google Cloud 公式ブログ
- 3 users
- cloud.google.com
- テクノロジー
- 2020/12/17
※この投稿は米国時間 2020 年 12 月 8 日に、Google Cloud blog に投稿されたものの抄訳です。近頃はリアルタイム分析がビジネスに欠かせなくなっています。最新のデータに基づくリアルタイムの自動意思決定は、もはや高度なテクノロジーファーストの企業だけのものではありません。それは、ビジネスを行うための基本的な方法になりつつあります。IDC によれば、作成されるデータの 4 分の 1 以上は、今後 5 年でリアルタイムのデータになります。この増加を促進していると思われる要因は、サービスとユーザーエクスペリエンスの品質向上という競争圧力です。もう一つの要因は、従来のさまざまなビジネスのコンシューマライゼーションです。以前はエージェントによって行われていた多くの機能が消費者自身によって行われるようになりました。現在、銀行、小売業者、サービスプロバイダはそれぞれ、内部ア
Download free O'Reilly books · GitHub
- 3 users
- gist.github.com/pavel-popov
- テクノロジー
- 2019/09/13
books.md From theme: Programming Microservices for Java Developers: A Hands-On Introduction to Frameworks and Containers http://www.oreilly.com/programming/free/files/microservices-for-java-developers.pdf http://www.oreilly.com/programming/free/files/microservices-for-java-developers.epub http://www.oreilly.com/programming/free/files/microservices-for-java-developers.mobi Modern Java EE Design Pat
- pdf
- book
- free
- github
- books
- あとで読む
Lessons Learned: The Journey to Real-Time Machine Learning at Instacart
- 3 users
- tech.instacart.com
- テクノロジー
- 2022/09/22
Figure 1: How ML models support shopping journey at InstacartInstacart incorporates machine learning extensively to improve the quality of experience for all actors in our “four-sided marketplace” — customers who place orders on Instacart apps to get deliveries in as fast as 30 minutes, shoppers who can go online at anytime to fulfill customer orders, retailers that sell their products and can mak
Announcing Message DB: Event Store and Message Store for PostgreSQL
- 3 users
- blog.eventide-project.org
- テクノロジー
- 2019/12/17
The Eventide Project team is excited to announce Message DB: A fully-featured event store and message store implemented in PostgreSQL for pub/sub, event sourcing, and evented microservices applications. For more specifics, visit Message DB on GitHub: https://github.com/message-db/message-db Message DB was distilled from the Eventide Project to make it easier for users to write clients in the langu
Data Engineer: Interview Questions
- 3 users
- www.ejable.com
- テクノロジー
- 2024/03/06
Here is a list of common data engineering interview questions, with answers, which you may encounter for an interview as a data engineer. The questions during an interview for a data engineer aim to check not only the grasp of data systems and architectures but also a keen understanding of your technical prowess and problem-solving skills. This article lists essential interview questions and answe
New AWS Lambda controls for stream processing and asynchronous invocations | Amazon Web Services
- 3 users
- aws.amazon.com
- テクノロジー
- 2019/11/28
AWS Compute Blog New AWS Lambda controls for stream processing and asynchronous invocations Today AWS Lambda is introducing new controls for asynchronous and stream processing invocations. These new features allow you to customize responses to Lambda function errors and build more resilient event-driven and stream-processing applications. Stream processing function invocations When processing data
Change Data Capture for Microservices
- 3 users
- www.infoq.com
- テクノロジー
- 2023/10/05
Transcript Morling: Welcome to this talk about Change Data Capture for microservices. Let me set the scene a little bit with a maybe blunt statement and an observation. The world around us, this is happening in real time. People buy stuff in an online store, maybe they do some payment transactions. Maybe you have machinery or IoT devices, which send over measurements or all kinds of sensor data. N
- techfeed
SE Radio 393: Jay Kreps on Enterprise Integration Architecture with a Kafka Event Log – Software Engineering Radio
- 3 users
- se-radio.net
- テクノロジー
- 2019/12/19
SE Radio 393: Jay Kreps on Enterprise Integration Architecture with a Kafka Event Log Jay Kreps, CEO of Confluent discusses an enterprise integration architecture organized around an event log. Robert Blumen spoke with Jay about the N-squared problem of data integration; how LinkedIn tried and failed to solve the integration problem; the nature of events; the enterprise event schema; schema defin
- Kafka
Machine Learning Design Patterns - higepon blog
- 3 users
- higepon.hatenablog.com
- テクノロジー
- 2021/12/19
Scaling Min-max & clipping は一様分布に良い Z-score は正規分布に良い。 input data によっては non-linear な変換の方が適切。例えば Wikipedia page views。これは正直意識してなかった。この視点で圧力コンペのデータでやってみた（02-01-scaling.ipynb) Categorical 入力が array of categorical である場合は考えたこともなかった。dummy と one hot encoding の違いを理解した。 Design Pattern 1: Hashed Feature Kaggle では経験のないパターン。新しい ID や cold start にも対応できるのが良い。学習データにはない空港が建設された場合どうするか。というのはわかりやすい例だった。感覚的には hash が衝
Real-time machine learning: challenges and solutions
- 3 users
- huyenchip.com
- テクノロジー
- 2022/01/03
[Twitter discussion, LinkedIn] Updates Jan 3, 2023: Update the online features section to differentiate between real-time features and near real-time features. If you’re interested in this topic, my book Designing Machine Learning Systems (O’Reilly, June 2022) covers online prediction and continual learning in much more detail. Real-time machine learning is the approach of using real-time data to
The Day of a new Command-Line Interface: Shell
- 3 users
- arcan-fe.com
- テクノロジー
- 2022/04/04
This article continues the long-lost series on how to migrate away from terminal protocols as the main building block for command-line and text-dominant user interfaces. The previous ones (Chasing the dream of a terminal-free CLI (frustration/idea, 2016) and Dawn of a new Command-Line Interface (design, 2017)) might be worth an extra read afterwards, but they are not prerequisites to understanding
Going Reactive with Spring, Coroutines and Kotlin Flow
- 3 users
- spring.io
- テクノロジー
- 2020/02/05
Going Reactive with Spring, Coroutines and Kotlin Flow Since we announced Spring Framework official support for Kotlin in January 2017, a lot of things happened. Kotlin was announced as an official Android development language at Google I/O 2017, we continued to improve the Kotlin support across Spring portfolio and Kotlin itself has continued to evolve with key new features like coroutines. I wou
- あとで読む
Data engineering at Meta: High-Level Overview of the internal tech stack
- 3 users
- medium.com/@AnalyticsAtMeta
- テクノロジー
- 2023/10/13
Data engineering at Meta: High-Level Overview of the internal tech stack This article provides an overview of the internal tech stack that we use on a daily basis as data engineers at Meta. The idea is to shed some light on the work we do, and how the tools and frameworks contribute to making our day-to-day data engineering work more efficient, and to share some of the design decisions and technic
Lessons learned from combining SQS and Lambda in a data project - Solita Data
- 3 users
- data.solita.fi
- テクノロジー
- 2021/02/09
In June 2018, AWS Lambda added Amazon Simple Queue Service (SQS) to supported event sources, removing a lot of heavy lifting of running a polling service or creating extra SQS to SNS mappings. In a recent project we utilized this functionality and configured our data pipelines to use AWS Lambda functions for processing the incoming data items and SQS queues for buffering them. The built-in functio
- aws
- あとで読む
GitHub - infinyon/fluvio: Lean and mean distributed stream processing system written in rust and web assembly.
- 3 users
- github.com/infinyon
- テクノロジー
- 2021/07/11
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Decoding protobuf messages using AWS Lambda | Amazon Web Services
- 3 users
- aws.amazon.com
- テクノロジー
- 2022/03/08
AWS Compute Blog Decoding protobuf messages using AWS Lambda This post is written by Ennio Pastore, Data Lab Architect. Protobuf is short for protocol buffers, which are language- and platform-neutral mechanisms for serializing structured data. Compared to XML or JSON the size of the messages is smaller, so the network transfer is faster, reducing latency in the interactions between applications.
Rapid Event Notification System at Netflix
- 3 users
- netflixtechblog.com
- エンタメ
- 2022/02/19
By: Ankush Gulati, David Gevorkyan Additional credits: Michael Clark, Gokhan Ozer IntroNetflix has more than 220 million active members who perform a variety of actions throughout each session, ranging from renaming a profile to watching a title. Reacting to these actions in near real-time to keep the experience consistent across devices is critical for ensuring an optimal member experience. This
データレイク関連の OSS - Delta Lake, Apache Hudi, Apache Kudu
- 3 users
- soonraah.github.io
- テクノロジー
- 2021/07/30
はじめに#前回のポストではデータレイクとはどういうものかというのを調べた。今回はデータレイクの文脈でどのような OSS が注目されているのかを見ていきたい。以下は NTT データさんによる講演資料であり、その中で「近年登場してきた、リアルタイム分析に利用可能なOSSストレージレイヤソフト」というのが3つ挙げられている。 Delta LakeApache HudiApache Kuduこれらはすべて論理的なストレージレイヤーを担う。こちらの講演資料に付け足すようなこともないかもしれないが、このポストではデータレイクという文脈から自分で調べて理解した内容をまとめるということを目的にする。当然 Hadoop, Hive, Spark 等もデータレイクの文脈において超重要だが、「データレイク」という言葉がよく聞かれるようになる前から普及していたのでこのポストでは触れないことにする。 Del
GitHub - puresec/sas-top-10: Serverless Architectures Security Top 10 Guide
- 3 users
- github.com/puresec
- テクノロジー
- 2019/10/30
The Ten Most Critical Risks for Serverless Applications v1.0 Preface The “Serverless architectures Security Top 10” document is meant to serve as a security awareness and education guide. The document is curated and maintained by top industry practitioners and security researchers with vast experience in application security, cloud and serverless architectures. As many organizations are still expl
- security
- あとで読む
RabbitMQ vs Kafka: Which Platform Should You Choose in 2023?
- 3 users
- eranstiller.com
- テクノロジー
- 2020/10/28
Have you ever found yourself standing at a crossroads, trying to decide between RabbitMQ vs Kafka for your Microservices-based system? Have you ever wondered which of these messaging platforms is most suitable for your use case? RabbitMQ and Apache Kafka are well-known solutions in the asynchronous messaging domain, but despite popular belief, they aren’t one-size-fits-all solutions. As a software
Structured Streaming Programming Guide - Spark 3.5.1 Documentation
- 3 users
- spark.apache.org
- テクノロジー
- 2020/07/20
Structured Streaming Programming Guide Overview Quick Example Programming Model Basic Concepts Handling Event-time and Late Data Fault Tolerance Semantics API using Datasets and DataFrames Creating streaming DataFrames and streaming Datasets Input Sources Schema inference and partition of streaming DataFrames/Datasets Operations on streaming DataFrames/Datasets Basic Operations - Selection, Projec