並び順

ブックマーク数

期間指定

  • から
  • まで

1 - 28 件 / 28件

新着順 人気順

apache spark source code githubの検索結果1 - 28 件 / 28件

  • GitHub - modelcontextprotocol/servers: Model Context Protocol Servers

    Official integrations are maintained by companies building production ready MCP servers for their platforms. 21st.dev Magic - Create crafted UI components inspired by the best 21st.dev design engineers. 2slides - An MCP server that provides tools to convert content into slides/PPT/presentation or generate slides/PPT/presentation with user intention. ActionKit by Paragon - Connect to 130+ SaaS inte

      GitHub - modelcontextprotocol/servers: Model Context Protocol Servers
    • awesome-scalability

      The Patterns of Scalable, Reliable, and Performant Large-Scale Systems View the Project on GitHub View On GitHub An updated and organized reading list for illustrating the patterns of scalable, reliable, and performant large-scale systems. Concepts are explained in the articles of prominent engineers and credible references. Case studies are taken from battle-tested systems that serve millions to

      • Things we learned about LLMs in 2024

        31st December 2024 A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments. This is a sequel to my review of 2023. In this article: The GPT-4 barrier was comprehensively broken Some of those GPT-4 models run on my laptop LLM pri

          Things we learned about LLMs in 2024
        • Update for Apache Log4j2 Issue (CVE-2021-44228)

          AWS is aware of the recently disclosed issues relating to the open-source Apache “Log4j2" utility (CVE-2021-44228 and CVE-2021-45046). Responding to security issues such as this one shows the value of having multiple layers of defensive technologies, which is so important to maintaining the security of our customers’ data and workloads. We've taken this issue very seriously, and our world-class te

            Update for Apache Log4j2 Issue (CVE-2021-44228)
          • The inside story on Mountpoint for Amazon S3, a high-performance open source file client | Amazon Web Services

            AWS Storage Blog The inside story on Mountpoint for Amazon S3, a high-performance open source file client UPDATE (8/9/2023): Mountpoint for Amazon S3 is now generally available. For details, please read the What’s New post. Amazon S3 is the best place to build data lakes because of its durability, availability, scalability, and security. Hundreds of thousands of data lakes are built on S3, storing

              The inside story on Mountpoint for Amazon S3, a high-performance open source file client | Amazon Web Services
            • ベンダーロックインを考える - Qiita

              更新記録 2021/6/17 - 「Cloud 型ベンダーロックイン」「Cloud Native DB」について加筆 2021/6/16 - 「OSS」 について加筆 2021/6/14 - 「業界標準」について補足 はじめに IT企業 = ベンダーロックインの塊 プラットフォーマー = ベンダーロックインの塊 残念ですが、その視点の方は、多くいらっしゃいます。ソフトウェア自身が期待していたほど正しく動作しなかった、もっと言うと枯れていなかった時代には、それしか選択肢が無かったかもしれません。 IT業界は Dog Year だと言われて久しいわけですが、Cloud 全盛の今。ベンダーの儲けどころは大きく変わっています。ベンダーロックインは「囲い込み戦略」であり、その負の部分の方が大きい事をベンダーは知っています。 定義 ベンダーロックインの定義を Wikipedia から拾ってみます。 W

                ベンダーロックインを考える - Qiita
              • Introduction to Zig

                Welcome Welcome! This is the initial page for the “Open Access” HTML version of the book “Introduction to Zig: a project-based book”, written by Pedro Duarte Faria. This is an open book that provides an introduction to the Zig programming language, which is a new general-purpose, and low-level language for building robust and optimal software. Support the project! If you like this project, and you

                • Netflix System Design- Backend Architecture

                  Cover Photo by Alexander Shatov on Unsplash Netflix accounts for about 15% of the world's internet bandwidth traffic, serving over 6 billion hours of content per month to nearly every country in the world. Building a robust, highly scalable, reliable, and efficient backend system is no small engineering feat, but the ambitious team at Netflix has proven that problems exist to be solved. This artic

                    Netflix System Design- Backend Architecture
                  • Tech Solvency: The Story So Far: CVE-2021-44228 (Log4Shell log4j vulnerability).

                    Log4Shell log4j vulnerability (CVE-2021-44228 / CVE-2021-45046) - cheat-sheet reference guide Last updated: $Date: 2022/02/08 23:26:16 $ UTC - best effort, validate all for your environment/model before use, unofficial sources may be wrong by @TychoTithonus (Royce Williams), standing on the shoulders of many giants Send updates or suggestions (please include category / context / public (or support

                    • Why We Use Julia, 10 Years Later

                      Exactly ten years ago today, we published "Why We Created Julia", introducing the Julia project to the world. At this point, we have moved well past the ambitious goals set out in the original blog post. Julia is now used by hundreds of thousands of people. It is taught at hundreds of universities and entire companies are being formed that build their software stacks on Julia. From personalized me

                        Why We Use Julia, 10 Years Later
                      • Databases in 2021: A Year in Review

                        It was a wild year for the database industry, with newcomers overtaking the old guard, vendors fighting over benchmark numbers, and eye-popping funding rounds. We also had to say goodbye to some of our database friends through acquisitions, bankruptcies, or retractions. As the end of the year draws near, it’s worth reflecting and taking stock as we move into 2022. Here are some of the highlights a

                          Databases in 2021: A Year in Review
                        • Databricks記事のまとめページ(その1) - Qiita

                          Databricksイベント Databricksセミナー・ハンズオンまとめページ Databricks Data + AI Summit 2024バーチャルセッションのご紹介 Databricks年次イベント「DATA + AI WORLD TOUR JAPAN 2022」のご案内 DATA + AIサミット2022のご案内 Data + AIサミットで何が起こるのか:オープンソース、テクニカルキーノートなどなど! Data + AIサミット2021で発表されたDatabricksの新機能 Data + AIサミットで発表された重要ニューストップ10 Data & AI Summit 2022におけるDatabricksレイクハウスプラットフォーム発表の振り返り Data & AIサミットにおけるDatabricks SQLのハイライト JEDAI勉強会 第2回: エンドツーエンド・レコ

                            Databricks記事のまとめページ(その1) - Qiita
                          • Applied-ML Papers

                            Curated papers, articles, and blogs on machine learning in production. Designing your ML system? Learn how other organizations did it. Star Table of Contents Data QualityData EngineeringData DiscoveryFeature StoresClassificationRegressionForecastingRecommendationSearch & RankingEmbeddingsNatural Language ProcessingSequence ModellingComputer VisionReinforcement LearningAnomaly DetectionGraphOptimiz

                              Applied-ML Papers
                            • Apache Airflow : 10 rules to make it work ( scale ) | Towards Data Science

                              Airflow is by default very permissive and without strict rules you are likely to create a chaotic code base that is impossible to scale and administrate. if you are not careful your shortcuts will cost you a lot afterwards Airflow permissive approach will let you schedule any custom code (jobs) but you will create a spaghetti stack if you do not follow very strict SEPARATION OF CONCERN design betw

                                Apache Airflow : 10 rules to make it work ( scale ) | Towards Data Science
                              • Speeding up Rust semver-checking by over 2000x

                                This post describes work in progress: how cargo-semver-checks will benefit from the upcoming query optimization API in the Trustfall query engine. Read on to learn how a modern linter works under the hood, and how ideas from the world of databases can improve its performance. Today, cargo-semver-checks is good enough to prevent real-world semver violations, and fast enough to earn a spot in the re

                                  Speeding up Rust semver-checking by over 2000x
                                • Don’t call it a comeback: Why Java is still champ

                                  No matter what ranking system you look at, whether the TIOBE Index, the Popularity of Programming Language Index, RedMonk’s bi-annual language rankings, or GitHub’s yearly State of the Octoverse, Java has been sitting among the top three languages since shortly after its launch in 1995. To listen to the general scuttlebutt of the developer crowd over time, however, you might think that Java was in

                                    Don’t call it a comeback: Why Java is still champ
                                  • Introduction - PyO3 user guide

                                    Press ← or → to navigate between chapters Press S or / to search in the book Press ? to show this help Press Esc to hide this help The PyO3 user guide Welcome to the PyO3 user guide! This book is a companion to PyO3's API docs. It contains examples and documentation to explain all of PyO3's use cases in detail. The rough order of material in this user guide is as follows: Getting started Wrapping

                                    • Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless | Amazon Web Services

                                      AWS Big Data Blog Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. A solution to this problem is to use AWS Database Migration Service (AWS DMS) for migrating hi

                                        Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless | Amazon Web Services
                                      • Dive deep into AWS Glue 4.0 for Apache Spark | Amazon Web Services

                                        AWS Big Data Blog Dive deep into AWS Glue 4.0 for Apache Spark Jul 2023: This post was reviewed and updated with Glue 4.0 support in AWS Glue Studio notebook and interactive sessions. Deriving insight from data is hard. It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data

                                          Dive deep into AWS Glue 4.0 for Apache Spark | Amazon Web Services
                                        • Code versioning using AWS Glue Studio and GitHub | Amazon Web Services

                                          AWS Big Data Blog Code versioning using AWS Glue Studio and GitHub AWS Glue now offers integration with Git, an open-source version control system widely used across the developer community. Thanks to this integration, you can incorporate your existing DevOps practices on AWS Glue jobs. AWS Glue is a serverless data integration service that helps you create jobs based on Apache Spark or Python to

                                            Code versioning using AWS Glue Studio and GitHub | Amazon Web Services
                                          • Argo Workflows - The workflow engine for Kubernetes

                                            Home Home Getting Started User Guide Operator Manual Developer Guide Roadmap Blog ⧉ Slack ⧉ Twitter ⧉ LinkedIn ⧉ Home What is Argo Workflows?¶ Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition). Define workflows where each step is a container. Model multi-ste

                                            • Track Awesome List Updates Daily

                                              Track Awesome List Updates DailyWe track over 500 awesome list updates, and you can also subscribe to daily or weekly updates via RSS or News Letter. This repo is generated by trackawesomelist-source, visit it Online or with Github. 📅 Weekly · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 Github · 🌐 Website · 📝 07/29 · ✅ 07/29 Table of Contents Recently Updated Top 50 Awesome List All Tr

                                                Track Awesome List Updates Daily
                                              • Migrating data from Google BigQuery to Amazon S3 using AWS Glue custom connectors | Amazon Web Services

                                                AWS Big Data Blog Migrating data from Google BigQuery to Amazon S3 using AWS Glue custom connectors July, 2022: This post was reviewed and updated to include a mew data point on the effective runtime with the latest version, explaining Glue 3,0 and autoscaling. October, 2024: In Glue 4.0 we have introduced a native and managed connector for Google BigQuery. You can follow the instruction in the bl

                                                  Migrating data from Google BigQuery to Amazon S3 using AWS Glue custom connectors | Amazon Web Services
                                                • Rill | The Open Table Format Revolution: Why Hyperscalers Are Betting on Managed Iceberg

                                                  Wondering why open table formats are suddenly booming? Why is AWS investing heavily in making Iceberg tables on S3, and why did Databricks pay a reported $2B to acquire Tabular? The answers might change how we think about data architecture. Historically, object storage like Amazon S3 or R2 was used as inexpensive, scalable storage for unstructured files, while structured data typically went to dat

                                                    Rill | The Open Table Format Revolution: Why Hyperscalers Are Betting on Managed Iceberg
                                                  • The Easiest Way to Find CVEs at the Moment? GitHub Dorks!

                                                    In this article, I will demonstrate how I used GitHub dorks to find 24 vulnerabilities in popular open-source projects in just a few weeks while only spending time in the evenings and the weekends (see https://github.com/dub-flow/vulnerability-research for information on all my CVEs). Before starting this journey, I had already found one CVE: A stored XSS vulnerability in Apache Spark. Around last

                                                      The Easiest Way to Find CVEs at the Moment? GitHub Dorks!
                                                    • Azure Updates (2022.10.13 / Microsoft Ignite 2022)

                                                      Ignite関連のアップデート他いろいろ。 Ignite関連記事 公式文書といえばこれ。Microsoft Ignite 2022 Book of News How Microsoft Azure helps drive agility and optimization for your business Microsoft Ignite: A showcase of products to help customers be more efficient and productive 5 cybersecurity capabilities announced at Microsoft Ignite 2022 What’s new in Azure Network Security at Microsoft Ignite 2022 Modernize with Microsoft Clo

                                                        Azure Updates (2022.10.13 / Microsoft Ignite 2022)
                                                      • A non-beginner Data Engineering Roadmap — 2025 Edition

                                                        Me after years using python.Before starting this post, I want to acknowledge that soft and hard skills are equally important. Data people exist to deliver business value, or more broadly read facts from a pool of ever-growing data. But, even with a bunch of posts talking about soft skills, at the end of the day, we're being paid for the technical skills we have, and the ability we have to deliver

                                                          A non-beginner Data Engineering Roadmap — 2025 Edition
                                                        • 分散処理OSSへのコントリビューション in 2023 - おくみん公式ブログ

                                                          Contributions to Apache Hive 2023年に取り組んだ分散処理OSSに対する貢献のまとめです。今年はApache Hiveのコミュニティが活性化したのでHiveやTezに対する貢献が多めです。 この記事は『Distributed computing (Apache Spark, Hadoop, Kafka, ...)のカレンダー | Advent Calendar 2023 - Qiita』24日目として執筆しました。若干遅れて申し訳ございません。 データ不整合の解消 ネストしたCTEをマテリアライズするとデータが消失する問題 LIMIT OFFSET Pushdownのバグ修正 パフォーマンス改善 Auto Reduce Parallelismの改善 Fair Routingの開発 ジェネリックなAM or TaskレベルのフックをTezに追加 UDTFの出力に

                                                            分散処理OSSへのコントリビューション in 2023 - おくみん公式ブログ
                                                          1