  • 機械学習関連の入門書籍まとめ

    こんにちは、GMOアドマーケティングのS.Sです。 今回は機械学習について少し知りたいと思ったときに、参考になりそうな書籍をご紹介します。 ここで紹介する書籍の多くは、目次やドラフトが著者や出版社のご厚意で閲覧できるようになっているので、中身を確認してから購入することができます。 機械学習全般 いざ機械学習を知ろうと思っても必要となる事前知識やカバーされるトピックの範囲が広すぎて、いきなり分厚い鈍器のような本をcover-to-coverで読むのはしんどい気がするので、全体像をつかむ初めの一冊によさそうなのがはじパタと言われています。 はじめてのパターン認識 平井 有三 https://www.morikita.co.jp/books/book/2235 はじパタなどで概観をつかんだら、次は主要な機械学習手法についてもう少し詳しく学んで見たくなった場合に参考になる書籍を紹介します。 次の3

    • Happy New Year: GPT in 500 lines of SQL - EXPLAIN EXTENDED

      Translations: Russian This year, the talk of the town was AI and how it can do everything for you. I like it when someone or something does everything for me. To this end, I decided to ask ChatGPT to write my New Year's post: "Hey ChatGPT. Can you implement a large language model in SQL?" "No, SQL is not suitable for implementing large language models. SQL is a language for managing and querying d

      • The security of customer-chosen banking PINs

        A birthday present every eleven wallets? The security of customer-chosen banking PINs Joseph Bonneau, Sören Preibusch, Ross Anderson Computer Laboratory University of Cambridge {jcb82,sdp36,rja14}@cl.cam.ac.uk Abstract. We provide the first published estimates of the difficulty of guessing a human-chosen 4-digit PIN. We begin with two large sets of 4-digit sequences chosen outside banking for onl

        • 「まったく新しいコンピュータ」を作ってるらしいけど、どんな感じ?Extropic AI社に聞いてみた

          「まったく新しいコンピュータ」を作ってるらしいけど、どんな感じ?Extropic AI社に聞いてみた2024.08.07 19:0018,521 山田ちとら Thermo AI Hardwareチップのプロトタイプ。超伝導のアルミ製基板上にジョセフソン接合が配置されている。Image: Extropic AI 毎年当たり前のようにスペックが上がってきているスマートフォンやパソコン。 しかし、これらのデジタル機器を支えている根幹技術は、もはやこれ以上効率化できない、限界に近づいてきているとも言われています。 ならばチャンス!とばかりに、これまでとは根本的に構造の異なるコンピュータシステムを開発しているスタートアップ企業がいると聞いて、衝撃を受けました。その名はExtropic AI(エクストロピックAI)。 彼らが開発している熱力学ベースのコンピュータシステム「Thermo AI Hardw

          • How Transformers work in deep learning and NLP: an intuitive introduction | AI Summer

            The famous paper “Attention is all you need” in 2017 changed the way we were thinking about attention. With enough data, matrix multiplications, linear layers, and layer normalization we can perform state-of-the-art-machine-translation. Nonetheless, 2020 was definitely the year of transformers! From natural language now they are into computer vision tasks. How did we go from attention to self-atte

            • Dataflowr - Deep Learning DIY

              Deep Learning Do It Yourself! This site collects resources to learn Deep Learning in the form of Modules available through the sidebar on the left. As a student, you can walk through the modules at your own pace and interact with others thanks to the associated Discord server. You don’t need any special hardware or software. Practical deep learning course The main goal of the course is to allow st

              • Building a large-scale distributed storage system based on Raft

                Guest post by Edward Huang, Co-founder & CTO of PingCAP In recent years, building a large-scale distributed storage system has become a hot topic. Distributed consensus algorithms like Paxos and Raft are the focus of many technical articles. But those articles tend to be introductory, describing the basics of the algorithm and log replication. They seldom cover how to build a large-scale distribut

                • Naïve Bayes for Machine Learning – From Zero to Hero

                  Before I dive into the topic, let us ask a question – what is machine learning all about and why has it suddenly become a buzzword? Machine learning fundamentally is the “art of prediction”. It is all about predicting the future, based on the past. The reason it is a buzzword is actually not about data, technology, computing power or any of that stuff. It’s just about human psychology! Yes, we hum

                  • Regression and Other Stories

                    Regression and Other Stories Andrew Gelman, Jennifer Hill, Aki Vehtari Page updated: 2024-07-30 Home page for the book Regression and Other Stories by Andrew Gelman, Jennifer Hill, and Aki Vehtari, including the code and data for the examples. Published by Cambridge University Press in 2020. © Copyright by Andrew Gelman, Jennifer Hill, and Aki Vehtari 2020. Back cover text: Many textbooks on regre

                    • A Graduate Course in Applied Cryptography

                      Part I: Secret key cryptography 1: Introduction 2: Encryption 3: Stream ciphers 4: Block ciphers 5: Chosen plaintext attacks 6: Message integrity 7: Message integrity from universal hashing 8: Message integrity from collision resistant hashing 9: Authenticated encryption Part II: Public key cryptography 10: Public key tools 11: Public key encryption 12: Chosen ciphertext secure public-key encrypti

                      • Annotated history of modern AI and deep neural networks

                        For a while, DanNet enjoyed a monopoly. From 2011 to 2012 it won every contest it entered, winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] In particular, at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest. DanNet was also the first deep CNN to win

                        • The Cloud Conundrum: S3 Encryption

                          River Landscape with Cows, 1645/1650 by Aelbert Cuyp👋 Dear reader: Hope you’re staying safe, and going strong with your new year resolutions. This is first part of a series of posts I wish to write on peculiar cloud security challenges. In this post, I will cover: Encryption at rest in cloud Amazon S3 and its encryption options How cloud’s server side encryption can give a false sense of security

                          • How machine learning powers Facebook’s News Feed ranking algorithm

                            How machine learning powers Facebook’s News Feed ranking algorithm Designing a personalized ranking system for more than 2 billion people (all with different interests) and a plethora of content to select from presents significant, complex challenges. This is something we tackle every day with News Feed ranking. Without machine learning (ML), people’s News Feeds could be flooded with content they

                            • 因果推論記事読み方のススメ 〜因果推論の2つのフレームワーク〜 - Qiita

                              はじめに 因果推論について学び始めて数ヶ月、私が最もハマった部分である、「因果推論の2つのフレームワーク」についての解説記事です。因果推論の手法や理論についての素晴らしい解説記事は多数存在するのであまり詳細には説明していません。 この記事ではどのような視点で因果推論についての記事を読むと良いかについての知見を提供できればと思っています。 ※著者の私見や勘違いが含まれている可能性もあるので、ご了承ください。 そもそも因果推論とは? Wikipediaでは、(統計的)因果推論は以下のように書かれています。 統計的因果推論(Causal inference in statistics)とは、実験データや観察データから得られた不完全な情報をもとに、事象の因果効果を統計的に推定していくことである ここでの「不完全な情報」とは、おそらく「治療を行った患者にもし治療を行わなかった時の効果」や「治療を行わ

                              • How to generate text: using different decoding methods for language generation with Transformers

                                How to generate text: using different decoding methods for language generation with Transformers Note: Edited on July 2023 with up-to-date references and examples. Introduction In recent years, there has been an increasing interest in open-ended language generation thanks to the rise of large transformer-based language models trained on millions of webpages, including OpenAI's ChatGPT and Meta's L

                                • Mathematics for the adventurous self-learner | Neil Sainsbury

                                  For over six years now, I've been studying mathematics on my own in my spare time - working my way through books, exercises, and online courses. In this post I'll share what books and resources I've worked through and recommend and also tips for anyone who wants to go on a similar adventure. Self-studying mathematics is hard - it's an emotional journey as much as an intellectual one and it's the k

                                  • Generalizing Automatic Differentiation to Automatic Sparsity, Uncertainty, Stability, and Parallelism - Stochastic Lifestyle

                                    Automatic differentiation is a “compiler trick” whereby a code that calculates f(x) is transformed into a code that calculates f'(x). This trick and its two forms, forward and reverse mode automatic differentiation, have become the pervasive backbone behind all of the machine learning libraries. If you ask what PyTorch or Flux.jl is doing that’s special, the answer is really that it’s doing automa

                                    • Understanding Convolutions on Graphs

                                      Many systems and interactions - social networks, molecules, organizations, citations, physical models, transactions - can be represented quite naturally as graphs. How can we reason about and make predictions within these systems? One idea is to look at tools that have worked well in other domains: neural networks have shown immense predictive power in a variety of learning tasks. However, neural

                                      • Understanding Deep Learning (Still) Requires Rethinking Generalization – Communications of the ACM

                                        CACM Web Account Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. Sign In Sign Up Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small gen

                                        • Trademarks in Open Source

                                          Trademarks in Open Source Introduction Cases Unmanaged Trademarks: Naked Licensing FreecycleSunnyvale v. Freecycle Network Discussion Common Law Trademarks Planetary Motion, Inc. v. Techsplosion, Inc. Discussion Fair Use Defense to Trademark Infringement: Nominative Use Playboy Enters. v. Welles Discussion License Terms’ Bearing on Trademark Use MIT Discussion BSD-3-Clause Discussion PHP-3.0 Discu

                                          • What Putin Fears Most | Journal of Democracy

                                            Forget his excuses. Russia’s autocrat doesn’t worry about NATO. What terrifies him is the prospect of a flourishing Ukrainian democracy. 22 February 2022 By Robert Person and Michael McFaul Russia’s invasion of Ukraine has begun. Russian president Vladimir Putin wants you to believe that it’s NATO’s fault. He frequently has claimed (including again in an address to the nation as this invasion comm

                                            • 日本語で読める統計学史に関する書籍(途中:日本の統計学史に関するものは省いています.) - Tarotanのブログ

                                              取り急ぎ,日本語で読める統計学史に関する書籍を,いくつか紹介します. 統計学史の専門家が書いた専門書だけではなく,軽めの啓蒙書や伝記も含めます. 書籍だけを挙げることにして,論文は取り上げません. 翻訳書や廃版になっている書籍も取り上げます. 現在(1月31日段階)では,日本の統計学史に関する書記は挙げていません. 統計学史をはじめとして史学の正式な教育を私は受けておらず,趣味の範囲を超えないので,全然,網羅しきれていないと思われます.また,以下で取り上げるのは,日本語で書かれた書籍のみです. 皆さんのお勧めの書籍があれば,Twitterアカウントの@BluesNoNoまで教えてください. 統計学自体もあまり知らない方向けの軽めの読み物 Salsburg, D.S.[著], 竹内惠行・熊谷悦生[翻訳](2006:翻訳書)『統計学を拓いた異才たち』日本経済新聞出版(2010年に文庫化,原題

                                              • What We’ve Learned From A Year of Building with LLMs – Applied LLMs

                                                A practical guide to building successful LLM products, covering the tactical, operational, and strategic. Also published on O’Reilly Media in three parts: Tactical, Operational, Strategic (podcast). Also translated to Japanese (by Kazuya Kanno) It’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. And they’

                                                • PyTorch Tutorial: How to Develop Deep Learning Models with Python - MachineLearningMastery.com

                                                  Predictive modeling with deep learning is a skill that modern developers need to know. PyTorch is the premier open-source deep learning framework developed and maintained by Facebook. At its core, PyTorch is a mathematical library that allows you to perform efficient computation and automatic differentiation on graph-based models. Achieving this directly is challenging, although thankfully, the mo

                                                  • China and Russia's domestic insanity - 4th letter from the FSB analyst

                                                    All #FSBletters translated as of August 14th, 2022 - Chronological Order - Look Inside Before reading these #FSBletters from the #WindofChange, please watch/listen to the following audio for the origin & context of these le... My translation of the 4th letter in the series from an active FSB analyst to Vladimir Osechkin. Written March 9th.  As consequential as the 1st translated letter. Buckle up

                                                    • AI Timelines via Cumulative Optimization Power: Less Long, More Short — LessWrong

                                                      The general trend is clear: larger lifetime compute enables systems of greater generality and capability. Generality and performance are both independently expensive, as an efficient general system often ends up requiring combinations of many specialist subnetworks. BNNs and ANNs both implement effective approximations of bayesian learning[29]. Net training compute then measures the total intra-li

                                                      • Growing Neural Cellular Automata

                                                        Growing models were trained to generate patterns, but don't know how to persist them. Some patterns explode, some decay, but some happen to be almost stable or even regenerate parts! [experiment 1] Persistent models are trained to make the pattern stay for a prolonged period of time. Interstingly, they often develop some regenerative capabilities without being explicitly instructed to do so [exper

                                                        • Committees Paper

                                                          Author's note 42 years after publication: Perhaps this paper's most remarkable feature is that it made it to publication with its thesis statement in the third-last paragraph. To save you the trouble of wading through 45 paragraphs to find the thesis, I'll give an informal version of it to you now: Any organization that designs a system (defined more broadly here than just information systems) wil

                                                          • 「確率思考の戦略論」がもやもやする方へ -NBDモデル編-

                                                            本記事の目的 この記事を読まれているということはきっと皆さんはこの本を読んだということでしょう。 データサイエンティストの方やデータアナリスト、機械学習エンジニアの方であれば、やはり「どんな仮説の下データ分析が行われたのか」という点は気になる点ですよね。その確率的構造が「市場構造の本質だ」と豪語されてしまっては、「ほうほう、それはどういったものか」とどうしても気になってしまいます。私自身、どうしても気になってしまったので読んでいたのですが、どうにも腑に落ちないというか、そんなもやもやが既に第一章において発生しました。このもやもやを解決するために、自分の中でかなり時間を使ったため、今後誰かが読む際の参考になればよいなと思って、ここに自分の思考の跡を辿ろうと思います。 第 1 章 4 節「市場構造の本質は全て同じ」 著者の主張の要約 さて、当初の第一の主張は「市場の構造の本質はすべて同じ」とい

                                                            • 『ファインマン物理学』の名講義のオーディオが公開されている - とね日記

                                                              理数系ネタ、パソコン、フランス語の話が中心。 量子テレポーテーションや超弦理論の理解を目指して勉強を続けています! リチャード・ファインマン、ファインマン物理学 20世紀でいちばん有名な物理学者は誰か?と聞かれれば、それはもちろんアインシュタインだ。しかし、20世紀でいちばん人気のあった物理学者は誰か?と聞かれれば、たいていの物理学徒は「リチャード・ファインマン!」と答えるだろう。ファインマン先生を知らない方は、ウィキペディアの紹介記事を読んでほしい。 ファインマン先生が1961年から1963年にカリフォルニア工科大学の1、2年生に対して行った2年間の名講義をもとに編纂された『ファインマン物理学』は、とてもユニークな教科書として知られ、日本語だけでなく各国語に翻訳されている。そして2013年暮れには英語版のニュー・ミレニアム・エディションがオンラインで無料公開された。(参考記事:「ファイン

                                                              • 新型肺炎の風評で日本はまた国益を失うのでしょうか

                                                                2020年2月25日、日本政府は「新型コロナウイルス感染症対策の基本方針」を発表し、感染防止に取り組んでいますが、その中で、中国人の入国を禁止しないことへの批判が根強く残っているようです。 ところが、実際に中国人の入国を禁止することで、どれだけ感染防止効果を期待できるのかについてはまったく定量的に示されていません。 そこで、米国・ジョンズ・ホプキンズ大学の[COVID-19特設website]のデータを基にその感染防止効果がどの程度のものなのか、概算してみたいと思います。 感染確認数の空間分布 次の表は、ジョンズ・ホプキンズ大学が2020年2月27日10:40pm発表の各地域における新型コロナウィルスの感染状況を示すデータです。 中国大陸における感染確認数の空間分布は、感染源とされている湖北省の武漢を中心に等次元状に拡がっています。 物質の拡散現象と同様に、武漢との距離が近い省ほど感染確認

                                                                • 名寄せの定量評価とGroup Sequential Test - Sansan Tech Blog

                                                                  こんにちは、技術本部Sansan Engineering UnitのNayoseグループでバックエンドエンジニアをしている上田です。 普段はデータの名寄せサービスを開発しています。Sansanの名寄せというのは、こちらのページに記載のとおり、別々のデータとして存在する同じ会社や人物のデータをひとまとめにグルーピングすることを言います。 下記の記事のとおり、前回は名寄せアルゴリズムを定量評価する際に利用する統計的仮説検定において、固定サンプルサイズ検定の課題を解決する逐次検定の手法SPRT(Sequential Probability Ratio Test、逐次確率比検定)を紹介しました。SPRTには別の課題があるため、今回は実務で重宝する特徴をもつGroup Sequential Testという逐次検定について紹介します。 buildersbox.corp-sansan.com この記事の

                                                                  • How Recent Google Updates Punish Good SEO: 50-Site Case Study - Zyppy SEO Consulting

                                                                    How Recent Google Updates Punish Good SEO: 50-Site Case Study SEOs need to rethink “over-optimization” Are recent Google updates now targeting SEO practices to demote informational sites that are “too optimized?” Using metrics provided by Ahrefs (thank you, Patrick Stox!) and collecting thousands of data points across impacted sites, I conducted a 50-site case study to look for answers. To begin w

                                                                    • Untitled/unsorted collection of math notes

                                                                      Untitled/unsorted collection of math notes Dennis Yurichev Untitled/unsorted collection of math notes Dennis Yurichev May 18, 2023 Contents 1 Unsorted parts 1 1.1 Fencepost error / off-by-one error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 GCD and LCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1

                                                                      • FermiNet: Quantum physics and chemistry from first principles

                                                                        Research FermiNet: Quantum physics and chemistry from first principles Published 22 August 2024 Authors David Pfau and James Spencer Using deep learning to solve fundamental problems in computational quantum chemistry and explore how matter interacts with light Note: This blog was first published on 19 October 2020. Following the publication of our breakthrough work on excited states in Science on

                                                                        • ZX Spectrum Raytracer - Gabriel Gambetta

                                                                          I love raytracers; in fact I’ve written half a book about them. Probably less known is my love for the ZX Spectrum, the 1982 home computer I grew up with, and which started my interest in graphics and programming. This machine is so ridiculously underpowered for today’s standards (and even for 1980s standards), the inevitable question is, to what extent could I port the Computer Graphics from Scra

                                                                          • Safer Usage Of C++

                                                                            Safer Usage Of C++ This document is PUBLIC. Chromium committers can comment on the original doc. If you want to comment but can’t, ping palmer@. Thanks for reading! Google-internal short link: go/safer-cpp Authors/Editors: adetaylor, palmer Contributors: ajgo, danakj, davidben, dcheng, dmitrig, enh, jannh, jdoerrie, joenotcharles, kcc, markbrand, mmoroz, mpdenton, pkasting, rsesek, tsepez, awhalle

                                                                            • A critical review of Marketing Mix Modeling — From hype to reality

                                                                              Context Most companies spend large chunks of their budget on marketing. Often, without knowing the return of that investment. Marketing Mix Modeling has been promoted as the one method to shed light on the effect of marketing. Not quite coincidentally, this is mainly supported by people that have a self-serving interest to advocate MMM. Opposing standpoints are few and far between. In this post, I

                                                                              • Reconsidering evidence of moral contagion in online social networks - Nature Human Behaviour

                                                                                The digitalization of society raises many substantive questions (see, for example, refs. 1,2,3). At the same time, however, it provides unmistakable methodological opportunities for social science research. For all of the interactions that take place online, such as communications between social media users, digital data traces are left behind. Not only do these data traces capture naturalistic be

                                                                                • How bad are search results? Let's compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT

                                                                                  Marginalia does relatively well by sometimes providing decent but not great answers and then providing no answers or very obviously irrelevant answers to the questions it can't answer, with a relatively low rate of scams, lower than any other search engine (although, for these queries, ChatGPT returns zero scams and Marginalia returns some). Interestingly, Mwmbl lets users directly edit search res