サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
買ってよかったもの
www.benfrederickson.com
One of the cool new features in py-spy is the ability to profile native Python extensions written in languages like C, C++ or Cython. Almost all other Python profilers[1] only show program activity that is in pure Python code, and native code will instead show up as spending time in the line of Python that calls the native function. Using native profiling tools like perf can get you a sense of wha
One challenge that recommender systems face is in quickly generating a list of the best recommendations to show for the user. These days many libraries can quickly train models that can handle millions of users and millions of items, but the naive solution for evaluating these models involves ranking every single item for every single user which can be extremely expensive. As an example, my implic
I’ve been digging into GitHub data recently, and I thought it would be fun to use that data to figure out exactly where the world’s software developers live and then to visualize the results interactively using D3. In a previous post, I wrote about how an individual’s GitHub profile is a noisy and unreliable indicator of programming talent. For this post though I’m aggregating GitHub profiles toge
One of the things I’m working on right now is a project that’s aggregating data found in developers GitHub profiles. Since there are a couple of problems with using GitHub profiles as a data source like this, I wanted to first list out some of the issues I have with trying to assess developers by looking only at their GitHub contributions. One common misuse of GitHub profile data is in trying to f
I’ve recently become obsessed with the sheer amount of development activity happening on sites like GitHub. As a first project on working with this data, I thought it would be fun to rank all the programming languages by counting how many people on GitHub use each language. I’m using the GitHub Archive and GHTorrent projects as data sources for this analysis. The GitHub Archive provides a record o
A site’s robots.txt file advises the web crawlers of the worlds what files they can and can’t download. It acts as the first gatekeeper of the internet, unlike blocking the response - it lets you stop requests to your site before it happens. The interesting thing about these files is that it lays out how webmasters intend automated processes should access their websites. While it’s easy for a bot
Numerical Optimization is one of the central techniques in Machine Learning. For many problems it is hard to figure out the best solution directly, but it is relatively easy to set up a loss function that measures how good a solution is - and then minimize the parameters of that function to find the solution. I ended up writing a bunch of numerical optimization routines back when I was first tryin
In a previous post I wrote about how to build a ‘People Who Like This Also Like …’ feature for displaying lists of similar musicians. My goal was to show how simple Information Retrieval techniques can do a good job calculating lists of related artists. For instance, using BM25 distance on The Beatles shows the most similar artists being John Lennon and Paul McCartney. One interesting technique I
While packed full of information, this doesn’t really provide any context for the exact relations between these cities. A much more useful visualization would be a map of these points. The challenge then is to produce coordinates for each item that best approximate the distances in the table. This type of problem is well solved by a set of techniques called Multidimensional Scaling (MDS). There is
I haven’t done any real work on learning Javascript and D3.js since my last attempt a couple months back. To keep at it, I thought I’d try using D3.js to visualize a simple algorithm: finding the largest couple of items in a list. This problem comes up all the time when doing search and recommendation type tasks. Every time you query a search engine, it has to find the couple best scored results i
The hard part here isn’t actually displaying the diagram with D3 - its calculating the positions of each set such that the areas of each region are proportional to the size of the set intersections. The problem is that even for only 3 sets, its not always possible to position everything so that everything is area proportional to the set sizes. Try changing A=B=C=8 , AB=AC=4 and BC=0 in the above e
このページを最初にブックマークしてみませんか?
『www.benfrederickson.com』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く