サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
ドラクエ3
mlwave.com
Model ensembling is a very powerful technique to increase accuracy on a variety of ML tasks. In this article I will share my ensembling approaches for Kaggle Competitions. For the first part we look at creating ensembles from submission files. The second part will look at creating ensembles through stacked generalization/blending. I answer why ensembling reduces the generalization error. Finally I
Let’s take a look at the perceptron: the simplest artificial neuron. This article goes from a concept devised in 1943 to a Kaggle competition in 2015. It shows that a single artificial neuron can get 0.95 AUC on an NLP sentiment analysis task (predicting if a movie review is positive or negative). In logic there are no morals. Everyone is at liberty to build up his own logic, i.e., his own form of
A not well-known feature of Vowpal Wabbit is Online Latent Dirichlet Allocation. This allows you to do topic modelling on millions of documents in under an hour. Under construction. Come back soon. Latent Dirichlet Allocation Topic modeling Probabilistic topic models are useful for uncovering the underlying semantic structure of a collection of documents. Latent Dirichlet Allocation (LDA) is a hie
There are new and exciting commercial opportunities in the data science space. We take a look at the data science start-ups from the latest yCombinator batch. Niches Customer analytics Framed Data. Framed Data is a good example of a new data science company. Using machine learning to predict customer churn. An attractive problem to attack, since no company wants to lose customers. We wish them muc
I recently competed in a CrowdAnalytix competition to predict worsening symptoms of COPD. Our team (Marios Michailidis, Phil Culliton, Vivant Shen and me) finished in the money. Here is how we managed this. COPD COPD (Chronic Obstructive Pulmonary Disease) is a lung disease that makes it hard to breath. People with COPD experience exacerbation: A sudden worsening of the symptoms. Symptoms of COPD
It’s been a year since I joined Kaggle for my first competition. Back then I didn’t know what an Area Under the Curve was. How did I manage to predict my way to Kaggle Master? Early start Toying with datasets and tools I was already downloading datasets from Kaggle purely for my own entertainment and study before I started competing. Kaggle is one of the few places on the internet where you can ge
Wisdom of the crowds and ensemble machine learning techniques are similar in principle. Could insights in group learning provide insights in machine learning and vice versa? In this article we will touch upon a variety of more (or less) related concepts and try to build an ensemble view of our own. “Of all the offspring of Time, Error is the most ancient, and is so old and familiar an acquaintance
Good clicklog datasets are hard to come by. Luckily CriteoLabs released a week’s worth of data — a whopping ~11GB! — for a new Kaggle contest. The task is to predict the click-through-rate for ads. We will use online machine learning with Vowpal Wabbit to beat the logistic regression benchmark and get a nr. 1 position on the leaderboard. Updates Demo with data from this contest added to Vowpal Wab
Kaggle is hosting a contest where the task is to predict visual stimuli from magnetoencephalography (MEG) recordings of human brain activity. A subject is presented a stimulus (a human face or a distorted face) and the concurrent brain activity is recorded. The relation between the recorded signal and the stimulus may provide insights on the underlying mental process. We use Vowpal Wabbit to beat
Another Kaggle contest means another chance to try out Vowpal Wabbit. This time on a data set of nearly 350 million rows. We will discuss feature engineering for the latest Kaggle contest and how to get a top 3 public leaderboard score (~0.59347 AUC). A short competition description The competition is to predict repeat buyers (those who redeem a coupon and purchase that product afterwards). For th
Kaggle is hosting another cool knowledge contest, this time it is sentiment analysis on the Rotten Tomatoes Movie Reviews data set. We are going to use Vowpal Wabbit to test the waters and get our first top 10 leaderboard score. Contest Description Data The Rotten Tomatoes movie review data set is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee [pdf]. In
このページを最初にブックマークしてみませんか?
『MLWave | Learning Machine Learning』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く