サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
ドラクエ3
wesmckinney.com
About the Open Edition The 3rd edition of Python for Data Analysis is now available as an “Open Access” HTML version on this site https://wesmckinney.com/book in addition to the usual print and e-book formats. This edition was initially published in August 2022 and will have errata fixed periodically over the coming months and years. If you encounter any errata, please report them here. In general
Announcing Ursa Labs: an innovation lab for open source data science Funding open source software development is a complicated subject. I’m excited to announce that I’ve founded Ursa Labs (https://ursalabs.org), an independent development lab with the mission of innovation in data science tooling. I am initially partnering with RStudio and Two Sigma to assist me in growing and maintaining the lab’
The 2nd Edition of my book was released digitally on September 25, 2017, with print copies shipping a few weeks later. The 1st Edition was published in October, 2012. Where to buy? 2nd Edition Resources Book Data and Code Notebooks: https://github.com/wesm/pydata-book What's New in the 2nd Edition? Updated for Python 3.6 Updated for latest pandas (0.20.3) Revamped intro chapters including abridged
This post is the first of many to come on Apache Arrow, pandas, pandas2, and the general trajectory of my work in recent times and into the foreseeable future. This is a bit of a read and overall fairly technical, but if interested I encourage you to take the time to work through it. In this post I hope to explain as concisely as I can some of the key problems with pandas’s internals and how I’ve
Over the last year, I have been working with the Apache Parquet community to build out parquet-cpp, a first class C++ Parquet file reader/writer implementation suitable for use in Python and other data applications. Uwe Korn and I have built the Python interface and integration with pandas within the Python codebase (pyarrow) in Apache Arrow. This blog is a follow up to my 2017 Roadmap post. Desig
There have been many Python libraries developed for interacting with the Hadoop File System, HDFS, via its WebHDFS gateway as well as its native Protocol Buffers-based RPC interface. I'll give you an overview of what's out there and show some engineering I've been doing to offer a high performance HDFS interface within the developing Arrow ecosystem. This blog is a follow up to my 2017 Roadmap pos
2017 is shaping up to be an exciting year in Python data development. In this post I’ll give you a flavor of what to expect from my end. In follow up blog posts, I plan to go into more depth about how all the pieces fit together. I have been a bit delinquent in blogging in 2016, since my hands have been quite full doing development and working on the 2nd edition of Python for Data Analysis. I am g
In this post I discuss some recent work in Apache Arrow to accelerate converting to pandas objects from general Arrow columnar memory. Challenges constructing pandas DataFrame objects quickly One of the difficulties in fast construction of pandas DataFrame object is that the “native” internal memory structure is more complex than a dictionary or list of one-dimensional NumPy arrays. I won’t go int
I’m super excited to be involved in the new open source Apache Arrow community initiative. For Python (and R, too!), it will help enable Substantially improved data access speeds Closer to native performance Python extensions for big data systems like Apache Spark New in-memory analytics functionality for nested / JSON-like data There’s plenty of places you can learn more about Arrow, but this pos
After some unanticipated media leaks (here and here), I was very excited to finally share that my team and I are joining Cloudera. You can find out all the concrete details in those articles, but I wanted to give a bit more intimate perspective on the move and what we see in the future inside Cloudera Engineering. Chang She and I conceived DataPad in 2012 while we were building out pandas and hel
TL;DR I’ve finally gotten around to building the high performance parser engine that pandas deserves. It hasn’t been released yet (it’s in a branch on GitHub) but will after I give it a month or so for any remaining buglets to shake out: A project I’ve put off for a long time is building a high performance, memory efficient file parser for pandas. The existing code up through and including the imm
Bio I am an entrepreneur and open source software developer focusing on analytical computing. I am currently a Principal Architect at Posit PBC. I co-founded Voltron Data and now serve on its advisory board. I created or co-created the pandas, Apache Arrow, and Ibis projects. I am a Member of The ASF and I have authored three editions of Python for Data Analysis. In the past, I was with Ursa Compu
Do you know how fast your code is? Is it faster than it was last week? Or a month ago? How do you know if you accidentally made a function slower by changes elsewhere? Unintentional performance regressions are extremely common in my experience: it’s hard to unit test the performance of your code. Over time I have gotten tired of playing the game of “performance whack-a-mole”. Thus, I started hacki
Discussion thread on Hacker News So, this post is a bit of a brain dump on rich data structures in Python and what needs to happen in the very near future. I care about them for statistical computing (I want to build a statistical computing environment that trounces R) and financial data analysis (all evidence leads me to believe that Python is the best all-around tool for the finance space). Othe
このページを最初にブックマークしてみませんか?
『Wes McKinney』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く