This talk will introduce TeraCache, a new scalable cache for Spark that avoids both garbage collection (GC) and serialization overheads. Existing Spark caching options incur either significant GC overheads for large managed heaps over persistent memory or significant serialization overheads to place objects off-heap on large storage devices. Our analysis shows that: (1) serialization increases exe
![Rakuten LeoFs - distributed file system](https://cdn-ak-scissors.b.st-hatena.com/image/square/13b36e68b89c9196f64e57c61810bb75a149e4a5/height=288;version=1;width=512/https%3A%2F%2Fcdn.slidesharecdn.com%2Fss_thumbnails%2Frakutenleofs20120525-120528014032-phpapp01-thumbnail.jpg%3Fwidth%3D640%26height%3D640%26fit%3Dbounds)