Release Highlights Easier Application Porting Share GPUs across multiple threads Use all GPUs in the system concurrently from a single host thread No-copy pinning of system memory, a faster alternative to cudaMallocHost() C++ new/delete and support for virtual functions Support for inline PTX assembly Thrust library of templated performance primitives such as sort, reduce, etc. NVIDIA Performance