Figures xv Tables xxiListings xxvForeword xxixPreface xxxiiiAcknowledgments xliAbout the Authors xliii Part I: The OpenCL 1.1 Language and API 1 Chapter 1: An Introduction to OpenCL 3What Is OpenCL, or . . . Why You Need This Book 3Our Many-Core Future: Heterogeneous Platforms 4Software in a Many-Core World 7Conceptual Foundations of OpenCL 11OpenCL and Graphics 29The Contents of OpenCL 30The Embedded Profile 35Learning OpenCL 36 Chapter 2: HelloWorld: An OpenCL Example 39Building the Examples 40HelloWorld Example 45Checking for Errors in OpenCL 57 Chapter 3: Platforms, Contexts, and Devices 63OpenCL Platforms 63OpenCL Devices 68OpenCL Contexts 83 Chapter 4: Programming with OpenCL C 97Writing a Data-Parallel Kernel Using OpenCL C 97Scalar Data Types 99Vector Data Types 102Other Data Types 108Derived Types 109Implicit Type Conversions 110Explicit Casts 116Explicit Conversions 117Reinterpreting Data as Another Type 121Vector Operators 123Qualifiers 133Keywords 141Preprocessor Directives and Macros 141Restrictions 146 Chapter 5: OpenCL C Built-In Functions 149Work-Item Functions 150Math Functions 153Integer Functions 168Common Functions 172Geometric Functions 175Relational Functions 175Vector Data Load and Store Functions 181Synchronization Functions 190Async Copy and Prefetch Functions 191Atomic Functions 195Miscellaneous Vector Functions 199Image Read and Write Functions 201 Chapter 6: Programs and Kernels 217Program and Kernel Object Overview 217Program Objects 218Kernel Objects 237 Chapter 7: Buffers and Sub-Buffers 247Memory Objects, Buffers, and Sub-Buffers Overview 247Creating Buffers and Sub-Buffers 249Querying Buffers and Sub-Buffers 257Reading, Writing, and Copying Buffers and Sub-Buffers 259Mapping Buffers and Sub-Buffers 276 Chapter 8: Images and Samplers 281Image and Sampler Object Overview 281Creating Image Objects 283Creating Sampler Objects 292OpenCL C Functions for Working with Images 295Transferring Image Objects 299 Chapter 9: Events 309Commands, Queues, and Events Overview 309Events and Command-Queues 311Event Objects 317Generating Events on the Host 321Events Impacting Execution on the Host 322Using Events for Profiling 327Events Inside Kernels 332Events from Outside OpenCL 333 Chapter 10: Interoperability with OpenGL 335OpenCL/OpenGL Sharing Overview 335Querying for the OpenGL Sharing Extension 336Initializing an OpenCL Context for OpenGL Interoperability 338Creating OpenCL Buffers from OpenGL Buffers 339Creating OpenCL Image Objects from OpenGL Textures 344Querying Information about OpenGL Objects 347Synchronization between OpenGL and OpenCL 348 Chapter 11: Interoperability with Direct3D 353Direct3D/OpenCL Sharing Overview 353Initializing an OpenCL Context for Direct3D Interoperability 354Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357Acquiring and Releasing Direct3D Objects in OpenCL 361Processing a Direct3D Texture in OpenCL 363Processing D3D Vertex Data in OpenCL 366 Chapter 12: C++ Wrapper API 369C++ Wrapper API Overview 369C++ Wrapper API Exceptions 371Vector Add Example Using the C++ Wrapper API 374 Chapter 13: OpenCL Embedded Profile 383OpenCL Profile Overview 38364-Bit Integers 385Images 386Built-In Atomic Functions 387Mandated Minimum Single-Precision Floating-Point Capabilities 387Determining the Profile Supported by a Device in an OpenCL C Program 390 Part II: OpenCL 1.1 Case Studies 391 Chapter 14: Image Histogram 393Computing an Image Histogram 393Parallelizing the Image Histogram 395Additional Optimizations to the Parallel Image Histogram 400Computing Histograms with Half-Float or Float Values for Each Channel 403 Chapter 15: Sobel Edge Detection Filter 407What Is a Sobel Edge Detection Filter? 407Implementing the Sobel Filter as an OpenCL Kernel 407 Chapter 16: Parallelizing Dijkstra's Single-Source Shortest-Path Graph Algorithm 411Graph Data Structures 412Kernels 414Leveraging Multiple Compute Devices 417 Chapter 17: Cloth Simulation in the Bullet Physics SDK 425An Introduction to Cloth Simulation 425Simulating the Soft Body 429Executing the Simulation on the CPU 431Changes Necessary for Basic GPU Execution 432Two-Layered Batching 438Optimizing for SIMD Computation and Local Memory 441Adding OpenGL Interoperation 446 Chapter 18: Simulating the Ocean with Fast Fourier Transform 449An Overview of the Ocean Application 450Phillips Spectrum Generation 453An OpenCL Discrete Fourier Transform 457A Closer Look at the FFT Kernel 463A Closer Look at the Transpose Kernel 467 Chapter 19: Optical Flow 469Optical Flow Problem Overview 469Sub-Pixel Accuracy with Hardware Linear Interpolation 480Application of the Texture Cache 480Using Local Memory 481Early Exit and Hardware Scheduling 483Efficient Visualization with OpenGL Interop 483Performance 484 Chapter 20: Using OpenCL with PyOpenCL 487Introducing PyOpenCL 487Running the PyImageFilter2D Example 488PyImageFilter2D Code 488Context and Command-Queue Creation 492Loading to an Image Object 493Creating and Building a Program 494Setting Kernel Arguments and Executing a Kernel 495Reading the Results 496 Chapter 21: Matrix Multiplication with OpenCL 499The Basic Matrix Multiplication Algorithm 499A Direct Translation into OpenCL 501Increasing the Amount of Work per Kernel 506Optimizing Memory Movement: Local Memory 509Performance Results and Optimizing the Original CPU Code 511 Chapter 22: Sparse Matrix-Vector Multiplication 515Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515Description of This Implementation 518Tiled and Packetized Sparse Matrix Representation 519Header Structure 522Tiled and Packetized Sparse Matrix Design Considerations 523Optional Team Information 524Tested Hardware Devices and Results 524Additional Areas of Optimization 538 Appendix: Summary of OpenCL 1.1 541The OpenCL Platform Layer 541The OpenCL Runtime 543Buffer Objects 544Program Objects 546Kernel and Event Objects 547Supported Data Types 550Vector Component Addressing 552Preprocessor Directives and Macros 555Specify Type Attributes 555Math Constants 556Work-Item Built-In Functions 557Integer Built-In Functions 557Common Built-In Functions 559Math Built-In Functions 560Geometric Built-In Functions 563Relational Built-In Functions 564Vector Data Load/Store Functions 567Atomic Functions 568Async Copies and Prefetch Functions 570Synchronization, Explicit Memory Fence 570Miscellaneous Vector Built-In Functions 571Image Read and Write Built-In Functions 572Image Objects 573Image Formats 576Access Qualifiers 576Sampler Objects 576Sampler Declaration Fields 577OpenCL Device Architecture Diagram 577OpenCL/OpenGL Sharing APIs 577OpenCL/Direct3D 10 Sharing APIs 579 Index 581
Preface Acknowledgments About the Authors Chapter 1: Introduction to OpenCL Chapter 2: Hello World: An OpenCL Example Chapter 3: Platforms, Contexts, and Devices Chapter 4: Programming with OpenCL C Chapter 5: OpenCL C Built In Functions Chapter 6: Programs and Kernels Chapter 7: Buffers and sub-buffers Chapter 8: Images and Samplers Chapter 9: Events Chapter 10: Interoperability with OpenGL Chapter 11: Interoperability with DirectX Chapter 12: OpenCL C++ API Bindings Chapter 13: OpenCL ES Profile Chapter 14: Image Histogram Chapter 15: Sobel Edge Detection Filter Chapter 16: Parallelizing Dikjstra's Single Source Shortest Path Graph Algorithm Chapter 17: Cloth Simulation in the Bullet Physics SDK Chapter 18: Simulating the Ocean with Fast Fourier Transform Chapter 19: Optical Flow Chapter 20: Using OpenCL with PyOpenCL Chapter 21: Matrix Multiplication with OpenCL Chapter 22: Sparse Matrix Multiplication Appendix: Summary of OpenCL 1 Index
Aaftab Munshi is the spec editor for the OpenGL ES 1.1, OpenGL ES 2.0, and OpenCL specifications and coauthor of the book OpenGL ES 2.0 Programming Guide (with Dan Ginsburg and Dave Shreiner, published by Addison-Wesley, 2008). He currently works at Apple. Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, in particular looking at high-level abstractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. Benedict has contributed extensively to the OpenCL's design and has represented AMD at the Khronos Group open standard consortium. Benedict has a Ph.D. in computer science for his work on type systems for extensible records and variants. He has been working at AMD since 2008. Timothy G. Mattson is an old-fashioned parallel programmer, having started in the mid-eighties with the Caltech Cosmic Cube and continuing to the present. Along the way, he has worked with most classes of parallel computers (vector supercomputers, SMP, VLIW, NUMA, MPP, clusters, and many-core processors). Tim has published extensively, including the books Patterns for Parallel Programming (with Beverly Sanders and Berna Massingill, published by Addison-Wesley, 2004) and An Introduction to Concurrency in Programming Languages (with Matthew J. Sottile and Craig E. Rasmussen, published by CRC Press, 2009). Tim has a Ph.D. in chemistry for his work on molecular scattering theory. He has been working at Intel since 1993. James Fung has been developing computer vision on the GPU as it progressed from graphics to general-purpose computation. James has a Ph.D. in electrical and computer engineering from the University of Toronto and numerous IEEE and ACM publications in the areas of parallel GPU Computer Vision and Mediated Reality. He is currently a Developer Technology Engineer at NVIDIA, where he examines computer vision and image processing on graphics hardware. Dan Ginsburg currently works at Children's Hospital Boston as a Principal Software Architect in the Fetal-Neonatal Neuroimaging and Development Science Center, where he uses OpenCL for accelerating neuroimaging algorithms. Previously, he worked for Still River Systems developing GPU-accelerated image registration software for the Monarch 250 proton beam radiotherapy system. Dan was also Senior Member of Technical Staff at AMD, where he worked for over eight years in a variety of roles, including developing OpenGL drivers, creating desktop and hand-held 3D demos, and leading the development of handheld GPU developer tools. Dan holds a B.S. in computer science from Worcester Polytechnic Institute and an M.B.A. from Bentley University.
"Welcome to the new world of heterogeneous parallel programming with this authoritative and accessible guide to the complete OpenCL Programming Model."
-Professor Pat Hanrahan, Stanford University