Posts Tagged ‘embarrassingly parallel’

Batching, a basis for optimization

It’s interesting that in 3 distinct domains I’ve run across the same underlying basis for optimization:

  • Graphics: Modern GPUs depend heavily on batching primitives, typically triangles. Instead of rendering triangles individually, you get much better performance by batching primitives together in a list, sending it to the GPU via a single call, then letting the GPU pipelines to do their thing. Even before modern GPUs existed, graphics cards supported techniques like BitBlt which, essentially, performed operations on batched blocks of pixels, to take advantage of the embarrassingly parallel nature of computer graphics.
  • Relational Databases: Issuing lots of small queries can kill performance. A better strategy is, usually, to issue fewer queries, joining and returning as much data as possible with each query. Even if these queries becomes complex and costly, the cost of a complex query will usually still be less than the aggregate cost of numerous simpler queries.
  • Networking: The speed of light sucks… server and packet switching latencies make things worse. I usually assume ~50ms baseline latency to send a request packet + get a reply packet back from an internet server (I use the term “packet” loosely, referring to programmer-defined, application-level “packets” or messages, or whatever you like to call them, not necessarily TCP/IP packets). Note that this baseline is regardless of the amount of information in a packet and is bound by the travel time between server and client. So, to optimize communication and bandwidth, a good strategy is to transfer as much as possible per-packet instead of depending upon numerous requests/responses to/from a server, which would mean lots of packets and lots of wasted time.