In relation to the topic of batching, a section from a GL Intercept (highly recommended by the way) log looks like this:
Code: Select all
glBindTexture(GL_TEXTURE_2D,39)
glDrawArrays(GL_TRIANGLE_FAN,19200,4) VP=36 FP=39 Time= 3us
glDrawArrays(GL_TRIANGLE_FAN,19254,3) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19248,3) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19290,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19350,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19306,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19326,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19318,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19366,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19370,6) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19151,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19145,6) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19139,6) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19136,3) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19130,6) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19108,3) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19103,5) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19096,7) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,20014,5) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19989,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19998,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,20002,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,20010,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,20006,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19993,5) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19904,4) VP=36 FP=39
glDrawArrays(GL_TRIANGLE_FAN,19916,3) VP=36 FP=39
It's obvious that some extra work is being done in the first glDrawArrays call, but the rest of them take next to no time (there's obviously some small cost for each subsequent call but it's too low to register in the log - they're certainly not free though). That could be lazy state changes, it could be switching the driver into some internal "drawing stuff now" mode, it could be anything.
The upshot though is that you want to get as many consecutive primitives drawn as possible. It's a bad idea to draw stuff out of state order as individual draw calls (with state changes between them) all exhibit the same kind of setup cost:
Code: Select all
glActiveTextureARB(GL_TEXTURE0)
glBindTexture(GL_TEXTURE_2D,79)
glActiveTextureARB(GL_TEXTURE1)
glBindTexture(GL_TEXTURE_2D,184)
glDrawArrays(GL_TRIANGLE_FAN,23964,4) VP=36 FP=39 Time= 3us
glActiveTextureARB(GL_TEXTURE0)
glBindTexture(GL_TEXTURE_2D,52)
glDrawArrays(GL_TRIANGLE_FAN,19300,6) VP=36 FP=39 Time= 3us
glBindTexture(GL_TEXTURE_2D,51)
glDrawArrays(GL_TRIANGLE_FAN,20441,4) VP=36 FP=39 Time= 4us
glDrawArrays(GL_TRIANGLE_FAN,20417,4) VP=36 FP=39
glBindTexture(GL_TEXTURE_2D,49)
glDrawArrays(GL_TRIANGLE_FAN,21515,6) VP=36 FP=39 Time= 2us
glDrawArrays(GL_TRIANGLE_FAN,21505,6) VP=36 FP=39
glBindTexture(GL_TEXTURE_2D,48)
glDrawArrays(GL_TRIANGLE_FAN,18255,4) VP=36 FP=39 Time= 2us
So in the first example we got 39 draw calls with a cost of ~3us (there was actually about three times that number of calls in this section of the log but it would be just silly to paste it all). In the second we got 8 calls with a cost of ~14us. (And note the second glDrawArrays in a batch of two there which didn't incur a setup cost).
glDrawElements is interesting; I'm obviously getting a whole lot more triangles in a single draw call, but there is a much higher setup cost too:
Code: Select all
glDrawElements(GL_TRIANGLES,1041,GL_UNSIGNED_SHORT,0x35f4) VP=15 FP=16 Time= 23us
You'll probably need to evaluate each individual case and see where gains can be made or lost - sometimes a whole bunch of glDrawArrays calls comes out preferable to a single glDrawElements, particularly if it's going to be something like brush surfaces where you don't really know at creation time how many of them there are and in what order they're going to be for any given scene. You could set things up for a nice glDrawElements call only to find that you've just got a relatively small number of surfaces and the cost of setting it up in your program plus the added cost of glDrawElements actually runs slower.