In both cases you're rendering about the same amount of pixels. You could do something as silly as render a ball with 12k vertices instead of 24 and expecting the vertex processing to be much slower, but after profiling you find out its the fragment part lagging way behind because the data sequencer is overloaded trying to generate fragment tasks. The PowerVR chips we're working with have dozens and dozens of different profile metrics corresponding to the different areas of its pipeline, each one being a potential bottleneck. The GPU is creating threads and tasks internally and it's not always easy to balance this workload so no parts of the GPU becomes saturated while following parts in the chip's pipeline are idly waiting for work. On consoles where the hardware is fixed this is easily profiled. You still have to be aware of it when optimizing the shaders and workloads though. Then talking to the GPU is a simple matter of handing over an already-validated, immutable object to the GPU. Re-use them across frames, build them on multiple threads, etc. DX12 and others now lets you take lifecycle control of those objects. The majority of CPU time these days in OpenGL/DirectX is in validating and building state objects. The threading improvements come because you can build the GPU objects on different threads, not because you can talk to a bunch of GPU cores from different threads. The key thing is that those low level guts aren't that low level. The point of Mantle, of Metal, and of DX12 is to expose more of the low level guts. The whole post isn't just oversimplified, it's just wrong. They cannot split those cores up into logical chunks that can then individually do independent things. They have thousands of cores, yes, but much more in a SIMD-style fashion than in a bunch of parallel threads. GPUs do not have the capability to run more than one work-unit-thing at a time. Moreover the actual GPU doesn't work like that either. The numbers are against DirectX, and that's why Microsoft is slowly letting devs access the GPU on the Xbox One without the DirectX overhead. A low-level hardware call available on Playstation Platform will do the same texture switch in few dozen instruction calls. A DirectX call to switch a texture takes a few thousand clock cycles. I work with console devkits every single day and the reason why we can squeeze so much performance out of relatively low-end hardware is that we get to make calls which you can't make on PC. None of this has to do with getting close to the hardware. "Every time I hear someone say “but X allows you to get close to the hardware” I want to shake them. You could probably add a couple thousand and it would be fine too. "įor my Masters degree project at uni I had a demo written in OpenGL with over 500 dynamic lights, running at 60fps on a GTX580. Which is fine for a relatively static scene. Guess how many light sources most engines support right now? 20? 10? Try 4. "Creating dozens of light sources simultaneously on screen at once is basically not doable unless you have Mantle or DirectX 12.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |