Jump to content

Optimisation/Performance of T3D with many objects


JackStone

Recommended Posts

I am not sure if this is the right forum for this, but I am working on a project which will, eventually, allow users to create complex objects such as buildings, vehicles, or machinery, from many smaller objects. I am hesitant to compare my work to other projects for fear of making it look like I am making a knock-off, but if you imagine "Kerbal Space Program", then that is the general type of concept that I am going for.


However, I just conducted some test in T3D regarding this, and, as I suspected, there is a problem.


As I discussed on my blog, I noticed that even with a similar polycount, the FPS is vastly reduced as object count increases.


For my tests, I did this:


I created some spheres in a modelling tool, with a known number of polys. One high res sphere, with 10082 polys, one medium resolution at 1058, and one low res at 105.


I then wrote three drawing functions, which I executed separately:


Test 1: Draw 512 of the high res spheres (512*10082 = 5,161,984 polys),


Test 2: Draw 4,913 of the medium res spheres (4913*1058 = 5,197,954)


Test 3: Draw 46,656 of the low res spheres (46.656 * 105 = 4,898,880)


Since the polycount is quite similar in all cases, the FPS should be similar too, but it isnt. With nothing at all rendering, the polycount is about 200, after Test 1 (512 polys) the FPS is about 50, with all polys in view. However, Test 2, in the same conditions, produces a poly count of 9.3, and Test 3 wasnt really conclusive due to issues drawing all of those objects, but it was giving me about 5-9 fps, depending on the angle I looked at the objects at.


To go from 50 FPS down to 9.3 with a similiar polycount indicates a serious problem with my concept.


Can anyone shed any light on how I would conceptually solve this? I am not planning to work on it in any major way at the moment, but I will need to in the future.


I am just creating TSStatic's in script at the moment, is there a way to combine the distinct object meshes into one logical object?

So, I might have 10 meshes drawing, but they would be logically grouped into one object?

I think I got some advice before on this, and I was told to look into the vehicle code (since it allows mounting of objects into a single, driveable, object). Can anyone see any major problems, or solutions, to connecting many objects together like this?


Something tells me this is going to be very hard to do, performance wise.


Thanks for any advice!

Link to comment
Share on other sites

The reason that occurs is because T3D doesn't currently have a static batching solution for separate objects.


Nils brings up a good point about adjusting the instancing limit(any mesh under that size will try and use instancing to reduce API overhead, which is good for lots of simple shapes).


Otherwise, when you have 46000 spheres, you're going to be seeing a lot of draw calls. The render bins render them by material groups, which reduces state switches to the graphics card, but you're still going to be seeing a ton of API calls which will hit performance.


A similar subject came up last week, so I started doing a bit of R&D into an automagic static batch render bin, but that's not there yet. Once it does anything, I'll toss it up on github for sure though.


Part of it boils down to "do you need your objects to move"? If they have to move, it gets a lot trickier, because you can't assume each object in a batch is going to move the same way, so you have to look into different means.


The easiest setup is that the objects never move. These can be tossed into large static batches and handled in a few draw calls at once, and you get maximum performance.


If the objects move, but are attached together, like, say, parts of a vehicle or the like, you could rig up a scheme to have each group of objects fold into a static buffer and be drawn that way.


If the objects move and could potentially all be individual and dynamic, such as in Kerbal when a rocket explodes and all the components go flying, that gets tricky if you want to just flat draw all the pieces moving around.


So the best approach for how to optimize would depend on how you need your objects to behave.

Link to comment
Share on other sites

Thank you for these replies, I had never heard of the maxinstancingverts variable, I will definitely play around with that. So far, increasing it doesn't seem to do anything, I will probably have to dig a bit deeper to figure out exactly what it's doing.


JeffR's reply is exactly what I was looking for, this is a great insight into exactly what the engine is doing when drawing large numbers of objects.


At the moment, I would be happy to limit the system to static shapes, but I would like to have some vehicle or moving machinery, etc, support eventually. This means that using groups of objects folding into a static buffer would likely be the best option.


I don't need to model each object individually, such as in an explosion in KSP. If there are any pieces that fall off or detach from the main object, I could just draw those individually, this would be rare enough that it wouldn't affect performance.


The problem now is "folding each group of objects into the static buffer". This is more low-level rendering type stuff, which I don't have a lot of experience with. Is this extremely complex, in general?

Link to comment
Share on other sites

It's trickier than flatly rendering each object, but not excessively so.


I have an example of cramming geometry data into as few buffers as possible for rendering here:


https://github.com/Areloch/Torque3D/blob/BrushEditorToolTesting/Engine/source/T3D/BrushObject.cpp#L656-L725


That loads an arbitrary number of brush objects from a file sorts them by material and then stuffs them into buffer sets(vertex and prim buffers). If it'd overrun the max buffer size, it splits off a new one.


You may not use this exact approach, but you'd probably have something in the same ballpark.


You'd have an object to act as your 'master', which would act as the rendering object in this case. So in the kerbal example, you'd have a 'space ship' object class, and then when you add a part, you'd submit the part's geometry to the spaceShip class. It regenerates it's buffers by running through the list of parts, sorting by like materials and packing everything into as few buffers as possible. Then at render time, you just run through the list of buffers and push the render instances.


That'd be the high-level idea, anyways. There's a few ways to approach this, of course, but that approach should be simple enough to try and start with and you can refine it/make it more robust as you go.

Link to comment
Share on other sites

I had never heard of the maxinstancingverts variable, I will definitely play around with that. So far, increasing it doesn't seem to do anything, I will probably have to dig a bit deeper to figure out exactly what it's doing.

 

That's because T3D's instancing isn't really that well made IMO.

In any way, you'll need to see if the scene you're working with is either GPU or CPU bound.


Also is advanced lighting (with dynamic shadows etc.) a big factor; you can't simply count the total of polys in scene and compare.

As Jeff mentioned, the drawcalls are pretty important to measure as well.


To get more out of T3D instancing and your scene has a lot of statics that don't need interaction with the world, then I would suggest to experiment with the forest tool. These objects are handled with groups, which will give you huge performance improvements (when having a lot of same objects) but has a downside as well (they're as static as it can be ;-) ) and if you would like to script it you'll may need to include editor functions and have a little work with that (not impossible I think).


Edit: Is > Isn't :oops: :lol:

Edited by Nils
Link to comment
Share on other sites

This is interesting... I ran into this problem with the last game I worked on, which was a really cool concept but failed when I scaled it up.


Basically I was making a minecraft-like game, where the player can place blocks to build things like castles or houses, etc. They are all TSStatic objects to start, so should be really low overhead, and they are all simple simple cubes with only one low-res material on them...


What happened was the FPS would drop significantly when I added more than a few hundred blocks. This is a problem, as any decent size castle will have 1000+ blocks in it.


I have never heard of this instancing variable either - is there any more information on how to use it effectively? Because I really loved that game, I had all kinds of neat concept in there that I was working on, but had to put it on hold because it just didn't scale to the point I needed it to.


It would seem to me that the engine should be able to handle thousands of small cubes without issue, I was quite surprised and disappointed when it started to crawl to a halt before I even really got a lot in the scene.


Any suggestions?


Thanks!

P

Link to comment
Share on other sites

With $pref::TS::maxInstancingVerts that will affect all static geometry that uses TSMesh class internally. So all TS based static geometry(not skinned meshes). So what it does is each mesh that has $pref::TS::maxInstancingVerts or less will be rendered using instancing. This in itself can be costly if it's picking up meshes that are not rendered multiple times. So be aware of this part.


Where instancing is a big win is say you have 5000 cubes to draw, without instancing you have to make that draw call 5000 times to the graphics card, with instancing you could make that draw call once. There is more going on than this, with instancing you have to update the instance buffer for each mesh to send the gpu so it's not as simple as i make out above ;) but that is the general idea. So as you can see by this, all that geometry still has to be rendered/processed by the gpu it's just cpu side you are not making 5000 draw calls to do it. So if the actual draw calls themselves are not your bottleneck than instancing may not help you at all. With T3D there is still heaps of other stuff going on with that mesh, like all the collision container stuff, scene graph sorting, . It's possible the bottleneck could be elsewhere but it's certainly worth experimenting with.


Personally i think it would work better if you had to manually mark a mesh for instancing rather than just blindly picking up anything with $pref::TS::maxInstancingVerts or less, i'm sure art departments would most likely hate me for that one because it wouldn't be automatic :lol: :lol: but sometimes fine tuning by hand is simply the best method. Anyways have a play around, there is no magic number to use and it is very specific to your level you are rendering.


*Edit:


When a i say static, it doesn't have to be static in that it can never move, just static as in not a skinned mesh. For example a physics object can be rendered using instancing.

Link to comment
Share on other sites

If I remember correctly, it's a NVIDIA rule that if you want to have instancing be effective the object needs to have more then a certain amount of polys (thought it was 300 but could be wrong); else you'd be stuck up with overhead. Though I can't find the document anymore where they described it all into detail.

Link to comment
Share on other sites

This can be a very long and involved path to travel down, especially if this is planned to be a multiplayer world. Personally I pursued this very goal off and on for a couple years. I spent a month here there discovering exactly why Torque seemed to choke on a world filled with geometry that the rendering code has no problem chewing up and spitting out.


I did finally find a couple solid solutions, neither of which I consider 'easy' to implement. As all things Torque, a solution for any given project is going to require taking it by the horns, wrestling it down, and beating it to submission.


The first thing you're going to find is the biggest cause of performance drop is the network ghosts. Run your mission, push 'N' and then watch your active ghost count go up when you add the statics. As it goes up, your performance goes down. This is because Torque has a client and a server active on your machine and actually ghosts client versions of the data you change in the mission.


You are correct to assume an object to hold many sub-objects can be helpful - but only to a certain degree, depending on the ultimate goal. I ended up creating a class from scratch that rendered out several mesh cubes within one object by building all the vertexBuffer data manually in engine functions. In my case the whole thing was cubes, so I could specify the grid size, had the whole she-bang using multiple indexed materials (for different faces and extended player class footstep sounds), much neighbor detection code, added EngineFunctions, Callbacks... altogether the code for that class is spread out across about 9 or so new engine files =)


On top of all of that, I hadn't mentioned the most important part: back to networking. Even in a singleplayer mission, the way that Torque is going to handle adding and removing objects from the scene is going to impact performance if you aren't careful about how you handle the ghosts. Exactly how to handle the ghosts is a huge can of worms that can be dissected in several ways( again, each solution is going to be project specific of course ). I wrote NetEvent code for my 'object of many objects' that would allow users to send and receive data only about the 'sub-object' within the main object when adding/removing blocks.


After all that's ironed out, you'll want to be considering how all of those objects are going to be ghosted / unghosted from each client depending on distance from that object. Can't stress enough how many different ways this can be approached, but more recently I've found a system similar to the existing TerrainCell or ForestCell engine code( don't quote me on exact names! ) can be extremely effective if you're willing to fight the dragon there. He's a bit scary, but a solid attempt at making custom functions that hook into those indexes could be a good solution.


About mounting, depending on how many objects are being attached to build things it could be required the mount code be extended a bit. There are resources out there on this iirc. A bit of work moreso with models and adding the proper mount0, mount1 nodes etc. I hadn't done much testing with the effect of mounted objects on ghosting and so on but it could be worth a look =) Anyhow, perhaps some information here is helpful, I wish you luck with Torque!

Link to comment
Share on other sites

Wow, so much to think about... Thanks all for this info.


Doesn't really get me started on a solution or anything, it all seems a bit over my head, but it's educational nonetheless.


So, what we're basically saying here is that Torque is inherently limited in the amount of objects that can be placed in a world, at least with out of the box T3D code?


That's kind of a bummer, and a huge limitation, funny that this has never been a priority for anyone to tackle and put a fix into the stock T3D engine code.


Do people only make T3D games with less than a couple hundred objects in a scene or something?


I really liked my game where we could build a castle out of blocks and then destroy it into all its pieces. I had it set up so everything was TSStatics, and when you shot them they turned into Physics objects and flew all over the place. I have lots of great working code for that... The problem has always been that any decent castle has at least 500-1000 1X1 cubes, so you build two castles and frame rate drops to below 10 FPS. Always puzzled me, since my blocks are super simple objects with just one simple low-res material on them - you'd think you could load thousands of them without a problem. Sadly this is not the case for me :(

Link to comment
Share on other sites

That's kind of a bummer, and a huge limitation, funny that this has never been a priority for anyone to tackle and put a fix into the stock T3D engine code.

 

It's a difficult thing to do because there isn't really a one glove fits all solution, what may work for scenario A could make scenario B worse.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...