Jump to content

Multithreading the engine


LukasPJ

Recommended Posts

Slightly off topic but one of the biggest things i have seen that is really holding T3D back, is just how CPU bound it is. Using T3D now on a decent size level it quickly becomes very apparent how under utilized the GPU is. It needs multi threading.

 

I think Timmy is write, and I just wanted to start a discussion on which part of the engine it'd make sense to multithread.

The ParticleSystem is one, which I can probably handle in a meaningful way :)


Any other ideas?

Link to comment
Share on other sites

Yeah Az is definitely right it is a real spider web right now. As it stands the network/sound is the only thing multi-threaded and they both use the thread pool functionality already present in T3D.


The render system is the most obvious place to start, it doesn't need to actually support rendering API calls from multiple threads but breaking down the work the render system does to be distributed amongst the thread pools. Things like traversing the scene graph, culling etc etc


It would be a massive undertaking to do it but very beneficial.

Link to comment
Share on other sites

I think the easiest place to multithread that will have a noticeable impact and isn't a crazy project would be the asset loading. I started the research on it one day and it didn't look too bad. The key thing is creating some kind of state an asset can be in that's essentially "Loading". It won't be displayed but will still exist as an object in the world. Once that's in place you can just offload the asset loading to another thread and then mark the asset as ready/loaded once it's done processing. This should alleviate the hangs you experience when you quickly rotate the camera and do other things that cause hiccups.


The man who multithreads that renderer deserves a lifetime of supply of whiskey. He'll also need it for the PTSD he'll surely have after the project is complete. I've looked it up and down 3 or 4 times now and I keep coming to the same conclusion: it would make more sense to gut it, build it proper, and then go through all the existing code and update it to use a new threaded render system. It's what I started concluding when doing BGFX. In some cases you can fix a house by replacing one wall at a time.. but I think in this case the house should be torn down and rebuilt. Just my two cents anyway.

Link to comment
Share on other sites

Even with multi-threading the asset loading, i think you would still get the dreaded hangs/pauses etc because even though disk i/o and any cpu intensive operations can happen on another thread, things like sending the vert/index buffers, uploading textures,compiling shaders etc etc still has to happen on the main thread causing delays. D3D9Ex does support sharing resources https://msdn.microsoft.com/en-us/library/windows/desktop/bb219800(v=vs.85).aspx#Sharing_Resources though and OpenGL is capable of this too.

Link to comment
Share on other sites

Even with multi-threading the asset loading, i think you would still get the dreaded hangs/pauses etc because even though disk i/o and any cpu intensive operations can happen on another thread, things like sending the vert/index buffers, uploading textures,compiling shaders etc etc still has to happen on the main thread causing delays. D3D9Ex does support sharing resources https://msdn.microsoft.com/en-us/library/windows/desktop/bb219800(v=vs.85).aspx#Sharing_Resources though and OpenGL is capable of this too.

 

http://i.imgur.com/yj8MWI2.gif


Good point. I've got nothing on that one.

Link to comment
Share on other sites

Lockable resources (textures with D3DUSAGE_DYNAMIC, vertex buffers and index buffers, for instance) can experience poor performance when shared. Lockable rendertargets will fail to be shared on some hardware.

 

That part also doesn't sound very nice with resource sharing

Link to comment
Share on other sites

Asynchronous resource allocation in directx is not done with shared resources between devices ( a bad idea! ), but by creating the device with the D3DCREATE_MULTITHREADED flag, allowing for threadsafe api calls. The downside in d3d9 is that the critical section is global over all api calls. D3d11 is the first api to separate resource allocation and render calls.

Edited by Haladrin
Link to comment
Share on other sites

From my experience, multithreading well is pretty hard. There's so many things which can go wrong.


IMO any additional multithreading should meet the following criteria:


- Core systems should be thread-safe. (e.g. the logging which currently isn't. also console execution which isn't 100% safe)

- Memory allocations should be kept to a minimum, otherwise you'll increase the chances of memory fragmentation (e.g. if the one thread is allocating differently sized blocks to the other threads on a temporary basis)

- Mutexes should be kept to a minimum otherwise you have to deal with the overhead of a mutex lock too often (e.g. the current scenario of having 2 mutexes per simset is a bit OTT)

- Any threaded operation should be as isolated as possible from the system as a whole (debugging random timing-related crashes because thread X accessed something in the main thread without a lock is no fun)

- Any threaded operation should be designed so that it can easily be cancelled upon shutdown

- It should actually offer a performance advantage (threading just because it sounds cool is not good enough)

Link to comment
Share on other sites

Would it be possible to load several COLLADA shapes in parallel, even if we couldn't load new resources in the background? Might speed up that initial level load time.


Skinning has always been mentioned as a candidate for parallelism, though @MangoFusion's GPU skinning may make that obsolete.

 

it doesn't need to actually support rendering API calls from multiple threads but breaking down the work the render system does to be distributed amongst the thread pools

This seems like it could be a good way to go for many types of problem. Taking discrete pieces of an algorithm that involves a lot of computation, and parallelising it within a single discrete time slice in order to make it go faster. Not long-running systems which interact with the main thread over any length of time.


As always, Mango has it right. Definitely all good things to keep in mind.

Link to comment
Share on other sites

This seems like it could be a good way to go for many types of problem. Taking discrete pieces of an algorithm that involves a lot of computation, and parallelising it within a single discrete time slice in order to make it go faster. Not long-running systems which interact with the main thread over any length of time..

 

Yeah this seems to be the direction most of the "big boys" seemed to have taken with their engines. Intel has a really great video explaining the concept from GDC a few years back, i'll post the link if i find it.

Link to comment
Share on other sites

Multithreaded particlesystem with 25.000 particles on screen, vs non multithreaded: 31.25MSPF vs 33MSPF.. 

There is a change, but it's a small change :P That's only simulation though, which is pretty light.. I'll try and see if I can get more of the system to be multithreaded.

Link to comment
Share on other sites

I quite like the idea of getting scene traversal to use a thread pool, though I think that would involve the container system (IIRC), which is nowhere near threadsafe. I did do some exploration in that direction when working on recast/walkabout, so I could multithread container queries to build geometry. It wasn't super pretty.

Link to comment
Share on other sites

  • 7 months later...
  • 2 weeks later...

I've heard rumours about someone trying that. I agree, I think it might be a decent idea, though IIRC @MangoFusion was close to getting GPU skinning working, so it might be moot.


It might honestly be easiest to just add more documentation about T3D's threading features and add more examples of their use (maybe in the navmesh/pathfinding code?) so that game devs can decide how they want to use parallelism for their own problems. Sounds like the Life is Feudal team were having problems with parallelising their game; I wonder if better docs and guidance around what parts of the engine can and should be subject to it would have helped.


The biggest, biggest thing is obviously the resource manager but that'd be a huge amount of effort I reckon.

Link to comment
Share on other sites

I've heard rumours about someone trying that. I agree, I think it might be a decent idea, though IIRC @MangoFusion was close to getting GPU skinning working, so it might be moot.


It might honestly be easiest to just add more documentation about T3D's threading features and add more examples of their use (maybe in the navmesh/pathfinding code?) so that game devs can decide how they want to use parallelism for their own problems. Sounds like the Life is Feudal team were having problems with parallelising their game; I wonder if better docs and guidance around what parts of the engine can and should be subject to it would have helped.


The biggest, biggest thing is obviously the resource manager but that'd be a huge amount of effort I reckon.

 



I think we need multithreading. The documentation is good, but we still need multithreading. Is 2015. images?q=tbn:ANd9GcSoxNVPYhPkrQTA2vJeXhfXb-aicciCZxCENz3jp83IjtUKI9j7

Link to comment
Share on other sites

A ponderance:


One of the first things I'll PR for 3.9 is the Taml/Asset/Module stuff pulled from T2D.


Assets are interesting because they're auto-managed via references. If something references an asset, it does it's initializing/loading work, and then cleanup when nothing references it again.


So rather than trying to wade into the resource code itself, would it possibly make more sense to thread the asset load/unload step?


So when an asset is referenced for the first time, it'd do it's setup/loading work in a secondary thread. It wouldn't be as low level as doing up the entire resource system, but it seems like that'd help at least a little on load times and streaming.

Link to comment
Share on other sites

A ponderance:


One of the first things I'll PR for 3.9 is the Taml/Asset/Module stuff pulled from T2D.


Assets are interesting because they're auto-managed via references. If something references an asset, it does it's initializing/loading work, and then cleanup when nothing references it again.


So rather than trying to wade into the resource code itself, would it possibly make more sense to thread the asset load/unload step?


So when an asset is referenced for the first time, it'd do it's setup/loading work in a secondary thread. It wouldn't be as low level as doing up the entire resource system, but it seems like that'd help at least a little on load times and streaming.

 

That will be a nice start. I hope this will help the load time on the Pacific demo

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...