Work Blog - JeffR

196 posts Page 19 of 20
Posts: 19
Joined: Fri Apr 24, 2015 3:25 pm
by Skipper » Fri Dec 08, 2017 12:25 am
Cheers Jeff, keep up the good work.
Steering Committee
Steering Committee
Posts: 776
Joined: Tue Feb 03, 2015 9:49 pm
by JeffR » Tue Dec 26, 2017 5:51 pm
Hey everyone, time for a christmas update!

Firstly, hope everyone's holiday season has been going well so far!

Now, lots of very interesting developments, so this may get a bit long in the tooth, but I'm a huge nerd and this stuff is cool :P

Ok, so. Hard to say how much everyone is familiar with in regards to threading. Torque's broadly always been a single-threaded engine. There are exceptions(some sound stuff, physics thread, etc) but by and large all the work Torque does through it's normal run and sim has all been single threaded.

Why does this matter? Well, single-threaded performance for CPUs - that is, how much and how fast a CPU can do a workload on a single thread - has been beginning to plateau for a while now. While efficiency improvements are obviously continuous, ultimately, there's only so much computation you can push through in a timeframe. Which is why, of course, CPUs started adding multiple cores, hyperthreading, etc. So workloads can be divvy'd up across cores and threads, allowing multiple workloads to happen at once.

Game engines have been slow to multithread, simply because multitheading isn't easy even in simpler programs, what with the need to make sure you don't try and read or write to the same memory at the same time with different threads or corruption/crashing happens. And in something as intertwined as a game engine, it's many times more challenging.

That said, ultimately, even game engines need to suck it up and properly utilize all that CPU power that's just sitting there begging to be used. Over the past few years most major engines have shifted over to it, but Torque's been lagging behind on it, partially because none of us on core development were especially familiar with it. We've had brushes, but hadn't familiarized ourselves with the how-to-do enough to really rework the engine to make it happen.

The good news is, that's changing now.

We ended up getting into contact with the Life Is Feudal guys and they had been working on their own branch since before the engine went MIT, but offered to pass along a good chunk of the work they'd done for us to use and integrate where possible. They've *seriously* worked hard, and what they've passed along so far has been a treasure trove, so make sure to send them some thanks and maybe check out their game.

But one of the things they passed along that particularly piqued our interest was texture streaming. As in, loading in textures progressively, lower-to-higher mips. So you can quickly load in all the textures your rendering requires, and then over the course of a few frames, then load the higher quality mips of the same texture to get the proper, full detail. Obviously, this is a workload best done in a threaded way, and sure enough they'd reworked a good chunk of the threading system in Torque in order to make that happen.

User avatar
has been the major spearhead in figuring out the port work to get the threading management improvements in, and I've helped along with @
User avatar
working with out boy at LiF to puzzle the integration. There's some bugs to hammer out yet, but the basic integration of the improvements is looking solid so far. Once that's been finalized, we can move on to the texture streaming, and other delicious opportunities this provides.

Opportunities, you say? Indeed I do say. Let me explain.

The core of the improvements mainly update the thread handling to be more modern and standard(using std::thread, etc) and add some features modern thread handling provides, such as conditional tracking. This lets us understand the status of a thread/work item much better. So, for example, we go to process an image we're loading in, we can spool it up into a work item thread through the ThreadPool, and the improvements make it *much* easier to track if it's done or not, letting us actually use the texture when the loading work is concluded. In the old setup, you have to do some manual setup work for tracking stuff like that, so using threads was always kind of a pain.

So, it's easier to manage threads, which means they're easier to use. This means that we can and *will use them. So, outside of texture streaming, what are we looking at threading?

So, so many things. Two biggies are the next major points, but we plan on sprinkling threaded behavior around in a LOT of spots. GroundCover item placement? Thread that sucker so it goes faster. Loading a model off the disk and doing the setup work to prep it in memory/load the animations? Thread it up. Plan is to thread as many things as can be reasonable converted into a threaded workload so we're using as many cores of the CPU as we can, as often as we can. This will lead to lower load times, less load stutter, and a much higher baseline of functional performance.

Not to be confused with Entity-Component systems(hah, just kidding, feel free to be confused, naming conventions are hard!) the distinction is relatively straightforward, but an important one.

The current way I have entities and components set up in the engine is as a Entity/Component configuration. You have an Entity object you throw in the scene, and then add Component objects to it to apply functionality and behavior to said Entity(like adding a MeshComponent to get it to render a mesh). The Components contained data AND they also implemented that data, so a MeshComponent had what model to render, scale, etc. But also had the code logic to actually DO the rendering of the model. When render-time came, we iterated over all our mesh components(we actually used a globally accessible interface list, but that's not important yet) and told them to do their work. We similarly iterated over all the regular components that did tick updates and so on.

While working with LiF guys on the thread stuffs, they mentioned they, too had begun shifting to components and the like, so we got talking about that and they used a modified Entity/Component/Systems setup. I'd looked at this approach back when I started working on E/C stuff, but there were issues with script integration and other problems so I abandoned it. But as we talked, ideas began to form, so I leapt into action and quickly drafted a prototype.

One of the things you want to avoid at all costs(as hard as it is) in software is Cache Thrashing, or loss of cache locality. Basically, when you go to process a chunk of memory in the CPU, you pull it out of RAM and toss it into the CPU cache. As fast as ram is, the cache is way faster. But it's also waaaay smaller. However, when you're processing that chunk of memory, if you have to refer to a chunk of memory that isn't in the cache, you pretty much have to dump it, get the newly relevent memory chunk, cache it, and then keep working.

When dealing with pointered objects, this is hard to avoid. Because Torque uses a central list of objects via the SimDictionary, it means referncing into that is likely going to kill cache locality, which leads to slowness. Making cache locality happen with Entity/Components is, thus, hard.

But with the discussions, a new notion was formed. We had a convenient system already in place via the component interfaces - a system where when a component is created, it automatically adds itself to a global list that makes it cheap and easy to access them without having to go do a lookup through the SimDictionary and iterate through all loaded objects.

So I repurposed this system to make...uh, Systems.

So calling back to our MeshComponent example, the old way was that the MeshComponent contains the data of what to render, and then when the time comes, it also did the rendering. In the new setup, the implementation is separated out and moved into a MeshRenderSystem.

This does the actual work of making rendering happen. We then have a data interface that is a list inside our system, which contains the actual data-to-use. Model, scale, transform, etc.

The component itself basically acts as our 'real' object, which can be manipulated, changed via script, etc. So the structure is like so:


-- MeshComponent
----Pointer to a unique instance of MeshRenderSystemData

When a MeshComponent is made, it allocates a MeshRenderSystemData which is stored into a list in the system. The data contains everthing relevent to making the rendering of a mesh happen. The component merely gets/sets the data.

When it comes time to render, the MeshRenderSystem basically gets told "do rendering" and it very quickly iterates over it's local list of MeshRenderSystemData and renders them very quickly. Because the data is locally contained in that array, we maintain locality and it helps performance even more.

A bonus with the system feeding off a general data container like our MeshRenderSystemData is that it doesn't care where that data came from, as long as it's formatted correctly. This means that we can use one system, like our MeshRenderSystem, to render anything as long as the data is formatted right. Player model, terrain, particle effects, etc. Doesn't matter we can just crunch the data and go. Which means that we have fewer paths in/out, and it makes the code cleaner and easier to maintain.

From here, we get to do something really neat. Because the data is all self-contained in the MeshRenderSystemData, we can then implement the MeshRenderSystem to process that data in a threaded way.

So when we go to render our objects, we can chop up our list into chunks, and then assign a thread to each chunk and process that data as fast as is physically possible, in a largely asyncronous way. This will DRASTICALLY reduce overhead time when we go to render. Likewise, this is also faaaaaar cleaner a setup code-wise, as we have less jumping and hopping around through a dozen files to get from 'the engine wants to render a frame' to 'time for our object being rendered to submit to the API for actual drawing'.

This makes the code easier to understand and maintain, and also more performant, again. One such way is how updates will happen for all this stuff.

The current way, we have a global list of 'tickable' items, they do processTick, then advanceTime if appropriate. Then at some point later during the update, they'll render as well.

With the systems-focused setup, the flow is a little more linear and cohesive. Namely, when we go to do our main update, we'll walk through our systems, in-order doing our update systems first(physics, animation updates, etc).

And then building off that data, we can do our render systems. Because we can do these linearly, the code'll be a little more comprehensible in how a given update goes. But inside that system, we can spool up a number of threads equal to the CPU's available, and crunch our data as quickly as possible before moving on to the next system. This lets us thread stuff aggressively, but keeps the actual order of tasks sane and easier to debug/expand. It's not as theoretically fast as a work-stealing fiber-task system where threads do whatever's available and just makes it happen/work, but it IS easier to maintain and debug.

I've already got this implemented in the R&D with the mesh component, so the new setup definitely works. I just need to get the other components converted over.

This ties into the next nifty bit of work:

So, the GFX Render API wrapper system is a good system, but it's showing it's age. It was written for D3D9, upgraded to D3D11, and then had OpenGL support added onto it. But ultimately, it was still designed around the D3D9 ethos, which is not how modern render APIs behave, meaning we're not nearly as performant as we could and should be when doing rendering.

We've been looking at options for updating it for a good while now, but it's a bit of a 'thing' because it touches a lot of places, so a good approach wasn't concrete. @
User avatar
and @
User avatar
had looked into this sort of thing before as a side project in their free time, and @
User avatar
recently started messing with it in a more full, way.

The idea is a clean, new implementation of GFX. He's started out by learning how Vulkan works, then using those design notions to design a OpenGL wrapper. So you structure the data in a vulkan-like way, but the backend interprets the data for OpenGL and renders it very efficiently. The good news is that modern openGL is also absurdly efficient when you structure things correctly, so starting with OpenGL keeps platform flexibility, but will still render really fast.

Afterwards, we can look at implementing Vulkan properly for more forward-looking prospects on newer hardware.

But for the meantime, this lends to some neat advantages. Namely, we can do some stuff like tying back into the Systems mentioned above, and very quickly, cleanly processing our render data directly into universal buffer objects for drawing.

This is important, because the actual overhead cost of that is minuscule. We can basically just throw our data into that buffer en-mass when we do our render updates, and it propagates to the GPU with minimal driver overhead. Which means the actual draw calls are very cheap because the data it cares about is all efficiently pushed via the buffer objects. We also only ever need to update stuff that changes, so if a model hasn't changed since the last frame, we can leave the data and cut down on more processing.

Because we skip so much extra fluff by going system-to-GFX3 in that way, we further compound the efficiency gains. This means that rendering any given object should have FAR less CPU requirements and it'd mostly be about being able to actually just crunch the workload on the GPU. Which leads to much more free CPU for other things, which we can then utilize in all this new threaded work.

One convenient thing with all this ties back into the 'updating GFX is a huge pain in the butt', in that between my prior workblog mention of refactoring the render path to be camera-oriented, and the new Systems based entity/component stuff, most all our core render path for a lot of objects is essentially parallel to the old code. Meaning we can actually conveniently sandbox the new stuff out and test it without having to go all-in on a total strip-out and replace until everything is actually ready. This makes the work faster and easier to keep everyone sane :P

I don't believe this will be done for 4.0, but we *might* have an experimental-tag version of it in that for prototyping/testing. If we do, I would suggest fully expecting things to break horribly :P

A much better bet is to expect it for 4.1.

Component Networking
Another recent improvement was rewriting how components networked. Originally, I had components inherit off of NetObject, and were ghosted down if the component needed to do stuff on the client, such as rendering. However, this became pretty complicated when you had components that had dependencies(the animation component requires a mesh component to exist and be added/loaded to work), and relying on delay-detection ghosting voodoo was a bad time.

So I reworked it all. Now, components don't ghost down at all. Instead, entities are in charge of managing a component's network behavior. If a component is flagged for being networked, the entity will network across the type of component to be created, and will spool up a instance on the client side and do the adding automatically. Then for any networked components, the entity will get the network data from the components and integrate it into the entity's network stream.

This keeps components having a full netmask so we only have to send the data we actually updated, but is cleaner and more straightforward than relying on them being ghosted along like regular netObjects. This removes the uncertainty and has helped stabalize things a lot. From here, we can further refine how much data we need to network across to make things even more efficient, but this is a nice boost. Currently, if nothing is happening(nothing is moving/updating) an entity and it's components hold 0 bandwidth consumption, which is pretty excellent.

Asset Browser
I've been working on this for a good while now, but the initial pass is almost done. Was hoping to get the last main issues wrapped this weekend, but the holidays predictably slowed everything down. My current goal is to get this wrapped and PR'd by the end of the week.

Beyond all that, there's a good bit of stuff we haven't really even bit in to from the LiF guys regarding other improvements, some examples of shaders they wrote we can port some math from, etc, etc. I'd also mentioned it before, but I'm hashing out a new, proper demo/map that we can use for testing stuff, which we should be able to see more of in the coming weeks. @
User avatar
also got reflection probes working in OpenGL for the PBR stuff as well, which is fantastic.

So yeah, whooooole crap-ton of really interesting work that's really going to give T3D a hearty boost performance and functionality wise while making stuff easier to maintain. Good times :D

Peace out, and have an excellent New Year!

(Also, if you think all this development is pretty neat, maybe consider tossing a bit of support our way via my Patreon)
Posts: 209
Joined: Tue Feb 03, 2015 10:30 pm
by Steve_Yorkshire » Tue Dec 26, 2017 9:40 pm
Wew, lad that's a lot of stuff! ∑(´0ω0`*)


Hope everyone managed to find time to gorge on dead ground bird whilst they were beavering away!

This will lead to lower load times, less load stutter

Hurray for less load stutter! I know that AFX code has an option to load ALL the datablocks, but that does seem a little overkill if you really do have a few gorrilian datablocks.
Jason Campbell
Posts: 200
Joined: Fri Feb 13, 2015 2:51 am
by Jason Campbell » Tue Dec 26, 2017 10:50 pm
Thank you for the update Jeff, sounds amazing. Thank you to the whole team!
Posts: 388
Joined: Tue Feb 03, 2015 9:50 pm
by Azaezel » Tue Dec 26, 2017 10:54 pm
Steve_Yorkshire wrote:Wew, lad that's a lot of stuff! ∑(´0ω0`*)


Hope everyone managed to find time to gorge on dead ground bird whilst they were beavering away!

This will lead to lower load times, less load stutter

Hurray for less load stutter! I know that AFX code has an option to load ALL the datablocks, but that does seem a little overkill if you really do have a few gorrilian datablocks.

Ironically, more 'just load the file' vs 'ok, so transmit all this stuff over teh intarwebs' (with some validation-checking) is pretty much the deal were looking at where it'd be sane to do so to save some of that load time.
Posts: 29
Joined: Thu Jun 23, 2016 12:02 pm
by damik » Wed Dec 27, 2017 10:40 am
Steering Committee
Steering Committee
Posts: 776
Joined: Tue Feb 03, 2015 9:49 pm
by JeffR » Wed Dec 27, 2017 5:39 pm
Azaezel wrote:
Steve_Yorkshire wrote:Wew, lad that's a lot of stuff! ∑(´0ω0`*)


Hope everyone managed to find time to gorge on dead ground bird whilst they were beavering away!

This will lead to lower load times, less load stutter

Hurray for less load stutter! I know that AFX code has an option to load ALL the datablocks, but that does seem a little overkill if you really do have a few gorrilian datablocks.

Ironically, more 'just load the file' vs 'ok, so transmit all this stuff over teh intarwebs' (with some validation-checking) is pretty much the deal were looking at where it'd be sane to do so to save some of that load time.

To expand on this: When we go to load a level(principly, connecting to a server) the server has a list of datablocks it has loaded, which is then transmitted down to the client. This can - especially if you figure on the AFX ability to preload everything to save time loading off the disk - can mean that while the actual on the fly load times are lessened, you can see a big jump in the level load times during the 'Transmitting Datablocks' portion.

With assets, we're looking to have a simpler approach. When a client connects to the server(either the single player or multiplayer) the server throws along a hashed list of assets that it expects the client to have for everything to work. The client can then compare that to the assets it has and ensure they are loaded. This is a lot leaner network-load wise as well as being quicker. If we can expect and require the client to have all the required assets at the time of connection, we don't have to always re-send all that data when they go to load. Thus faster.

Assets are internally refcounted, so as long as they're in use, they're loaded and ready, whichis where the lessened load-stutter comes in as well. If the client - on connection as per above - has all the correct assets already and then loads them on command because the server says "hey, this is what all we're using for this level", that means we can load during the loadingscreen(if the assets aren't already loaded) and thus further skip a lot of that 'last second load' that leads to the camera hitches and the like.

So overall, it should be a pretty neat bit. Lessened loadtimes, lessened network impact, and yet still only bothering with the stuff that's actually in-use.
Posts: 26
Joined: Tue Feb 10, 2015 8:12 pm
by andi_s » Fri Dec 29, 2017 6:21 pm
JeffR and team, really cool progress!
So with 4.1, will you move away from directX and basically go for Opengl/Vulkan only? And when do you expect 4.0 and 4.1 to be done? Is the PBR renderer already done or will this be 4.0 also?
I played around with lot's of other engines in the past months, years, but somehow I get drawn back to Torque again and again, and it's great to see that it's still alive and progressing nicely.

Keep it going, really cool!!
Steering Committee
Steering Committee
Posts: 776
Joined: Tue Feb 03, 2015 9:49 pm
by JeffR » Mon Jan 01, 2018 8:35 am
We're currently looking primarily at the GL/Vulkan, but DirectX has some value and may be worth tying into.

That said, the APIs have 99% feature API, and if/when we add UWP support later, it actually will properly support OGL/Vulkan mode, so we don't even need DirectX for that, either.

So, in short, potentially, but it's not the focus yet as the OGL/V target supports the widest range of hardware, OS's and doesn't lose any feature functionality. About the only real potential upheaval is that the shader emphasis we have currently is on HLSL, so we'd have to figure out what to do about that(as HLSL is a touch nicer to operate with than GLSL due to a few reasons).

We'll see though :)

PBR is largely done. Timmy's been on vacation, but he'll be back after the new year and we're gunna wrap up optimizations to the PBR stuff with cleanup, trimming down the gBuffer and fix up bugs, but the bulk work is done.
It should go in in the near future.

I'll be putting up a post discussing the major points of development to hit yet for 4.0 as we haven't really bothered with a roadmap for a while because you guys just read my workblogs and have an idea of where things are.
For a release date, I'm unsure just yet. I'd hazard Q2 of 2018? PBR is close to being done, as is the E/C and Asset stuff. Those are the really big bullet points.

When those are in, it's mostly just porting the game classes to components, fixes, cleanup with a side of updated tools.

Once 4.0 drops, the plan is to adopt a more aggressive update schedule. Since the huge upheavals will be largely concluded(outside of GFX3), we can emphasize smaller updates and thus faster turnaround times.
Posts: 70
Joined: Sat Feb 07, 2015 1:29 am
by HeadClot » Fri Jan 05, 2018 6:16 am
I have a bunch of questions about the upcoming releases of T3D 4.0 and beyond.

1. Is it safe to assume that with the Vulkan wrapper that we would be able to target mobile platforms such as Android and possibly iOS?
2. Are there any plans for any networking improvements slated for T3D? I ask this as I am looking at making some form of large scale persistent online game with T3D.
3. Are there any plans for Visual Scripting? (Similar to UE4 Blueprints or Lumberyard Script Canvas)
4. Will there be any form of node based material or Shader editor?
5. Is there any pre-compiled experimental branches that I could check out?

If you cannot tell I am excited for 4.0 and beyond. :)
196 posts Page 19 of 20

Who is online

Users browsing this forum: No registered users and 2 guests