What's that guy, Lukas, doing? ParticleSystem Rewrite, C# and TS IDE's

  • 1
  • 2
14 posts Page 1 of 2
LukasPJ
Site Admin
Posts: 388
Joined: Tue Feb 03, 2015 7:25 pm
 
by LukasPJ » Tue Jul 07, 2015 8:44 pm

ParticleSystem Rewrite

Az has been poking me a lot lately about the ParticleSystem "Refactor" (we decided to rename it to Rewrite, as that is what it actually is.

So we worked a bit on it, and I have worked on adding a lot more info to the wiki page. Now you can find information on what it is, the different components and how to port your old datablocks to the new ones! Like these extensive maps.

If you're interested in that ParticleSystem Rewrite, we really need an abbreviation of that, you should totally check out that wiki page! It should give you all the information you need.

Torque6 C# scripting language

I've also been working on scripting for Torque6 with C#, this has included manually writing a c-style interface for all the console methods and functions.
The resulting C# scripts looks somewhat like this:
[ConsoleClass]
internal class AnimatedMeshExample
{
   public static void create()
   {
      // Create some dwarfs!
      SceneEntity entity1 = new SceneEntity
      {
         Template = "^AnimatedMeshExample/entities/bigDwarfRedDwarf.taml",
         Position = Point3F.Zero(),
         Rotation = Point3F.Zero()
      };

      // Register the object! (Similar to how you do it in C++)
      entity1.RegisterObject();

      // Add it to the scene!
      Scene.AddEntity(entity1, "Dwarf Meshes");

      // Let there be lights!
      SceneEntity light1 = new SceneEntity
      {
         Template = "^AnimatedMeshExample/entities/lightTest2.taml",
         Position = Point3F.Zero()
      };
      light1.RegisterObject();
      Scene.AddEntity(light1, "Lights");

      // More lights I guess
      Scene.SetDirectionalLight(new Point3F(1, 1, -1), new Color(0.8f, 0.8f, 0.8f), new Color(0.1f, 0.1f, 0.1f));
   }

   public static void destroy()
   {
      Console.Print("DESTROYED AnimatedMeshExample MODULE");
   }
}
Which is equivalent to this TorqueScript file.

I think it's pretty cool, and the C-Interface could be re-used for other languages as well! Making it easier to integrate your own scripting language on top of TorqueScript.
I'm still working on how to make it work as a module and such, but it's nearing completion!

TorqueScript IDE

As @ andrewmac also noted in his latest post, I've also been working on writing a lexer for syntax highlighting for his embedded scripting IDE for Torque6:
Image
I've even added some basic syntax-checking capabilities (although since it's not complete, I'd rather call it syntax-assisting capabilities).
Image
It's not completely complete, but the fundamentals are there.

Thanks for reading

Been a while since I've written a post, thought I might just bring y'all up to speed with what I'm doing. So thanks for reading my summary-type post, and I do hope you check out the ParticleSystem Rewrite and the wiki pages and give me some feedback!
buckmaster
Steering Committee
Steering Committee
Posts: 321
Joined: Thu Feb 05, 2015 1:02 am
by buckmaster » Wed Jul 08, 2015 2:33 pm
I saw all the wiki activity on those pages - and wow, those maps are really nice.

This is something I never really had the wherewithal to ask you back when you first started all this work, but: how does the particle rewrite affect performance? Are you being nice to the cache when going through all these virtualised interfaces? Doing lots of tight iterations over packed arrays and so on?
LukasPJ
Site Admin
Posts: 388
Joined: Tue Feb 03, 2015 7:25 pm
 
by LukasPJ » Wed Jul 08, 2015 4:12 pm
Maybe @
User avatar
Azaezel
can come with some input here? He's a bit more clever on the cache miss thing etc.
It should be possible to reduce the amount of virtual calls, but it would muddy up the code a bit. Any idea on how to benchmark it? I could try spawning a thousand emitters or something like that, but my experience tells me that T3D is a PITA to benchmark.
Tbh I'm more used to the managed world, where virtualization is more a matter of course than it is in the C++ world. So I don't have any experience on how virtualization affects performance.
Azaezel
Posts: 410
Joined: Tue Feb 03, 2015 9:50 pm
 
by Azaezel » Wed Jul 08, 2015 4:52 pm
General rule of thumb is group as much of everything as humanly possible of relevance to a particular step into one class so it's not darting around looking into several classes. Be that the aspects you've got now, or a proxy, like the ObjectRenderInst subsystem. Generally speaking you seem to be on the right track there with the Renderer = for render lookup, and ParticleSystem/behaviours for physics. Will need to dig around a bit more on the particlepool to be 100% sure,but that too seems to be the right general direction.

Edit: I can say for stuff like TSSkinMesh* sMesh = dynamic_cast<TSSkinMesh*>(mesh); you'll want to make that one-time deals as much as possible, though that's later down the line.
andrewmac
Posts: 295
Joined: Tue Feb 03, 2015 9:45 pm
 
by andrewmac » Wed Jul 08, 2015 5:12 pm
I haven't looked too deep into the particle system, but at first glance it looks like it's using a linked list? That's the least cache-friendly approach I can think of :P. Either use a vector or preallocate an array of particles to utilize. Iteration will be much faster and avoid cache-misses.

If you do use a vector though, consider preallocating with that as well. Torque's vector class has a pretty weak resizing algorithm:

https://github.com/GarageGames/Torque3D ... #L618-L622

https://github.com/GarageGames/Torque3D ... #L329-L336

You could argue it's optimized to use less memory, but I'd argue that it's resizing and relocating the set for every item you add, so if you populate it with 500 particles its going to allocate a larger chunk and copy all the items there every time you push_back. I believe std::vector will resize to double the number of items to reduce the number of allocations/copies. This can be avoided by just resizing the vector before adding items.
LukasPJ
Site Admin
Posts: 388
Joined: Tue Feb 03, 2015 7:25 pm
 
by LukasPJ » Wed Jul 08, 2015 8:26 pm
@ andrewmac
Actually, it already uses a pre-allocated vector for the particle pool, but the particles inside that vector are traversed using a link-list. I'll try and change that to a vector instead. I wonder how big an impact that has on performance :P
LukasPJ
Site Admin
Posts: 388
Joined: Tue Feb 03, 2015 7:25 pm
 
by LukasPJ » Sat Jul 11, 2015 5:59 pm
General rule of thumb is group as much of everything as humanly possible of relevance to a particular step into one class so it's not darting around looking into several classes. Be that the aspects you've got now, or a proxy, like the ObjectRenderInst subsystem. Generally speaking you seem to be on the right track there with the Renderer = for render lookup, and ParticleSystem/behaviours for physics. Will need to dig around a bit more on the particlepool to be 100% sure,but that too seems to be the right general direction.

Edit: I can say for stuff like TSSkinMesh* sMesh = dynamic_cast<TSSkinMesh*>(mesh); you'll want to make that one-time deals as much as possible, though that's later down the line.
I've removed all but one dynamic_cast in MeshEmitter, so that shouldn't be an issue anymore :P

About the linked-list stuff, we had a long discussion about cache-misses vs. branch prediction failure and came up with a couple of alternative solutions to traversing the particle list.

One was to pack the particles in memory, so they would always be beside each other in memory.
- This could get costly because you'd potentially have to do it each frame.

Another was to simply run over the whole array from start to finish.
- This is where the branch prediction fail vs cache miss comes in.

I never got around to actually implement any of them, because my profiler began failing me, and I haven't got it running since ^^
buckmaster
Steering Committee
Steering Committee
Posts: 321
Joined: Thu Feb 05, 2015 1:02 am
by buckmaster » Sun Jul 12, 2015 7:15 am
Here's some useful background info on why I asked the question. You may be able to sidestep the cache miss versus branch fail question if you partition your data. For example, instead of looping over a bunch of particles and then doing an if() on some condition, store two lists of particles, one for each branch of the if(), and just blaze through each of them with the appropriate function. Obviously this doesn't apply to all situations, but it's a hint at the kind of architecture that performs best. Don't cause a cache miss OR a branch.

(Note that applies to inner loops. Obviously, that case, you might be doing the branch when you insert the item, to figure out which list to put it in, instead of doing the branch every time you loop.)

If you're in for lots of confusion and being raged at unpleasantly, this presentation is good at raising questions to look into.

I was going to recommend using the built-in profiler, but then #1349 happened D:
LukasPJ
Site Admin
Posts: 388
Joined: Tue Feb 03, 2015 7:25 pm
 
by LukasPJ » Sun Jul 12, 2015 5:12 pm
Alright, did a bunch of profiling. And got some interesting results.
The moste intensive function is copyToVB with 83% (100 emitters, emitting 1000 particles per second with 1000ms lifetime ~ 100.000 particles)
Digging deeper in there, you get that setupBillboard is 80%, and a single lerp function inside is 45%. (function is here).

That lerp function uses getParticleColor which is 33%, so it's not everything that's caused by this function. But a lot of performance goes into this one.
Looking closer at getParticleColor (see it here), we see that the actual interpolation takes only about 6% of the performance while the rest of the function, loop and fetching of data takes the remaining 26% with the outer loop alone taking up 10% of the performance.

Haha in a strange way I love this! It's so interesting! I'll try and bring those numbers down, so more time is spent on more reasonable things. I introduced the getParticleColor as an alternative to subclassing the "Particle" class with billboard-specific data, this could quite possibly be a culprit for a lot of cachemisses and branch prediction fails.

After that I'll look at the impact of linked list and the cache misses there and optimize that, using that article you linked @ buckmaster , what he did there seems very much like the thing I had in mind already.

Edit: oh btw, kudos to @ GuyA for the fantastic graph representation!

Attachments

standardRewrite.png
standardRewrite.png (36.22 KiB) Viewed 3010 times
buckmaster
Steering Committee
Steering Committee
Posts: 321
Joined: Thu Feb 05, 2015 1:02 am
by buckmaster » Mon Jul 13, 2015 2:57 am
Haha cool stuff! I wish I had the time and effort to dig into stuff like this these days. I think I need to take a holiday sometime soon and get my C++ back on :p.
  • 1
  • 2
14 posts Page 1 of 2

Who is online

Users browsing this forum: No registered users and 2 guests