Profiling Multithreaded Unity3d Applications

Profiler

The Profiler class has the following public method interface that your application threads can access.

using System.Threading;
 
using UnityEngine;
 
namespace Amesgames.Behaviours
{
	public class UnityProfilerExceptionBehaviour : MonoBehaviour
	{
		void Start()
		{
			new Thread(new ThreadStart(Work)).Start();
		}
 
		void Work()
		{
			UnityEngine.Profiler.BeginSample("test");
			UnityEngine.Profiler.EndSample();
		}
	}
}

Each thread can and should set its unique name using SetThreadName(). The main thread should do this, as well. In the demonstration code, I am using “main” for the name of the main thread. This is set from a MonoBehaviour Start() method. The profiled worker thread class will set the name of its thread when the thread starts.

The main thread is the only thread that should ever invoke DemarcateFrame(). This signals the profiler that it is done capturing data for the previous frame and is starting the next frame. This is called after the main thread has waited for every other thread to signal that they are done with their frame and just before signaling them to start the next frame. This is the line of demarcation between frames.

Each thread is also responsible for invoking BeginFrameThread(), WaitFrameThread(), and EndFrameThread(). The first is called at the beginning of the thread’s frame; the second is called at the time which the thread is done with its own work and must wait on the other threads to finish; the third is called in order to finish clocking it’s frame’s timing information.

Finally, any thread, including the main thread, can invoke BeginSample() and EndSample() to clock portions of code. These calls can be nested as much as is needed.

This is what our new thread class looks like that makes use of this new Profiler.

using System;
using System.Threading;
 
namespace Amesgames.Threading
{
	public class ProfiledWorkerThread
	{
		public ProfiledWorkerThread(Action step, string name = "")
		{
			this.name = name;
			this.step = step;
 
			thread = new Thread(new ThreadStart(Work));
			thread.Start();
		}
 
		public void Wait()
		{
			ready.WaitOne();
		}
 
		public void Run()
		{
			go.Set();
		}
 
		public void Done()
		{
			done.WaitOne();
		}
 
		public void Finish()
		{
			wait.Set();
		}
 
		string name;
		Action step;
		Thread thread;
 
		// Signals main thread that we are ready to run the next step
		AutoResetEvent ready = new AutoResetEvent(/*initialState=*/false);
 
		// Used by main thread to signal us to run next step
		AutoResetEvent go = new AutoResetEvent(/*initialState=*/false);
 
		// Signals main thread that we are done with last step
		AutoResetEvent done = new AutoResetEvent(/*initialState=*/false);
 
		// Used by main thread to signal us that we can stop waiting for rest
		// of threads.
		AutoResetEvent wait = new AutoResetEvent(/*initialState=*/false);
 
		void Work()
		{
			Profiler.SetThreadName(name);
 
			ready.Set();
 
			go.WaitOne();
 
			while(true)
			{
				Profiler.BeginFrameThread();
 
				try
				{
					step();
				}
				catch(Exception e)
				{
					UnityEngine.Debug.LogException(e);
				}
 
				Profiler.WaitFrameThread();
 
				done.Set();
 
				wait.WaitOne();
 
				Profiler.EndFrameThread();
 
				ready.Set();
 
				go.WaitOne();
			}
		}
	}
}

In this new implementation, we make four calls to Profiler methods to setup the thread’s name and scope each frame’s execution.

Notice that this new thread class has twice as many event handles as did the previous thread class. This is an important detail. This is necessary because each thread is using its own timer for clocking its run times. There is no thread-safe high frequency timer available in the C# System library that we can use to clock the start and end of all threads together. As a result, each thread must submit their end and wait times separately.

Also notice we wrap the call to the step() delegate inside an exception handler.

Following is a truncated version of the MonoBehaviour that handles the main thread’s responsibilities.

namespace Amesgames.Behaviours
{
	public class ProfiledWorkerThreadBehaviour : MonoBehaviour
	{
		public void Start()
		{
			threads.Add(new ProfiledWorkerThread(ShortWork));
			threads.Add(new ProfiledWorkerThread(LongWork));
 
			Profiler.SetThreadName("main");
 
			foreach(ProfiledWorkerThread thread in threads)
				thread.Wait();
		}
 
		public void Update()
		{
			// Clean up the previous frame of the main thread at the top of our
			// Update. This ensures that we capture all of the main thread
			// scripts as run time. Make sure this unity script's execution
			// order is earlier than the others.
			if(profilerInitialized)
			{
				Profiler.WaitFrameThread();
 
				foreach(ProfiledWorkerThread thread in threads)
					thread.Done();
 
				Profiler.EndFrameThread();
 
				foreach(ProfiledWorkerThread thread in threads)
					thread.Finish();
 
				foreach(ProfiledWorkerThread thread in threads)
					thread.Wait();
			}
 
			profilerInitialized = true;
 
			Profiler.DemarcateFrame();
 
			foreach(ProfiledWorkerThread thread in threads)
				thread.Run();
 
			Profiler.BeginFrameThread();
		}
 
		readonly List<ProfiledWorkerThread> threads =
			new List<ProfiledWorkerThread>();
		bool profilerInitialized;
 
		void ShortWork()
		{
		}
 
		void LongWork()
		{
		}
	}
}

The MonoBehaviour script that handles the frame synchronization should have its script execution order earlier than all other scripts. This is to ensure that we are timing all of the main thread work in one frame capture, instead of having it split between two different frames.

Now, we need a mechanism to consume each frame of data in a manner that can be stored and iterated over by a profiler UI. I came up with the following classes to describe each frame’s data. In a production solution, you will find that memory efficient structs or value type arrays are desirable. You will also want to pool them in some fashion. However, for code simplicity and illustration purposes I chose a heavier approach with objects and generic container types. As a result of these choices, this design will suffer from poor performance and memory fragmentation in a medium- to large-sized project.

The following classes and methods are added to the Profiler module to handle frame data consumption.

namespace Amesgames
{
	public interface IProfilerConsumer
	{
		void ReceiveFrame(ProfilerFrame frame);
	}
 
	public class ProfilerSamples
	{
		public readonly List<ProfilerSample> samples = new List<ProfilerSample>();
	}
 
	public class ProfilerSample : ProfilerSamples
	{
		public string name;
		public long beginTicks;
		public long endTicks;
 
		public ProfilerSample(string name, long ticks)
		{
			this.name = name;
			beginTicks = ticks;
		}
	}
 
	public class ProfilerThread : ProfilerSamples
	{
		public string name;
		public long startTicks;
		public long waitTicks;
		public long endTicks;
 
		public ProfilerThread(string name)
		{
			this.name = name;
		}
	}
 
	public class ProfilerFrame
	{
		public readonly List<ProfilerThread> threads = new List<ProfilerThread>();
	}
 
	public class Profiler
	{
		public static void AddConsumer(IProfilerConsumer consumer)
		{
			if(!consumers.Contains(consumer))
				consumers.Add(consumer);
		}
 
		public static void RemoveConsumer(IProfilerConsumer consumer)
		{
			if(consumers.Contains(consumer))
				consumers.Remove(consumer);
		}
 
		// Rest of Profiler class omitted
	}
}

We now have a IProfilerConsumer interface we can implement in order to capture frames of data from the Profiler class. The ReceiveFrame() method of that interface is given a ProfilerFrame instance which contains a list of ProfilerThread instances, which contain a tree of ProfilerSample instances.

Each ProfilerThread has the time the thread spent running as well as the time the thread spent waiting on the frame to finish. These values are stored in the fields for startTicks, waitTicks, and endTicks, respectively. Each ProfilerSample has its running time as well as the running times of the sample children. Each sample in the tree consists of beginTicks and endTicks.

Putting all of this together, we have the following complete Profiler and ProfiledWorkerThreadBehaviour modules.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
 
namespace Amesgames
{
	public interface IProfilerConsumer
	{
		void ReceiveFrame(ProfilerFrame frame);
	}
 
	public class ProfilerSamples
	{
		public readonly List<ProfilerSample> samples = new List<ProfilerSample>();
	}
 
	public class ProfilerSample : ProfilerSamples
	{
		public string name;
		public long beginTicks;
		public long endTicks;
 
		public ProfilerSample(string name, long ticks)
		{
			this.name = name;
			beginTicks = ticks;
		}
 
		public float runMs {
			get {
				if(endTicks > beginTicks)
					return (endTicks - beginTicks) / (float)Stopwatch.Frequency;
				return 0.0f;
			}
		}
	}
 
	public class ProfilerThread : ProfilerSamples
	{
		public string name;
		public long startTicks;
		public long waitTicks;
		public long endTicks;
 
		public ProfilerThread(string name, long ticks)
		{
			this.name = name;
			this.startTicks = ticks;
		}
 
		public float runMs {
			get {
				if(waitTicks > startTicks)
					return (waitTicks - startTicks) / (float)Stopwatch.Frequency;
				return 0.0f;
			}
		}
 
		public float waitMs {
			get {
				if(endTicks > waitTicks)
					return (endTicks - waitTicks) / (float)Stopwatch.Frequency;
				return 0.0f;
			}
		}
 
		public float totalMs {
			get {
				if(endTicks > startTicks)
					return (endTicks - startTicks) / (float)Stopwatch.Frequency;
				return 0.0f;
			}
		}
	}
 
	public class ProfilerFrame
	{
		public readonly List<ProfilerThread> threads = new List<ProfilerThread>();
	}
 
	public class Profiler
	{
		public static void AddConsumer(IProfilerConsumer consumer)
		{
			if(!consumers.Contains(consumer))
				consumers.Add(consumer);
		}
 
		public static void RemoveConsumer(IProfilerConsumer consumer)
		{
			if(consumers.Contains(consumer))
				consumers.Remove(consumer);
		}
 
		public static void SetThreadName(string name)
		{
			if(name != null && name.Length > 0)
				Name = name;
		}
 
		public static void DemarcateFrame()
		{
			if(frame != null)
			{
				lock(samplesByThreadId)
				{
					foreach(ProfilerThread thread in samplesByThreadId.Values)
						frame.threads.Add(thread);
 
					samplesByThreadId.Clear();
				}
 
				foreach(IProfilerConsumer consumer in consumers)
					consumer.ReceiveFrame(frame);
			}
 
			frame = new ProfilerFrame();
		}
 
		public static void BeginFrameThread()
		{
			ProfilerThread thread;
			lock(samplesByThreadId)
			{
				thread = new ProfilerThread(Name, Stopwatch.ElapsedTicks);
				samplesByThreadId.Add(Thread.CurrentThread.ManagedThreadId, thread);
			}
			Samples.Push(thread);
		}
 
		public static void BeginSample(string name)
		{
			Samples.Push(new ProfilerSample(name, Stopwatch.ElapsedTicks));
		}
 
		public static void EndSample()
		{
			ProfilerSample profilerSample = (ProfilerSample)Samples.Pop();
			profilerSample.endTicks = Stopwatch.ElapsedTicks;
			Samples.Peek().samples.Add(profilerSample);
		}
 
		public static void WaitFrameThread()
		{
			lock(samplesByThreadId)
			{
				int threadId = Thread.CurrentThread.ManagedThreadId;
				ProfilerThread thread = samplesByThreadId[threadId];
				thread.waitTicks = Stopwatch.ElapsedTicks;
			}
		}
 
		public static void EndFrameThread()
		{
			lock(samplesByThreadId)
			{
				int threadId = Thread.CurrentThread.ManagedThreadId;
				ProfilerThread thread = samplesByThreadId[threadId];
				thread.endTicks = Stopwatch.ElapsedTicks;
			}
			Samples.Pop();
		}
 
		static ProfilerFrame frame;
		static readonly Dictionary<int, ProfilerThread> samplesByThreadId =
			new Dictionary<int, ProfilerThread>();
 
		[ThreadStatic]
		static Stack<ProfilerSamples> _samples;
 
		static Stack<ProfilerSamples> Samples {
			get {
				if(_samples == null)
					_samples = new Stack<ProfilerSamples>();
				return _samples;
			}
		}
 
		static Stopwatch Stopwatch {
			get {
				if(_stopwatch == null)
				{
					_stopwatch = new Stopwatch();
					_stopwatch.Start();
				}
				return _stopwatch;
			}
		}
 
		[ThreadStatic]
		static Stopwatch _stopwatch;
 
		static string Name {
			get {
				if(_name == null)
					_name = string.Format("thread {0}",
						Thread.CurrentThread.ManagedThreadId);
				return _name;
			}
			set {
				_name = value;
			}
		}
 
		[ThreadStatic]
		static string _name;
 
		static readonly List<IProfilerConsumer> consumers = new List<IProfilerConsumer>();
	}
}
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
 
using UnityEngine;
using UnityEngine.UI;
 
using Amesgames.Threading;
 
namespace Amesgames.Behaviours
{
	public class ProfiledWorkerThreadBehaviour : MonoBehaviour, IProfilerConsumer
	{
		public void Start()
		{
			threads.Add(new ProfiledWorkerThread(ShortWork, "short"));
			threads.Add(new ProfiledWorkerThread(LongWork, "long"));
 
			text = GameObject.FindObjectOfType<Text>();
 
			Profiler.AddConsumer(this);
			Profiler.AddConsumer(graphic);
 
			Profiler.SetThreadName("main");
 
			foreach(ProfiledWorkerThread thread in threads)
				thread.Wait();
		}
 
		public void Update()
		{
			// Clean up the previous frame of the main thread at the top of our
			// Update. This ensures that we capture all of the main thread
			// scripts as run time. Make sure this unity script's execution
			// order is earlier than the others.
			if(profilerInitialized)
			{
				Profiler.WaitFrameThread();
 
				foreach(ProfiledWorkerThread thread in threads)
					thread.Done();
 
				Profiler.EndFrameThread();
 
				foreach(ProfiledWorkerThread thread in threads)
					thread.Finish();
 
				foreach(ProfiledWorkerThread thread in threads)
					thread.Wait();
			}
 
			profilerInitialized = true;
 
			Profiler.DemarcateFrame();
 
			foreach(ProfiledWorkerThread thread in threads)
				thread.Run();
 
			Profiler.BeginFrameThread();
 
			Profiler.BeginSample("update");
 
			lock(stepCountByThreadId)
			{
				text.text += "\n";
				foreach(KeyValuePair<int, int> kv in stepCountByThreadId)
				{
					Profiler.BeginSample("text");
 
					text.text += "thread " + kv.Key + " count " + kv.Value + "\n";
 
					Profiler.EndSample();
				}
			}
 
			CountAndSleep(25);
 
			Profiler.EndSample();
		}
 
		public void ReceiveFrame(ProfilerFrame frame)
		{
			text.text = "";
			ProfilerThread[] threads = frame.threads.ToArray();
			Array.Sort(threads, (x, y) => x.name.CompareTo(y.name));
			foreach(ProfilerThread thread in threads)
			{
				text.text += thread.name + ": run: " +
					thread.runMs.ToString() + " ms wait: " +
					thread.waitMs.ToString() + " ms\n";
 
				ReceiveSamples(thread, 2);
			}
		}
 
		readonly Dictionary<int, int> stepCountByThreadId =
			new Dictionary<int, int>();
		Text text;
		readonly List<ProfiledWorkerThread> threads =
			new List<ProfiledWorkerThread>();
		bool profilerInitialized = false;
 
		void ReceiveSamples(ProfilerSamples samples, int indent)
		{
			foreach(ProfilerSample sample in samples.samples)
				ReceiveSample(sample, indent + 2);
		}
 
		void ReceiveSample(ProfilerSample sample, int indent)
		{
			for(int i = 0; i < indent; i++)
				text.text += " ";
 
			text.text += sample.name + ": " + sample.runMs + " ms\n";
 
			ReceiveSamples(sample, indent);
		}
 
		void ShortWork()
		{
			CountAndSleep(10);
		}
 
		void LongWork()
		{
			CountAndSleep(50);
		}
 
		void CountAndSleep(int milliseconds)
		{
			Profiler.BeginSample("count and sleep");
 
			Profiler.BeginSample("count");
 
			lock(stepCountByThreadId)
			{
				int threadId = Thread.CurrentThread.ManagedThreadId;
				int count;
				if(stepCountByThreadId.TryGetValue(threadId, out count))
					stepCountByThreadId[threadId] = count + 1;
				else
					stepCountByThreadId.Add(threadId, 1);
			}
 
			Profiler.EndSample();
 
			Profiler.BeginSample("sleep");
 
			Thread.Sleep(milliseconds);
 
			Profiler.EndSample();
 
			Profiler.EndSample();
		}
	}
}

The following screenshot shows the current state of the demonstration.

MT-profiling-unity-prof-threads

If you study the timings carefully, you will notice the short running thread is reporting extra time waiting on the main and long running threads to finish; on the other hand, the main and long running threads are spending nearly zero time waiting.

Conclusion

Here are the entire set of Unity assets I used for this post: https://bitbucket.org/amesgames/multithreaded-profiling-in-unity. I used Unity 5.3.5 to develop and test the sample scenes and code.

Feel free to use this code as-is in your Unity3d applications. You will certainly want to improve on the display of the timing information, especially when you have more threads doing much more work than you see in this demonstration. With the addition of network services to your application, you can send the frame data over a network connection to a standalone desktop application. A standalone application will do better with the heavy lifting of sorting, filtering, recording and rendering the frame timing data and not slow down your game or simulation quite as much.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.