After three years of work at Unity I’m leaving the company, today being my last day. I’m immensely proud of the work I did to make Burst and HPC# so powerful for our users both within and outwith Unity, and to have worked with some really great people along the way.

I fulfilled a lifelong ambition to work closely with some heroes of mine like Andreas Fredriksson, Daniel Collin, Cort Stratton, and many others, and I met some truly amazing people at Unity who I didn’t previously know (Lee Hammerton, Sebastian Schöner, Alex Thibodeau, Tim Jones, Alexandre Mutel, among many others).

Achievements of my Time at Unity

It’s quite hard to realise how much of a game changer Burst and HPC# has been for Unity and it’s users. 8x performance wins were considered low when using Burst - I frequently remember people being amazed when they refactored for the constraints of HPC# that they would see 40x performance gains.

When I was interviewing for Unity the pitch for Burst was that maybe a hundred jobs would be written in HPC# and compiled with Burst - the absolute core logic of the game that needed to run as fast as possible. Burst was so successful and provided so much performance that I’ve seen shipping titles with over six thousand Bursted methods - games that could never ship without Burst.

After shipping six versions of Burst during my tenure here, I can really break down my achievements at Unity into three parts: performance, compile time, and startup time.

Performance

My main job at Unity was to extract every iota of performance out of our LLVM codepath with Burst. I have eleven years of experience with LLVM as a technology, so I brought all that knowledge to Burst.

  • I added a custom alias-analysis that utilized all the knowledge from Unity’s job system to make code run fast
  • I completely reworked our LLVM pass pipeline to make it 2x faster
  • I reworked how we vectorize to ensure that vectorization succeeded in many many more cases than with stock LLVM
  • I added over 15 custom LLVM passes for missing optimizations specific to Burst
  • I reduced the final game executable size by 43% on average

I also added lots of ways users could tell the compiler about performance sensitive things:

  • [NoAlias] / [AssumeRange] attributes
  • Assume / Likely / Unlikely / Pause / Prefetch / umul128 / InterlockedAnd / InterlockedOr intrinsics
  • helper intrinsics to query if the compiler performed optimizations like IsConstantExpression / ExpectAliased / ExpectNotAliased
  • DisableSafetyChecks = true to let users bless certain Burst jobs as being safe, with a global option to ForceOn safety checks even for these jobs
  • OptimizeFor for jobs to say what the optimization target was for the code (size, performance, just compile it fast)
  • per assembly [BurstCompile] attributes that lets you specify the default options for an entire assembly
  • F16C and FMA x86 intrinsics

Compile Time

When I started we thought 100 jobs might be Bursted, and as I said before I’ve seen titles with 6000 jobs today. That’s a 60x increase in content Burst has to compile, and so compile time became a real focus during my time here.

  • cummulatively I improved the compile time with Burst by 26x in my time here - so while the amount of content we are dealing with has grown significantly, the compile time hasn’t grown linearly with it
  • I spent a lot of time with big users of Burst ensuring that code compiled fast and was highly optimized. For instance with NetCode I added Burst compiler optimizations that made compile time faster by 39x

Startup Time

One often overlooked thing with optimizations is that the feel of the product is oftentimes more important than the raw power. Burst has required over the years a lot of domain-reload time hooks to compile things, setup shared-statics, make direct call work - all these things would result in a cost when the editor starts, and when code changes are made by the user.

  • I made direct call 33x faster to process during domain reload and startup
  • I made shared statics 13.3x faster when entering playmode
  • I made Burst load 60x faster during initial startup
  • And code changes 2x faster during domain reload

Overall these changes didn’t make the core work of Burst any faster (compilers still gotta compile), but the feeling of the editor being usable while Burst is background compiling is so important for the productivity of Unity’s users.

Highlights of my Time at Unity

Working with the Burst team, and a bunch of the people across DOTS more broadly was a highlight here. I learned a lot from a lot of people, and I hope I helped others too.

Being able to give people what they need from the compiler was so powerful. Hashing is slow? Here have a umul128 intrinsic to make that fast. You don’t understand what the compiler is doing? Here have an ExpectAliased intrinsic so you can compile-time enforce things that you - the user - knows , but want to be sure the compiler knows too.

But my biggest highlight is how closely we got to work with Unity’s customers, and help them succeed. I was routinely told that Burst was amazing to deal with from a customer perspective because we were super responsive on the forums and on bugs, and we fixed our user issues in a fast and timely manner.

Why Am I Moving On?

It was time for a change, for a new challenge.

This great talk by Brian Karis had a line from Brian which I’ll badly paraphrase to:

What are the biggest challenges in your field, and why are you not working on them?

This really hit home for me. I made Burst significantly better during my tenure at Unity, but I no longer felt that what I was doing day-in day-out was really working on the big compiler-focused challenges for the next ten years.

What Now?

I’ve got some time off before my next gig begins and so I have the time to breathe out and relax for a bit.

I’m super excited with what comes next - stay tuned for the details!