✨ Shield now has support for Avalonia UI

Mastering C# Optimization: Tips and Best Practices

Mar 4, 2022 | .NET, C#

What are the first things that come to mind when you think of C#? If you’re like most people, the following phrases might pop into your head: object-oriented, portable code, and .NET Framework.

But if you’re looking to use C# in performance-critical applications, then the following phrases should enter your mind as well: performance, speed, and memory usage.

If you’re working on building an application whose primary concern is performance and code optimization, then these 10 tips will help you increase the performance of your C# application to meet or exceed your expectations!

Best C# Optimization Tips

If you are asking how to optimize c# code, these C# optimization techniques and tricks are simple to implement and will give you better runtime performance. You will improve its performance in a very easy ways!

Avoid the Garbage Collector

The C# programming language is built on top of the .NET Framework, which in turn was built on top of garbage collection. This helps keep performance pretty fast, but you can still do better by avoiding variables that require a garbage-collection cycle when they are released (like objects). Avoiding garbage collection helps maintain high levels of performance as well as optimizing memory usage.

Use StackAllocators (C# Recycling Allocator)

If your application is doing a lot of memory allocation, you may be able to get better performance by avoiding the cost of freeing and allocating memory yourself. You can achieve that by moving up and down in your heap until you find an unused block, then mark it for reuse. A stack-based allocator does just that. It allocates memory from a region called a stack on top of which new blocks are placed when freed blocks are found.

Don’t Reuse Object References

This is a very common mistake among beginning C# programmers: instead of cleaning up an object and releasing its memory when it’s no longer needed, they simply keep using it until there’s no more memory available.

This can be especially dangerous in server applications where many programs are sharing a server; one program’s leaked object reference could bring down all other programs using that server. To avoid creating leaked references, explicitly set your objects to null when you’re done with them — or make sure you only create as many references as necessary.

Avoid Memory Mapping (for big files/filesystems)

One pitfall when using C#’s File.ReadAllText() or File.ReadAllLines() methods is that they load all content into memory immediately. This can cause performance bottlenecks if you’re dealing with very large files and/or slow storage subsystems, like hard drives. One way around it is using a memory-mapped file, which creates virtual space in memory that looks just like a file on disk.

Avoid Unnecessary Serialization/Deserialization

The .NET Framework provides object serialization and deserialization capabilities out-of-the-box. The same holds true for several core .NET languages such as C#, F#, and Visual Basic .NET. However, under certain circumstances these functions can cause significant performance issues in your application. This is particularly true when a non-trivial amount of data needs to be serialized or deserialized.

If you are using these features on a regular basis it’s important that you understand how they work so that you can select an appropriate strategy for processing your data.

JIT away unused methods

The Just-In-Time (JIT) compiler is a process that reads CIL and translates it into native code. When building an application, you don’t want JIT to compile all your classes — particularly if they’re used only once or several times at startup. By compiling only what’s needed when it’s needed, you can make your app start much faster!

There are several ways you can make sure your startup is faster and one way is by using reflection: Remove unused methods from these objects using System.Reflection.Emit.

Measure in Production Mode instead of Debug Mode

In many programming languages, applications are executed in a special debug mode that allows developers to do things like set breakpoints and print debug messages. This is a great way for developers to test their code while they’re developing it — and during development it’s essential. But after you’re done coding, you should switch back into production mode before your app goes live.

Executing your app in production mode eliminates those extra lines of code that are essential only during development.

Best C# Performance Tips

All of us developers know that performance optimization in .NET is important and you may be wondering at this point how to improve application performance in C#. Well, with these tips you can tuning your C# code and gain a high performance! Let’s check it.

These c# performance optinization techniques are transcribed from Bartosz Adamczewski’s video on LevelUp’s Youtube channel about 5 (Extreme) Performance Tips in C#.

You will learn 5 different performance tricks that you can do in C#.

They are called EXTREME because, personally, I have not found information like this on the Internet. Yes, there are different performance tricks, but not the ones I’m going to show you here.

Let’s start with a very simple example where we’re gonna have a sum of odd elements, so what we can do here is we can take the array element and they we can check if it’s divisible by two and, if it’s not, then we’re gonna add that element because that element is odd and we can turn the result.


So, if we run it in our simple measuring procedure this is going to take quite a while to compute because we’re going to compute it on 40 million elements, so the average result is around 240–250 milliseconds.

image 6
Testing C# function

If we look at this function…

Is there something that we can do in order to be to actually optimized? 🤔

It’s there a way for us to have a faster version of this function turns out that there is? 🤔

Bit Tricks

We can replace more expensive elements with least expensive elements and in this case I’m talking about the modulus operation. Turns out that the modulus operation can be extremely expensive but the good news is that the jit compiler automatically uses a shift left on the module, so already we have a better implementation but, we can still do better because we have a proper context that the compiler doesn’t and there is a way to make it even faster.

sum = SumOdd_Bit(array);

What we can do is we can do a and operation with one and that will effectively test if our element is out and if it’s odd it’s going to be reduced to just a single value one.

private static int SumOdd(int[] array)
  int counter = 0;
  for (int i = 0; i < array.Length; i++)
    var element = array[i];
    if ((element & 1) == 1)
    counter += element;
  return counter;

Now we can check is if this got faster 👇

image 5
Testing Bit Tricks C# function

Now it takes slightly faster because it’s 217 milliseconds from 240 milliseconds and we got a improvement.

Branch Elimination

We can do a branch-free version of that some odd function. We can do it because we already had a kind of an operation that did it because this element and one operation will return one if the element is odd, otherwise it’s going to return zero. That already allows us to eliminate the branch and we can do a multiplication or by the element.

sum = SumOdd_BranchFree(array);

If this is one that means the element is odd, we’re gonna multiply that by one. We’re gonna have an element, otherwise, we’re just going to multiply by zero and have zero. Branch elimination is interesting for a two of reasons:

  • First of all, there’s certain data sets and data workflows that you can have in an application where the data is very extreme that the branch predictor cannot do a good enough job well.
  • The second reason is because you want to have a stable performance, because like i said, branch prediction depends on the data and you can have super fast function because you have predictable data but, on the other that might get a bit slower. That’s why you might want to consider that.

Of course you have to keep in mind that branch prediction is expensive, because all of the things that you have to do to eliminate the branch can be expensive like bit hacking tricks.

private static int SumOdd_BranchFree(int[] array)
  int counter = 0;
  for (int i = 0; i < array.Length; i++)
    var element = array[i];
    var odd = element & 1; 
    counter += (odd * element);
  return counter;

Let’s check the performance 👇

image 4
Testing Branch Elimination C# function

It took 43 milliseconds which is a big improvement from 217 milliseconds and that’s really good.

Instruction Parallelism

Since we did branch elimination already, what we can do now is we can do a instruction level parallelism here. Instruction level parallelism means that modern cpus usually can do multiple things at the same time provided that there’s no data hazards between different sort of elements and that these instructions that they execute can really be executed on multiple ports.

sum = SumOdd_BranchFree_Parallel(array);

We effectively duplicated our counter to not have data hazards and now we can do certain operations at the same time, for example: and operation can be done four times per one cpu cycle and in order to be able to figure out if you can benefit from these improvements.

Let’s test the performance of this version 👇

image 3
Testing Instruction Parallelism C# function

Now 39 milliseconds. It’s slightly faster but only slightly and the reason might be that we had a multiplication here.

Bounds Checking

Tip number four would be to eliminate all of the bounce checks, because the previous method had a lot of bounce checking of the array, although, we sort of fulfill almost the correct signature not to have any bounce checks but, if we’re going to change the signature of i, we’re gonna get two elements of the array.

sum = SumOdd_BranchFree_Parallel_NoChecks(array);

There are a couple of ways of eliminating them but, one of them that’s the simplest one it’s not the best one mind you. Is to just have a fixed pointer to that array and then, basically, we’re gonna convert that to the end pointer and gonna we’re gonna take that pointer and access the elements.

private static int SumOdd_BranchFree_Parallel(int[] array)
  int counterA = 0;
  int counterB = 0;
  for (int i = 0; i < array.Length; i += 2)
    var elementA = array[i];
    var elementB = array[i + 1];
    var oddA = elementA & 1;
    var oddB = elementB & 1;
  counter += (oddA * elementA);
  counter += (oddB * elementB);
  return counterA + counterB;

Let’s measure the performance optimization of this version 👇

image 2
Testing Bounds Checking C# function

Better, 32 milliseconds from 39 milliseconds. We got a improvement.

Maximize Ports

This tip would be, if we know all of these things now, we can do a better job with ports. We can get another pointer to our data and the first pointer is going to be loaded to registers but the second won’t be. That effectively eliminates the need to have just a single multiplication per operation.

sum = SumOdd_BranchFree_Parallel_NoChecks_BetterPorts(array);

Although, we’re still constrained by loads as we can do only two loads per cycle and we have eight loads here, still the multiplication would be the term the the biggest factor in performance degradation here and we can check if this is really true.

private static int SumOdd_BranchFree_Parallel_NoChecks_BetterPorts(int[] array)
  int counterA = 0;
  int counterB = 0;
  int counterC = 0;
  int counterD = 0;
  fixed (int* data = &array[0])
    var p = (int*)data;
    var n = (int*)data;
    for (var i = 0; i < array.Length; i += 4)
      counterA += (n[0] & 1) * p[0];
      counterB += (n[1] & 1) * p[1];
      counterC += (n[2] & 1) * p[2];
      counterD += (n[3] & 1) * p[3];
      p += 4;
      n += 4;
  return counterA + counterB + counterC + counterD;

Let’s check it out and let’s run this version👇

image 1
Testing Maximize Ports C# function

It took 25.6 milliseconds which is the fastest version.

Just to show you in how it looks in a nice graph, you can see the ports version is slightly faster than the no checks with four parts version and from the first tip to the last tip we have a performance benefit by almost a factor of 10.

Comparison of all C# tips

You May Also Like