How string works in c#
If you have been writing c# for a while, you must have used String (string and String can be used interchangeably because string is just an alias of String) and StringBuilder class for sure. However, underneath the simple ‘+’ operator is a huge amount of work which was designed for good intention but also could degrade performance drastically if used naively. The StringBuilder class was crafted in order to overcome one of those problems. Do you know what these problems are and what was resolved? Let’s find out in this post!

C# primitive types and String
C#, as an OOP programming language does a very good job at encapsulating the implementation of everything, especially at memory management which hardly leaves any traits which can be guessed from the user perspective.
According to this C# built-in types from Microsoft. They defined String as a reference type along with Object and dynamic types.
Sizeof Operator
If you run Console.WriteLine(sizeof(int)) the output should be 4 since int occupies 4 bytes in the memory. But if you do the same for String, you will get a compile error error CS0233: 'string' does not have a predefined size, therefore sizeof can only be used in an unsafe context.
Since String is a reference type. We cannot know what is the actual size of its instance. It could contain zero character (empty string) or hundreds of characters (a paragraph or a magazine). Nevertheless, we can only inspect the size of its reference if we run our code in an unsafe block.
Figure 1: Running Console.WriteLine(sizeof(string)) in an unsafe block
Because a string instance is just a reference to a memory location on the heap. The code above should output 8 on a 64-bit machine and 4 on a 32-bit machine respectively.
Figure 2: Visualization of string’s representation in memory (simplified)
String is reference type but immutable
The behavior of string as a reference type is different from other reference types as well. If you create an instance of a class and change the value of its property, all the variables referring to that object should reflect the changes you have made. However, once a string is created, it can't be changed. See the code below.
Figure 3: Normal reference types behavior
The secondObject changes accordingly when the underlying object changes. Let’s look at how string type behave:
Figure 04: String immutable proof
According to Microsoft Immutability of string.
“String objects are immutable: they can't be changed after they've been created. All of the String methods and C# operators that appear to modify a string actually return the results in a new string object.”
The statement above should explain why secondString stays the same when the firstString changes.
If we dig deeper into their string.cs source code, we should be able to find that the Concat method actually allocates new memory which has the length equals the total length of the 2 input variables. Then fills it up using these 2 variables and returns the new newly allocated memory.
Figure 05: Concat method implementation
Why is string immutable?
I cannot find any official documentation from Microsoft about why they did it that way. But we can see clearly there are some use cases it could be used to avoid nasty bugs and boost the performance.
For example: If we use string as the key for Dictionary. Any changes made to the string variable will not break the Dictionary.
Figure 06: String immutability keeps Dictionary safe
Another example I've found is the SubString method could return the result using the indexes of the existing string instead of creating another one. This could save memory and improve performance because we avoided memory allocation. Neither the sub string nor the original string would be changed if perform anything on the sub string. Hence, the behavior of the program should be the same.
Figure 07: SubString method works based on the existing string
One last thing I could think of is that we could actually cache these strings. This technique is called Interning and could be done by the framework or manually in our code. We will see in the next section.
String interning
According to Microsoft’s documentation:
“The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system”.
The default behavior is that the program only caches the literal string by default. If we want to intern a string programmatically, we must explicitly use the String.Intern method.
Here's an example of string interning using String.Intern method
Figure 08: String.Intern method caches the string
Using an interning pool the right way could reduce the memory usage a lot. For example, if you are writing a program to process text and there could be a huge amount of duplication, interning should be appropriate in this case. However, interning a string at runtime is quite costly and the interned string will last for at least the lifetime of the program (refer here).
Why do we need StringBuilder class
Let's go back to the + operator. Because a new string is created every single time we make changes to the existing string. If we write code like below:
Figure 9: Bad string concatenation example
This code would not be efficient because we keep allocating a new string every time we concatenate the result. Doing this on a large scale could degrade the performance drastically. To address the problem, Microsoft came up with a StringBuilder class.
Figure 10: Using StringBuilder instead of string concatenation
A glance at StringBuilder implementation and why it's better than string concatenation.
In this section, we will discover how the StringBuilder works. We will use the code below as our example.
Figure 11: Sample of StringBuilder’s usage
And StringBuilder’s implementation on github to see why it's better than string concatenation.
First, the StringBuilder class contains some members
Figure 12: StringBuilder’s members
If we use the empty constructor as line 1 in our example, it will boil down to the constructor which takes 3 parameters with 3 default parameters and assigns values for these members above. In our case:
- the m_ChunkChars is initialized to an array which contains 16 empty elements.
- m_ChunkLength is set to 0.
- m_ChunkOffset is set to 0.
- m_ChunkPrevious is null
Figure 13: StringBuilder constructor with default parameters
The ThreadSafeCopy performs copying the initial string to the empty m_memoryChunks in the memory buffer (we don't have any initial value in this case).
Figure 14: StringBuilder visualization
If you are curious, here's the definition of the ThreadSafeCopy method.
Figure 15: ThreadSafeCopy method’s signature
On the second line of our code, we call Append("Hello, World!"). This method checks the inner buffer (m_ChunkChars) to see if it’s able to contain the whole string. If the current buffer cannot contain the whole string, it will fill up the current buffer and create a new StringBuilder with the appropriate length to fill the rest.
In our case, the m_ChunkChars has a length of 16 characters, obviously we can hold the whole string of 12 characters without any problems. Hence, we don't need to allocate more memory. m_ChunkLength should be updated accordingly.
Figure 16: Calling Append(“Hello World!”) on StringBuilder
The third line of our code is a bit trickier, because " World!” has a length of 7, our m_ChunkChars doesn’t have enough space to contain the whole string. In this case, we fill the rest of m_ChunkChars and allocate more memory for the rest of the string.
Refer the below codes:
Figure 17: Append method’s implementation
Note that we call AppendHelper inside Append. This method simply just calls the Append(char* value, int valueCount) using " World!" and 7 as input parameters.
Figure 18: Append method’s implementation
Inside the Append(char*, int), the ExpandByABlock is called with the restLength (3 in this case). This function allocates new StringBuilder with the new m_ChunkChars at least enough to hold the rest of the string and lower than the threshold 80000 elements.
Figure 19: ExpandABlock method’s implementation
Math.Max(3, Math.Min(16, 8000)) returns 16. Hence, we are doubling the capacity by creating a new StringBuilder with the buffer 16.
Figure 20: Calling Append(“ World!”) on third line
After the third line, we created a new StringBuilder and modified the m_ChunkPrevious to point to that StringBuilder. m_ChunkOffset is set to 16, m_ChunkLength is set to 0 as well.
By doing this way, we gain more performance than string concatenation because there's no need to allocate spaces for duplicate string.We also pre-allocate the memory so we don't have to do that every time the string gets concatenated.
Conclusion
I hope you understand how String and StringBuilder work internally and hope you like Saigon Technology Tech Blog.
OTHER ARTICLES FROM DO TRAN