Measure string size – the right way.
Recently I needed to measure an exact size in memory for a string – asked a developer sitting next to me how would he do it. An answer was: “Take a string’s length multiply by 2(it’s a UTF-8 encoding) – you will get an exact size”.
Well this answer was wrong…
And the explanation is in a definition of UTF-8 encoding.
Here is a quote from Wikipedia:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII.
The simplest way to measure it, I’ve found so far, is to use an Encoding class from .NET Framework.
Example:
[STAThread]
public static void Main()
{
string str = "בדיקהה test";
int size = Encoding.UTF8.GetByteCount(str);
Console.WriteLine("Length * 2: " + str.Length * 2 + " Bytes");
Console.WriteLine("Real size: " + size.ToString() + " Bytes");
Console.ReadKey();
}
Output:
Enjoy,
Yevgeni