As far as I can tell, the process is something like this:
* Convert all strings (including UTF16) to UTF8 without a BOM.
* Convert Windows line endings ("\r\n") to Unix line endings ("\n")
* Leave Mac line endings ("\r") alone
* Run the MD5 checksum on that data.
This works for the vast majority of files, but I must be missing something as it doesn't work for some. The digest comes out different, but a diff of the different file against the depot version reports no difference. The files are identical when viewed in hex too.
The attached file has a depot digest of 378BE809DF7D15AAC75A175693E25FBB, but I can only seem to get a digest of 3DCF20D7995C25F69EB9AB55019B0757 locally. It has Windows line endings.
Here's a LinqPad query to calculate the digest:
FileInfo Info = new FileInfo( "D:\\Root\\ThirdParty\\Mono\\Mac\\etc\\mono\\browscap.ini" );
FileStream InputFile = Info.OpenRead();
byte EntireTextFile = new byte[Info.Length];
InputFile.Read( EntireTextFile, 0, ( int )Info.Length );
string InputString = Encoding.UTF8.GetString( EntireTextFile );
// Convert Windows line endings to Unix line endings
InputString = InputString.Replace( "\r\n", "\n" );
// Convert Mac line endings to Unix line endings
//InputString = InputString.Replace( '\r', '\n' );
EntireTextFile = Encoding.UTF8.GetBytes( InputString );
//EntireTextFile.Count( x => x == '\r' ).Dump();
//EntireTextFile.Count( x => x == '\n' ).Dump();
MD5 Checksummer = new MD5CryptoServiceProvider();
byte Checksum = Checksummer.ComputeHash( EntireTextFile );
string Digest = "";
foreach( byte Check in Checksum )
Digest += Check.ToString( "X2", CultureInfo.InvariantCulture );