Have you ever wondered how many rows your SQL Server database tables contains?
A few times now I've wanted to grab a list of table names with the row count for each table. In a post about a few undocumented Stored Procedures I came across this neat script.
EXEC sp_MSforeachtable 'SELECT ''?'', Count(*) as NumberOfRows FROM ?'
It uses an undocumented stored procedure that iterates over all user tables in the database. The only issue I had with it, was that the output was really tedious to read. I wanted a simple tabular display showing me a list of tables with their name and number of rows, but sorted in descending order by the number of rows. Note that you can get this information in Management Studio
Here's a short script that does that...
if(object_id ('tempdb.dbo.#TableRowCount') is not null)
drop table #TableRowCount
go
create table #TableRowCount
(Id int identity(1,1) NOT NULL primary key
,TableName nvarchar(100)
,RowsInTable int)
insert into #TableRowCount (TableName, RowsInTable)
exec sp_MSforeachtable 'SELECT ''?'', Count(*) as NumberOfRows FROM ?'
select * from #TableRowCount order by RowsInTable desc
Alternatively you can use the following script (which is what Management Studio uses)
select
[name] AS TableName
,TableRowCount = (
select sum (spart.rows) from sys.partitions spart
where spart.object_id = tbl.object_id and spart.index_id < 2 )
from sys.tables tbl
order by TableRowCount desc
Which do you prefer?
We had an interesting problem the other day. In our database (SQL Server 2008) we have a few tables with possibly many millions of records. We send some of the data from these and related tables to a third party service for processing and get status reports back. The problem was that the reports that we get back cannot easily be correlated back to the original records in our database.
Let’s say we have a person table and an address table with a 1:m relationship between person and address.
The tables could look something like:
Together with the person and address data we send some additional data points for processing. The processing report contains the following data.
ProcessingResult, FirstName, LastName, AddressLine1, AddressLine2, City, State, Zip.
When we receive the processing report we need to correlate between the person record(s) and the processing result for that name and address. (Could you have two people with the same name at the same address?)
So what do you do if there are 10,000,000 person records with the same amount of addresses?
I don’t know what your first thought is, but my first thought was that either we index the data in some way or this won’t work. You can’t do a table-scan on 10 million rows. From there the next thought that came to mind was to index on all the address columns to limit the resultset down to the persons with a given address. Essentially a covering index on Address. Doable, but there has to be a better way. My issue with this approach is that this index is expensive both in terms of space and IO.
Let’s rephrase the question:
“How can we create an alternative representation of Address that allows us to limit the number of matches?”
It doesn’t have to be exact. If the result would return, say 100 addresses, we could just do an exhaustive search on those and compare all columns with the data returned from the service.
Spoiler below…
public static string Hash(Address address)
{ StringBuilder sb = new StringBuilder(150);
sb.Append(address.AddressLine1);
sb.Append(address.AddressLine2);
sb.Append(address.City);
sb.Append(address.Zip);
sb.Append(address.State);
return new Hasher(HashType.MD5).Hash(sb.ToString().ToLowerInvariant());
}
Create a condensed representation of Address and store that representation with an index.
By hashing the address object using MD5, we obtain a 32 character string representation of the address. To store this hashed representation we can add a column to the address table named AddressHash and create an index on that column. Now, when we receive the processing report, we hash the address in the report and do a lookup against the hash value stored in the database. We then take any matching records (possibly only one) and do a comparison between the found record and the address from the service.
Here’s the Hasher class.
public enum HashType
{ MD5,
Sha1,
Sha256,
Sha384,
Sha512,
}
public class Hasher
{ private HashAlgorithm m_hasher;
private readonly HashType DefaultHashType = HashType.MD5;
/// <summary>
/// Instantiates Hasher with the default HashType
/// </summary>
public Hasher()
{ m_hasher = GetHasher(DefaultHashType);
}
public Hasher(HashType hashType)
{ m_hasher = GetHasher(hashType);
}
/// <summary>
/// Hash input and return the hashed result in a Hex string.
/// </summary>
/// <param name="input"></param>
/// <returns>Hash of input or null if input is null</returns>
public string Hash(string input)
{ if (String.IsNullOrEmpty(input))
return string.Empty;
byte[] data = m_hasher.ComputeHash(Encoding.Unicode.GetBytes(input));
return ToHex(data);
}
/// <summary>
/// Verify that hasing input matches an existing hash value
/// </summary>
/// <param name="input">The string to check against its hash</param>
/// <param name="hash"></param>
/// <returns>Match result or false if either parameter is null</returns>
public bool IsHashMatch(string input, string hash)
{ if (String.IsNullOrEmpty(input) || String.IsNullOrEmpty(hash))
return false;
string inputHash = Hash(input);
return string.Compare(inputHash, hash, StringComparison.OrdinalIgnoreCase) == 0;
}
private string ToHex(byte[] data)
{ StringBuilder builder = new StringBuilder();
for (int i = 0; i < data.Length; i++)
{ builder.Append(data[i].ToString("X")); }
return builder.ToString();
}
private static HashAlgorithm GetHasher(HashType hashType)
{ switch (hashType)
{ case HashType.MD5:
return MD5.Create();
case HashType.Sha1:
return SHA1.Create();
case HashType.Sha256:
return SHA256.Create();
case HashType.Sha384:
return SHA384.Create();
case HashType.Sha512:
return SHA512.Create();
default:
throw new ArgumentOutOfRangeException("Unknown HashType"); }
}
}
P.S While the Hex conversion of the hashed result is not an absolute requirement, we wanted to store the hash in a human readable format.
How would you solve this problem to achieve better performance? (Storage is less important)
Happy Coding!