Category Archives: SQL Tips and Tricks

How to do technical things with SQL Server.

The Proper use of File Groups and Files

If you are creating a database, and it is going to be large, you may want to consider having more file groups than just PRIMARY. File groups enable the DBA to segregate objects into separate files. For example, in our shop, our largest database has the PRIMARY file group, which holds tables and clustered indexes.  Non-clustered indexes go into the INDEX file group, and WORM tables that contain BLOB data that isn’t frequently read go into our ARCHIVE file group.

File groups make it extremely easy to segregate a database into files that can be assigned to drives or LUNs that contain a certain speed of disk.  For example, one can have PRIMARY and INDEX on SSD, while the ARCHIVE group is on SATA. The DBA can gain more granular performance control by using file groups and LUNs together to give or take away speed. The DBA can control cost by using a file group to route infrequently-used data to cheaper (and usually slower) storage.

The Difference Between File Groups and Files

File Groups are assigned to objects (indexes, tables).  Each File group represents a span of data pages.  These pages are not, however, constrained to be on one file.  The DBA can assign many files to a single file group.

Why?  Again, performance control.  Also, space management objectives can be met by using multiple files per file group.

Assume that you have just the PRIMARY file group, and there is a great deal of update traffic.  If you have one file in that group, those pages must go solely to that one file.  Now, imagine that you have a second file, and it’s sized the same as the first file.  SQL server will distribute the pages in the file group fairly evenly between the files in the group.  If the drive / LUN / mount point is physically separated, you now have two paths to obtain or update data, usually resulting in a performance boost. Obviously, having the data on the same drive / mount point / LUN buys you no performance.

Using multiple files in multiple file groups multiplies the potential performance gain, provided that you’re not piling the files on the same drive or mount point or LUN.

However, you can also manage space.  As the drive holding the SQL data grows closer to full, you may need to cap one file and open another file to handle new data pages. This happened to me as the data in our new app grew from 250GB to nearly 2TB, at a rate nearly four times what was predicted before the application went live.  Certain Data files were set to grow no more as the mount point holding the file approached 100% full; a new mount point was added and a new file opened.  With mount points, this can be done with minimal or even zero downtime depending on configuration (results may vary with failover clustering – this may require bringing SQL Server down and back up after adding the drive or mount point as a resource).

To conclude, managing file groups and the files within the group is a powerful tool in the DBA’s performance arsenal.  Learn to shoot straight with it.

Latch Waits. What they mean, and when are they bad?

Recently our shop migrated to the new Version 11 of Dell’s Spotlight on SQL Server Enterprise, our monitoring tool.  Immediately we began seeing alerts that read like this:

The latch wait time rate is 7,500 ms per second.

Sounds ominous!  You are waiting 7.5 seconds every second to get a needed latch?!  Why is this happening to my servers?  They seem alright, because queries are moving along nicely and the users are happy.  However, this bad statistic must mean that there is a massive latent problem that this new default alert in the monitoring tool caught.

Research time!

What is a latch wait?

Well, first, we should define a latch.  A latch is a type of lock that protects access to SQL Server’s structures in memory like the buffer pool or SQL Server’s internal data structures.  They’re vitally necessary to keep integrity on in memory data pages.

Latches are serialized.  One latch completes work before the next latch is granted.  If a process needs a latch and one cannot be obtained, it is suspended until the latch becomes available.

Now, with that description, the latch wait seems even more ominous!  Waiting 7.5 milliseconds for each latch of memory?  Yipes!  Anything more than a microsecond’s wait is an eternity – correct?

Well, let’s use a DMV – sys.dm_os_latch_stats to see what’s going on:


I looked at the total wait time quickly, and see that ACCESS_METHODS_DATASET_PARENT is the predominant latch wait on this server.  Paul Randal wrote a wonderful post that explains many of these latch waits.  This particular wait is created by parallelism.  As Paul warns in his post, let’s not go bananas and set MAXDOP to 1.

Instead, the first thing to check is whether hyperthreading was accidentally enabled on this server.  In this case, the answer appears to be no, but the sysadmins need to check this (you can’t do it from SQL but there is a VB Script out there that can detect it — if your sysadmins will let you run that script on a production server).

While the Sysadmins investigate, I note that the second most prevalent class is BUFFER.  This is the classic wait for an in memory page or structure.  This one seems to be one that I can live with, especially since ACCESS_METHODS_DATASET_PARENT is responsible for over 90% of the wait time and also the highest max wait times.

As Paul’s article tells us, the NESTING_TRANSACTION_READONLY and NESTING_TRANSACTION_FULL latch classes are related to (duh) nested transactions.  Parallel nested transactions.

What My problem was

In my case, it seems that parallelism is the cause.  Hyperthreading is not, in this case, the problem.  SSIS also runs packages on this server to do ETL work.  We have four sockets with two cores each.  What I did to resolve this issue was to configure max degree of parallelism to 4, allowing four cores to be free to handle the SSIS packages and other SQL traffic.

We are seeing an incremental improvement in query performance on this server.  What was good is now a little better.  The alert hasn’t showed up since we changed configuration.

Does the new alert help?

Short answer:  yes.  However, this alert needs to be tuned server-by-server in Spotlight, and may need to be ignored for certain test servers.

I will take this opportunity to strongly encourage every DBA who manages more than five SQL Servers to review all of the available monitoring products out there.  I like Spotlight.  You might like SQL Sentry or Idera or RedGate.  Find the one that works best with your style of DB management and implement it.  Be ready to tune the thing for 30-120 days.

Spotlight did me right with the latest alert in v11.

Understanding Peer-to-Peer Transactional Replication, Part 1

Peer-to-Peer Transactional Replication (“PPTR” for short) is the ugly duckling of SQL Server replication topologies.  However, it need not be so.  There are occasions where PPTR is the perfect solution.  In this article I try to explain the right occasions to use PPTR and what the DBA needs to do when implementing it.

When is PPTR a good solution?

The first question that most admins ask when confronted with PPTR is, “Why don’t I just use Merge Replication?”  At first blush, it seems that Merge Replication is a good fit and PPTR is unnecessary.

However, Merge Replication adds a GUID row identifier to every table that is included in the topology.  That identifier could break the application.  With very large tables, the 16-byte GUID could cause storage issues to arise.  PPTR has no such limitations.

Therefore, if you are dealing with black-box code that cannot be amended, or if the changes needed to support merge replication cannot be easily implemented, you should consider PPTR.  You should also consider PPTR if you are dealing with storage limitations that make adding 16 bytes to every row of every table to be replicated impractical.

Considerations when implementing PPTR

In PPTR, every database in the topology acts as both publisher and subscriber.  Updates to the database are published to all other databases, and the database is subscribed to receive updates that occur on any other database.

This fact brings up the following considerations, in addition to the normal requirements of transactional replication (e.g., primary keys must be defined):

  • Timing:  Conflicts may arise if the same row is updated at two nodes at the same time.
  • Identity Management:  Identity columns, natural primary keys and other unique constraints need to be managed.  You need to ensure that a row with the same unique identifier is not inserted into more than one database.
  • No Snapshots:  You cannot snapshot any node in PPTR.  This is an important caveat, as explained below.

Timing and Conflict Detection

Timing issues are usually handled by disciplined application design.  For example, if row 123 is being modified in database A, a row is inserted into a replicated table that acts as an application level “lock” on the row.  That table is examined by any program using any database prior to allowing modifications.  This (or another solution) should be done because locks on replicated databases are not replicated to the other databases in the topology.

Sometimes, however, the application is not coded to do such a thing.  In that case, a conflict might occur.  PPTR comes with optional conflict detection, in which each database in the topology is assigned a number, with the lowest number giving the highest priority.  Conflicts are resolved by priority, with the rows that were rejected written into a system table, where developers and DBA can work to ensure that no critical business data are lost.  Each table in the topology is given its own conflict table.  For example, the table [dbo].[orders] would have a conflict table named [dbo].[conflict_dbo_orders].

Identity and Primary Key Management

I won’t repeat the classic DBA mantra that every table should have a non-business-significant sequential integer (int or bigint) clustered primary key defined, with any “natural” primary keys defined as unique nonclustered indexes.  The performance benefits of this arrangement are well-known, as are the exceptions to this rule.  If you follow this discipline, you will have a primary key defined as int or bigint with the identity specification.

Now, you will have the same table on all of the PPTR nodes, and you need to set ranges for all of those identity columns.  Assigning identity ranges requires you to determine growth room, for if you run out of identity values in one range, you will overlap identities.  If conflict detection is not enabled, PPTR will crash.  If conflict detection is enabled, you will fill up the conflict tables with data that should be in the production tables.  Either circumstance will result in user discomfort.

I recommend that you set up the identity as a BIGINT and try to project a few years’ worth of growth for the tables in each database.  Then add a fat pad to that amount, and use that to set ranges.  I also recommend creating a SQL job on each server that queries the tables to examine that range for coming close to running out.  If the range is within a defined percentage of running out, email or page DBA.  Some monitoring tools will allow you to put the query into the tool and generate an alert.  It doesn’t matter how you get the alert notice, as long as it gets rapid attention.

Setting the ranges is not a problem.  You simply use DBCC CHECKIDENT with the RESEED option.  I have a script that I use to generate the reseed option.  In my environment, the identity column has the same name on all tables in the PPTR topology.  If your environment differs, you may need to alter the SQL to recognize the tables and the identity columns.

set nocount on;

declare @sn varchar(128), @seedmin bigint, @seedmax bigint, @schemaname varchar(128), @tablename varchar(128), @maxkey bigint

set @sn = @@SERVERNAME;
print '--Executing partitioning on: ' + @sn

select @seedmin = case @sn
	when 'Server0' then 500000001
	when 'Server1' then 20000000001
	when 'Server2' then 30000000001
	else 0

select @seedmax = case @sn
	when 'Server0' then 10000000001
	when 'Server1' then 20000000001
	when 'Server2' then 30000000000
	else 0

print '--Seed value = ' + convert(varchar(25),@seedmin) + '-' + convert(varchar(25),@seedmax) 
print '--Executed at: ' + convert(varchar(80),getdate());
print ' '
print 'declare @maxkey bigint'
print 'declare @newkey bigint'
print 'declare @seed bigint = ' + convert(varchar(25),@seedmin)

declare cur cursor for 

where C.COLUMN_NAME = 'SequentialIdentity'
and T.TABLE_NAME not like 'conflict%'

open cur

fetch next from cur into @schemaname,@tablename

while @@FETCH_STATUS = 0
	print ' '
	print ' '
	print 'select 
	@maxkey = isnull(max(SequentialIdentity),0)
	from [' + @schemaname + '].[' + @tablename + ']
	where SequentialIdentity >= ' + convert(varchar(25),@seedmin) + '
	and SequentialIdentity < ' + convert(varchar(25),@seedmax) + ';'
	print ' ' 
	print 'print '' '''
	print 'print ''--Table name = [' + @schemaname + '].[' + @tablename + ']'''
	print 'print ''--Maxkey='' + convert(varchar(25),@maxkey)'
	print 'set @newkey = @seed'
	print 'if @maxkey > @seed
	set @newkey = @seed + (@maxkey - @seed)'
	print 'print ''dbcc checkident(''''' + @schemaname + '.' + @tablename + ''''',RESEED,'' + convert(varchar(25),@newkey) + '');'''
	fetch next from cur into @schemaname,@tablename

close cur
deallocate cur

The script above generates a second script into the messages in SQL Server Management Studio.  That generated script is executed and in turn generates a script full of the DBCC CHECKIDENT commands that you execute in yet another SSMS window to set the range for the node in question.

Thus, the above script is useful to regenerate range keys in the event that you must restore the topology, a fate that I hope that you never experience!  It may be possible to reduce effort and write a step that does not generate a script that generates a second-level script.  I haven’t put the energy into that effort.

Clustered Index Considerations

If you cluster on a sequential identity (either as a PK or an unique index), you will have identity ranges for each database in the topology.  Therefore, you will have to consider using a fill factor of less than 100 on the clustered index.  Page splits along the high boundaries of the ranges (except the last range) are inevitable.  I generally use a fill factor of 90; you may need to go lower than that.  A lower fill factor may waste space within the range, but the performance boost will be worth the wasted space.

Whatever database gets the most INSERT traffic should be placed into the highest range.  This will reduce page splits, since rows inserted at the end of the clustered index will simply fill up a page and allocate the next one.

Snapshots = Never

PPTR cannot be populated by snapshot.  That’s the story, all of it.  When you implement the PPTR topology, all articles must contain identical data rows.  How do you populate the PPTR nodes?  Backup and Restore is the easiest method.  The procedure from 10,000 feet:

  1. Quiesce the system
  2. Take a backup of the canonical database (where I work, this is the database that “wins” all conflicts and is numbered highest in the PPTR topology wizard, which I will demonstrate in the next article.
  3. Restore that backup to all other planned databases
  4. Reset the ranges of all identities and key generation mechanisms.
  5. Create the publication
  6. Place all the nodes into the topology
  7. Release the databases for use.

Of course, Backup and restore assumes that all of the tables in the database are either static or in the PPTR topology.  Isolating the replicated tables into a separate database containing nothing but those tables is an excellent idea if your application can handle it.  I have also had excellent results using RedGate SQL Data Compare to synchronize individual tables when adding them to the PPTR topology after it has initially gone live.

Big caveat:  SQL Server generates snapshot agents and jobs for PPTR nodes.  Please, don’t ever run that agent. However, you also cannot delete that agent.

In the next article, we will set up and demonstrate how PPTR works.