All posts by John Tamburo

Missing Index Suggestions — Discretion Required!

If you are in Management Studio and have seen a suggested index, or if you have used a resource such as Pinal Dave’s Missing Indexes Script, you will be tempted to apply the suggested index without exercising discretion in doing so.

Please, be careful!  I have seen a table with 10GB of data pages and 50GB+ of index pages.  This is a terrifying thing for write performance, and often, a little bit of analysis is in order.

A common anomaly is to see multiple indexes with identical sort columns  and different groups of include columns.  Fixing these into a single covering index is a low-hanging fruit to pick.  Simply delete the duplicate indexes and replace with one that has all necessary included columns — most of the time.

What if all of these grouping columns add up to most or all of the table columns?  You are then keeping a second copy of the table in the nonclustered index and are causing numerous extra write I/O operations.

In that case, the best bet is to eliminate all included columns and just keep the sort columns as one index — most of the time.  You may have an outlier query that requires included columns to evade a performance bottleneck.  How do you determine this? The best way is with the DMV included in SQL Server. Analyze the missing indexes based on estimated impact of index change, multiplied by the number of executions that asked for this missing index.  Look at the highest values of the product.

Pinal Dave’s script calculates this number.  Again, however, you have to analyze each recommendation. Just throwing the indexes at the database may solve a read issue, and create a write or disk space issue.

We are DBA. We thrive on using the simplest solution and moving on.  However, in many cases, we must do a little digging to determine what solution is truly the simplest.

Happy New Year!

I wish you all the happiest 2015.

May your queries never be accidentally Cartesian.

May your DBCC CHECKDB always come back without inconsistencies or errors.

May none of your databases ever be marked suspect.

May both your budget and your salary increase.

May the next packaged product that your company installs come with a database that is properly normalized, with the correct indexes, and following all best practices.

May your monitoring tools never generate false positives.

May your new SAN contain nothing but solid state drives.

Have a wonderful new year!

Auto Close – Who Needs it?

When someone has a database set to Auto Close ON, after the last database user exits, the database is shut down cleanly and SQL Server frees the resources used.  While this may have had some use in the past, that is certainly no longer the case.

Auto Close set to ON will cause database down warnings in modern monitoring tools, which cannot (yet) tell the difference between an auto-closed database and one that is offline or suspect.  In a production environment that leads to numerous “cry-wolf” error messages.

The best thing to do is to ensure that all of your databases are set to auto close OFF.  Some people say that auto close can be advantageous in Development or testing environments.  However, I find that the risks outweigh the benefits.

Auto-close is usually defaulted to OFF.  However, moving databases up from free versions of SQL Server may cause AUTO_CLOSE to become set to ON.

Setting this pernicious feature off is easy:

ALTER DATABASE [database name] SET AUTO_CLOSE OFF;
GO

Save yourself time and headache and do this to every database on every instance.

I’m open to hear of a scenario where this option helps; I just cannot see one at this time.

The Importance of Monitoring Tools.

How does the modern DBA deal with 20-1,000 SQL Servers?  There are a few choices.

The first thing a DBA could do is configure a set of alerts to catch error situations. While useful, alerts are reactive in nature and the DBA finds out there is a problem after it happens.

The second thing a DBA could do is to configure a set of scheduled jobs that gather and report on metrics that might predict a failure. One example is a script set that measures database space available and or free disk space on the drives/mount points, to help catch things like dwindling free space before it hits you at 3:00am on a Sunday. Again this is an excellent thing to do but implementation may vary by machine and it complicates deployment.

The third thing a DBA could do is to acquire a good monitoring tool. There are several: SQL Sentry, Idera SQL Diagnostic Manager, Redgate SQL Monitor, and Dell (formerly Quest) Spotlight on SQL Server Enterprise.

A DBA who has to monitor the health of more than five SQL servers needs the “force multiplier” effect of a good monitoring tool. Scripts and alerts won’t cut it.  My environment uses Spotlight, and has previously dallied with Idera and SQL Sentry. The tool you choose should be determined by your comfort level with said tool, and how effectively it helps you to manage your environment.

What do packaged monitoring tools offer you?

1 . Solid monitoring

Your monitoring product should monitor each SQL instance for performance, blocking, PLE, cache hit rates and more.  It should also monitor the health of services such as replication and or Availability Groups. It should be able to gather windows metrics, and if the server is a VM, it should gather at least some basic metrics from the underlying hypervisor.

2.  Solid Alerting

A good monitoring tool should enable the DBA to configure alerts that page and or email. Alert thresholds need to be set to a default but customizable for a given machine or SQL Server instance.   Alerts should be able to drive actions. For example a replication alert might be set up to stop a user process in the event of excess latency.

3. Solid high-level reporting.

Without making myself into a Spotlight evangelist, this is one area where the tool truly excels. It has a heat map with every instance in several colors, from green for all-OK to red for an immediate-attention item. Moreover, multiple alerts increase the size of the non-green instance icon to draw attention. Even better, there is an iPhone app that we can reference anywhere to temperature-check the environment.

4. Easy Provisioning

One should be able to commence monitoring with little to no effort, and preferably without installing client software or scripts.

5. A Solid Repository

This is important. You should be able to store metrics over time for trend reporting, and to access those metrics for custom reporting as needed. Such power, used wisely, can stabilize environments and make production DBA proactive and not reactive.

Some companies may view monitoring tools as a waste of money. I’m blessed to work someplace where management understands the need for the tools, and the end result is a stable environment that is easier for me to manage and is much more effective at meeting corporate needs.

I suggest that you obtain evaluation copies of the popular tools and see for yourself what they buy you in terms of proactive management of your SQL Server environment.

Why I chose WordPress and how it relates to modern DBA work.

This is a new blog.  I decided that I could go home-brew and spend a while creating and debugging something, or I could get busy and put up a new site (using Azure Marketplace – Microsoft please send check to me  in a day).  I chose the latter.

WordPress makes a complete and easy to use blogging product.  It was simple to get it up and running.  Much better than reinventing the wheel.

I’m an old geezer in the IT world, starting work over 30 years ago. As such, there is a strong desire within me to craft every solution from a blank slate.  However, times change.  Adapting to those changing times is important.  Is my winter vacation time better spent developing something already well-done by others, or with family? This choice is a no-brainer.

We see this all the time in the work world.  Companies have wised up.  What company builds an ERP solution from scratch anymore? A package is purchased, customized features are added, and the product is deployed.  For DBA this is a challenge set all its own.

Packaged software vendors rarely prioritize the needs of a SQL Server DBA to maintain performance as an application scales.  For many packaged products, things like Entity Framework and SQL generators restrict DBA query tuning options.  Database structures are mostly out of DBA hands, meaning that the options for fixing poorly-envisioned table structures and the like are also limited.  How do we respond when performance tanks?  Is the performance-tuning role of a 21st-Century DBA relegated to tossing indexes over the fence? May it never be!

As the site grows, I will share more and more about how to deal with these frustrations and challenges.  I also look forward to your comments.

Understanding Peer-to-Peer Transactional Replication, Part 1

Peer-to-Peer Transactional Replication (“PPTR” for short) is the ugly duckling of SQL Server replication topologies.  However, it need not be so.  There are occasions where PPTR is the perfect solution.  In this article I try to explain the right occasions to use PPTR and what the DBA needs to do when implementing it.

When is PPTR a good solution?

The first question that most admins ask when confronted with PPTR is, “Why don’t I just use Merge Replication?”  At first blush, it seems that Merge Replication is a good fit and PPTR is unnecessary.

However, Merge Replication adds a GUID row identifier to every table that is included in the topology.  That identifier could break the application.  With very large tables, the 16-byte GUID could cause storage issues to arise.  PPTR has no such limitations.

Therefore, if you are dealing with black-box code that cannot be amended, or if the changes needed to support merge replication cannot be easily implemented, you should consider PPTR.  You should also consider PPTR if you are dealing with storage limitations that make adding 16 bytes to every row of every table to be replicated impractical.

Considerations when implementing PPTR

In PPTR, every database in the topology acts as both publisher and subscriber.  Updates to the database are published to all other databases, and the database is subscribed to receive updates that occur on any other database.

This fact brings up the following considerations, in addition to the normal requirements of transactional replication (e.g., primary keys must be defined):

  • Timing:  Conflicts may arise if the same row is updated at two nodes at the same time.
  • Identity Management:  Identity columns, natural primary keys and other unique constraints need to be managed.  You need to ensure that a row with the same unique identifier is not inserted into more than one database.
  • No Snapshots:  You cannot snapshot any node in PPTR.  This is an important caveat, as explained below.

Timing and Conflict Detection

Timing issues are usually handled by disciplined application design.  For example, if row 123 is being modified in database A, a row is inserted into a replicated table that acts as an application level “lock” on the row.  That table is examined by any program using any database prior to allowing modifications.  This (or another solution) should be done because locks on replicated databases are not replicated to the other databases in the topology.

Sometimes, however, the application is not coded to do such a thing.  In that case, a conflict might occur.  PPTR comes with optional conflict detection, in which each database in the topology is assigned a number, with the lowest number giving the highest priority.  Conflicts are resolved by priority, with the rows that were rejected written into a system table, where developers and DBA can work to ensure that no critical business data are lost.  Each table in the topology is given its own conflict table.  For example, the table [dbo].[orders] would have a conflict table named [dbo].[conflict_dbo_orders].

Identity and Primary Key Management

I won’t repeat the classic DBA mantra that every table should have a non-business-significant sequential integer (int or bigint) clustered primary key defined, with any “natural” primary keys defined as unique nonclustered indexes.  The performance benefits of this arrangement are well-known, as are the exceptions to this rule.  If you follow this discipline, you will have a primary key defined as int or bigint with the identity specification.

Now, you will have the same table on all of the PPTR nodes, and you need to set ranges for all of those identity columns.  Assigning identity ranges requires you to determine growth room, for if you run out of identity values in one range, you will overlap identities.  If conflict detection is not enabled, PPTR will crash.  If conflict detection is enabled, you will fill up the conflict tables with data that should be in the production tables.  Either circumstance will result in user discomfort.

I recommend that you set up the identity as a BIGINT and try to project a few years’ worth of growth for the tables in each database.  Then add a fat pad to that amount, and use that to set ranges.  I also recommend creating a SQL job on each server that queries the tables to examine that range for coming close to running out.  If the range is within a defined percentage of running out, email or page DBA.  Some monitoring tools will allow you to put the query into the tool and generate an alert.  It doesn’t matter how you get the alert notice, as long as it gets rapid attention.

Setting the ranges is not a problem.  You simply use DBCC CHECKIDENT with the RESEED option.  I have a script that I use to generate the reseed option.  In my environment, the identity column has the same name on all tables in the PPTR topology.  If your environment differs, you may need to alter the SQL to recognize the tables and the identity columns.

set nocount on;

declare @sn varchar(128), @seedmin bigint, @seedmax bigint, @schemaname varchar(128), @tablename varchar(128), @maxkey bigint

set @sn = @@SERVERNAME;
print '--Executing partitioning on: ' + @sn

select @seedmin = case @sn
	when 'Server0' then 500000001
	when 'Server1' then 20000000001
	when 'Server2' then 30000000001
	else 0
end

select @seedmax = case @sn
	when 'Server0' then 10000000001
	when 'Server1' then 20000000001
	when 'Server2' then 30000000000
	else 0
end

print '--Seed value = ' + convert(varchar(25),@seedmin) + '-' + convert(varchar(25),@seedmax) 
print '--Executed at: ' + convert(varchar(80),getdate());
print ' '
print 'declare @maxkey bigint'
print 'declare @newkey bigint'
print 'declare @seed bigint = ' + convert(varchar(25),@seedmin)

declare cur cursor for 

select
	T.TABLE_SCHEMA
	,T.TABLE_NAME 
from INFORMATION_SCHEMA.TABLES T
inner join INFORMATION_SCHEMA.COLUMNS C
	on T.TABLE_NAME = C.TABLE_NAME
where C.COLUMN_NAME = 'SequentialIdentity'
and T.TABLE_NAME not like 'conflict%'
and T.TABLE_TYPE = 'BASE TABLE'

open cur

fetch next from cur into @schemaname,@tablename

while @@FETCH_STATUS = 0
begin
	print ' '
	print ' '
	print 'select 
	@maxkey = isnull(max(SequentialIdentity),0)
	from [' + @schemaname + '].[' + @tablename + ']
	where SequentialIdentity >= ' + convert(varchar(25),@seedmin) + '
	and SequentialIdentity < ' + convert(varchar(25),@seedmax) + ';'
	print ' ' 
	print 'print '' '''
	print 'print ''--Table name = [' + @schemaname + '].[' + @tablename + ']'''
	print 'print ''--Maxkey='' + convert(varchar(25),@maxkey)'
	print 'set @newkey = @seed'
	print 'if @maxkey > @seed
	set @newkey = @seed + (@maxkey - @seed)'
	print 'print ''dbcc checkident(''''' + @schemaname + '.' + @tablename + ''''',RESEED,'' + convert(varchar(25),@newkey) + '');'''
	fetch next from cur into @schemaname,@tablename
end

close cur
deallocate cur

The script above generates a second script into the messages in SQL Server Management Studio.  That generated script is executed and in turn generates a script full of the DBCC CHECKIDENT commands that you execute in yet another SSMS window to set the range for the node in question.

Thus, the above script is useful to regenerate range keys in the event that you must restore the topology, a fate that I hope that you never experience!  It may be possible to reduce effort and write a step that does not generate a script that generates a second-level script.  I haven’t put the energy into that effort.

Clustered Index Considerations

If you cluster on a sequential identity (either as a PK or an unique index), you will have identity ranges for each database in the topology.  Therefore, you will have to consider using a fill factor of less than 100 on the clustered index.  Page splits along the high boundaries of the ranges (except the last range) are inevitable.  I generally use a fill factor of 90; you may need to go lower than that.  A lower fill factor may waste space within the range, but the performance boost will be worth the wasted space.

Whatever database gets the most INSERT traffic should be placed into the highest range.  This will reduce page splits, since rows inserted at the end of the clustered index will simply fill up a page and allocate the next one.

Snapshots = Never

PPTR cannot be populated by snapshot.  That’s the story, all of it.  When you implement the PPTR topology, all articles must contain identical data rows.  How do you populate the PPTR nodes?  Backup and Restore is the easiest method.  The procedure from 10,000 feet:

  1. Quiesce the system
  2. Take a backup of the canonical database (where I work, this is the database that “wins” all conflicts and is numbered highest in the PPTR topology wizard, which I will demonstrate in the next article.
  3. Restore that backup to all other planned databases
  4. Reset the ranges of all identities and key generation mechanisms.
  5. Create the publication
  6. Place all the nodes into the topology
  7. Release the databases for use.

Of course, Backup and restore assumes that all of the tables in the database are either static or in the PPTR topology.  Isolating the replicated tables into a separate database containing nothing but those tables is an excellent idea if your application can handle it.  I have also had excellent results using RedGate SQL Data Compare to synchronize individual tables when adding them to the PPTR topology after it has initially gone live.

Big caveat:  SQL Server generates snapshot agents and jobs for PPTR nodes.  Please, don’t ever run that agent. However, you also cannot delete that agent.

In the next article, we will set up and demonstrate how PPTR works.