img by Coba

Walkthrough of diagnosing a SQL Server Error – Part 3

So here’s where we are: In Part 1, I explained the initial error. In Part 2, I began the main troubleshooting process. In this entry, I continue troubleshooting based off of what I learned in Parts 1 & 2.

After I found out there were potential disk errors, I realized that I likely needed a minimum of 30 minutes downtime up to a likely two hours. Sure, it could go beyond that but I did not tell Windows 2003 to repair all errors therefore the time to fix was hopefully shortened. Since I needed downtime and it was the middle of the week, I had two things to consider:

  1. The SQL Server is up and running. The websites are up and running. The websites are online and able to write to the database.
  2. If I could wait until Friday night or Saturday morning for the downtime, it would have the least amount of impact on users.

So that’s great news – that the server is up and running and the websites are online. The only negative, right now, is that I’m running naked – no backups since yesterday. Was I comfortable running naked for another 48-72 hours?

Yes.

Fourth action: Plan for disaster

The worst thing that could happen right now is that I made a change that caused the server to crash. I’d already tried stopping a job (which failed) so I knew there were serious problems. I didn’t want to do anything that destroyed the server state right now. The first thing I needed to do was to come up with a plan on how to get the data out of the “live” production database. Enter SSIS!

Since I could still connect to the server using Windows authentication (BTW – even Dedicated Administration Connections would fail saying that a DAC already exists – wasn’t me!), I fired up SSIS and hooked into a separate instance. For the LearnItFirst database (you know – the company that has awesome SQL Server training videos), there are only 20 or so tables that have changed in a meaningful way since yesterday (Orders, Members, etc). So I created an SSIS package to export that data out and I took a backup.

  • Step 1: Create a new database on a separate instance
  • Step 2: Load the database with the data from the 20 or so tables that have changed since yesterday

Yay! Now I have a “backup”". While it isn’t a traditional SQL Server backup like I would like to get, if worse comes to worst I can be back online in minutes now.

Fifth action: Install another instance of SQL Server 2008

I tried creating a new database on the “affected” server but was unable to – the connection immediately entered a SUSPENDED state. So, in the event that the current state of affairs deteriorated, I wanted to be able to “flick a switch” and turn on a “new” database. SSIS did the backup for me – now I needed somewhere to store it. I installed SQL Server 2008 along with SP1 (check out LearnItFirst’s SQL Server 2008 tutorials for help doing this on your own) and created a new database to store the migrated data. You might be wondering, “Wait – he said he thought his drive was failing. Why not install on a separate machine?” Good question – the reason is that this particular server was running on a dedicated server from a webhost. I only had one server. If I wanted another server, I’d have to wait the 12-24 hours for them to build it. I needed this done NOW. I took a backup of the backup (that is, I took a SQL Server backup of the scripted-out-tables backup I did using SSIS), zipped it, and downloaded it to a separate server in case the drive failed.

Despite having a backup, I’m not fully prepared for the worst-case scenario yet. You see, my quick-and-dirty SSIS backup does three things:

  1. Scripts out the 20 or so tables I needed up-to-date data from
  2. Creates the tables in a new database
  3. Loads the tables with complete data from the original database

What it doesn’t do/have:

  1. Only copy data that has changed since last backup
  2. New database does not have all other tables

Sixth action: Restore yesterday’s backup onto new instance

Now that I have the new instance installed (let’s call it “LIF2” for short), I need a database to start with. I restored last night’s full database backup and now it’s time to synch the database with the live copy.

TIME OUT – Question time!

There are likely to be a few questions – let me try to answer them:

  • Did you try logging in using SQL Server’s Dedicated Administration Connection?
    • Yes – would not allow it. I tried from the New Query button and from sqlcmd. The error was that there was already a DAC connection. sp_who showed that my connection and the app connections were the only “live” connections though.
  • Are you ever going to tell us how you fixed it?
    • Eventually… hopefully… It’s still on-going as I post this!
  • Why are posting here instead of fixing it?
    • It’s one of those things where I’ve done everything I can during peak time – now I’m waiting for a non-peak time to do more!

'I’ll post more tomorrow on this…

authors
scott whigham
grant moyle
chad weaver
recent comments
  • rolex watches: In fact, some of the Swiss luxury watch, in read more