Walkthrough of diagnosing a SQL Server error – Part 1
- by Scott Whigham on June 10, 2009 12:01 PMToday while reviewing jobs on the LearnItFirst SQL Server 2008 database server (I record/release my SQL Server training classes on video through LearnItFirst.com), I noticed some odd behaviors and error log entries. Now, a full six hours later, my server isn't in a better place. In fact, I think it's worse off than before. So much for troubleshooting lol. This is still, as of this writing, an ongoing issue - which is actually a *good* thing. The mere fact that I have time to make this post shows that I "have it under control" lol. Really where I am right now is "waiting for the fail"... Let me give you more background...
It all started...
When I looked at my logs this morning, I noticed that a job had not completed. This particular job makes a call to xp_cmdshell and runs an old pre-SSIS backup script that I've used for years. It creates a folder and creates the SQL backup in that folder with a timestamp in the file name. Yes, today it should've been done using SSIS but it's worked for four years straight on SQL Server 2005 and for a year on SQL 2008 so I wasn't worried. When I saw that the backup job was still running, I knew there was a problem - the job usually runs in about 10 minutes yet this job had gone on for hours and hours.
I began doing my investigation of (a) what happened, (b) what else is being affected, and (c) how to resolve it by opening up and running sp_who in a query. I needed to know what was happening in my server. The sp_who output showed that spid 51 was "SUSPENDED" due to an UNKNOWN TOKEN in the msdb database. Hmmmm - that was my job step... Okay - Spid 55 was being blocked by spid 51 and spids 56-59 were all being blocked by 55 (a wait). It was time to look deeper and to view the SQL Server commands for those spids:
Screenshot #1 (click to view full screen)
If you look at the code for spid 51, you'll see that it is running CREATE PROC sys.sp_configure. Wha?! "Hmmmm", I thought. "I did run RECONFIGURE the other day after turning on xp_cmdshell access, didn't I?" So I ran RECONFIGURE and here's what I saw next:
Screenshot #2
Let's take some backups
Okay - this was getting serious. It was time to take some backups. So I tried and tried - not allowed. Damnit. Okay - let's look at sys.databases:
Screenshot #3
Do you see the log_reuse_wait_desc columns that say, "ACTIVE_BACKUP_OR_RESTORE"? Well, they are being backed up currently by SQL Server. Wait - what? No they aren't - if you look at Screenshot #1, you'll notice that several spids have issued BACKUP DATABASE xxx to VIRTUAL_DEVICE statements. But those spids are SUSPENDED as well as blocked by spid 51.
Let's Review
I can't take backups. I have a job that says it's running but it's clearly hung (wait - is the job "hung" or "hanging"? I'm not sure on the verbiage...). I know what to do! Let's do some killin'!
Continued in Part 2




Leave a comment