Is Compulsive Tuning Disorder a sign that DBAs are doing their jobs effectively? Or a sign that they are not?
DBAs are not doing their jobs if they suffer from CTD. In the methodology that I proposed in 2001, I invented the term "compulsive tuning disorder." I was writing my first book, and one dark November night, I asked myself what would be a catchy way of representing this problem -- DBAs tuning, tuning, tuning, but looking at irrelevant things. Lots of people are using the term now and it's really taking off. I hear people using it using it in presentations as something not to do. The whole idea of this presentation is to help people get over it -- not just for 10g but for all versions of Oracle. Is there some standard DBAs should be striving for? How does a DBA know when performance is good enough?
You have to define response time goals. Once response time goals have been achieved, performance is good enough. That's the crux of the problem -- people don't define goals, and they think tuning is a voodoo witchcraft thing that only tuning magicians can do. My rational for performance management is that you've got to have mathematical evidence for a problem. And that is the key that should drive you to solutions -- not just [following] a bunch of best practices. You cannot tune to a laundry list of things. My bottom line is: Let's find out where problem is. Let's have math drive what we should do, rather than opinions or expert voodoo techniques. What are the most common performance problems?
A query runs too slow or a job runs too slow. The question is what is making it slow. Lots of people don't spend time figuring out what it is. They spend time changing things, and you can change eight things and problem goes away. Have you solved it or just camouflaged it? And it can come back. If you don't use math, the solution won't be repeatable, and you will not have a consistent method of solving performance problems. It will always be a hit or miss, trial and error effort. What's the mathematical method?
Use low-level tracing methods to find out where the application is spending time. Take a trace statement and analyze it. After that, it is very simple. You might find that you are spending 78% time on I/O, so you ask why you are spending so much time on that. Do you see different problems arising in the different Oracle versions?
Yes. You could have some things that work right in one release, and the same application doesn't work after an upgrade because the DBMS has undergone change, and the optimizer plan changed. Sometimes new parameters have been introduced. It used to be good to have hundreds of parameters to tweak, but once people started having to manage hundreds of databases, people didn't have time to tweak a hundred things. The optimizer started maturing, taking on some of these tweaks. Most of the time, across releases, problems exist because of a bug, or lack of certain functionality that was required [by the application]. What are some features in 10g that address specific performance issues not addressed in previous versions?
There are multiple features, some deal with the collection of data, like the historical collection in 10g -- Automatic Workload Repository. And as part of the data collection, there is another feature, Active Session History, which is one of inflows into AWR. Once you have collected data, you need some sort of analysis engine for it, and that is now is available -- it's called Automatic Database Diagnostic Monitor. Those are some of the new things in Oracle that help you go down the path of knowing where it is spending time, rather than looking at ratios. People always want to find silver bullet and change it, but those things don't exist. It's all about using a method that is logical, repeatable, and mathematical -- driven by response time. 10g provides functionality for collection and analysis to channel the DBA on the right path of getting down to the core problem. What are the most common tuning mistakes or misconceptions?
A lot of people try to tune the database environment itself rather than look at application first. It's part of the silver bullet mindset. The database is easy -- change a parameter and boom, you made a change. Application tuning takes more effort -- finding out what components of the application is the problem stemming from, then the actual task of finding out what is the problem, then analyzing it, then figuring it out. It takes discipline and effort, but not a long drawn effort. If you are in mindset of waving a magic wand, you won't subscribe to this methodology. You did something before and you expect same thing to happen now, but the parameters may not be there, or they may not do anything because they are no longer relevant. The fundamental goal behind the effort of looking at response time is just to get people to look at the right data, and to wean people away from the old method of looking at a laundry list. It's humanly impossible to look at 100 databases every day. Even if you say, "That's why we have monitoring software," well, if it throws 100 alerts, which is the most important? Look at it from the perspective of asking do I have a performance problem, which means do I have a response time problem.
The best analogy is if you go to the doctor. You say that you have pain in your right foot. If he says he has to do a complete scan and do some blood work, you'd be thinking, why don't you just look at my right foot? You might question doctor's integrity. And that is the old method. I'll do blood test, CT scan, and an MRI every time we have a problem. We don't need to do complete blood work because of a pain in the foot. That's what we are trying to bring across.