Virtual functions in C++ make allow us to do mocking, stubbing, and faking, which helps us test code in isolation in TDD or just microtesting in general. A frequent objection to virtual functions is that it "costs more" than simple function calls.
The cost of a virtual function is hardly ever important, unless the function is being called in a very tight loop, or it is being called at interrupt-time, and even then compiler / linker optimization might overcome the "extra" table-lookup that makes a function virtual.
In C, calling a function through a function-pointer is equivalent to calling a virtual function in C++. Implementations of file-systems APIs in Linux and BSD Unix use function-pointers internally to do what C++ does with virtual functions.
The cost of cache-misses (on-chip cache and off-chip cache) is a LOT bigger than the cost of virtual function pointers. Incorrect branch-prediction is also a CPU-level cost that people fail to understand.
The best way to know if making a function virtual has any user-apparent cost, is to measure the application via some tool that does non-invasive timing. Often, there will be no visible effect if you make some functions virtual that were non-virtual before. (I'm talking about REAL applications, not benchmarks.)
Nice post, Keith! "Don't optimize unless you know you have a performance problem."
ReplyDeleteWhere's the danger in premature optimization? That you waste time and effort optimizing the wrong thing. Presumably you can also break things if you're not careful, but let's assume a good suite of tests. ;-)
In my experience, only real world profiling can tell you what needs to be optimized. We are surprisingly poor judges of where the bottlenecks are. So many times, what appears to be the problem in a thought experiment is innocent, and something stranger is at work.
Often, the best optimization is at a higher level, not at the instruction level. In other words, things like picking a better algorithm might buy you more.
Still, as you indicated there might well be times when timing is so tight that you need to optimize to the point of eliminating virtual functions. In that case, is it time to look at cache-misses? And if so, what do you prescribe for optimizing for instruction (and data) caches?
Optimizing for instruction (and data) caches is very machine-specific, so I wouldn't recommend looking into that unless your code is itself specific to one machine. At that point, you might consider dropping into assembly language. At which point _my_ expertise is has expired. I know what I don't know.
ReplyDeleteI think that removing duplicate code, keeping code and data "together" as per OO encapsulation and keeping classes (and data) "minimal" as per XP's "simple design" provides an environment where performance bottlenecks can be found and fixed fairly easily. Smaller code is less likely to provoke instruction cache-misses, ditto for smaller data.
If the algorithm and data are well-encapsulated, switching algorithms doesn't affect many classes. For example, you could pick a sort algorithm that has better locality than your default choice, if it happens that sorting is bottleneck.
But a higher-level look-around may let you discover that sorting isn't needed, that a different data structure gets you better benefits _plus_ sorted data.
Most of the time, we don't need to concern ourselves with the cache, nor do we need to concern ourselves with the virtual function table-lookup. Working, tested, well-designed code.
I mostly agree with you here, Keith. Usually, spatial locality is so poor in a system as a whole that you can add hundreds of indirections and they are just very tiny drops in a very large bucket.
ReplyDeleteThat being said, have you worked with teams who do have well-tested code, and have taken advantage of all the low-hanging fruit that profiler has outlined to them, like using Link-Time Optimization, Profile-Guided Optimization (automated integration tests are great for generating said profiles!), and then analyzing cache and prefetch issues using callgrind?
When you get out of the mode where you're just trying to get the code our of the hole it's been buried in, and into the proactive mode, virtuals can end up being an insurmountable bottleneck. This is why I convinced my current client to fund a project to add de-virtualization as an optimization to GCC for both stack and heap allocated objects. It's coming along very nicely and should optimize not only across function pointer boundaries, but shared_ptr<> boundaries as well. In the milestone tests I gave them, abstract factory patterns that returned pure virtual interface classes, as well as auto_ptr<> to a pure virtual interface class were included.
Even before virtual functions, there's the issue of clean header files that have little to no implementation details in them. When C++ codebases heavily favor templates instead of parameterizing pure virtual interface paramters to the ctor or individual method, that is usually the first nightmare. For plain old classes and methods without parameterized types, moving them into a .cpp can cause dramatic performance shifts. Thankfully, one can either use GCC 4.5's Link-Time Optimization (mostly contributed by folx @ Google), or just do the old mega-compilation trick with any compiler. Doing that allowed one of the teams I work with to not only move lots of implementation details out of headers files, but it also ended up giving a 12% performance boost at the system throughput level!
So, while I agree with your overall advice and attitude, one needs to provide an actual solution for teams that aren't stuck in the stone age and want to balance TDD/OO and performance. Hopefully the GCC optimizations I was able to get funded remove one more manual barrier, just like LTO did.
I go into the tools and guidance of when and how to use them in the book I'm working on regarding C/C++ Unit Testing.
PS: Please stop calling real unit tests "microtests". It's just confusing people who don't know any better and think that their unit-level tests can be slow and flaky since they aren't "microtests". unit, integration, system: well-established and I can just point people at wikipedia or c2 wiki for good definitions in case that don't believe what I tell them.
ReplyDeleteThe term "microtesting" is utterly unnecessary at best, and damaging to the overall understanding of developer testing at worst. If coining the term "microtesting" is just an attempt by Industrial Logic to somehow make their courseware appear unique, seem like thought leaders/visionaries, or as some oddly chosen branding opportunity, the cost of further confusing people's understanding is surely not worth whatever advantage you are probably not even getting out of it.
Thanks in advance for knocking it off :)
Nice to hear from you, Matt. Glad to hear about the optimizations for GCC that are available and are in progress. One of these days I may write about clean header-files: I think we'd be in agreement on what makes for a clean header file.
ReplyDeleteOne of the teams I have worked with is dealing with spatial locality. Working memory is quite small and so they have to page-in and page-out code manually. (I don't think they worry about cache, or if the processor they are using even has a cache) They're using C instead of C++ but in many places they use function-pointers for faking, stubbing, and mocking in tests.
Matt - we started using the term "microtest" precisely because people already had very well established idea for what a unit test is, and those ideas were not anything close to what we think of a great unit test. A new word helped us get them to consider a new way to think about these tests. Our use of the word had and has nothing to do with branding or though leadership - we are simply looking out for our clients, most of whom don't find the term to be confusing at all. --Joshua Kerievsky
ReplyDelete