Family Office
Analytics: Constructing accurate benchmarks (Part I)

It's crucial that we get back to basics and start getting
benchmarks right. Ronald Surz is president of PPCA, a San
Clemente, Calif.-based software firm that provides advanced
performance-evaluation and attribution analytics, and a principal
of RCG Capital Partners, a Denver, Colo.-based
fund-of-hedge-funds manager. He is a prolific and widely
published author.
In this two-part series we discuss the common benchmarks of indexes and peer groups, and offer improvements. In Part I, we outline the shortcomings of common practice and describe indexing. In Part II, we discuss peer groups and show how indexes and peer groups can be unified to provide more accurate benchmarks.
Synopsis
Investment-performance evaluators have lost touch with a basic
and self-evident truth: when the benchmark is wrong, all of
the analytics are wrong.
The cost of this mistake is high. Investment managers get hired
and fired for the wrong reasons. So it's imperative that we get
back to basics, that we get the benchmark right.
Fiduciary prudence dictates best practice over
common practice, as does the "do no harm" rule.
Indexes and peer groups are the common forms of benchmarks. These
are not best practices.
The article describes how accurate benchmarks can be constructed
from indexes and how peer group biases can be overcome.
Worth doing well
Safeguarding against "garbage in, garbage out" scenarios was once
a primary concern of investment-manager researchers. But in
recent years this self-evident "GIGO" admonition has been
forgotten by many. As the performance-evaluation industry
directed its focus to improving performance measurements, it lost
sight of the basic and critical principle of accurate
benchmarking.
The result: benchmarks are now routinely mis-specified, and
performance measurements use "garbage" benchmarks to end up with
"garbage" manager evaluations.
|image1|
The cost of this fundamental error is high. Investment managers
are hired and fired for the wrong reasons, with a consequent loss
to the investor in ill-spent fees and lost performance. It's
imperative that we get back to basics and get the benchmark
right. After all, fiduciary prudence dictates best practice over
common practice.
The need for accurate benchmarking has come into focus because of
increasing interest in portable alpha. Alpha is transported by
shorting the benchmark, thereby removing beta effects. But this
can be done properly only if the benchmark is known; you can't
after all short what you don't know. And of course the benchmark
has to be specified properly. This is an insidious problem for
hedge funds, where value added, or alpha, can easily be confused
with factor exposures such as low market participation in a
falling market.
Others have noticed these deficiencies. A study called
Assessing the Costs and Benefits of Brokers in the Mutual Fund
Industry suggests that investment consultants are actually
worse at picking managers than do-it-yourself investors.
Journalists have observed that the average fund-of-hedge-funds
consistently underperforms the average hedge fund, and that this
underperformance is not due solely to fees.
Simply stated, outside observers find that consultants have not
delivered on their promise of finding skillful managers. The
profession should heed this failure and take steps to change what
has clearly been a losing game.
We have an extensive menu of performance measurements such as the
Sharpe ratio, the Sortino ratio, the Treynor ratio, the
Information ratio and alpha. New and improved measurements such
as risk-and-style adjusted Omega Excess and the time-and-style
adjusted Sturiale Consistency ratio continue to be introduced.
Performance reports often include an array of such measures for
the edification of sophisticated clients. But truly sophisticated
investors should substantiate the accuracy of the benchmark
before trusting any measure of performance. If the benchmark is
wrong all of the analytics are wrong.
New measures do nothing to get us back to the basics of accurate
benchmarking, though the emphasis we put on them is
understandable. Getting the benchmark right is more difficult
than concocting new measurements or improving old ones. Tinkering
with mathematical formulas is more fun than agonizing over the
minutia of benchmark construction. Unfortunately, no amount of
arithmetic can bail us out if the benchmark is wrong.
Accurate benchmarking entails a lot of work, but it's worth the
effort. Without it we can forget about accuracy in performance
evaluation and attribution.
Indexes
A benchmark establishes a goal for the investment manager. A
reasonable goal is to earn a return that exceeds a low-cost,
passive implementation of the manager's investment approach,
because the investor always has the choice of active or passive
management. The relatively recent introduction of style indexes
helps, but these need to be employed wisely.
Before style indexes were developed, there was wide acceptance
and support for the concept of a "normal portfolio," which is a
customized list of stocks with their neutral weights. "Normals"
were intended to capture the essence of the people, process, and
philosophy behind an investment product. But only a couple of
consulting firms were any good at constructing these custom
benchmarks.
Today we can approximate these "designer benchmarks" with style
analysis, sometimes called "the poor man's normals." Although
style analysis may not be as comprehensive as the original idea
of normal portfolios, it makes it possible for many firms to
partake in this custom blending of style indexes. Style analysis
can be conducted with returns or holdings. Both approaches are
designed to identify a style blend that, like normals, captures
the people, process, and philosophy of the investment
product.
Whether a "returns" or "holdings" approach to style analysis is
used, the starting point is defining investment styles. The
classification of stocks into styles leads to style indexes,
which are akin to sector indexes such as technology or
energy.
It's important to recognize the distinction between indexes and
benchmarks. Indexes are barometers of price changes in segments
of the market. Benchmarks are passive alternatives to active
management. Historically, common practice has been to use indexes
as benchmarks, but style analyses have shown that most managers
are best benchmarked as blends of styles.
As a practical matter, we are no worse off with style blends, as
the old practice is considered in the solution and there's always
the possibility that the best "blend" is a single index.
One form of style analysis is returns-based style analysis
(RBSA). This regresses a manager's returns against a family of
style indexes to determine the combination of indexes that best
tracks the manager's performance. The interpretation of the "fit"
is that the manager is employing this "effective" style mix
because performance could be approximately replicated with this
passive blend.
Another approach, called holdings-based style analysis (HBSA),
examines the stocks actually held in the investment portfolio and
maps these into styles at points in time. Once a sufficient
history of these holdings-based snapshots is developed, an
estimate of the manager's average style profile can be developed
and used as the custom benchmark.
Note that HBSA, like normal portfolios, starts at the individual
security level and that both normal portfolios and holdings-based
style analysis examine the history of holdings. The departure
occurs at the blending. Normal portfolios blend stocks to create
a portfolio profile that is consistent with investment
philosophy. HBSA makes an inference from the pattern of
point-in-time style profiles and translates the investment
philosophy into style.
Choosing between RBSA and HBSA is a complicated business. The
major trade-off between the two approaches is ease of use, where
RBSA has an edge, versus accuracy and ease of understanding,
where HBSA comes out on top.
RBSA has become a commodity that is quickly available and
operated with a few points-and-clicks. Some websites offer free
RBSA for a wide range of investment firms and products. Find the
product, click on it, and out comes a style profile. Offsetting
this ease of use is the potential for error. RBSA uses
sophisticated regression analysis to do its job. As in any
statistical process, data problems can go undetected and
unrecognized, leading to faulty inferences. One such problem is
multicollinearity, which exists when the style indexes
used in the regression overlap in membership. Multicollinearity
invalidates the regression and usually produces spurious results.
The user of RBSA must trust the "black box," because the
regression can't explain why that particular blend is the best
solution.
In the 1988 article that introduced RBSA, Nobel laureate William Sharpe says the "style palette" indexes used in this approach should be
Mutually exclusive (no class should overlap with another)
Exhaustive (all securities should fit in the set of asset
classes)
Investable (it should be possible to replicate the return of each
class at relatively low cost)
Macro-consistent (the performance of the entire set should be
replicable with some combination of asset classes).
The criterion of mutual exclusivity addresses the
multicollinearity problem. The other criteria provide solid
regressors for the style match. The only indexes that meet all of
these criteria are provided by Morningstar and by my company.
Morningstar is available for U.S. stocks; my own Surz indexes are
provided for U.S., international and global stock markets. Using
indexes that don't meet Sharpe's criteria is like using
low-octane fuel in your high-performance car.
HBSA -- holdings-based style analysis, remember -- provides an
alternative to RBSA. The major benefits of HBSA are that the
analyst can both observe the classification of every stock in the
portfolio as well as question these classifications. This results
in total transparency and easy understanding, but at a cost of
additional operational complexity.
HBSA requires more information than RBSA. It needs individual
security holdings at various points in time, rather than returns.
Since these holdings are generally not available on the Internet,
as returns are, the holdings must be fed into the analysis system
through some means other than point-and-click. Despite the
benefits, this additional work, sometimes called "throughput,"
may be too onerous for some. Like RBSA, HBSA also requires that
stocks be classified into style groups, or indexes.
Sharpe's criteria, already mentioned, work for HBSA as much as
for RBSA. Consistency calls for using the same "palette" for both
types of style analysis. Note though that the "mutually
exclusive" and "exhaustive" criteria are particularly important
to HBSA because it is desirable to have stocks in only one style
group and to classify all stocks.
In certain circumstances, deciding between RBSA and HBSA is
really a Hobson's choice. When holdings data is difficult to
obtain, as with some mutual funds and unregistered investment
products, or when derivatives are used in the portfolio, RBSA is
the only choice. RBSA can also be used to calculate information
ratios, which are style-adjusted return-to-risk measures.
Some researchers are finding persistence in information ratios,
so they should be used as a first cut for identifying skill.
Similarly, when it is necessary to detect style drift or to
understand fully the portfolio's actual holdings, HBSA is the
only choice. Holdings are also required for performance
attribution analysis that is focused on differentiating skill
from luck and style. This level of analysis must use holdings
because performance must be decomposed into stock selection and
sector allocation. Returns cannot make this distinction.
Custom benchmarks developed through either RBSA or HBSA solve the
GIGO problem. But statisticians estimate that it takes
decades to develop confidence in a manager's success at
beating the benchmark, even one that is customized. This is
because when custom benchmarks are used, the hypothesis test
"Performance is good" is conducted across time. An alternative is
to perform this test in the cross-section of other active
managers, which is the role of peer group comparisons.
To continue reading this article, please go to Viewpoint:
Constructing accurate benchmarks (Part II). -FWR
.