Family Office
Analytics: Constructing accurate benchmarks (Part II)

Common practice is not always best practice; we can and must do
better. Ronald Surz is president of PPCA, a San Clemente,
Calif.-based software firm that provides advanced
performance-evaluation and attribution analytics, and a principal
of RCG Capital Partners, a Denver, Colo.-based
fund-of-hedge-funds manager. He is a prolific and widely
published author.
Continued from Viewpoint: Constructing accurate benchmarks (Part II).
Peer groups
Peer groups place a portfolio's performance into perspective by
ranking it against the performance of similar portfolios.
Accordingly, performance for even a short period of time can be
adjudged significant if it ranks in the top of the distribution.
When traditional peer groups are used, the "Performance is good"
hypothesis is tested by comparing performance with that of a
group of portfolios that is presumably managed in a manner
similar to the portfolio that is being evaluated, so the
hypothesis is tested relative to the stock picks of similar
professionals.
This makes sense -- provided someone defines "similar" and then
collects data on the funds that fit this particular definition of
similar. Each peer group provider has its own definitions and its
own collection of funds, so each provider has a different sample
for the same investment mandate. "Large-cap growth" is one set of
funds in one provider's peer group, and another set of funds in
the next provider's peer group. These sampling idiosyncrasies are
the source of the following well-documented peer group
biases.
Classification bias results from the practice of forcing
every manager into a pre-specified pigeonhole, such as growth or
value. It is now commonly understood that most managers employ a
blend of styles, so that pigeonhole classifications misrepresent
the manager's actual style as well as those employed by peers.
Classification bias is the reason that a style index ranks well,
outperforming the majority of managers in an associated style
peer group, when that style is in favor. Conversely, the majority
of managers in an out-of-favor style tend to outperform an
associated index. Until recently it was believed that skillful
managers excelled when their style was out of favor. However,
research has shown that this phenomenon is a direct result of the
fact that many managers in a given style peer group are not
"style pure," and it is this impurity, or classification bias,
that leads to success or failure versus the index.
The illustration below shows the effect of classification bias.
The scatter charts RBSA to locate members of the Morningstar peer
group in style space. As you can see, the tendency is for the
funds to be somewhat similar, but significant compromises have
been made.
|image1|
Classification bias is a boon to client-relations personnel
because there is always an easy target to beat. When your style
is out of favor, you beat the index; when it's in favor, you beat
the median.
Composition bias occurs because each peer group provider has its
own set of fund data. This bias is particularly pronounced when a
provider's database contains concentrations of certain fund
types, such as bank commingled funds, and when it contains an
insufficient number of funds. For example, international managers
and socially responsible managers cannot be properly evaluated
using peer groups because there are no databases of adequate
size. Composition bias is the reason that managers frequently
rank well in one peer group, but simultaneously rank poorly
against a similar group of another provider, as Randall Eley,
president and CIO of Springfield, Va.-based Edgar Lomax, shows in
a 2004 article in Pensions &Investments.
|image2|
Don't like your ranking? Pick another peer group provider. It is
frequently the case that a manager's performance result is judged
to be both a success and a failure because the performance ranks
differently in different peer groups for the same mandate, such
as large cap value.
Survivorship bias is the best understood and most documented
problem with peer groups. Survivor bias causes performance
results to be overstated because defunct accounts, some of which
may have underperformed, are no longer in the database. For
example, an unsuccessful management product that was terminated
in the past is excluded from current peer groups. This removal of
losers results in an overstatement of past performance. A related
bias is called "backfill bias," which results from managers
withholding their performance data for new funds from peer group
databases until an incubator period produces good performance.
Both survivor and backfill biases raise the bar. A simple
illustration of the way survivor bias skews results is provided
by the "marathon analogy." Only 100 runners in a 1,000-contestant
marathon actually finish. Is the 100th runner dead last or in the
top 10%?
Peer group comparisons are more likely to mislead than to inform,
and so they should be avoided. Given the wide use of peer group
comparisons, we realize this position is an unpopular one. The
fact is that sometimes common practice defies common sense. Try
as we may, there is no way to make the biases described above go
away. The most that can be done is to try to minimize the effects
of these biases, which can best be accomplished with the approach
described in the next section.
Unification
Let's summarize what we've covered so far. Custom blended indexes
provide accurate benchmarks, but we have to wait decades to gain
confidence in a manager's success at beating the benchmark. Peer
groups don't have this "waiting problem," but are contaminated by
myriad biases that render them useless. A solution to these
problems is actually quite simple, at least in concept, but was
only recently made practical when the requisite computing power
became available. The solution uses custom benchmarks to create a
peer group backdrop that does not have a waiting problem, that
is, we know right away if a manager has significantly succeeded
or failed.
As noted above, performance evaluation can be viewed as a
hypothesis test that assesses the validity of the hypothesis
"Performance is good." To accept or reject this hypothesis, we
construct an approximation of all of the possible outcomes and
determine where the actual performance result falls. This
solution begins with identification of the best benchmark
possible, like a custom index blend, and then expands this
benchmark into a peer group by creating thousands of portfolios
that could have been formed from stocks in the benchmark,
following reasonable portfolio construction rules. This approach,
illustrated in Exhibit 3, combines the better characteristics of
both peer groups and indexes, while reducing the deficiencies of
each.
|image3|
Statistical significance is determined much more quickly with
this approach than with benchmarks because inferences are drawn
in the cross-section rather than across time. In other words, the
ranking of actual performance against all possible portfolios is
a measure of statistical confidence.
Let's say the manager has underperformed the benchmark by 3%.
Exhibit 4 shows that in a recent quarter this underperformance
would have been significant if the S&P 500 were the
benchmark, but not significant if the benchmark were the Russell
2000. We use 90% confidence as the breakpoint for declaring
significance. Because they provide indications of significance
very quickly, Monte Carlo simulations (MCSs) solve the waiting
problem of benchmarks.
|image4|
There are two central questions in the due diligence process.
"What does this manager do?" -- that is how and what does he
manage -- and "Does the manager do this well?" The first question
addresses the form of the investment, and the second identifies
the substance, or skill. In this context, the benchmark provides
the answer to the first question. The ranking within the
manager's customized opportunity set answers the second question.
Note that in properly constructed MCSs, the benchmark always
ranks median. This provides for the interpretation of an MCS
ranking as the "statistical distance" of return away from the
benchmark.
The MCS approach has been used to evaluate traditional investing
for more than a decade. MCS has yet to be accepted as standard
practice, but this doesn't make it faulty. It took 30 years for
Modern Portfolio Theory (MPT) to gain wide acceptance. Further
improving its potential for acceptance, MCS technology has been
extended to hedge funds, where recognition of the fact that peer
groups don't work for performance evaluation has lowered inherent
barriers to adoption.
Hedge funds
The first question of due diligence -- "What does this manager
do?" -- can be hard to answer in the hedge fund world. Here
though, the old tenet about not investing in things you don't
understand comes in handy. The beta of a specific hedge fund can
be replicated with a long-short blend of passive portfolios such
as exchange-traded funds. We shouldn't pay for beta, but its
identification sets the stage for the second question regarding
substance.
As with traditional long-only investing, MCSs provide the answer
to the question of manager skill. In constructing a specific
custom peer group, Monte Carlo simulations follow the same rules
that individual hedge fund managers follow in constructing
portfolios, going both long and short, following custom benchmark
specifications on both sides, as well as using leverage,
employing controls such as those shown in this illustration.
|image5|
An MCS approach addresses the unique challenge of evaluating
hedge fund performance by randomly creating a broad
representation of all of the possible portfolios that a manager
could have conceivably held following his unique investment
process, and so applies the scientific principles of modern
statistics to the problem of performance evaluation. This solves
the problem arising from the fact that members of hedge fund peer
groups are uncorrelated with one another, which violates the
central homogeneity principle of peer groups.
Some observers say it's good that the members of hedge fund peer
groups are unlike one another, because this produces
diversification benefits. While it may be good for portfolio
construction, it's bad for performance evaluation. Comparing
funds in hedge fund peer groups is like comparing apples and
oranges. Hedge funds really do require not only custom MCS peer
groups for accurate evaluation, but also custom benchmarks that
show both the longs and shorts, thereby estimating the hedge
fund's beta. A ranking in a hedge fund MCS universe renders both
the alpha and its significance.
Attribution
Up to this point we have been discussing performance evaluation,
which determines whether performance is good or bad. The next,
and more crucial, question is "Why?" -- the role of performance
attribution. Attribution is important because it is
forward-looking, providing the investor with information for
deciding if good performance is repeatable in the future. We want
to know which sectors had good stock selection or favorable
allocations and if the associated analysts are likely to continue
providing these good results.
We also want to know what mistakes have been made and what is
being done to avoid these mistakes in the future. These are
important considerations that fortunately can be addressed with
the same accurate, customized benchmark that we've described for
use in performance evaluation.
This practice enables us to steer clear of the problem associated
with more common attribution systems, i.e., the frequent
disconnect between the benchmark used for evaluation and the one
used for attribution. This disconnect is due to the fact that
most performance attribution systems are currently limited to
popular indexes and cannot accommodate custom benchmarks. This
unfortunate limitation creates the
very "garbage-in, garbage-ouy" problem we set out
to avoid. We should not throw away all of our hard work in
constructing an accurate benchmark when it comes to the important
step of attribution.
Put another way, we shouldn't bother with attribution analyses if
we can't customize the benchmark. We'll just spend a lot of time
and money to be misled and misinformed.
Getting back to basics is more than just a good thing to do.
Getting the benchmark right is a fiduciary imperative, an
obligation. Even if you don't agree with this article's
recommended best practices, you can't deny the failure of common
practices. Something has to change. Current common practices are
not best practices; we can and must do better.
The components of investment return as we understand them today
are summarized in the accompanying graphic entitled "The Complete
Performance Picture." The new element in this picture, beyond
Modern Portfolio Theory, is indicated by the box labeled "Style
Effects." MPT, which relies exclusively on market-related
effects, has not worked as predicted because of the powerful
influences of investment style. It's easy to confuse style with
skill, but difficult to make good decisions once this mistake has
been made.
|image6|
Accurate benchmarks are customized to each individual manager's
style and should be used for both performance evaluation and
performance attribution. Monte Carlo simulations expand these
custom benchmarks into accurate and fair universes, similar to
peer groups but without the biases, and provide indications of
significance very quickly. Both traditional and hedge fund
managers are best reviewed with these techniques. -FWR
.