Family Office

Analytics: Constructing accurate benchmarks (Part I)

Ronald Surz October 3, 2006

Analytics: Constructing accurate benchmarks (Part I)

It's crucial that we get back to basics and start getting benchmarks right. Ronald Surz is president of PPCA, a San Clemente, Calif.-based software firm that provides advanced performance-evaluation and attribution analytics, and a principal of RCG Capital Partners, a Denver, Colo.-based fund-of-hedge-funds manager. He is a prolific and widely published author.


In this two-part series we discuss the common benchmarks of indexes and peer groups, and offer improvements. In Part I, we outline the shortcomings of common practice and describe indexing. In Part II, we discuss peer groups and show how indexes and peer groups can be unified to provide more accurate benchmarks.



Synopsis

Investment-performance evaluators have lost touch with a basic and self-evident truth: when the benchmark is wrong, all of the analytics are wrong.
The cost of this mistake is high. Investment managers get hired and fired for the wrong reasons. So it's imperative that we get back to basics, that we get the benchmark right.
Fiduciary prudence dictates best practice over common practice, as does the "do no harm" rule.
Indexes and peer groups are the common forms of benchmarks. These are not best practices.
The article describes how accurate benchmarks can be constructed from indexes and how peer group biases can be overcome.

Worth doing well

Safeguarding against "garbage in, garbage out" scenarios was once a primary concern of investment-manager researchers. But in recent years this self-evident "GIGO" admonition has been forgotten by many. As the performance-evaluation industry directed its focus to improving performance measurements, it lost sight of the basic and critical principle of accurate benchmarking.

The result: benchmarks are now routinely mis-specified, and performance measurements use "garbage" benchmarks to end up with "garbage" manager evaluations.

|image1|

The cost of this fundamental error is high. Investment managers are hired and fired for the wrong reasons, with a consequent loss to the investor in ill-spent fees and lost performance. It's imperative that we get back to basics and get the benchmark right. After all, fiduciary prudence dictates best practice over common practice.

The need for accurate benchmarking has come into focus because of increasing interest in portable alpha. Alpha is transported by shorting the benchmark, thereby removing beta effects. But this can be done properly only if the benchmark is known; you can't after all short what you don't know. And of course the benchmark has to be specified properly. This is an insidious problem for hedge funds, where value added, or alpha, can easily be confused with factor exposures such as low market participation in a falling market.

Others have noticed these deficiencies. A study called Assessing the Costs and Benefits of Brokers in the Mutual Fund Industry suggests that investment consultants are actually worse at picking managers than do-it-yourself investors. Journalists have observed that the average fund-of-hedge-funds consistently underperforms the average hedge fund, and that this underperformance is not due solely to fees.

Simply stated, outside observers find that consultants have not delivered on their promise of finding skillful managers. The profession should heed this failure and take steps to change what has clearly been a losing game.

We have an extensive menu of performance measurements such as the Sharpe ratio, the Sortino ratio, the Treynor ratio, the Information ratio and alpha. New and improved measurements such as risk-and-style adjusted Omega Excess and the time-and-style adjusted Sturiale Consistency ratio continue to be introduced. Performance reports often include an array of such measures for the edification of sophisticated clients. But truly sophisticated investors should substantiate the accuracy of the benchmark before trusting any measure of performance. If the benchmark is wrong all of the analytics are wrong.

New measures do nothing to get us back to the basics of accurate benchmarking, though the emphasis we put on them is understandable. Getting the benchmark right is more difficult than concocting new measurements or improving old ones. Tinkering with mathematical formulas is more fun than agonizing over the minutia of benchmark construction. Unfortunately, no amount of arithmetic can bail us out if the benchmark is wrong.

Accurate benchmarking entails a lot of work, but it's worth the effort. Without it we can forget about accuracy in performance evaluation and attribution.

Indexes

A benchmark establishes a goal for the investment manager. A reasonable goal is to earn a return that exceeds a low-cost, passive implementation of the manager's investment approach, because the investor always has the choice of active or passive management. The relatively recent introduction of style indexes helps, but these need to be employed wisely.

Before style indexes were developed, there was wide acceptance and support for the concept of a "normal portfolio," which is a customized list of stocks with their neutral weights. "Normals" were intended to capture the essence of the people, process, and philosophy behind an investment product. But only a couple of consulting firms were any good at constructing these custom benchmarks.

Today we can approximate these "designer benchmarks" with style analysis, sometimes called "the poor man's normals." Although style analysis may not be as comprehensive as the original idea of normal portfolios, it makes it possible for many firms to partake in this custom blending of style indexes. Style analysis can be conducted with returns or holdings. Both approaches are designed to identify a style blend that, like normals, captures the people, process, and philosophy of the investment product.

Whether a "returns" or "holdings" approach to style analysis is used, the starting point is defining investment styles. The classification of stocks into styles leads to style indexes, which are akin to sector indexes such as technology or energy.

It's important to recognize the distinction between indexes and benchmarks. Indexes are barometers of price changes in segments of the market. Benchmarks are passive alternatives to active management. Historically, common practice has been to use indexes as benchmarks, but style analyses have shown that most managers are best benchmarked as blends of styles.

As a practical matter, we are no worse off with style blends, as the old practice is considered in the solution and there's always the possibility that the best "blend" is a single index.

One form of style analysis is returns-based style analysis (RBSA). This regresses a manager's returns against a family of style indexes to determine the combination of indexes that best tracks the manager's performance. The interpretation of the "fit" is that the manager is employing this "effective" style mix because performance could be approximately replicated with this passive blend.

Another approach, called holdings-based style analysis (HBSA), examines the stocks actually held in the investment portfolio and maps these into styles at points in time. Once a sufficient history of these holdings-based snapshots is developed, an estimate of the manager's average style profile can be developed and used as the custom benchmark.

Note that HBSA, like normal portfolios, starts at the individual security level and that both normal portfolios and holdings-based style analysis examine the history of holdings. The departure occurs at the blending. Normal portfolios blend stocks to create a portfolio profile that is consistent with investment philosophy. HBSA makes an inference from the pattern of point-in-time style profiles and translates the investment philosophy into style.

Choosing between RBSA and HBSA is a complicated business. The major trade-off between the two approaches is ease of use, where RBSA has an edge, versus accuracy and ease of understanding, where HBSA comes out on top.

RBSA has become a commodity that is quickly available and operated with a few points-and-clicks. Some websites offer free RBSA for a wide range of investment firms and products. Find the product, click on it, and out comes a style profile. Offsetting this ease of use is the potential for error. RBSA uses sophisticated regression analysis to do its job. As in any statistical process, data problems can go undetected and unrecognized, leading to faulty inferences. One such problem is multicollinearity, which exists when the style indexes used in the regression overlap in membership. Multicollinearity invalidates the regression and usually produces spurious results. The user of RBSA must trust the "black box," because the regression can't explain why that particular blend is the best solution.

In the 1988 article that introduced RBSA, Nobel laureate William Sharpe says the "style palette" indexes used in this approach should be

Mutually exclusive (no class should overlap with another)
Exhaustive (all securities should fit in the set of asset classes)
Investable (it should be possible to replicate the return of each class at relatively low cost)
Macro-consistent (the performance of the entire set should be replicable with some combination of asset classes).

The criterion of mutual exclusivity addresses the multicollinearity problem. The other criteria provide solid regressors for the style match. The only indexes that meet all of these criteria are provided by Morningstar and by my company. Morningstar is available for U.S. stocks; my own Surz indexes are provided for U.S., international and global stock markets. Using indexes that don't meet Sharpe's criteria is like using low-octane fuel in your high-performance car.

HBSA -- holdings-based style analysis, remember -- provides an alternative to RBSA. The major benefits of HBSA are that the analyst can both observe the classification of every stock in the portfolio as well as question these classifications. This results in total transparency and easy understanding, but at a cost of additional operational complexity.

HBSA requires more information than RBSA. It needs individual security holdings at various points in time, rather than returns. Since these holdings are generally not available on the Internet, as returns are, the holdings must be fed into the analysis system through some means other than point-and-click. Despite the benefits, this additional work, sometimes called "throughput," may be too onerous for some. Like RBSA, HBSA also requires that stocks be classified into style groups, or indexes.

Sharpe's criteria, already mentioned, work for HBSA as much as for RBSA. Consistency calls for using the same "palette" for both types of style analysis. Note though that the "mutually exclusive" and "exhaustive" criteria are particularly important to HBSA because it is desirable to have stocks in only one style group and to classify all stocks.

In certain circumstances, deciding between RBSA and HBSA is really a Hobson's choice. When holdings data is difficult to obtain, as with some mutual funds and unregistered investment products, or when derivatives are used in the portfolio, RBSA is the only choice. RBSA can also be used to calculate information ratios, which are style-adjusted return-to-risk measures.

Some researchers are finding persistence in information ratios, so they should be used as a first cut for identifying skill. Similarly, when it is necessary to detect style drift or to understand fully the portfolio's actual holdings, HBSA is the only choice. Holdings are also required for performance attribution analysis that is focused on differentiating skill from luck and style. This level of analysis must use holdings because performance must be decomposed into stock selection and sector allocation. Returns cannot make this distinction.

Custom benchmarks developed through either RBSA or HBSA solve the GIGO problem. But statisticians estimate that it takes decades to develop confidence in a manager's success at beating the benchmark, even one that is customized. This is because when custom benchmarks are used, the hypothesis test "Performance is good" is conducted across time. An alternative is to perform this test in the cross-section of other active managers, which is the role of peer group comparisons.

To continue reading this article, please go to Viewpoint: Constructing accurate benchmarks (Part II). -FWR

.

Register for FamilyWealthReport today

Gain access to regular and exclusive research on the global wealth management sector along with the opportunity to attend industry events such as exclusive invites to Breakfast Briefings and Summits in the major wealth management centres and industry leading awards programmes