Factor Investing Clearing the Air – Datamining and the Antidotes
Tommi Johnsen, PhD | Advisory Board Trent Ambler | Portfolio Manager
Factor Investing Origins and Implications
The roots of factor investing can be traced to work published in the early 1990s by two academics: Ken French and Eugene Fama. In two of their publications[1], they identified a set of risk factors priced to consistently and robustly provide a return premium. Three of these, Beta (market), Size (market capitalization), and Value (book-to-market) survived the rigorous analysis required by academia and became the 3 Factor Fama French risk model.
Thirty years later, where are we? Did the research pan out? Is factor investing real? It’s been a factor revolution in which the weight of the evidence is strongly in the affirmative. A quant strategy once only available to quantitative boutique funds and premier investment firms has been democratized by “smart beta” products, which have garnered AUM over $1 trillion in the US, with more trillions to come from Asia, Europe, and the Pacific countries.
On the other hand, much rhetoric, debate, and misinformation have arisen, reflecting the fundamental misunderstanding of why factors offer reliable risk premiums and the number of factors there should be. Practitioners and academics alike add new ‘factors’ into the mix with regularity. Fama and French themselves recently expanded their original three-factor model to include two new factors: profitability and investment.[2]
For confidence in the reliability of any factor premium, the mechanism for its existence and persistence should be well articulated. The minimum threshold for including a factor in an investment process is the ability to explain why taking on exposure to that factor offers a payoff.
Offering a defensible economic rationale for the existence of a factor premium is a critical consideration when evaluating factors for a live investment process.
The justifications for the existence of factor premiums often come from either a risk-centric view of the world or a more behaviorally driven view. The factor premiums that can best be understood under both paradigms are those that have an abundance of academic research, a long history of practical use behind them, and substantial out-of-sample performance.
The number of factors, or better-stated factor styles, that survive shrinks dramatically when reliability and magnitude of returns are a priority. There are perhaps fewer than a dozen of these true style factors that are available to investors.
One of the harshest criticisms leveled against factor investing maintains that factors are “datamined,” where researchers “snoop around in the data” and simply capitalize on random events. If this were true, if the expected payoff to a factor is a result of data mining, then we shouldn’t expect that payoff to be reliable, let alone persist. By definition, random events don’t predictably repeat themselves, so a factor born of data mining cannot be counted on to provide a stable and persistent source of returns and should not be included in any investment process.
The modern era has gifted investors with an abundance of data; hundreds of millions of potential data points could be used to inform an investment process. Add recent advances in processing power and technologies such as AI, it is easy to see how the data could be sufficiently ‘tortured’ to generate a false ‘signal.’ And so, the savvy investor looking to make sense of all of this data, who understands the incredible potential made available by such a wealth of information, must also be careful to avoid relying on false signals born from datamining.
For decades academics and practitioners alike have continuously broadened their understanding of the reasons for the existence and persistence of investment returns. As access to data and computing power has expanded, so too has the capacity for the identification of false signals and spurious return drivers, which cannot be relied upon to explain factor premiums. Both academics and practitioners will find compelling reasons to guard against data mining in their respective research processes. Whether it be the academic concerned with publishing rigorous and defensible results or the practitioner aiming to build a real-world investment process around reliable and meaningful return expectations, the antidotes to data mining have perhaps never been more relevant.
The Antidote for Data Mining
Datamining occurs when the researcher tests many variables in the absence of a good theory or proper methodology until a significant result is found. There are likely numerous studies and backtests that have great results that you cannot really trust or believe in. You cannot elicit confidence in the investment strategy because it makes no fundamental sense. Ensuring that the results are not “one-time wonders” when the proper methodology is not used is challenging.
For the researcher and investor, several activities should precede any analysis of factors:
1. Develop and present a theory regarding the underlying mechanism of interest and what hypotheses can be derived from such a theory. Do this before conducting any data analysis.
2. Define the methodology, including the period of analysis, how the data will be handled or transformed, and what statistical approach will be used.
3. Use a t-statistic criterion that is greater than 3 to avoid diluting significance.
4. Be sure to include out-of-sample testing of some sort. For example, out-of-sample testing conditions can include periods surrounding the actual time frame, different asset classes, non-US markets, sectors, countries with varying governance norms, and varying tax rates and trading costs.
The number of investment factors that survive such stringent criteria is not large. However, a research process that adheres to these guidelines while attempting to find a signal in the vast quantities of data available to modern investors does offer some confidence in the persistence and magnitude of the returns that are discovered.
Searching For an Edge in Factor Strategies
An investment professional seeking an edge will expend effort creating and discovering better ways to gain exposure to known factors. Of course, researching new and novel exposures has excellent benefits. Still, a great deal of alpha exists in better understanding the payoff mechanisms and methods for gaining exposure amongst known factor styles.
For example, signals that pull from disparate data sources to establish factor exposures might be given more weight in a more extensive process. Suppose a signal generated with data pulled from the Income Statement, the Balance Sheet, the Statement of Cash Flows, and even potentially some non-traditional accounting data are all in alignment that a given company is a ‘value’ play. In that case, the signal should perhaps be given more credence than one that pulls only from a singular accounting statement.
A singular signal might suggest exposure to a given factor style for anomalous reasons. Including a position in an investment portfolio based on a singular signal may not actually result in exposure to the desired factor style. When the signal is confirmed from sources that pull from several different sides of a company’s operations or other datasets, there can be more confidence that the position will deliver the desired exposure.
Consider an investor looking to build a value exposure into a portfolio. A viable economic rationale for the existence of the value premium is compensation for distress risk; investors who choose to hold exposure to value are getting paid for the risk that these companies may ultimately fail. Under this paradigm, the relevant signal is ‘distress,’ and the more sources confirming said distress, the more likely the position will establish exposure to the desired payoff mechanism.
Behavioralists might argue that value pays because companies whose recent performance has suffered are penalized unfairly, their current poor performance is over-extrapolated into the future, and the potential for a recovery is not adequately reflected in the firm’s market price. An investor looking to gain exposure to value under this paradigm will want to find signals that suggest over-extrapolation, again prioritizing positions that confirm the exposure from multiple signals. Value as an investment style has been around for a very long time and yet there continues to be alpha in better understanding why value exposure pays and by therein discovering novel and better ways to establish exposure to that style. The good news is that style exposures like momentum, quality, and others offer similar rewards.
[1] “The Cross Section of Expected Returns” (1992) and “Common Risk Factors on Stocks and Bonds (1993)
[2] Fama, Eugene F., and French, Kenneth R, “A Five-Factor Asset Pricing Model” (2014)