You've spent weeks creating your strategy. The backtest shows a Sharpe Ratio of 2.5, Profit Factor of 3.2, and a flawless equity curve. You take it live and lose money.
It's not you. It's not bad luck. Your backtest lied.
Backtests have structural problems that artificially inflate results. Some are technical errors, others are subtle statistical biases, and others stem from how your platform works. If you want to first understand what algorithmic trading is before diving into its pitfalls, we recommend starting there. This article documents the 7 most dangerous problems and how to detect them.
"Over 90% of strategies that work in backtest fail in real implementation." — Marcos López de Prado, Advances in Financial Machine Learning
Why that "perfect" equity curve never replicates in live trading
Is your backtest telling the truth?
Upload your strategy and get professional metrics. Our analysis automatically detects warning signs.
Analyze my strategy →The 7 deadly biases of backtesting
Any of these biases can completely invalidate a backtest. Most traders have at least 2 or 3 active in their tests without knowing it.
| # | Bias | What it does | Typical impact |
|---|---|---|---|
| 1 | Look-ahead bias | Uses future information | Critical |
| 2 | Survivorship bias | Ignores bankrupt assets | Critical |
| 3 | Overfitting | Fits noise, not signal | Critical |
| 4 | Data snooping | Mining spurious patterns | High |
| 5 | No slippage/commissions | Ignores real costs | High |
| 6 | No intrabar data | Ignores internal movement | Medium-High |
| 7 | Spot vs Futures | Different products | Medium |
How to detect and prevent each bias
| Bias | How to detect | How to prevent |
|---|---|---|
| Look-ahead bias | Check if signal uses data not available in real-time | Use only data up to current bar; avoid future-looking functions |
| Survivorship bias | Check if universe includes delisted assets | Use databases with full history including bankruptcies |
| Overfitting | Sharpe > 3, Profit Factor > 3, perfect equity | Limit parameters, validate out-of-sample and Walk Forward |
| Data snooping | Many combinations tested without statistical adjustment | Apply Deflated Sharpe Ratio or White's Reality Check |
| Selection bias | Only the best asset or period is shown | Test across multiple assets and periods; report all results |
| Ignored costs | Results without deducting real costs | Include realistic slippage and commissions per asset |
| Intrabar data | Stops or targets trigger without being reflected | Enable bar magnifier or use tick/minute data |
To dive deeper into how to correctly interpret these warning signs, check our guide on advanced trading metrics.
Let's examine each one in detail.
Look-ahead bias: the time traveler
Look-ahead bias occurs when your backtest uses information that didn't exist at the time of the decision. It's like trading with a crystal ball showing the future.
Look-ahead bias (definition): A bias that introduces future information into past decisions in a backtest. Any data that wasn't available at the time of the trade contaminates results.
Common examples
Using tomorrow's price today
"If tomorrow's opening price is higher than today's close, buy today". Impossible in reality because you don't know the future. Important: In generic programming languages like Python or C++, this type of error is possible because you can access any index in the data array. However, trading-specific languages like EasyLanguage (TradeStation) or Pine Script (TradingView) are designed to prevent this: they only allow access to past or current data, never future.
Indicators recalculated with future data
Some indicators "repaint" - they recalculate with each new bar, changing history. What you see today isn't what existed yesterday.
Economic data before publication
"If quarterly GDP is positive, go long". But GDP is published weeks after the quarter ends. Your backtest uses the data from day 1 of the quarter.
How to detect it
- Results too good to be true: If your strategy seems to "know" when big moves are coming, it probably has look-ahead
- Repainting indicators: Compare current indicator values with historical screenshots. If they differ, it repaints
- Check operation order: Is the decision made BEFORE having the data it uses?
How to avoid it
- Use only data available at the time of the decision
- Execute orders at the NEXT bar's open price, not the current one
- Verify indicators don't repaint (many TradingView ones do)
- For economic data, use publication date, not data date
Survivorship bias: the dead don't speak
Survivorship bias occurs when you backtest only with assets that exist today, ignoring those that went bankrupt, were delisted, or absorbed. It's like studying "the secret to success" by only interviewing millionaires.
Survivorship bias (definition): A bias that overestimates results by excluding assets that ceased to exist (bankruptcies, mergers, delisting) from the analysis. Only "survivors" are analyzed.
The Enron/Lehman case
A backtest of the S&P 500 from 2000 to today with CURRENT index components automatically excludes:
- Enron - bankrupt in 2001, was the 7th largest US company
- WorldCom - bankrupt in 2002, massive accounting fraud
- Lehman Brothers - bankrupt in 2008, financial crisis
- Hundreds more companies that were absorbed, delisted, or went bankrupt
Your strategy never bought these companies in your backtest because they don't exist today. But in 2007, Lehman Brothers was one of the world's largest. Your momentum strategy would have bought it.
The real impact
Studies show survivorship bias can inflate annual returns by 0.5% to 1.5%. Over a 20-year backtest, that can turn a mediocre strategy into apparently brilliant.
How to avoid it
✅ Use "point-in-time" data
Providers like CRSP, Compustat, or premium services include all historical index components, not just current ones.
✅ Include delisted assets
If a company was delisted, assume partial or total loss. Don't simply ignore it because it no longer exists.
Overfitting: the perfect curve that lies
Overfitting occurs when your strategy captures past noise instead of real patterns. It's the most common and dangerous sin of backtesting.
Overfitting (definition): Excessively adjusting a strategy to historical data, capturing random fluctuations instead of generalizable patterns. The strategy works perfectly on the past but fails in the future.
"With enough parameters, I can fit an elephant." — John von Neumann
Warning signs
| Metric | Suspicious value | Realistic value | Why |
|---|---|---|---|
| Sharpe Ratio | > 3.0 | 0.5 - 2.0 | World's best funds rarely exceed 2.0 |
| Profit Factor | > 3.0 | 1.3 - 2.0 | Robust strategies are in 1.5-2.0 range |
| Win Rate | > 80% | 40% - 65% | Trend strategies typically have 35-45% |
| Max Drawdown | < 5% | 15% - 30% | If DD is very low, it's probably curve-fitted |
| Total trades | < 30 | > 100 | Few trades yield statistically insignificant results |
According to Bailey and Lopez de Prado (2014), a backtest with a Sharpe Ratio above 3.0 has a high probability of being the product of overfitting. To evaluate whether a result is genuine, they proposed the Deflated Sharpe Ratio (DSR), which adjusts the observed Sharpe for the number of tests performed, the skewness, and the kurtosis of returns. If you have tested dozens of variants, you need a much higher Sharpe for it to be statistically significant.
Why more parameters = more danger
Each parameter you add is a "dial" you can adjust until the backtest works. With 10 parameters and 10 values each, you have 10 billion combinations. Guaranteed some will be "profitable" by pure chance.
Robert Pardo in Design, Testing, and Optimization of Trading Systems recommends: "If you need more than 5 rules to describe your strategy, you're probably fitting noise".
Practical rule
If your strategy has more than 4 parameters, ask yourself why. Each one should have a clear economic justification, not simply "because it improves the backtest".
Monte Carlo: the ultimate stress test
Would your strategy survive 10,000 random scenarios? Our Monte Carlo simulation reveals whether it's robust or just noise.
Try Monte Carlo →Data snooping: mining noise
Data snooping is testing so many variants until you find one that works by chance. It's the statistical equivalent of throwing darts until you hit the bullseye and then bragging about your aim.
Data snooping (definition): Intensively searching for patterns in data until finding something that appears to work, inflating statistical significance because the same dataset is used repeatedly to generate and test hypotheses.
The 7,846 rules study
Sullivan, Timmermann & White (1999) conducted a seminal study published in the Journal of Empirical Finance. They analyzed 7,846 technical trading rules over 100 years of Dow Jones data.
Result: Many rules showed high returns in the test period. But when they applied White's Reality Check (a statistical method adjusting for multiple tests), almost none showed real predictive power.
Absurd example that "works"
Andrew Lo from MIT documents cases like: "Go long on stocks with letter S in the third position of the ticker and short those with U". With enough tests, even this can appear "profitable" in a specific period.
The multiple testing problem
If you test 100 strategies with a 95% confidence level, you expect 5 to appear "significant" by pure chance. If you test 1,000, it's 50. If you test 10,000 (easy with modern computers), you'll have 500 "winning" strategies that are pure noise.
According to Harvey, Liu, and Zhu (2016), at least 50% of published factors in academic finance are false positives resulting from data snooping. If this happens in peer-reviewed academic research, imagine what occurs when a retail trader tests thousands of combinations without any statistical adjustment.
Solutions
The Deflated Sharpe Ratio (DSR)
David H. Bailey and Marcos Lopez de Prado proposed the Deflated Sharpe Ratio as a solution to the multiple testing problem. The idea is to adjust the observed Sharpe Ratio to account for how many strategies you tested before finding "the good one":
Deflated Sharpe Ratio (DSR):
DSR = SR * sqrt(T) / sqrt(1 + (skew/6)*SR + ((kurt-3)/24)*SR^2)
Adjusted for N tests performed. Where: SR = observed Sharpe Ratio, T = number of observations, N = number of backtests tried, skew = return skewness, kurt = kurtosis. The more tests you have run (higher N), the higher the SR must be to be considered significant.
- Deflated Sharpe Ratio (DSR): Metric by David H. Bailey that adjusts Sharpe Ratio for number of strategies tested
- White's Reality Check: Statistical test by Sullivan, Timmermann, and White (1999) that evaluates whether the best strategy found truly outperforms the benchmark after considering all variants tested
- Pre-specify hypotheses: Document your strategy BEFORE seeing results
- Out-of-sample: Reserve data you never use to generate ideas
- Fewer parameters: The fewer "dials", the lower snooping risk
Slippage and commissions: the invisible cost
A backtest without slippage and commissions is fantasy. I've seen scalping strategies that appeared profitable but inverted the equity curve when realistic costs were added.
"Any backtest that doesn't model commissions and slippage shouldn't be considered representative of the strategy." — Anonymous professional trader
Every trade has a real cost that goes far beyond the broker commission. To calculate the true impact, use these formulas:
Estimated slippage per side:
Slippage = average spread / 2 + market impact
Market impact depends on your order volume relative to the asset's average volume. The larger the relative size, the greater the slippage.
Total cost per trade (round trip):
Cost = entry commission + exit commission + entry slippage + exit slippage
Each trade has 4 cost impacts. A scalper with 50 daily trades suffers 200 cost impacts per day. That's why high-frequency strategies are so sensitive to costs.
Typical slippage on ES futures (E-mini S&P 500) is approximately $12.50 per side (1 tick) under normal conditions, but can exceed $50 during high-volatility events such as macro data releases or Fed decisions. Including these extreme scenarios in your backtest is key to evaluating the real robustness of the strategy.
How much slippage to include by asset
| Asset | Typical slippage (per side) | Adverse conditions |
|---|---|---|
| ES (S&P 500 Futures) | ~$12.50 | $25-50 |
| NQ (Nasdaq Futures) | ~$20 | $40-80 |
| Liquid stocks/ETFs (SPY) | $0.02-0.05 | $0.10-0.20 |
| Small cap stocks | $0.05-0.15 | $0.30+ |
| EUR/USD (Forex) | 0.5-1 pip | 2-5 pips |
| Crypto (BTC/USDT) | 0.01% - 0.05% | 0.1% - 0.5% |
| Illiquid futures (KC, OJ) | $20-40 | $50-100+ |
The 2% rule
Slippage shouldn't exceed 2% of trade value. If it does, you're probably trading an asset too illiquid for your position size.
Stress tests for slippage and commissions
Our Algo Strategy Analyzer tool includes stress tests that simulate increased slippage and commissions. You can see how your strategy would perform if actual costs were 2x or 3x higher than expected - something common in high volatility conditions.
Commissions that change the equation
Commissions vary enormously between brokers and account types. A scalping strategy doing 50 daily trades with $5 commission per trade pays $250/day = $62,500/year in commissions alone.
For quality market data you also need to consider subscription costs.
Intrabar data: what you don't see kills you
Bars are aggregations. They hide all movement that occurred during their formation. A backtest using only bars assumes price went in a straight line from open to close. Unrealistic.
The problem of bar-based backtesting
Imagine a 30-minute green bar (close > open). Your backtest assumes price rose continuously. But if you look at the 1-minute chart for the same period, you'll see:
- Price went up, then dropped 2%, then rose again
- Your 1.5% stop loss would have triggered mid-bar
- But the backtest doesn't know, because it only sees the green close
Comparison: trade execution with and without intrabar data
Scenario: a candle whose range includes both entry level and stop loss
Assumes price rose directly from open to close, passing through entry without hitting stop.
Sees that price first hit entry, then dropped to stop loss, then rose. Trade closed at a loss.
Same trade, opposite results. Without intrabar data, the backtest lies.
This problem affects all platforms
The intrabar data problem doesn't depend on the platform but on the backtest logic. TradingView, TradeStation, NinjaTrader, MultiCharts - all use bars by default. If you don't enable the intrabar data option, stops and targets can be completely ignored during simulation.
Bar Magnifier and alternatives
The solution is to enable intrabar data:
| Platform | Option | Location |
|---|---|---|
| TradeStation | Look-Inside-Bar Backtesting | Format Strategy → Properties → General |
| MultiCharts | Bar Magnifier | Strategy Properties → Backtesting |
| NinjaTrader | Tick Replay | Strategy Analyzer → Data Series |
| TradingView | Bar Magnifier | Strategy Settings → Properties → Execution |
In Python with Backtrader or similar, you can use two datasets: one for signals (30M) and another for stops/targets (1M or tick).
Spot vs Futures: different products
Backtesting on the SPX index and trading ES futures is comparing apples to oranges. They're related products but with different dynamics.
Critical differences
Spot/Index
- "Theoretical" price of the asset
- No expiration
- Doesn't include carry costs
- Dividends included in price
Futures
- Derivative contract
- Expires (roll every quarter)
- Contango/backwardation affects price
- Dividends NOT included
The roll problem
Futures expire. Every quarter (or month, depending on the product) you have to "roll" to the next contract. This has a cost:
- Contango: Next futures is more expensive → you lose on roll
- Backwardation: Next futures is cheaper → you gain on roll
A backtest on spot data ignores this cost. In commodities with strong contango (VIX, oil), roll can cost 15-20% annually that your backtest doesn't show.
Solution
Backtest with backadjusted continuous futures data. Or use the actual contract and program the roll manually. Never backtest on spot if you're trading futures.
How to create a backtest that doesn't lie
The perfect backtest doesn't exist, but you can minimize the lies. Here's the checklist I use before trusting any result.
Backtest Validation Checklist
If you pass all points, you have a backtest you can start to trust. But it's still not a guarantee of live success. For that you need complete validation: Walk Forward, Monte Carlo, and paper trading.
Conclusion
Backtests lie. It's their natural state. Not out of malice, but because simulating the past is inherently imperfect. Your job as an algorithmic trader is to minimize the lies, not eliminate them.
The 7 biases we've covered (look-ahead, survivorship, overfitting, data snooping, no costs, no intrabar, spot/futures) are the most dangerous because they're invisible. A perfect equity curve is the biggest warning sign. Pay special attention to drawdown in trading: if your backtest shows a maximum drawdown below 5%, it is almost certainly overfitted.
The true value of a backtest isn't validating winning strategies. It's discarding losing ones. If your strategy survives a rigorous backtest, you have something worth testing live. If it doesn't survive, better to know before risking money.
The next step is learning to properly validate strategies that pass this initial test.
Validate your strategy with professional rigor
Walk Forward, Monte Carlo, advanced metrics. Find out if your backtest is telling the truth before risking real capital.
Validate my strategy for free →