Massive Problems (Part 2)
How I Handle Splits and Dividends
Corporate actions are a big headache.
First you need to figure out the math. Math is confusing.
But the real challenge comes after: storing actions, rebuilding databases, handling fake splits, and keeping your entities consistent.
In this article, I’ll show you how I solved these problems.
What’s Adjusted / unadjusted
Unadjusted prices are what stocks actually traded for on that date.
If you want to filter stocks by price—say, only stocks above $5—use unadjusted prices.
Same goes for volume.
The actual number of shares that traded on any day is the unadjusted number.
So if you only want stocks that trade over 30M shares per day on average, use unadjusted volume.
But if you're looking for multi-day price patterns, use adjusted prices. Otherwise you'll see fake gaps that appear after corporate actions.
Adjusted prices account for splits and dividends.
If a stock splits, all the adjusted prices before that split aren’t the actual traded prices—they’re recalculated to match the split.
The problem with Polygon: their flat files API only gives you unadjusted prices. If you use their REST API, you need one call for adjusted prices and another for unadjusted.
If you’re downloading years of 1-minute data like me, you have two options: double your API calls and download time (very bad), or adjust the prices yourself based on splits.
Adjusting yourself is faster. But it also sucks.
How I adjust for splits
First, download and maintain all splits and dividends from Polygon. That’s easy—there’s an endpoint for it.
Then, every time you download a new date, you have to adjust all past prices in your database.
Here’s how.
A split is just a change in share count. Nothing economic happened. Only the units changed.
So the rule is:
Never adjust prices forward
Always adjust prices backward
When a split happens on date S with ratio:
k = shares_after / shares_before
Examples:
2-for-1 split → k = 2.0
4-for-1 split → k = 4.0
1-for-10 reverse split → k = 0.1
For all dates before the split:
adjusted_price = unadjusted_price / k
adjusted_volume = unadjusted_volume * k
For dates on or after the split:
adjusted_price = unadjusted_price
adjusted_volume = unadjusted_volume
That’s it.
Important rules:
The most recent adjusted price always equals the most recent unadjusted price
Dollar volume stays the same
Only historical data changes when you detect a new split
If a new split is discovered later, you must re-adjust all prior history again.
How I adjust for dividends
Dividends are different from splits.
A split changes units.
A dividend changes value.
This is how I handle dividends:
For a dividend of amount D on ex-date S:
For all dates before the dividend:
adjusted_price = unadjusted_price - D
(Optionally scaled multiplicatively for total-return calculations.)
For dates on or after the dividend:
adjusted_price = unadjusted_price
Volumes are never adjusted for dividends.
Key points:
Dividends only affect historical prices
Latest adjusted price == latest unadjusted price
Dividends do not change share count
Splits and dividends are applied independently
If you don’t care about total-return strategies, you can even skip dividend adjustment entirely and still be correct for most trading use cases.
But there’s a problem
Why adjusting for splits sucks
Of course with Polygon, it’s never just math.
Some splits in their database aren’t real. You have to figure out which ones are legit. Otherwise you’ll adjust data that shouldn’t be adjusted, creating fake gaps.
How I handle Fake Splits
I validate every split before storing it.
Here’s how:
For a split on date D:
Get the trading day before (D-1) and after (D+1)
Fetch adjusted AND unadjusted closes for both days
Calculate the ratio for each day:
ratio_before = adjusted_close(D-1) / unadjusted_close(D-1)ratio_after = adjusted_close(D+1) / unadjusted_close(D+1)
If the ratios are identical → fake split
Why this works:
Real split: Unadjusted price changes, adjusted stays the same. The ratio changes.
Fake split: Both prices stay the same. The ratio stays the same.
If the ratio barely changes (less than 0.0000000001), I reject it.
Fake splits get logged to fake_splits.txt and never touch my adjustment files.
Zombie Tickers
As I wrote in the last article, tickers get reused. You can’t just adjust all past dates for a ticker—you need to know its entity boundaries.
An adjustment only applies if:
The adjustment’s entity dates match the data’s entity dates (both start and end)
The adjustment date falls within the entity’s active period
This means AAPL Entity 1’s 2003 split only affects AAPL Entity 1’s data. AAPL Entity 2’s 2020 split only affects AAPL Entity 2’s data.
No cross-contamination.
Conclusion
If you follow part 1 and part 2, you should now have a consistent database without survivorship bias and adjusted to corporate actions.
But there’s one more problem before you can start scanning and backtesting.
Late prints.
I’ll cover that in the next article.



This is exactly the kind of practical guidance you don't find elsewhere. The fake splits validation logic is clever, especially using the ratio test to catch bogus corporate actions. I've dealt with similar data quality nightmares and your approach to entity boundaries solves a problem most people don't even realize exists. The zombie ticker issue is something I wish I'd understood earlier in my backtesting journy. Really apreciate the code-level detail.