Mastodawn

Lily Dec 26, 2022

Question of the day:
You are studying a certain asset with OHLCV data.
You have two copies of the OHLCV data from two different, independent data sources (same frequency and time series length).
You notice on diffing between the two there are conflicts (maybe they disagree on open prices for 5, non-consecutive days).
How do you determine which one is right?

Show thread

RuhmUndAnsehen

@nope_its_lily In terms of backtesting, the one that yields worse performance is the better one.

Another alternative would be to research reasons. Maybe one dataset contains later changes to market data and the other doesn't.