Mastodawn

Question of the day:
You are studying a certain asset with OHLCV data.
You have two copies of the OHLCV data from two different, independent data sources (same frequency and time series length).
You notice on diffing between the two there are conflicts (maybe they disagree on open prices for 5, non-consecutive days).
How do you determine which one is right?

Show thread

Dallman Ross Dec 26, 2022

@nope_its_lily Tough question. Maybe track it in real time for a few days more, at the open, for instance, and see if one source is obviously less trustworthy.

Show thread

DinCA 🏳️‍🌈 🎾 🐶 🖖🥄Dec 26, 2022

@nope_its_lily which which data set does the main brokerage houses use? Try to use the data set that the market uses the most.

Show thread

DinCA 🏳️‍🌈 🎾 🐶 🖖🥄Dec 26, 2022

@nope_its_lily also, the answer to which one is right: neither and both!

Show thread

RuhmUndAnsehen Dec 26, 2022

@nope_its_lily In terms of backtesting, the one that yields worse performance is the better one.

Another alternative would be to research reasons. Maybe one dataset contains later changes to market data and the other doesn't.