The 2023 play by play data has now been made available at:

https://www.retrosheet.org/game.htm

This data model used data from #retrosheet to make yearly day by day snapshots from 1903-2022. Today starts the laborious compile of their 2023 data and importing it into the database here.

Retrosheet.org is an amazing baseball historical resource described in detail here:
https://www.mlb.com/news/the-history-of-retrosheet

This thread will cover how the sausage is made each December to build a historical year database.

1/

#baseball

Retrosheet Event Files

The event table contains play by play information used to calculate daily snapshots of all players for a certain year. The two attached pics show the retro version of inning 1 of opening day between Cubs and Brewers and the compiled version.

The retrosheet version is coded in baseball scorekeeper jargon. When we were kids every adult who took us to ballgames made us keep score. I recall Wrigley had 10 cent folded cardboard scoresheets while at Comiskey they came with a magazine.

#baseball

2/

The event parser translates the play string and keeps track of game state like who is on what base, pitcher, catcher, how many runs scored, play event, fielders, etc. This table is used by a subsequent scripts to calculate WAA, BA, OBP, SLG, etc. Here is a sample:

https://baseball-handbook.com/index.php?events=202204070_MIL_CHN

Since this model now handicaps baseball daily during current year using data from mlb.com we now only use the retro event calculations for integrity checks. Historical years however must rely on event data. 3/

202204070 MIL CHN Events

Events which occurred during game between Milwaukee Brewers and Chicago Cubs on 2022-04-07