League Updates

The State of our Stats

Friends:

Dave, Jamie and I have spent considerable time — especially Dave — the last two days examining our stats collection and computing system.  We have even corresponded with Baseball Reference, who gave us a helpful but incomplete response.  Here are our most important findings:

  • MLB routinely makes errors in its initial daily stats reports used by Baseball Reference, our source for daily stats.  We always knew errors were possible, but we have discovered they are considerably more frequent than we had ever suspected.
  • BR doesn’t catch those errors in the data they supply us.  Apparently they do not correct errors themselves.  Instead, MLB will catch and correct errors, and send those corrections along in separate reports to BR.  
  • We do not necessarily get those corrections under our current system.  We just take each new daily report and lay it on top of previous days to get our running totals each month.   
  • The errors are generally small, and definitely random — at least as to hitters. And there may be a tendency for GIDP to go under-reported.  Their effect on the standings is small, but possibly could come into play in very close races.
  • Under our old Baseball Prospectus system these errors were occurring, but since we were using month-to-date stats, we would get the corrections, generally without noticing anything.  We were still vulnerable to any errors on the last day of the month, but I don’t recall ever noticing one. 
  • Under our current system we are vulnerable to every day’s errors, and have been since we adopted it.                                                                                                                                                                                                  

The upshot is this:  We have always had errors in our data.    We do not recommend going back to the BP system because we had to do wonky, highly unrealistic things to be able to modify our allocations mid-month — and they often had retroactive effect. The current system delivers more realistic results, even though we are all vulnerable to random errors. 

We do not believe there is any way to avoid the daily errors. We are still working on whether we can, on the last day of the month, run an extra error-correction “day” to undo all the month’s accumulated  errors.  We want to avoid having to hand-check all 300+ EFL players for errors each month.  That would be several hours of work.  Even if we can avoid that problem, it won’t be perfect because the errors might have occurred under different player allocations than what prevails at the end of the month.  And any errors in the last day of the month’s stats would be uncorrected (which has always been the case, it turns out, since the beginning of the league). 

Dave did the most to discover this. Using the Pears as guinea pigs, he checked every player’s accumulated August stats, according to BR and BP, against our EFL totals.  The EFL totals represent the sum of each daily BR report, without error correction. The monthly accumulated stats include any error corrections from MLB.  Here is what he found for Peshastin’s hitters this month, as of yesterday:

Albies add 2 GDP both 8/3
Robles add 1 GDP 8/2
Soto add 1 hit, 1 BB, 1 IBB, 1 GDP, 1 R
Mitch Haniger add 1 GDP 8/5 or 8/10
Yoan Moncada add 1 PA, 1 AB, 1 R, 2 H, 1 RBI, 3 SO
Willi Castro subtract 4 PA, 4 AB, 2 H, 1 CS
Jo Adell add 1 PA, 1 AB, 2 R, 1 3B, 4 RBI, 2 SO, 1 GDP
Jazz Chisholm subtract 1 PA, 1 AB, 1 SO, add 1 SB, 1 GDP
Mountcastle add 1 AB, 2 H, 2 HR, 2 RBI, 1 BB, 2 SO, 1 GDP
Zimmerman add 4 PA, 4 AB, 1 R, 1 H, 1 HR, 2 RBI, 2 SO, 1 GDP
Schrock subtract 1 PA, 1 AB, add 1 R
 
Clarification: ‘add’ means that we would have to add to our database to get it to agree with BR. ‘Subtract’ is the opposite. Dave has not made any corrections to Pear stats. We presume all of us have similar errors. 
 
If all these errors were fixed, the Pears’ offense would have produced about 2 fewer runs than we currently credit it with.  Adding the missing GIDPs would hurt.  That would cost the Pears about 0.2 wins.
 
Note that these are NET errors.  Albies could have had an extra hit one day and then an omitted hit another day, and we wouldn’t know it.  Those kind of self-cancelling errors are not really a problem for us, except where allocations have changed between the two errors. 
 
Two major mysteries remain.  First, as we understand it from BR, corrections are managed separately, not incorporated in a subsequent day’s report. However, if this is true, we still don’t know why the Pears, Rosebuds, Alleghenys and Cheese all had suspiciously prominent negative runs scored during play on Wednesday. Our helper at BR said he didn’t see anything unusual in the Aug 24 or 25 data. 
 
Second: there were no errors in the Peshastin pitching data. We do not know why this is.  It may be because we only care about two data points for each pitcher: ip and er allowed. Maybe the Pears were just lucky this month, so far, that those two items had no mistakes.
 
Today’s update includes this uncorrected data, almost certainly for every team. 
 
 
….
 
So.  We have at least two decisions to make. 
 
First, What do we do about the errors?  If Dave can concoct a reasonable method of doing an end-of-the-month error correction day, I am inclined to do this to minimize the effects of the errors, starting with this month. It has the virtue of adding suspense to our pennant races…  But now that we have seen how our sausage is made, can we live with it?  Does anyone have any ideas about how we should respond?
 
Second, we also discovered that BP and BR do not report intentional walks the same way.  BP treats them as a subset of regular walks, as our rc/g formula expects.  But BR counts them separately.  Which means Juan Soto’s BB total is only his unintentional BB.  Since our formula adds up total walks for rc/g purposes, and then slightly penalizes IBB (because they generate slightly fewer runs than unintentional BB), Soto is actually punished in our formula for getting so many IBB that do not feed into his BB total in BR. 
 
We will change our formula. But when should we do it? 
 
I invite you to discuss these questions in the comments.  We’ll put the topic on the agenda for our Sep 2 meeting, although unless it seems urgent, we can put off any decisions until the winter meeting in December or January.
 
Finally, please forgive the sparse update this morning.  My budget for EFL time has been spent on the data error problem.  Dave, Jamie and I will be working on this more this afternoon. 

 

EFL Standings for 2021
EFL
TEAM WINS LOSSES PCT. GB RS RA
Old Detroit Wolverines 89 39 .693 734.3 489.3
Flint Hill Tornadoes 83 45 .649 5.5 676.3 495.9
D.C. Balk 80 47 .634 7.6 730.2 555.1
Kaline Drive 79 49 .615 9.9 691.5 546.0
Peshastin Pears 78 50 .612 10.3 631.8 507.4
Cottage Cheese 73 56 .566 16.1 728.8 655.7
Haviland Dragons 71 57 .553 17.8 664.3 617.4
Canberra Kangaroos 70 57 .551 18.2 655.4 603.2
Pittsburgh Alleghenys 70 60 .536 19.9 654.3 607.1
Bellingham Cascades 67 63 .518 22.3 560.2 540.8
Portland Rosebuds 61 67 .475 27.9 672.6 713.3
 
August 26 – 27 results, with only Friday’s daily stats visible to me: 
 
Old Detroit: L, 11  – 14.  A bad couple of days for Wolverine pitchers, but Byron Buxton is back! For a few hours, at least.
 
Flint Hill:  W, 3 – (-3).  They just won’t go away.  In fact, they gained 1.2 games in the last 48 hours. 13 out of 14 Tornados reached safely Friday.  
 
DC: L, 7 – 8.  The Balk must have hit well on Thursday, because they went .121, .237, .273 on Friday.   It was a mixed bag for DC the last two days, but they still gained 0.5 games. 
 
Kaline: W 2, L (-1), 7 – 2.  The Drive drove past the softening Pears, and gained an entire game in the standings.  I’m not sure they’re out of it yet this season.
 
Peshastin: W 0, L 2; 6 – 16.  It’s been a tough week in Peshastin, dropping all the way to 5th from 3rd place.  But Mike Zunino hit another homer yesterday, his 27th of the season, to bring his OPS up to .859, so there’s that. 
 
Cottage:  W 2, L 0: 11 – 6. Cottage stole 6th place from the draggin’ Dragons, thanks in no small part to Steven Matz (6 ip, 1 er).  Only 5.8 games behind the Pears now! 
 
Haviland: L, (-4) – 9.  The Dragons hit 0.118, 0.143, 0.147 in 35 PA Friday. That could easily tear a hole in your raft. 
 
Canberra: W, 5 – 4. Friday wasn’t so hot: OPS was only .643; ERA was a Ruthian 7.14.  I’m guessing Thursday was a better day.  Still, the ‘Roos are within striking range of Haviland. 
 
Pittsburgh:  W 2, L 0; 15 – 10.  The Alleghenys gained 1.1 games the last two days while the Dragons lost 0.6.  At that pace, Pittsburgh will pass Haviland by the end of the month. 
 
Bellingham:  W 0, L 2; 5 – 11.  Lots of good pitching on Friday (14 ip, 3.86 ERA).  Thursday must have been tough for Cascade pitchers. 
 
Portland:  W 2, L 0; 12 – 4.  Even better pitching for the Rosebuds Friday (18.7 ip, 7 er, 3.37 ERA) but it was probably pretty good Thursday, too. So the Rosebuds continue to generally keep pace with the Wolverines. If this season lasted forever, the Rosebuds would still be about 27-28 games back. 
 
 
Combined MLB + EFL Standings for 2021
AL East
TEAM WINS LOSSES PCT. GB
Old Detroit Wolverines 89 39 .693
Flint Hill Tornadoes 83 45 .649 5.5
Tampa Bay Rays 80 48 .625 8.6
New York Yankees 76 52 .594 12.6
Boston Red Sox 74 56 .569 15.6
Toronto Blue Jays 66 61 .520 22.1
Baltimore Orioles 40 87 .315 48.1
NL East
TEAM WINS LOSSES PCT. GB
D.C. Balk 80 47 .634
Canberra Kangaroos 70 57 .551 10.5
Atlanta Braves 69 58 .543 11.5
Philadelphia Phillies 64 64 .500 17
New York Mets 61 67 .477 20
Washington Nationals 55 72 .433 25.5
Miami Marlins 53 76 .411 28.5
 
AL Central
TEAM WINS LOSSES PCT. GB
Chicago White Sox 75 55 .577
Pittsburgh Alleghenys 70 60 .536 5.3
Bellingham Cascades 67 63 .518 7.7
Cleveland Indians 63 63 .500 10
Detroit Tigers 62 67 .481 12.5
Kansas City Royals 58 70 .453 16
Minnesota Twins 56 72 .438 18
NL Central
TEAM WINS LOSSES PCT. GB
Milwaukee Brewers 78 51 .605
Cottage Cheese 73 56 .566 5
Cincinnati Reds 71 59 .546 7.5
St. Louis Cardinals 65 62 .512 12
Chicago Cubs 56 74 .431 22.5
Pittsburgh Pirates 47 82 .364 31
 
AL West
TEAM WINS LOSSES PCT. GB
Kaline Drive 79 49 .615
Houston Astros 76 52 .594 2.8
Haviland Dragons 71 57 .553 7.9
Oakland A’s 70 59 .543 9.3
Seattle Mariners 69 60 .535 10.3
Los Angeles Angels 63 67 .485 16.8
Texas Rangers 44 84 .344 34.8
NL West
TEAM WINS LOSSES PCT. GB
San Francisco Giants 83 45 .648
Los Angeles Dodgers 81 48 .628 2.5
Peshastin Pears 78 50 .612 4.7
San Diego Padres 69 61 .531 15
Portland Rosebuds 61 67 .475 22.2
Colorado Rockies 59 69 .461 24
Arizona Diamondbacks 44 86 .338 40
 
 

1 Comment

  • If there is no easy fix then we can just ignore them, assuming they are random thus affecting all teams similarly. Ignorance is bliss.