WAR discussion

ericmvan
Veteran

Supposed to be working on something more important

Posts: 8,908

WAR discussion Nov 7, 2014 18:57:13 GMT -5 jimed14 likes this

Quote

Post by ericmvan on Nov 7, 2014 18:57:13 GMT -5

I'm on record as hating fWAR and liking bWAR, but I have my own take on what kind of stats I'd like to see.

First, WAR has two separate functions and ought to be two separate stats. You absolutely want and need a retrospective WAR that represents the actual value to the team, for purposes of dividing credit and blame for team wins, and for MVP arguments, etc. And you also obviously want and need a prospective WAR that reflects the actual skill, and can be used for projections.

For pitchers, the two stats differ in numerous ways.

BABIP: This has a park and fielding support element that needs to be factored out of both WARs. Whatever difference from league average that remains is, for most pitchers, some luck and some skill (I would argue, more skill than usually thought; I've found that regressing defense-adjusted BABIP halfway to league average seems to work). Buchholz happens to be a guy with an unusually large skill component (both good and bad, depending on how he's going.) You want to factor out just the luck out of just the prospective WAR. Using FIP for that is like eating lobster with a garden rake.

HR/FB: Has a park adjustment that you need for both WARs, and needs to be tweaked at least somewhat for prospective WAR.

Ideally, for prospective WAR, we'd measure the quality of the pitches with pitch/fx, and translate those directly into run values, and ignore the actual results entirely. If we're doing it right, there's no residual that correlates from one time period to the next. But that's a huge project.

Sequencing: See BABIP, although the skill component is probably much smaller. There are definitely pitchers who pitch relatively better or worse than others out of the stretch, and I believe that pitchers also differ in their batter-to-batter consistency; those that tend to lose their mechanics for stretches tend to have higher inherited runner-adjusted-RA than expected RA based on BaseRuns (or any other run metric), and there are probably opposite types as well. This is understudied.

Start-to-start inconsistency: All things being equal, the more inconsistent pitcher is better. This is currently reflected in WPA, but not any form of WAR. It's a small tweak, but it should be included entirely in ret-WAR and studied to see if any part of it is predictive and should be included in pro-WAR.

Pitching to the score: No one's found any evidence that it exists in terms of close games, but it's probably real when pitching with a huge lead, and it's certainly real when being left in take a pounding when the team is losing big and the bullpen needs saving. All of this should be in ret-WAR (and is currently only in WPA), while pro-WAR should reflect some of the blowout differential.

Managerial hook support: In no known metric, but a manager who continually leaves a pitcher in too long (as Grady used to do with John Burkett) is skewing his numbers in a way that doesn't reflect his skill. I'd even take the difference out of ret-WAR and charge it to the manager! Lots of work needed to measure this, of course.

Catcher framing support: A can of worms we've just opened, made difficult by the probability that different pitchers benefit differently from framing. Christian Vazquez (sent back in time though a wormhole) catching Greg Maddux creates extra value from their interaction that would not exist if Maddux were pitching to an average catcher, or Vazquez catching an average pitcher. How do you divide it up for ret-WAR? And for pro-WAR, one could argue that if you've got a Greg Maddux throwing everything on the black, you can expect that any good organization would get him a good framing catcher, so some of that potential extra value should be part of his prospective WAR. Whereas a pitcher who benefits very little from framing would optimally be paired with a catcher whose strength is elsewhere (e.g., offense), so maybe he gets a negative interaction value for his framing.

Not to mention that no one is adjusting for opponent quality.

"You either need some medication or you're an a******." -- David Ortiz correctly diagnosing Bobby Valentine

wcsoxfan
Veteran

Posts: 2,318

WAR discussion Nov 8, 2014 17:11:37 GMT -5

Quote

Post by wcsoxfan on Nov 8, 2014 17:11:37 GMT -5

Nov 7, 2014 18:57:13 GMT -5 ericmvan said:

I'm on record as hating fWAR and liking bWAR, but I have my own take on what kind of stats I'd like to see.

First, WAR has two separate functions and ought to be two separate stats. You absolutely want and need a retrospective WAR that represents the actual value to the team, for purposes of dividing credit and blame for team wins, and for MVP arguments, etc. And you also obviously want and need a prospective WAR that reflects the actual skill, and can be used for projections.

For pitchers, the two stats differ in numerous ways.

BABIP: This has a park and fielding support element that needs to be factored out of both WARs. Whatever difference from league average that remains is, for most pitchers, some luck and some skill (I would argue, more skill than usually thought; I've found that regressing defense-adjusted BABIP halfway to league average seems to work). Buchholz happens to be a guy with an unusually large skill component (both good and bad, depending on how he's going.) You want to factor out just the luck out of just the prospective WAR. Using FIP for that is like eating lobster with a garden rake.

HR/FB: Has a park adjustment that you need for both WARs, and needs to be tweaked at least somewhat for prospective WAR.

Ideally, for prospective WAR, we'd measure the quality of the pitches with pitch/fx, and translate those directly into run values, and ignore the actual results entirely. If we're doing it right, there's no residual that correlates from one time period to the next. But that's a huge project.

Sequencing: See BABIP, although the skill component is probably much smaller. There are definitely pitchers who pitch relatively better or worse than others out of the stretch, and I believe that pitchers also differ in their batter-to-batter consistency; those that tend to lose their mechanics for stretches tend to have higher inherited runner-adjusted-RA than expected RA based on BaseRuns (or any other run metric), and there are probably opposite types as well. This is understudied.

Start-to-start inconsistency: All things being equal, the more inconsistent pitcher is better. This is currently reflected in WPA, but not any form of WAR. It's a small tweak, but it should be included entirely in ret-WAR and studied to see if any part of it is predictive and should be included in pro-WAR.

Pitching to the score: No one's found any evidence that it exists in terms of close games, but it's probably real when pitching with a huge lead, and it's certainly real when being left in take a pounding when the team is losing big and the bullpen needs saving. All of this should be in ret-WAR (and is currently only in WPA), while pro-WAR should reflect some of the blowout differential.

Managerial hook support: In no known metric, but a manager who continually leaves a pitcher in too long (as Grady used to do with John Burkett) is skewing his numbers in a way that doesn't reflect his skill. I'd even take the difference out of ret-WAR and charge it to the manager! Lots of work needed to measure this, of course.

Catcher framing support: A can of worms we've just opened, made difficult by the probability that different pitchers benefit differently from framing. Christian Vazquez (sent back in time though a wormhole) catching Greg Maddux creates extra value from their interaction that would not exist if Maddux were pitching to an average catcher, or Vazquez catching an average pitcher. How do you divide it up for ret-WAR? And for pro-WAR, one could argue that if you've got a Greg Maddux throwing everything on the black, you can expect that any good organization would get him a good framing catcher, so some of that potential extra value should be part of his prospective WAR. Whereas a pitcher who benefits very little from framing would optimally be paired with a catcher whose strength is elsewhere (e.g., offense), so maybe he gets a negative interaction value for his framing.

Not to mention that no one is adjusting for opponent quality.

I completely agree - most frustrating thing about discussing baseball statistics is that many people don't realize the difference between performance statistics and predictive statistics. The two separate WAR statistics that you propose would make this much easier to explain.

One other thing I would like to do with WAR is to adjust the positional adjustment as it relates to defense. This is necessary because the positional adjustments assume 'no player could possibly play another position' which is of course ridiculous (David Ortiz could even play SS - he would just be brutally terrible at it). This would remove instances where a LF is values higher in 2014 than in 2013 because other LFers didn't ~~hit~~field as well in 2014; to me this change in value as a result of the players peers in no way increases the player's value to his team or indicates that the player is going to be more valuable going forward. This is much easier to see when it comes to OFers as we can all easily see how an average CFer should always be more valuable (all things remaining equal) to an average LFer as in the majority of cases that CFer could switch to LF and become a superior LFer (in defense). With more modern defensive statistics I have to think this is possible but would be very tricky due to the following issues:

1. Each defensive form of measurement must be adjusted accordingly to the value it represents in winning a game. This is quite a task as you would have to place a value on a SS reaching a ball outside of his zone vs a 1B reaching a ball outside of his zone. Then you would have to weight a 1B catching a throw and tagging 1B for an out vs a 2B doing the same thing at 2nd base. This is an enormous task, but if done well would create a much more accurate WAR (or whatever you want to call it) calculation.

2. What do you do about the DH? As the DH doesn't field, he would creat no positive or negative value to the team, those making a DH appear to be much more valuable in general as most full-time DHs tend to be butchers in the field. Perhaps for each out that a player isn't involved in there is somehow a negative adjustment made to his value. This could not only help account for this issue but would also help account for position players as well as players who don't get to balls in or out of their zone would have diminished value and players who play positions where there are few chances to field would have the same (fewer chances normally means less valuable defensive position).

3. What about pitchers? In theory every pitch could be considered a value of some sort - but this would be very difficult to integrate if the value measurement is based on outs or innings, since the number of pitches vary greatly and a pitchers ability to reduce the number of pitches they throw per out is generally a good sign of future success (and reduces injury risk - not accounting for pitch leverage of course).

I think that throwing out league adjustments and replacing it by a 'strength of competition' would help quite. (Think how college sports look at 'strength of schedule' rather than just 'they're in the SEC so they're good!').

If you throw out the 'in general' park factors and replace them with 'instance by instance' park factors (e.g. a 330' fly ball down the field in left is equally predictive as a HR or an out - but not for MVP voting as Eric mentions above) then this could help considerably and could also be used to have a better idea of how a player will perform when they change teams.

These things that I'm adding are A LOT more work than the current WAR statistics; but in an industry that is flush with money, I would be surprised if there aren't team that already have something closer to this.

Last Edit: Nov 9, 2014 0:34:10 GMT -5 by wcsoxfan: edited to fix 'hit' with 'field'

jmei
Global Moderator

Posts: 15,065

WAR discussion Nov 8, 2014 19:10:38 GMT -5

Quote

Post by jmei on Nov 8, 2014 19:10:38 GMT -5

Nov 8, 2014 17:11:37 GMT -5 wcsoxfan said:

This would remove instances where a LF is values higher in 2014 than in 2013 because other LFers didn't hit as well in 2014

I might be wrong, but as far as I know, this is not a thing that WAR (at least the Fangraphs version) does. Replacement level and positional adjustment do not vary based on how well the league hit at any given position.

What you might be thinking of is the fact that UZR calibrates itself to league-average every year. So Alex Gordon didn't actually field much better this year than he did last year, but LFers as a whole fielded much worse, so his UZR score went way up. I agree that this is a flaw that should be corrected, but it has a much more peripheral effect on how WAR is calculated.

izzy
Rookie

Posts: 44

WAR discussion Nov 8, 2014 19:54:33 GMT -5

Quote

Post by izzy on Nov 8, 2014 19:54:33 GMT -5

Do you think the Red Sox have their own WAR stat that they use? I don't know what kind of pull Bill James has with the Red Sox but he's been pretty harsh about WAR's flaws (http://sportsworld.nbcsports.com/bill-james-statistical-revolution/). On the other hand, he's a good friend of TangoTiger and seems to really respect his baseball knowledge.

wcsoxfan
Veteran

Posts: 2,318

WAR discussion Nov 9, 2014 0:37:46 GMT -5

Quote

Post by wcsoxfan on Nov 9, 2014 0:37:46 GMT -5

Nov 8, 2014 19:10:38 GMT -5 jmei said:

Nov 8, 2014 17:11:37 GMT -5 wcsoxfan said:

This would remove instances where a LF is values higher in 2014 than in 2013 because other LFers didn't hit as well in 2014

I might be wrong, but as far as I know, this is not a thing that WAR (at least the Fangraphs version) does. Replacement level and positional adjustment do not vary based on how well the league hit at any given position.

What you might be thinking of is the fact that UZR calibrates itself to league-average every year. So Alex Gordon didn't actually field much better this year than he did last year, but LFers as a whole fielded much worse, so his UZR score went way up. I agree that this is a flaw that should be corrected, but it has a much more peripheral effect on how WAR is calculated.

I should have placed 'field' instead of 'hit'. Was on a bit of a tangent so not surprised if there are a couple of mistakes. Thanks for catching it - corrected now.

Left the rest the same as I don't believe that UZR calibrates for positional adjustments but that there is a separate defensive metric which accounts for UZR and the defensive adjustment (as far as I know it's just called 'DEF')

jimed14
Veteran

Posts: 25,814

WAR discussion Nov 9, 2014 8:42:58 GMT -5

Quote

Post by jimed14 on Nov 9, 2014 8:42:58 GMT -5

This is kind of a confusing thread. I'm pretty sure Eric was only discussing pitcher WAR and now we're all over the place. It's kind of hard to talk about both pitcher and hitter WAR without two separate discussions.

I like fWAR for position players and bWAR for pitchers. And pitcher WAR has a long way to go because of a lot of the things mentioned in Eric's post. The inconsistency thing is a great point because a pitcher that gives up 1 run in 5 games and then gives up 10 in the 6th probably contributes to a lot more wins than someone giving up 3 runs per game. And how many times did Farrell push his starters out there for one inning too many when we all knew he should have given a RP a clean inning? I know about saving the bullpen, but sometimes it was like he was hoping for some luck on hard hit balls. Does that factor mean our pitchers are worth less than pitchers on another team when the manager rarely does that?

Last Edit: Nov 9, 2014 8:44:38 GMT -5 by jimed14

“We just lost a World Series game in 18 innings. But after that [meeting], it didn’t feel like we lost. It felt like we won.”

mgoetze
Veteran

Posts: 5,057

WAR discussion Nov 9, 2014 23:42:37 GMT -5

Quote

Post by mgoetze on Nov 9, 2014 23:42:37 GMT -5

Nov 9, 2014 8:42:58 GMT -5 jimed14 said:

This is kind of a confusing thread. I'm pretty sure Eric was only discussing pitcher WAR and now we're all over the place. It's kind of hard to talk about both pitcher and hitter WAR without two separate discussions.

Not to mention that wcsoxfan doesn't seem to be discussing WAR at all because he either dislikes or is utterly confused by the concept of "replacement level".

Ceterum censeo John Farrell esse dismissiendam.

mattpicard
Veteran

Posts: 4,024

WAR discussion Nov 10, 2014 0:45:05 GMT -5

Quote

Post by mattpicard on Nov 10, 2014 0:45:05 GMT -5

Nov 9, 2014 0:37:46 GMT -5 wcsoxfan said:

Nov 8, 2014 19:10:38 GMT -5 jmei said:

I might be wrong, but as far as I know, this is not a thing that WAR (at least the Fangraphs version) does. Replacement level and positional adjustment do not vary based on how well the league hit at any given position.

What you might be thinking of is the fact that UZR calibrates itself to league-average every year. So Alex Gordon didn't actually field much better this year than he did last year, but LFers as a whole fielded much worse, so his UZR score went way up. I agree that this is a flaw that should be corrected, but it has a much more peripheral effect on how WAR is calculated.

Left the rest the same as I don't believe that UZR calibrates for positional adjustments but that there is a separate defensive metric which accounts for UZR and the defensive adjustment (as far as I know it's just called 'DEF')

Right. UZR is made up of several components that calculate how many runs a player is above/below average at their specific position. The "DEF" metric is Fangraphs way of saying, OK you can look to UZR to see how valuable a player is compared to others at their position, but how valuable is he compared to all other fielders? That's why elite defensive corner outfielders like Gordon and Heyward rank behind guys like Pedroia, Simmons, and Lagares in DEF despite having superior UZR's -- they simply are playing a much easier position, and that needs to be accounted for when calculating overall value (ie. putting value into a form where players can be compared across positions. The adjustments Fangraphs uses may not be perfect, but they're necessary.

Last Edit: Nov 10, 2014 0:45:52 GMT -5 by mattpicard

ericmvan
Veteran

Supposed to be working on something more important

Posts: 8,908

WAR discussion Nov 10, 2014 8:43:39 GMT -5

Quote

Post by ericmvan on Nov 10, 2014 8:43:39 GMT -5

There's another significant problem with WAR, which is kind of a dirty secret. It's really only accurate for regular players. Have you ever noticed how many seasons by actual replacement players have negative WAR? That can't be right. There are in fact two reasons for that.

The agreed-upon replacement level is equivalent to a .230 TAv (EqA), which is said to be the offensive level of the average replacement. But that's not true. It's more or less the average offense level of all bench players and true replacements, combined. And that's mostly bench players.

I last studied this over the 2007-9 seasons. If you sort all seasons by innings in the field or at DH, descending, you get a strong correlation of TAv to innings. Better players play more, wow! To divide these player seasons into two chunks, the bottom one which averages .230 (weighted by PA, of course), you have to draw the line at 600 innings in the field, and that leaves 9.4 players per team per season above the line. That's basically one bench player per team (since AL teams have 9 regulars and NL teams have 8).

(Note that if we went to the trouble to identify injured players and move them upwards in the sorted data, it would make the currently defined .230 group worse, and we'd have to move the bar higher and make the "non-replacement" pool even smaller. So ignoring regulars who get hurt, and hence show up as bench players or replacement players, just gives us a conservative estimate of how bad those players are, and we can live with that understatement.)

We can draw the correct lines by including 8.5 and then 13 players per team per season, starting from the top. That's 680 innings in the field or more, and then 300 - 680. You get regulars hitting about. 272, backups hitting about .244, and replacement players hitting about .211. (It's "about" because the figures need to be massaged so that each pool has the proper distribution of players by position -- I did that in the original study, but I was using slightly different borderlines.)

The second problem involves the very concept of "replacement level." If we define that as the average performance of those players, that means that in any given year ... half the replacement level players are below replacement level. That can't be right! A AAA scrub who gets called up and is a little below average for such a player does not have negative value. He has a tiny positive value for not being a bad replacement.

The TAv standard deviation (weighted by PA) of players with 150-300 innings is about .055. I would argue that to value bench and replacement level players accurately, we should set replacement level at 1.5 SD below average, which is a .130 TAv. That's the actual level of a worthless callup.

So how does this affect the WAR we use?

Regulars have an adjustment for PT that is probably a little excessive, although I'm not certain of that. When they are hurt and miss PT, they are replaced by .245-ish players, not .230. Those bench players are replaced by .212 players, on average. Bench player innings are about 23% of regular innings, so when you do the chaining, you get .236 as a baseline for Wins Above Bench. (I do want to think more about the most accurate way of doing the chaining.)

Bench and replacement level players are getting hosed. When bench players get hurt, in reality, some of that missing PT is taken by lesser bench players on their team, whose PT is in turn taken by replacement level players. But much of it is taken by replacement level players, and we should use a .130 baseline for that.

The way you do this is to calculate a WAB (.245-ish baseline plus chaining TBD) and WAS (Wins above Scrub, .130 baseline) for each player. WAR would be calculated by weighting the two based on PT. A full-time player's WAR is his WAB, a scrub's WAR is his WAS, and in between we gradually shift the mix as to get a smooth curve with no paradoxes where playing less would have made you more valuable because you were shifted into a group with a different baseline.

Now, if we did this, and hence valued bench players and scrubs accurately, it would compress the scale between them and the regulars. And that would in turn require us to think about WAR scarcity. A 6.0 WAR player is vastly more valuable than three 2.0 WAR players, and everyone knows that, but at current we have no way of putting a value on that. We live without that adjustment only because, in fact, the WAR scale is erroneous and already expanded downward so that a true 2.0 WAR player is probably showing up as 1.0 WAR (which really means more like 1.0 WAB, remember). So in this new way of doing WAR, there would be a Scarcity value, based on a simple power function of WAR, that would accurately represent the relationship of nominal WAR to team wins. I don't know if such studies have been done, but it's possible that you couldn't do them accurately without first fixing the unadjusted WAR scale.

(And, yes, someday this will be a widely distributed article, but with the last 10 years of data, and as many loose ends tied up as possible!)

"You either need some medication or you're an a******." -- David Ortiz correctly diagnosing Bobby Valentine

rjp313jr
Veteran

Posts: 14,000

WAR discussion Nov 10, 2014 12:17:00 GMT -5 via the ProBoards App

Quote

Post by rjp313jr on Nov 10, 2014 12:17:00 GMT -5

Good stuff Eric. It just highlights how a lot of these advanced statistics are extremely flawed and make some big assumptions yet are spoken about by the community at large like they are Gospel.

WAR has and always will bother me when people truly talk about it as if it's a real way to measure wins on the field. Besides the fact that it's just not accurate, part of building a team is fitting the pieces together so they work as one. WAR cannot capture this.

mgoetze
Veteran

Posts: 5,057

WAR discussion Nov 10, 2014 12:32:03 GMT -5 thursty likes this

Quote

Post by mgoetze on Nov 10, 2014 12:32:03 GMT -5

Nov 10, 2014 12:17:00 GMT -5 rjp313jr said:

Good stuff Eric. It just highlights how a lot of these advanced statistics are extremely flawed and make some big assumptions yet are spoken about by the community at large like they are Gospel.

WAR has and always will bother me when people truly talk about it as if it's a real way to measure wins on the field. Besides the fact that it's just not accurate, part of building a team is fitting the pieces together so they work as one. WAR cannot capture this.

This post just highlights how some people always read what they wanted to read, regardless of what was written.

Ceterum censeo John Farrell esse dismissiendam.

rjp313jr Veteran Posts: 14,000	WAR discussion Nov 10, 2014 12:39:31 GMT -5 via the ProBoards App Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by rjp313jr on Nov 10, 2014 12:39:31 GMT -5 Yea I guess I misread the post about how significantly flawed WAR was to read it was a flawed statistic.... My bad

raftsox
Veteran

Posts: 678

WAR discussion Nov 10, 2014 12:45:35 GMT -5

Quote

Post by raftsox on Nov 10, 2014 12:45:35 GMT -5

Nov 8, 2014 17:11:37 GMT -5 wcsoxfan said:

One other thing I would like to do with WAR is to adjust the positional adjustment as it relates to defense.
2. What do you do about the DH? As the DH doesn't field, he would creat no positive or negative value to the team, those making a DH appear to be much more valuable in general as most full-time DHs tend to be butchers in the field. Perhaps for each out that a player isn't involved in there is somehow a negative adjustment made to his value. This could not only help account for this issue but would also help account for position players as well as players who don't get to balls in or out of their zone would have diminished value and players who play positions where there are few chances to field would have the same (fewer chances normally means less valuable defensive position).

I agree, but take a different approach. I think the baseline should be set by offense with defensive position and ability adding value. Therefore in my world a DH gets no value taken away.

However, I think the defensive adder is too great as it's presently constructed. In my opinion, a "routine" play doesn't add value, but it can take value away if muffed. I personally believe that the pitcher and defensive alignment are the lion's share of the defensive pie.

mgoetze
Veteran

Posts: 5,057

WAR discussion Nov 10, 2014 12:49:00 GMT -5

Quote

Post by mgoetze on Nov 10, 2014 12:49:00 GMT -5

Nov 10, 2014 12:39:31 GMT -5 rjp313jr said:

Yea I guess I misread the post about how significantly flawed WAR was to read it was a flawed statistic.... My bad

It's just flawed in completely different ways than the ones you want it to be.

Ceterum censeo John Farrell esse dismissiendam.

wcsoxfan
Veteran

Posts: 2,318

WAR discussion Nov 10, 2014 16:38:34 GMT -5

Quote

Post by wcsoxfan on Nov 10, 2014 16:38:34 GMT -5

Nov 10, 2014 12:45:35 GMT -5 raftsox said:

I agree, but take a different approach. I think the baseline should be set by offense with defensive position and ability adding value. Therefore in my world a DH gets no value taken away.

However, I think the defensive adder is too great as it's presently constructed. In my opinion, a "routine" play doesn't add value, but it can take value away if muffed. I personally believe that the pitcher and defensive alignment are the lion's share of the defensive pie.

The problem with keeping a DH 'neutral' and subtracting when a player misses a 'routine play' is that it would leave you with a player like Ortiz actually lowering his value when he plays 1B. A player who is below average at a given position shouldn't have value added for playing DH rather than that position. This part is a bit tricky.

But the above issue is more of a problem for the 'predictive' WAR calculation and less important for the 'real valued gained' calculation. Then again, I would hope that a team could predict, within some reason, a players value if they were moved to DH.

Agree with you on the pitcher part, but not so sure about the 'defensive position part'. I think defender's reactions and 'first step' are the most important thing for most infielders at least.

wcsoxfan
Veteran

Posts: 2,318

WAR discussion Nov 10, 2014 16:38:49 GMT -5

Quote

Post by wcsoxfan on Nov 10, 2014 16:38:49 GMT -5

Nov 10, 2014 12:45:35 GMT -5 raftsox said:

I agree, but take a different approach. I think the baseline should be set by offense with defensive position and ability adding value. Therefore in my world a DH gets no value taken away.

However, I think the defensive adder is too great as it's presently constructed. In my opinion, a "routine" play doesn't add value, but it can take value away if muffed. I personally believe that the pitcher and defensive alignment are the lion's share of the defensive pie.

The problem with keeping a DH 'neutral' and subtracting when a player misses a 'routine play' is that it would leave you with a player like Ortiz actually lowering his value when he plays 1B. A player who is below average at a given position shouldn't have value added for playing DH rather than that position. This part is a bit tricky.

But the above issue is more of a problem for the 'predictive' WAR calculation and less important for the 'real valued gained' calculation. Then again, I would hope that a team could predict, within some reason, a players value if they were moved to DH.

Agree with you on the pitcher part, but not so sure about the 'defensive position part'. I think defender's reactions and 'first step' are the most important thing for most infielders at least.

dewey1972
Rookie

Posts: 134

WAR discussion Nov 12, 2014 23:54:50 GMT -5

Quote

Post by dewey1972 on Nov 12, 2014 23:54:50 GMT -5

Nov 10, 2014 8:43:39 GMT -5 ericmvan said:

There's another significant problem with WAR, which is kind of a dirty secret. It's really only accurate for regular players. Have you ever noticed how many seasons by actual replacement players have negative WAR? That can't be right. There are in fact two reasons for that.

The agreed-upon replacement level is equivalent to a .230 TAv (EqA), which is said to be the offensive level of the average replacement. But that's not true. It's more or less the average offense level of all bench players and true replacements, combined. And that's mostly bench players.

I last studied this over the 2007-9 seasons. If you sort all seasons by innings in the field or at DH, descending, you get a strong correlation of TAv to innings. Better players play more, wow! To divide these player seasons into two chunks, the bottom one which averages .230 (weighted by PA, of course), you have to draw the line at 600 innings in the field, and that leaves 9.4 players per team per season above the line. That's basically one bench player per team (since AL teams have 9 regulars and NL teams have 8).

(Note that if we went to the trouble to identify injured players and move them upwards in the sorted data, it would make the currently defined .230 group worse, and we'd have to move the bar higher and make the "non-replacement" pool even smaller. So ignoring regulars who get hurt, and hence show up as bench players or replacement players, just gives us a conservative estimate of how bad those players are, and we can live with that understatement.)

We can draw the correct lines by including 8.5 and then 13 players per team per season, starting from the top. That's 680 innings in the field or more, and then 300 - 680. You get regulars hitting about. 272, backups hitting about .244, and replacement players hitting about .211. (It's "about" because the figures need to be massaged so that each pool has the proper distribution of players by position -- I did that in the original study, but I was using slightly different borderlines.)

The second problem involves the very concept of "replacement level." If we define that as the average performance of those players, that means that in any given year ... half the replacement level players are below replacement level. That can't be right! A AAA scrub who gets called up and is a little below average for such a player does not have negative value. He has a tiny positive value for not being a bad replacement.

The TAv standard deviation (weighted by PA) of players with 150-300 innings is about .055. I would argue that to value bench and replacement level players accurately, we should set replacement level at 1.5 SD below average, which is a .130 TAv. That's the actual level of a worthless callup.

So how does this affect the WAR we use?

Regulars have an adjustment for PT that is probably a little excessive, although I'm not certain of that. When they are hurt and miss PT, they are replaced by .245-ish players, not .230. Those bench players are replaced by .212 players, on average. Bench player innings are about 23% of regular innings, so when you do the chaining, you get .236 as a baseline for Wins Above Bench. (I do want to think more about the most accurate way of doing the chaining.)

Bench and replacement level players are getting hosed. When bench players get hurt, in reality, some of that missing PT is taken by lesser bench players on their team, whose PT is in turn taken by replacement level players. But much of it is taken by replacement level players, and we should use a .130 baseline for that.

The way you do this is to calculate a WAB (.245-ish baseline plus chaining TBD) and WAS (Wins above Scrub, .130 baseline) for each player. WAR would be calculated by weighting the two based on PT. A full-time player's WAR is his WAB, a scrub's WAR is his WAS, and in between we gradually shift the mix as to get a smooth curve with no paradoxes where playing less would have made you more valuable because you were shifted into a group with a different baseline.

Now, if we did this, and hence valued bench players and scrubs accurately, it would compress the scale between them and the regulars. And that would in turn require us to think about WAR scarcity. A 6.0 WAR player is vastly more valuable than three 2.0 WAR players, and everyone knows that, but at current we have no way of putting a value on that. We live without that adjustment only because, in fact, the WAR scale is erroneous and already expanded downward so that a true 2.0 WAR player is probably showing up as 1.0 WAR (which really means more like 1.0 WAB, remember). So in this new way of doing WAR, there would be a Scarcity value, based on a simple power function of WAR, that would accurately represent the relationship of nominal WAR to team wins. I don't know if such studies have been done, but it's possible that you couldn't do them accurately without first fixing the unadjusted WAR scale.

(And, yes, someday this will be a widely distributed article, but with the last 10 years of data, and as many loose ends tied up as possible!)

Eric, I have two questions. The first relates to your feelings about fWAR. I'm pretty sure that you think it's worthless, but I sometimes think you might be having a MGL moment in which your comment seems extremely negative, but really you think fWAR's decent, you just want to clarify all of the things you think aren't good about it. Am I right that you think it's worthless?

The second follows the first. So if I'm wrong and you don't think it's worthless, ignore this one. You have obviously thought about these things on a much deeper level than I ever will. There's a lot about fWAR that makes sense to me, but that's not the reason I assume it's useful. The reason I trust it's not worthless is because people like Tom Tango and many commenters on his site (guys who produce high-quality research) consistently refer to it and judge it as useful. Why do you think they would ignore the serious flaws you seem to see in it? Do they not see the flaws? Do they think it's good enough?

Last Edit: Nov 12, 2014 23:56:11 GMT -5 by dewey1972

raftsox
Veteran

Posts: 678

WAR discussion Nov 13, 2014 11:40:47 GMT -5

Quote

Post by raftsox on Nov 13, 2014 11:40:47 GMT -5

Nov 10, 2014 16:38:34 GMT -5 wcsoxfan said:

Nov 10, 2014 12:45:35 GMT -5 raftsox said:

I agree, but take a different approach. I think the baseline should be set by offense with defensive position and ability adding value. Therefore in my world a DH gets no value taken away.

However, I think the defensive adder is too great as it's presently constructed. In my opinion, a "routine" play doesn't add value, but it can take value away if muffed. I personally believe that the pitcher and defensive alignment are the lion's share of the defensive pie.

The problem with keeping a DH 'neutral' and subtracting when a player misses a 'routine play' is that it would leave you with a player like Ortiz actually lowering his value when he plays 1B. A player who is below average at a given position shouldn't have value added for playing DH rather than that position. This part is a bit tricky.

But the above issue is more of a problem for the 'predictive' WAR calculation and less important for the 'real valued gained' calculation. Then again, I would hope that a team could predict, within some reason, a players value if they were moved to DH.

Agree with you on the pitcher part, but not so sure about the 'defensive position part'. I think defender's reactions and 'first step' are the most important thing for most infielders at least.

BOLD: That's kinda the point. Ortiz shouldn't be artificially rewarded by playing a terrible first base. Conversely, if someone is so terrible at defense that their value detracts from the offensive baseline, then they shouldn't be in the field.
ITALIC: Either the value ascribed to defense is too great, or the emphasis of sharing value between pitcher and hitter needs to be rethought. I personally think a routine play (ex. a low leverage middling grounder to second) is an out entirely on the pitcher's ability to induce weak, easily defended contact. Whereas a smoking grounder deep in the hole that Andrelton Simmons gets to and throws the runner out is completely on him. However, while the current model adds value for more difficult plays, it still adds value to the routine plays based on the runs saved assumption of (roughly) 75% pitcher/25% hitter.

jimed14 Veteran Posts: 25,814	WAR discussion Nov 13, 2014 17:00:37 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by jimed14 on Nov 13, 2014 17:00:37 GMT -5 If Farrell was dumb enough to play Papi at third base next year, would he a worse player than he is now?
	“We just lost a World Series game in 18 innings. But after that [meeting], it didn’t feel like we lost. It felt like we won.”

lonborgski Rookie Posts: 90	WAR discussion Nov 14, 2014 4:00:30 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by lonborgski on Nov 14, 2014 4:00:30 GMT -5 Nov 12, 2014 23:54:50 GMT -5 dewey1972 said: Nov 10, 2014 8:43:39 GMT -5 ericmvan said: . . . a MGL moment . . . What's "a MGL moment"?
	"I'm sure we could tie you to a futon and have Maura Tierney and Liv Tyler patiently explain the concept of Major League Equivalences to you, and your understanding of the game of baseball would . . . remain limp." Eric Van 8/13/05 (at SoSH)

fenwaythehardway Veteran Posts: 5,573	WAR discussion Nov 14, 2014 11:13:16 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by fenwaythehardway on Nov 14, 2014 11:13:16 GMT -5 Nov 13, 2014 17:00:37 GMT -5 jimed14 said: If Farrell was dumb enough to play Papi at third base next year, would he a worse player than he is now? Yes. He would cost the team wins.

jimed14
Veteran

Posts: 25,814

WAR discussion Nov 14, 2014 13:30:26 GMT -5

Quote

Post by jimed14 on Nov 14, 2014 13:30:26 GMT -5

Nov 14, 2014 11:13:16 GMT -5 fenwaythehardway said:

Nov 13, 2014 17:00:37 GMT -5 jimed14 said:

If Farrell was dumb enough to play Papi at third base next year, would he a worse player than he is now?

Yes. He would cost the team wins.

And then there's Gomes vs. RHP and Nava vs. LHP.

“We just lost a World Series game in 18 innings. But after that [meeting], it didn’t feel like we lost. It felt like we won.”

ericmvan
Veteran

Supposed to be working on something more important

Posts: 8,908

WAR discussion Nov 14, 2014 15:44:59 GMT -5

Quote

Post by ericmvan on Nov 14, 2014 15:44:59 GMT -5

Nov 14, 2014 11:13:16 GMT -5 fenwaythehardway said:

Nov 13, 2014 17:00:37 GMT -5 jimed14 said:

If Farrell was dumb enough to play Papi at third base next year, would he a worse player than he is now?

Yes. He would cost the team wins.

You're both right, and importantly so.

He would have a lower retrospective WAR, but his prospective WAR wouldn't change at all. The former measures contribution to team wins, the latter measures player quality.

For a real world example, Ryan Howard has been a better player than his WAR would indicate, because his manager keeps throwing him out there to be below replacement level against LHP. So you can add platoon usage to the list of things that should be captured by pro-WAR, but are currently being ignored because no one is even making the distinction between the two functional varieties.

(I'll answer the fWAR question when I get the chance, BTW!)

"You either need some medication or you're an a******." -- David Ortiz correctly diagnosing Bobby Valentine

wcsoxfan
Veteran

Posts: 2,318

WAR discussion Nov 14, 2014 16:16:27 GMT -5

Quote

Post by wcsoxfan on Nov 14, 2014 16:16:27 GMT -5

I think we're getting to a point where there is no single number for a predictive WAR. Instead there would have to be an outline of situational instances that would have to be plugged in (position, team, vs LHP, vs RHP, etc.).

I guess the question is: "how accurate would this be?"

Essentially the 'large sample size' would be reduced to several smaller sample sizes or estimates (like how would Ortiz perform at 3B given his SSS recent playing time at 1B and using other players drop in defensive performance when shifted from 1B to 3B)

I think the above should be done and would be quite interesting - but there will always be flaws as statistics need to be weighting differently on a case-by-case basis in order to forecast as accurately as possible. I'm sure we will see something like this soon.

dewey1972
Rookie

Posts: 134

WAR discussion Nov 14, 2014 23:08:34 GMT -5

Quote

Post by dewey1972 on Nov 14, 2014 23:08:34 GMT -5

Nov 14, 2014 4:00:30 GMT -5 lonborgski said:

Nov 12, 2014 23:54:50 GMT -5 dewey1972 said:

What's "a MGL moment"?

Mitchel Lichtman (often referred to, both by others and himself, as MGL) is a prominent sabermetrician who co-wrote The Book with Tom Tango and Andrew Dolphin. He is incredibly blunt in his criticism. Sometimes he will write an incredibly long comment mentioning numerous flaws in an argument, only to end with something like "Good piece overall, though."

Post by ericmvan on Nov 7, 2014 18:57:13 GMT -5

Post by wcsoxfan on Nov 8, 2014 17:11:37 GMT -5

Post by jmei on Nov 8, 2014 19:10:38 GMT -5

Post by izzy on Nov 8, 2014 19:54:33 GMT -5

Post by wcsoxfan on Nov 9, 2014 0:37:46 GMT -5

Post by jimed14 on Nov 9, 2014 8:42:58 GMT -5

Post by mgoetze on Nov 9, 2014 23:42:37 GMT -5

Post by mattpicard on Nov 10, 2014 0:45:05 GMT -5

Post by ericmvan on Nov 10, 2014 8:43:39 GMT -5

Post by rjp313jr on Nov 10, 2014 12:17:00 GMT -5

Post by mgoetze on Nov 10, 2014 12:32:03 GMT -5

Post by rjp313jr on Nov 10, 2014 12:39:31 GMT -5

Post by raftsox on Nov 10, 2014 12:45:35 GMT -5

Post by mgoetze on Nov 10, 2014 12:49:00 GMT -5

Post by wcsoxfan on Nov 10, 2014 16:38:34 GMT -5

Post by wcsoxfan on Nov 10, 2014 16:38:49 GMT -5

Post by dewey1972 on Nov 12, 2014 23:54:50 GMT -5

Post by raftsox on Nov 13, 2014 11:40:47 GMT -5

Post by jimed14 on Nov 13, 2014 17:00:37 GMT -5

Post by lonborgski on Nov 14, 2014 4:00:30 GMT -5

Post by fenwaythehardway on Nov 14, 2014 11:13:16 GMT -5

Post by jimed14 on Nov 14, 2014 13:30:26 GMT -5

Post by ericmvan on Nov 14, 2014 15:44:59 GMT -5

Post by wcsoxfan on Nov 14, 2014 16:16:27 GMT -5

Post by dewey1972 on Nov 14, 2014 23:08:34 GMT -5