SoxProspects News
|
|
|
|
Legal
Forum Ground Rules
The views expressed by the members of this Forum do not necessarily reflect the views of SoxProspects, LLC.
© 2003-2024 SoxProspects, LLC
|
|
|
|
|
Forum Home | Search | My Profile | Messages | Members | Help |
Welcome Guest. Please Login or Register.
Price/Porcello for Cy Young
|
Post by manfred on Sept 22, 2016 16:28:00 GMT -5
The only reason the run support would hurt him would be if voters were going to grant outsized value to his W-L record in the first place. Like, if Kluber beats Porcello, it's not going to be because they disqualified Porcello based on having better run support but rather because they decided that Kluber's numbers were superior before reaching that point. It's really Kluber's to lose. He leads the league in both ERA+ and FIP (and by extension, both bWAR and fWAR). He's going to end up north of 220 strikeouts. He's at 7.0 H/9, which is outstanding, and his .272 BABIP isn't remarkably lucky or anything. It's Kluber's to lose... he leads in both ERA+ and FIP. Cookies accepted. Seriously, though, my point returns to two partly related matters: what do voters look for, and what should they look for. What they do look for (and I tried to outline this earlier) varies to a degree, but when there are 20-game winners, they almost always win CY, even when there are pitchers who are more "dominant" in peripheral stats (the great exception in the last 14 contests being CC Sabathia and Felix). What they should vote on are the traditional stats first: wins do matter, ERA does matter, innings, maybe WHIP etc. The question of how those wins come about should matter on some level, but when one starts saying, for example, that Kluber's first place team vs. Porcello's first place team should tip the balance slightly to Kluber, it seems like we are moving away from the bottom line, that they pitch for almost identically good teams and Porcello has had, in the most important results-oriented catagories, the better year.
|
|
|
Post by jchang on Sept 22, 2016 21:59:28 GMT -5
This time next year, Price, Porcello and Bucholz will be in a 3-way tie for CY. Sorry deepJohn, I don't see Kopech in this group yet.
|
|
|
Post by dcsoxfan on Sept 23, 2016 11:58:26 GMT -5
I think Rick Porcello should be the Cy Young winner. While I would agree that wins is, in general, a meaningless statistic, in this case it is actually indicative of something very real: consistency.
Porcello has pitched at least five innings every outing and has pitched at least six in all but three. He has only allowed 5 ER once compared to Verlander (3), Tanaka (4), Kluber (5) and Sale (6). He has allowed 4 or more runs only six times (better than everyone but Verlander).
These five pitchers are incredibly close, but Porcello has put his team in position to win more often.
|
|
ericmvan
Veteran
Supposed to be working on something more important
Posts: 8,911
|
Post by ericmvan on Sept 23, 2016 12:19:58 GMT -5
The only reason the run support would hurt him would be if voters were going to grant outsized value to his W-L record in the first place. Like, if Kluber beats Porcello, it's not going to be because they disqualified Porcello based on having better run support but rather because they decided that Kluber's numbers were superior before reaching that point. It's really Kluber's to lose. He leads the league in both ERA+ and FIP (and by extension, both bWAR and fWAR). He's going to end up north of 220 strikeouts. He's at 7.0 H/9, which is outstanding, and his .272 BABIP isn't remarkably lucky or anything. It's Kluber's to lose... he leads in both ERA+ and FIP. Cookies accepted. Seriously, though, my point returns to two partly related matters: what do voters look for, and what should they look for. What they do look for (and I tried to outline this earlier) varies to a degree, but when there are 20-game winners, they almost always win CY, even when there are pitchers who are more "dominant" in peripheral stats (the great exception in the last 14 contests being CC Sabathia and Felix). What they should vote on are the traditional stats first: wins do matter, ERA does matter, innings, maybe WHIP etc. The question of how those wins come about should matter on some level, but when one starts saying, for example, that Kluber's first place team vs. Porcello's first place team should tip the balance slightly to Kluber, it seems like we are moving away from the bottom line, that they pitch for almost identically good teams and Porcello has had, in the most important results-oriented catagories, the better year. Porcello leads the league in ERA-, though. For most voters, they'll break the virtual tie in ERA by W/L record rather than by deeper metrics. I still plan to look at them both in detail, and maybe Sale, Tanaka, and Verlander, too. Let's see when they each will make their final start: Porcello, Friday 9/30 Kluber, Saturday 10/1 unless they go to a 4-man rotation Sale, Sunday 10/2 Tanaka, 10/1 Verlander, 10/2
|
|
gerry
Veteran
Enter your message here...
Posts: 1,656
|
Post by gerry on Sept 23, 2016 13:31:59 GMT -5
I think Rick Porcello should be the Cy Young winner. While I would agree that wins is, in general, a meaningless statistic, in this case it is actually indicative of something very real: consistency. Porcello has pitched at least five innings every outing and has pitched at least six in all but three. He has only allowed 5 ER once compared to Verlander (3), Tanaka (4), Kluber (5) and Sale (6). He has allowed 4 or more runs only six times (better than everyone but Verlander). These five pitchers are incredibly close, but Porcello has put his team in position to win more often. Which seems the best definition of most valuable
|
|
|
Post by jmei on Sept 23, 2016 14:36:45 GMT -5
|
|
ericmvan
Veteran
Supposed to be working on something more important
Posts: 8,911
|
Post by ericmvan on Sept 23, 2016 15:42:54 GMT -5
I think Rick Porcello should be the Cy Young winner. While I would agree that wins is, in general, a meaningless statistic, in this case it is actually indicative of something very real: consistency. Porcello has pitched at least five innings every outing and has pitched at least six in all but three. He has only allowed 5 ER once compared to Verlander (3), Tanaka (4), Kluber (5) and Sale (6). He has allowed 4 or more runs only six times (better than everyone but Verlander). These five pitchers are incredibly close, but Porcello has put his team in position to win more often. Which seems the best definition of most valuable However ... for pitchers withe the same ERA and IP, the less consistent pitcher is more valuable. Imagine two guys with a 4.50 ERA. The first guy allows 3 runs in 6 innings every game, for 32 starts. The second guy has 3 starts where he allows 5 runs in 5 IP. In those 3 games, he gave you less of a chance to win. But to offset those 3 bad starts and have the same total of IP and ER allowed, he also has three games where he allows 2 runs in 6 IP and 3 games where he allows 2 runs in 7 IP. He helps the team more in those six games then he hurts them in the 3 bad games. You can prove that point by taking this to an absurd extreme. The guy with a 4.50 ERA, 32 GS, and 192 IP you'd really want is one who gave up all of his 96 ER in one game, without retiring a batter, and pitched 6 shutout innings in every other start. That guy would be the most valuable pitcher in baseball. The more you cluster your badness, the more valuable you are. Giving up runs 4 and 5 is less damaging to your chances of winning than giving up runs 2 and 3.
|
|
|
Post by dcsoxfan on Sept 23, 2016 16:20:33 GMT -5
Which seems the best definition of most valuable However ... for pitchers withe the same ERA and IP, the less consistent pitcher is more valuable. Imagine two guys with a 4.50 ERA. The first guy allows 3 runs in 6 innings every game, for 32 starts. The second guy has 3 starts where he allows 5 runs in 5 IP. In those 3 games, he gave you less of a chance to win. But to offset those 3 bad starts and have the same total of IP and ER allowed, he also has three games where he allows 2 runs in 6 IP and 3 games where he allows 2 runs in 7 IP. He helps the team more in those six games then he hurts them in the 3 bad games. You can prove that point by taking this to an absurd extreme. The guy with a 4.50 ERA, 32 GS, and 192 IP you'd really want is one who gave up all of his 96 ER in one game, without retiring a batter, and pitched 6 shutout innings in every other start. That guy would be the most valuable pitcher in baseball. The more you cluster your badness, the more valuable you are. Giving up runs 4 and 5 is less damaging to your chances of winning than giving up runs 2 and 3. Good point.
|
|
|
Post by wcsoxfan on Sept 23, 2016 22:01:56 GMT -5
Which seems the best definition of most valuable However ... for pitchers withe the same ERA and IP, the less consistent pitcher is more valuable. Imagine two guys with a 4.50 ERA. The first guy allows 3 runs in 6 innings every game, for 32 starts. The second guy has 3 starts where he allows 5 runs in 5 IP. In those 3 games, he gave you less of a chance to win. But to offset those 3 bad starts and have the same total of IP and ER allowed, he also has three games where he allows 2 runs in 6 IP and 3 games where he allows 2 runs in 7 IP. He helps the team more in those six games then he hurts them in the 3 bad games. You can prove that point by taking this to an absurd extreme. The guy with a 4.50 ERA, 32 GS, and 192 IP you'd really want is one who gave up all of his 96 ER in one game, without retiring a batter, and pitched 6 shutout innings in every other start. That guy would be the most valuable pitcher in baseball. The more you cluster your badness, the more valuable you are. Giving up runs 4 and 5 is less damaging to your chances of winning than giving up runs 2 and 3. This is only half the equation. To follow your own extreme example - if the pitcher's team scored exactly zero runs in every game except for the the one where the pitcher gave up 96 runs, and in that game they scored 95 runs, then the pitcher's inconsistency would have made him as unvaluable as possible. So you really can't say whether the consistency is good or bad without the other part of the equation, but I would argue that as long as the pitcher's offense averages more runs in the games he pitches than they average giving up in those games, then consistency is more valuable than inconsistency. (so in the case of most teams with positive run differentials it's better to be consistent and in the case of most teams with negative run differentials its better to be inconsistent - unless you're making extreme examples like we both do above, in which case you can make any case you want as long as you fudge the numbers to your liking)
|
|
ericmvan
Veteran
Supposed to be working on something more important
Posts: 8,911
|
Post by ericmvan on Sept 24, 2016 8:10:12 GMT -5
However ... for pitchers withe the same ERA and IP, the less consistent pitcher is more valuable. Imagine two guys with a 4.50 ERA. The first guy allows 3 runs in 6 innings every game, for 32 starts. The second guy has 3 starts where he allows 5 runs in 5 IP. In those 3 games, he gave you less of a chance to win. But to offset those 3 bad starts and have the same total of IP and ER allowed, he also has three games where he allows 2 runs in 6 IP and 3 games where he allows 2 runs in 7 IP. He helps the team more in those six games then he hurts them in the 3 bad games. You can prove that point by taking this to an absurd extreme. The guy with a 4.50 ERA, 32 GS, and 192 IP you'd really want is one who gave up all of his 96 ER in one game, without retiring a batter, and pitched 6 shutout innings in every other start. That guy would be the most valuable pitcher in baseball. The more you cluster your badness, the more valuable you are. Giving up runs 4 and 5 is less damaging to your chances of winning than giving up runs 2 and 3. This is only half the equation. To follow your own extreme example - if the pitcher's team scored exactly zero runs in every game except for the the one where the pitcher gave up 96 runs, and in that game they scored 95 runs, then the pitcher's inconsistency would have made him as unvaluable as possible. So you really can't say whether the consistency is good or bad without the other part of the equation, but I would argue that as long as the pitcher's offense averages more runs in the games he pitches than they average giving up in those games, then consistency is more valuable than inconsistency. ( so in the case of most teams with positive run differentials it's better to be consistent and in the case of most teams with negative run differentials its better to be inconsistent - unless you're making extreme examples like we both do above, in which case you can make any case you want as long as you fudge the numbers to your liking) I don't think this is true. Teams with a higher standard deviation of RA outperform their Pythag, overall. I don't think that's possible if the teams with an above-average RS (or positive run differential) are underperforming and only the teams with a below-average RS (or negative run differential) are doing so. This follows from the basic nature of the game. Each run you give up reduces your chances of winning less than the previous run. This is true no matter how good your offense is. If your version worked, the mirror version would have to work as well. Offensive inconsistency is bad: when you inflate your RS by putting up 12 or 16 or 22 runs, the extra runs in those games would have obviously been more useful if they had happened in games where you only scored a few. Your argument here is equivalent to one that it's actually only bad for teams with below-average RA, because if you have lots of games with a high number of RA, then frequent blowout-level RS totals become so useful that they become a positive in general. But, again, I don't think that's the case. The distributions of RS and RA have the same rough shape regardless of how good a team is; they're just shifted on the axis. I do think you're onto something, in that the degree to which inconsistency of RA is good depends on RS, and the degree to which inconsistency of RS is bad depends on the RA. That's an interaction that you'd need to look it if you're constructing a new version of Pyth that includes the SD of RS and RA. What I'm saying is that this influence of the opposite stat on the value of consistency / inconsistency of RS or RA is an ameliorating / enhancing one, but it almost certainly does not cause it to flip.
|
|
|
Post by wcsoxfan on Sept 25, 2016 1:43:22 GMT -5
I don't think this is true. Teams with a higher standard deviation of RA outperform their Pythag, overall. I don't think that's possible if the teams with an above-average RS (or positive run differential) are underperforming and only the teams with a below-average RS (or negative run differential) are doing so. If the statistics you are using show deviation in terms of total runs in relation to teams overall rather than the individual team, then it would make sense that teams which give up more runs would have larger deviations as there are a larger number of possible outcomes as a result of the greater number of runs given up. Do you have the numbers which show the above to be true? This follows from the basic nature of the game. Each run you give up reduces your chances of winning less than the previous run. This is true no matter how good your offense is. This simply isn't true at all and is easily disproven by looking at win percentages at each number of runs scored. That sounds like a lot of work, but luckily Scott Lindholm did the heavy lifting a few years back. It's a funny bell curve with the difference between 3 and 4 runs scored having the largest outcome on win% since the Deadball Era. Here is the link if you are interested: beyondthescorecard.blogspot.com/2013/05/runs-scored-and-winning-percent.htmlIf your version worked, the mirror version would have to work as well. Offensive inconsistency is bad: when you inflate your RS by putting up 12 or 16 or 22 runs, the extra runs in those games would have obviously been more useful if they had happened in games where you only scored a few. Your argument here is equivalent to one that it's actually only bad for teams with below-average RA, because if you have lots of games with a high number of RA, then frequent blowout-level RS totals become so useful that they become a positive in general. But, again, I don't think that's the case. The distributions of RS and RA have the same rough shape regardless of how good a team is; they're just shifted on the axis. I'm not saying that teams with negative run differentials are benefited more by blowouts, simply stating that if they average fewer runs than the opposing team, and in each game between the two teams they each score their expected average, then the teams which averages fewer RS would lose every game and therefore benefit from wider distributions than the perfect average each time. I'm simply concluding that this would also be the case in more than this extreme instance. I'm sure someone has done the research and will check sometime later when i have more free time unless someone else beats me to it. I do think you're onto something, in that the degree to which inconsistency of RA is good depends on RS, and the degree to which inconsistency of RS is bad depends on the RA. That's an interaction that you'd need to look it if you're constructing a new version of Pyth that includes the SD of RS and RA. What I'm saying is that this influence of the opposite stat on the value of consistency / inconsistency of RS or RA is an ameliorating / enhancing one, but it almost certainly does not cause it to flip. Yeah - i really don't like the current Pyth and I think you got my gist. A pyth which measures the relationship between RS and RA as a % of total RS+RA would make far more sense than a simple RS-RA formula. The simplest way to observe this is by starting with at a team that outscores their opponent 324-162 during the season. You may expect them to win the vast majority of their games, but not all of their games. But a team that outscores their opponent 162-0 would clearly win 100% of their games. As the RS-RA = 162 in each of these instances, it's clear that the formula doesn't work. I bet that if you accounted for this, it would lead to a more predictive formula than what is currently available (although it would surprise me if someone isn't already doing this). --apologies for going so far off-subject everyone - feel free to move us to an off-subject thread.
|
|
|
Post by wcsoxfan on Sept 25, 2016 1:57:34 GMT -5
Back to topic - ESPN's Cy Young predictor now has Porcello as a runaway winner. I don't agree with the distance between himself and the other guys but do agree he's the leader. Only thing holding him back in the advanced metrics is his lack of Ks. www.espn.com/mlb/features/cyyoung
|
|
|
Post by jodyreidnichols on Sept 25, 2016 13:30:11 GMT -5
Which seems the best definition of most valuable However ... for pitchers withe the same ERA and IP, the less consistent pitcher is more valuable. Imagine two guys with a 4.50 ERA. The first guy allows 3 runs in 6 innings every game, for 32 starts. The second guy has 3 starts where he allows 5 runs in 5 IP. In those 3 games, he gave you less of a chance to win. But to offset those 3 bad starts and have the same total of IP and ER allowed, he also has three games where he allows 2 runs in 6 IP and 3 games where he allows 2 runs in 7 IP. He helps the team more in those six games then he hurts them in the 3 bad games. You can prove that point by taking this to an absurd extreme. The guy with a 4.50 ERA, 32 GS, and 192 IP you'd really want is one who gave up all of his 96 ER in one game, without retiring a batter, and pitched 6 shutout innings in every other start. That guy would be the most valuable pitcher in baseball. The more you cluster your badness, the more valuable you are. Giving up runs 4 and 5 is less damaging to your chances of winning than giving up runs 2 and 3. Your point stands on it's on and I think most fans figure this out for themselves, but in this case he's only has had one bad start
|
|
ericmvan
Veteran
Supposed to be working on something more important
Posts: 8,911
|
Post by ericmvan on Sept 25, 2016 21:59:46 GMT -5
I don't think this is true. Teams with a higher standard deviation of RA outperform their Pythag, overall. I don't think that's possible if the teams with an above-average RS (or positive run differential) are underperforming and only the teams with a below-average RS (or negative run differential) are doing so. If the statistics you are using show deviation in terms of total runs in relation to teams overall rather than the individual team, then it would make sense that teams which give up more runs would have larger deviations as there are a larger number of possible outcomes as a result of the greater number of runs given up. Do you have the numbers which show the above to be true? This follows from the basic nature of the game. Each run you give up reduces your chances of winning less than the previous run. This is true no matter how good your offense is. This simply isn't true at all and is easily disproven by looking at win percentages at each number of runs scored. That sounds like a lot of work, but luckily Scott Lindholm did the heavy lifting a few years back. It's a funny bell curve with the difference between 3 and 4 runs scored having the largest outcome on win% since the Deadball Era. Here is the link if you are interested: beyondthescorecard.blogspot.com/2013/05/runs-scored-and-winning-percent.htmlIf your version worked, the mirror version would have to work as well. Offensive inconsistency is bad: when you inflate your RS by putting up 12 or 16 or 22 runs, the extra runs in those games would have obviously been more useful if they had happened in games where you only scored a few. Your argument here is equivalent to one that it's actually only bad for teams with below-average RA, because if you have lots of games with a high number of RA, then frequent blowout-level RS totals become so useful that they become a positive in general. But, again, I don't think that's the case. The distributions of RS and RA have the same rough shape regardless of how good a team is; they're just shifted on the axis. I'm not saying that teams with negative run differentials are benefited more by blowouts, simply stating that if they average fewer runs than the opposing team, and in each game between the two teams they each score their expected average, then the teams which averages fewer RS would lose every game and therefore benefit from wider distributions than the perfect average each time. I'm simply concluding that this would also be the case in more than this extreme instance. I'm sure someone has done the research and will check sometime later when i have more free time unless someone else beats me to it. I do think you're onto something, in that the degree to which inconsistency of RA is good depends on RS, and the degree to which inconsistency of RS is bad depends on the RA. That's an interaction that you'd need to look it if you're constructing a new version of Pyth that includes the SD of RS and RA. What I'm saying is that this influence of the opposite stat on the value of consistency / inconsistency of RS or RA is an ameliorating / enhancing one, but it almost certainly does not cause it to flip. Yeah - i really don't like the current Pyth and I think you got my gist. A pyth which measures the relationship between RS and RA as a % of total RS+RA would make far more sense than a simple RS-RA formula. The simplest way to observe this is by starting with at a team that outscores their opponent 324-162 during the season. You may expect them to win the vast majority of their games, but not all of their games. But a team that outscores their opponent 162-0 would clearly win 100% of their games. As the RS-RA = 162 in each of these instances, it's clear that the formula doesn't work. I bet that if you accounted for this, it would lead to a more predictive formula than what is currently available (although it would surprise me if someone isn't already doing this). --apologies for going so far off-subject everyone - feel free to move us to an off-subject thread. Actually, that table shows that the second run you score is the most important, the third and fourth just a bit less so, and that it falls off from there. You have to subtract each Win% from the previous: .110, .147, ,140, .136, .106, .087, .071, .051, .041. You can see that scoring (or giving up) a 5th, 6th, or 7th run is a lot less damaging than giving up runs 1 through 4. Now, for starting pitchers going 6 innings on average (or a bit more) and bullpens on average giving up another run, outing with 4 ER allowed are the equivalent of 5 RA for the team, so a pitcher's 4th RA is discounted. The big confound here when using historical data is that the scoring of the two teams in not independent -- it is correlated by the weather, the park, and the era played in (if you're totaling over a long stretch of time). Take the actual distribution of RS and RA of each team in a league over a season and simulate, shuffling all the RS and RA at random (resolving ties in some fair way), and you will get fewer 2-1 and 3-2 games than you will in reality. weather absolutely changes the run expectation for both teams in tandem. There's You can factor the weather into the SNWP, but the underlying historical data has the confound that the weather and park ( correlates the scoring I really miss BP's "Support-Neutral Winning Percentage," which I had in fact conceived separately years before they introduced it. For a given IP and RA (properly adjusted for park, defense, quality of opposition -- the works), what are the odds of winning that game, with an average bullpen? That is essentially a perfected bWAR that also factors in these run distribution effects. For instance, for a given RA, there's a huge difference in expected win percentage in 50 degree weather with the wind howling in, versus the same RA on a hot day with the wind blowing out. A proper SNWP needs to in part adjust the RA, and in part the opposition RS expectation. It's a major project just to work out how it ought to work, let alone derive the model.
|
|
|
Post by wcsoxfan on Sept 25, 2016 22:42:26 GMT -5
Actually, that table shows that the second run you score is the most important, the third and fourth just a bit less so, and that it falls off from there. You have to subtract each Win% from the previous: .110, .147, ,140, .136, .106, .087, .071, .051, .041. You can see that scoring (or giving up) a 5th, 6th, or 7th run is a lot less damaging than giving up runs 1 through 4. Now, for starting pitchers going 6 innings on average (or a bit more) and bullpens on average giving up another run, outing with 4 ER allowed are the equivalent of 5 RA for the team, so a pitcher's 4th RA is discounted. The big confound here when using historical data is that the scoring of the two teams in not independent -- it is correlated by the weather, the park, and the era played in (if you're totaling over a long stretch of time). Take the actual distribution of RS and RA of each team in a league over a season and simulate, shuffling all the RS and RA at random (resolving ties in some fair way), and you will get fewer 2-1 and 3-2 games than you will in reality. weather absolutely changes the run expectation for both teams in tandem. There's You can factor the weather into the SNWP, but the underlying historical data has the confound that the weather and park ( correlates the scoring I really miss BP's "Support-Neutral Winning Percentage," which I had in fact conceived separately years before they introduced it. For a given IP and RA (properly adjusted for park, defense, quality of opposition -- the works), what are the odds of winning that game, with an average bullpen? That is essentially a perfected bWAR that also factors in these run distribution effects. For instance, for a given RA, there's a huge difference in expected win percentage in 50 degree weather with the wind howling in, versus the same RA on a hot day with the wind blowing out. A proper SNWP needs to in part adjust the RA, and in part the opposition RS expectation. It's a major project just to work out how it ought to work, let alone derive the model. You really need to read the the article as the numbers you posted above are the average going back 100 years and include non-relevant data which is why i stated "since the Deadball Era". They clearly map-out the difference in win percentage in the graphic below for each time period and independently each period since the Deadball Era equates to the 3-4 run gap as being the largest change in win% But as you mentioned the largest gap in win%, with Deadball Era data included, as being much lower, you are essentially proving my point: as the average runs increase the greatest increase in win% becomes clustered around the average number of runs. The part I find interesting is that the greatest win% increase seems to be on the low side for the given era's average runs scored, which to me indicates that there are outlier games which fail to be indicators of win% after a large enough number of runs are scored. It makes sense that there would be more outliers on the high-side than low-side pushing the mean artificially high for RS/RA and low as an indicator of win% because the greatest outlier down is '0' while the greatest outlier up is undetermined and much further from the mean than zero. I'm curious whether median runs would be more indicative of projecting win% than mean runs. As i disproved the premise for your opinion above, I'd also like to know if you have any data to back up your opinions on the subject, or if it's purely subjective.
|
|
ericmvan
Veteran
Supposed to be working on something more important
Posts: 8,911
|
Post by ericmvan on Sept 26, 2016 0:28:49 GMT -5
Actually, that table shows that the second run you score is the most important, the third and fourth just a bit less so, and that it falls off from there. You have to subtract each Win% from the previous: .110, .147, ,140, .136, .106, .087, .071, .051, .041. You can see that scoring (or giving up) a 5th, 6th, or 7th run is a lot less damaging than giving up runs 1 through 4. Now, for starting pitchers going 6 innings on average (or a bit more) and bullpens on average giving up another run, outing with 4 ER allowed are the equivalent of 5 RA for the team, so a pitcher's 4th RA is discounted. The big confound here when using historical data is that the scoring of the two teams in not independent -- it is correlated by the weather, the park, and the era played in (if you're totaling over a long stretch of time). Take the actual distribution of RS and RA of each team in a league over a season and simulate, shuffling all the RS and RA at random (resolving ties in some fair way), and you will get fewer 2-1 and 3-2 games than you will in reality. weather absolutely changes the run expectation for both teams in tandem. There's You can factor the weather into the SNWP, but the underlying historical data has the confound that the weather and park ( correlates the scoring I really miss BP's "Support-Neutral Winning Percentage," which I had in fact conceived separately years before they introduced it. For a given IP and RA (properly adjusted for park, defense, quality of opposition -- the works), what are the odds of winning that game, with an average bullpen? That is essentially a perfected bWAR that also factors in these run distribution effects. For instance, for a given RA, there's a huge difference in expected win percentage in 50 degree weather with the wind howling in, versus the same RA on a hot day with the wind blowing out. A proper SNWP needs to in part adjust the RA, and in part the opposition RS expectation. It's a major project just to work out how it ought to work, let alone derive the model. You really need to read the the article as the numbers you posted above are the average going back 100 years and include non-relevant data which is why i stated "since the Deadball Era". They clearly map-out the difference in win percentage in the graphic below for each time period and independently each period since the Deadball Era equates to the 3-4 run gap as being the largest change in win% But as you mentioned the largest gap in win%, with Deadball Era data included, as being much lower, you are essentially proving my point: as the average runs increase the greatest increase in win% becomes clustered around the average number of runs. The part I find interesting is that the greatest win% increase seems to be on the low side for the given era's average runs scored, which to me indicates that there are outlier games which fail to be indicators of win% after a large enough number of runs are scored. It makes sense that there would be more outliers on the high-side than low-side pushing the mean artificially high for RS/RA and low as an indicator of win% because the greatest outlier down is '0' while the greatest outlier up is undetermined and much further from the mean than zero. I'm curious whether median runs would be more indicative of projecting win% than mean runs. As i disproved the premise for your opinion above, I'd also like to know if you have any data to back up your opinions on the subject, or if it's purely subjective. As far as the version of Pyth with RS +/- SD and RA +/- SD, someone did that work on SoSH about 10 years ago. Yes, it does follow that the benefit of team pitching inconsistency comes from the fact that clustering your RA in blowouts makes you beat your Pyth. Which is to say, it's from the skew in the distribution. So it doesn't apply carte blanche when comparing two good pitchers to one another. Based on the most recent set of data, a team's nth run scored or allowed should be multiplied by this factor to represent its impact on winning: 6: .948 7: .894 8: .845 9: .779 For starting pitchers we can assume 1 RA by the average pen, so we can look at starts with 5 or more ER allowed and give a discount for those R allowed (by adding 1 and using that chart). Porcello has allowed 5 ERA once, which is a trivial .05 run adjustment (in his favor). Kluber has allowed 5, 5, 5, 6, 6, and that's 3 * .05 + 2 * .115 = .38 ER. So Kluber's inconsistency has actually made him about 1/3 of an ER more valuable. These adjustments are much smaller than inherited runner adjustments (which I plan to do), but that doesn't mean you wouldn't include them in an ideal system. Adjustments for the distribution of 0 to 4 ER allowed would very likely be even smaller.
|
|
|
Post by philarhody on Sept 26, 2016 22:36:11 GMT -5
You really need to read the the article as the numbers you posted above are the average going back 100 years and include non-relevant data which is why i stated "since the Deadball Era". They clearly map-out the difference in win percentage in the graphic below for each time period and independently each period since the Deadball Era equates to the 3-4 run gap as being the largest change in win% But as you mentioned the largest gap in win%, with Deadball Era data included, as being much lower, you are essentially proving my point: as the average runs increase the greatest increase in win% becomes clustered around the average number of runs. The part I find interesting is that the greatest win% increase seems to be on the low side for the given era's average runs scored, which to me indicates that there are outlier games which fail to be indicators of win% after a large enough number of runs are scored. It makes sense that there would be more outliers on the high-side than low-side pushing the mean artificially high for RS/RA and low as an indicator of win% because the greatest outlier down is '0' while the greatest outlier up is undetermined and much further from the mean than zero. I'm curious whether median runs would be more indicative of projecting win% than mean runs. As i disproved the premise for your opinion above, I'd also like to know if you have any data to back up your opinions on the subject, or if it's purely subjective. As far as the version of Pyth with RS +/- SD and RA +/- SD, someone did that work on SoSH about 10 years ago. Yes, it does follow that the benefit of team pitching inconsistency comes from the fact that clustering your RA in blowouts makes you beat your Pyth. Which is to say, it's from the skew in the distribution. So it doesn't apply carte blanche when comparing two good pitchers to one another. Based on the most recent set of data, a team's nth run scored or allowed should be multiplied by this factor to represent its impact on winning: 6: .948 7: .894 8: .845 9: .779 For starting pitchers we can assume 1 RA by the average pen, so we can look at starts with 5 or more ER allowed and give a discount for those R allowed (by adding 1 and using that chart). Porcello has allowed 5 ERA once, which is a trivial .05 run adjustment (in his favor). Kluber has allowed 5, 5, 5, 6, 6, and that's 3 * .05 + 2 * .115 = .38 ER. So Kluber's inconsistency has actually made him about 1/3 of an ER more valuable. These adjustments are much smaller than inherited runner adjustments (which I plan to do), but that doesn't mean you wouldn't include them in an ideal system. Adjustments for the distribution of 0 to 4 ER allowed would very likely be even smaller. *see aporia
|
|
|
Post by telluricrook on Sept 26, 2016 23:32:10 GMT -5
If the statistics you are using show deviation in terms of total runs in relation to teams overall rather than the individual team, then it would make sense that teams which give up more runs would have larger deviations as there are a larger number of possible outcomes as a result of the greater number of runs given up. Do you have the numbers which show the above to be true? This simply isn't true at all and is easily disproven by looking at win percentages at each number of runs scored. That sounds like a lot of work, but luckily Scott Lindholm did the heavy lifting a few years back. It's a funny bell curve with the difference between 3 and 4 runs scored having the largest outcome on win% since the Deadball Era. Here is the link if you are interested: beyondthescorecard.blogspot.com/2013/05/runs-scored-and-winning-percent.htmlI'm not saying that teams with negative run differentials are benefited more by blowouts, simply stating that if they average fewer runs than the opposing team, and in each game between the two teams they each score their expected average, then the teams which averages fewer RS would lose every game and therefore benefit from wider distributions than the perfect average each time. I'm simply concluding that this would also be the case in more than this extreme instance. I'm sure someone has done the research and will check sometime later when i have more free time unless someone else beats me to it. Yeah - i really don't like the current Pyth and I think you got my gist. A pyth which measures the relationship between RS and RA as a % of total RS+RA would make far more sense than a simple RS-RA formula. The simplest way to observe this is by starting with at a team that outscores their opponent 324-162 during the season. You may expect them to win the vast majority of their games, but not all of their games. But a team that outscores their opponent 162-0 would clearly win 100% of their games. As the RS-RA = 162 in each of these instances, it's clear that the formula doesn't work. I bet that if you accounted for this, it would lead to a more predictive formula than what is currently available (although it would surprise me if someone isn't already doing this). --apologies for going so far off-subject everyone - feel free to move us to an off-subject thread. Actually, that table shows that the second run you score is the most important, the third and fourth just a bit less so, and that it falls off from there. You have to subtract each Win% from the previous: .110, .147, ,140, .136, .106, .087, .071, .051, .041. You can see that scoring (or giving up) a 5th, 6th, or 7th run is a lot less damaging than giving up runs 1 through 4. Now, for starting pitchers going 6 innings on average (or a bit more) and bullpens on average giving up another run, outing with 4 ER allowed are the equivalent of 5 RA for the team, so a pitcher's 4th RA is discounted. The big confound here when using historical data is that the scoring of the two teams in not independent -- it is correlated by the weather, the park, and the era played in (if you're totaling over a long stretch of time). Take the actual distribution of RS and RA of each team in a league over a season and simulate, shuffling all the RS and RA at random (resolving ties in some fair way), and you will get fewer 2-1 and 3-2 games than you will in reality. weather absolutely changes the run expectation for both teams in tandem. There's You can factor the weather into the SNWP, but the underlying historical data has the confound that the weather and park ( correlates the scoring I really miss BP's "Support-Neutral Winning Percentage," which I had in fact conceived separately years before they introduced it. For a given IP and RA (properly adjusted for park, defense, quality of opposition -- the works), what are the odds of winning that game, with an average bullpen? That is essentially a perfected bWAR that also factors in these run distribution effects. For instance, for a given RA, there's a huge difference in expected win percentage in 50 degree weather with the wind howling in, versus the same RA on a hot day with the wind blowing out. A proper SNWP needs to in part adjust the RA, and in part the opposition RS expectation. It's a major project just to work out how it ought to work, let alone derive the model. Jibber Jabber Jibber Jabber Mumbo Jumbo!
|
|
|
Post by telluricrook on Sept 26, 2016 23:34:10 GMT -5
If you want to win you have to score more runs than the other team!
|
|
|
Post by bosox81 on Oct 4, 2016 20:00:30 GMT -5
Wow, I hadn't even realized this until now, but by fwar Porcello was the best pitcher in the AL. I'd be surprise if he doesn't win the award by a landslide.
|
|
|
Post by rookie13 on Oct 4, 2016 20:49:46 GMT -5
I'm still undecided as to who should win the AL Cy Young. In my opinion, Porcello, Kluber, and Sale should be the top 3 in whatever order. But honestly, did anyone on earth think for a second, before the season started, that Porcello would even be mentioned for this award?
|
|
|
Post by threeifbaerga on Oct 5, 2016 7:21:22 GMT -5
I'm still undecided as to who should win the AL Cy Young. In my opinion, Porcello, Kluber, and Sale should be the top 3 in whatever order. But honestly, did anyone on earth think for a second, before the season started, that Porcello would even be mentioned for this award? Joe Kelly, maybe.
|
|
|
Post by Oregon Norm on Nov 16, 2016 21:18:13 GMT -5
Moved all the posts about Porcello having won the award to a new thread.
|
|
|