SoxProspects News
|
|
|
|
Legal
Forum Ground Rules
The views expressed by the members of this Forum do not necessarily reflect the views of SoxProspects, LLC.
© 2003-2024 SoxProspects, LLC
|
|
|
|
|
Forum Home | Search | My Profile | Messages | Members | Help |
Welcome Guest. Please Login or Register.
MLB umpires show discrimination against non-white players
|
Post by redsoxfan2 on Aug 14, 2021 15:10:21 GMT -5
|
|
|
Post by foreverred9 on Aug 14, 2021 15:59:19 GMT -5
The author of the analysis (who did this as a senior thesis) explains his work in the following thread:
|
|
|
Post by benzinger on Aug 14, 2021 16:37:56 GMT -5
Does this take into account calls made by non-white umpires against non-white players?
|
|
|
Post by redsoxfan2 on Aug 14, 2021 16:47:41 GMT -5
Does this take into account calls made by non-white umpires against non-white players?
|
|
jbuttah
Veteran
Posts: 840
Member is Online
|
Post by jbuttah on Aug 14, 2021 16:48:25 GMT -5
Does this take into account calls made by non-white umpires against non-white players? I don't know, but it's not uncommon for poc to be biased towards other people of color. I've noticed employees at Chinatown restaurants in NY for example treat their white patrons way more politely than others even other Asians.
|
|
jl1947
Rookie
Posts: 146
Member is Online
|
Post by jl1947 on Aug 14, 2021 17:31:17 GMT -5
|
|
jl1947
Rookie
Posts: 146
Member is Online
|
Post by jl1947 on Aug 14, 2021 17:32:15 GMT -5
If the sample size is large enough, correlation = causation.
|
|
|
Post by incandenza on Aug 14, 2021 17:39:52 GMT -5
A nice short article on this sort of bias from a few years back mentions a similar finding among NBA refs: It's good that racial bias is correctable in this sort of context, but if the study holds up (maybe even if it doesn') it sure gives another big rhetorical tool to the argument for robo umps.
|
|
|
Post by jaffinator on Aug 14, 2021 19:53:24 GMT -5
So reading the actual paper, this is a conclusion that I might have been likely to "buy" ahead of time, but this is not an empirically convincing paper on its own (I am not aware of the state of this specific field at this moment). The inconsistent results by year and race-match within majority-minority status (comparing Hispanic and Black umps) or pitcher/batter status within racial pairing are not necessarily strongly suggestive of a strong statistical relationship. The combination of tremendous sample size + small effect invites questions as well. This may or may not be a problem, but there can also be subtle issues when using logistic regression with numerous controls but it may not be an issue here. Regardless of sample size, correlation is not causation. This is a case where Bayesian methods (remember mentioning this to another commenter earlier) may have been preferable. Pitch movement was a maybe include as well. Interesting subject.
|
|
|
Post by rangoon82 on Aug 15, 2021 15:03:18 GMT -5
If the sample size is large enough, correlation = causation. Everything else aside, this is not a true statement. In very large samples we can see that ice cream sales have a positive correlation with drownings. Ice cream doesn’t cause drownings
|
|
|
Post by GyIantosca on Aug 15, 2021 15:11:51 GMT -5
Boy I thought I have seen everything.
|
|
|
Post by jimed14 on Aug 15, 2021 15:30:24 GMT -5
If the sample size is large enough, correlation = causation. Everything else aside, this is not a true statement. In very large samples we can see that ice cream sales have a positive correlation with drownings. Ice cream doesn’t cause drownings Also, just because correlation does not equal causation, doesn't mean it is not the source of the cause. You can say hitting yourself in the head with a hammer leads to head injuries and someone can say that correlation does not equal causation. But sometimes it is.
|
|
|
Post by jaffinator on Aug 15, 2021 16:08:54 GMT -5
Everything else aside, this is not a true statement. In very large samples we can see that ice cream sales have a positive correlation with drownings. Ice cream doesn’t cause drownings Also, just because correlation does not equal causation, doesn't mean it is not the source of the cause. You can say hitting yourself in the head with a hammer leads to head injuries and someone can say that correlation does not equal causation. But sometimes it is. Right. If a and b are correlated that does not inherently mean a causes b. But if a does cause b, it should be the case that under correct model specifications a and b are correlated.
|
|
|
Post by patford on Aug 15, 2021 17:51:18 GMT -5
Now do the Yankees.
|
|
|
Post by Underwater Johnson on Aug 15, 2021 17:57:09 GMT -5
So reading the actual paper, this is a conclusion that I might have been likely to "buy" ahead of time, but this is not an empirically convincing paper on its own (I am not aware of the state of this specific field at this moment). The inconsistent results by year and race-match within majority-minority status (comparing Hispanic and Black umps) or pitcher/batter status within racial pairing are not necessarily strongly suggestive of a strong statistical relationship. The combination of tremendous sample size + small effect invites questions as well. This may or may not be a problem, but there can also be subtle issues when using logistic regression with numerous controls but it may not be an issue here. Regardless of sample size, correlation is not causation. This is a case where Bayesian methods (remember mentioning this to another commenter earlier) may have been preferable. Pitch movement was a maybe include as well. Interesting subject. So, either the results are statistically significant or they're not. Are they?
And if the experimental design is flawed (is that what you're suggesting?), then it's a lot of ado about nothing -- someone would need to go and do the study using the same data and a better design. It would appear that the data are publicly available...
|
|
|
Post by rangoon82 on Aug 15, 2021 19:36:57 GMT -5
Disclaimer I haven’t read this paper. But jaffinator is right. At high enough sample size everything becomes significant. So you have to consider the effect size as well, which is the size of the difference between two groups. Here that’s the difference between white and non white umps for some endpoint (maybe called strikes or safe calls or something - I should read the paper).
|
|
|
Post by foreverred9 on Aug 15, 2021 20:34:39 GMT -5
What I'd be interested in seeing this study quantify are the other biases. For example, how does this bias compare to the home field bias, the superstar pitcher bias, the winning team bias, late-inning bias, etc.? I'm not sure some of those are real, but what I want to see is whether this is the most extreme bias, a top-3 bias, or just one of many equally-sized biases.
What I'm certain from reviewing this study is that more analysis should be done.
|
|
|
Post by jaffinator on Aug 15, 2021 21:01:59 GMT -5
So reading the actual paper, this is a conclusion that I might have been likely to "buy" ahead of time, but this is not an empirically convincing paper on its own (I am not aware of the state of this specific field at this moment). The inconsistent results by year and race-match within majority-minority status (comparing Hispanic and Black umps) or pitcher/batter status within racial pairing are not necessarily strongly suggestive of a strong statistical relationship. The combination of tremendous sample size + small effect invites questions as well. This may or may not be a problem, but there can also be subtle issues when using logistic regression with numerous controls but it may not be an issue here. Regardless of sample size, correlation is not causation. This is a case where Bayesian methods (remember mentioning this to another commenter earlier) may have been preferable. Pitch movement was a maybe include as well. Interesting subject. So, either the results are statistically significant or they're not. Are they?
And if the experimental design is flawed (is that what you're suggesting?), then it's a lot of ado about nothing -- someone would need to go and do the study using the same data and a better design. It would appear that the data are publicly available...
To be precise with language, there is no experimental design because this is not an experiment. I would say that I don't agree with the interpretation/presentation of the statistical modelling. In general, when writing/reading social science research (which is what this essentially is) there's a lot of ground in between "this is significant so I believe it" and "this is insignificant so I don't." The results are significant, but there are various reasons that I'm at something less than "this is significant so I believe it." First is the inconsistency with the relationships presented in the paper. If bias is driving these results, why are effects not consistent across some breakdowns (white umps are reported as discriminating against both Black and Hispanic pitchers, but only against Hispanic batters for instance)? Then yes, as others have mentioned the combination of the small effect sizes and gigantic sample sizes (tenths of a percent effect size, millions of observations) makes it extremely easy to detect a relationship, when in reality there may not be one. I'm not saying that bias doesn't affect strikes/balls but instead that this paper is not strongly convincing evidence that it does.
|
|
|
Post by johnsilver52 on Aug 15, 2021 21:18:23 GMT -5
Boy I thought I have seen everything. Called today's world of look everywhere until find something u don't like, then either write facts, or in some cases as we see all the time now.. make them up and write them anyway. All this tells me is need fewer Angel hernandez (blind) umpires calling balls and strikes and kids in school writing about actual problems if they want to go that route, rather than current wokie item of the day awards.
|
|
|
Post by jaffinator on Aug 15, 2021 21:20:24 GMT -5
What I'd be interested in seeing this study quantify are the other biases. For example, how does this bias compare to the home field bias, the superstar pitcher bias, the winning team bias, late-inning bias, etc.? I'm not sure some of those are real, but what I want to see is whether this is the most extreme bias, a top-3 bias, or just one of many equally-sized biases. What I'm certain from reviewing this study is that more analysis should be done. That's kind of in the paper in the form of controls where you can compare the average marginal effects. Just straight up comparing the AME to the AME of some other controls - 1) 1 magnitude greater than batter All-Star appearances and score effects. 2) Same magnitude as it being any inning other than the first, it being the top of the inning, (I guess this is home field advantage?), there being a right-handed batter, there being a right handed pitcher, pitcher WAR, and pitcher All-Star appearances. 3) 1 magnitude less than the effects of there being one strike (compared to 0), there being more than 0 balls, there being two outs, there being a runner stealing, there being men on base, there being men in scoring position, batter WAR, and some of the years. 4) 2 magnitudes less than the effects of some years, the bases being loaded. 5) 3 magnitudes less than how far off the actual call was. All the effects I listed above are statistically significant according to the paper. An aside, but please don't just report average marginal effects. I understand that the paper is meant to be readable to a wider audience but AME depends strongly on the distribution of the data and it would be nice to see log-odds as well.
|
|
|
Post by jaffinator on Aug 15, 2021 21:22:15 GMT -5
Boy I thought I have seen everything. Called today's world of look everywhere until find something u don't like, then either write facts, or in some cases as we see all the time now.. make them up and write them anyway. All this tells me is need fewer Angel hernandez (blind) umpires calling balls and strikes and kids in school writing about actual problems if they want to go that route, rather than current wokie item of the day awards. If you don't understand something, you can just not comment on it.
|
|
|
Post by Underwater Johnson on Aug 15, 2021 22:58:13 GMT -5
So, either the results are statistically significant or they're not. Are they?
And if the experimental design is flawed (is that what you're suggesting?), then it's a lot of ado about nothing -- someone would need to go and do the study using the same data and a better design. It would appear that the data are publicly available...
To be precise with language, there is no experimental design because this is not an experiment. I would say that I don't agree with the interpretation/presentation of the statistical modelling. In general, when writing/reading social science research (which is what this essentially is) there's a lot of ground in between "this is significant so I believe it" and "this is insignificant so I don't." The results are significant, but there are various reasons that I'm at something less than "this is significant so I believe it." First is the inconsistency with the relationships presented in the paper. If bias is driving these results, why are effects not consistent across some breakdowns (white umps are reported as discriminating against both Black and Hispanic pitchers, but only against Hispanic batters for instance)? Then yes, as others have mentioned the combination of the small effect sizes and gigantic sample sizes (tenths of a percent effect size, millions of observations) makes it extremely easy to detect a relationship, when in reality there may not be one. I'm not saying that bias doesn't affect strikes/balls but instead that this paper is not strongly convincing evidence that it does. I appreciate that people ask questions, whether wokie or otherwise, because the alternative is... not asking questions. Seriously, how is that better? I would just like them to be asked in ways that result in defensible conclusions, whichever direction those go.
I have not read the paper (and have no interest in doing so... maybe I shouldn't be commenting) but it's hard for me to see how such a study could not rather easily produce simple, even quantitative results (e.g. calculate the total area of the strike zone) for each batter/umpire combination and group the data by race of ump, batter, and pitcher; and further break them down into individual umps, batters and pitchers, with individual x individual comparisons, as well as individual x race comparisons. The statistics shouldn't be that complex. (And I'm not sure that Bayesian analysis would be necessary or appropriate here, based on our earlier discussion...) And the null hypothesis is simple: umpires call balls and strikes the same, regardless of the whether a batter or pitcher is the same race as him.
Also, I disagree that with more data points you inevitably drive toward statistical significance. No. Things are either significantly different or they aren't. The grand assumption of analytical statistics is that once you have enough data points, it shouldn't matter how many more you add because they will simply reinforce the original finding. For example, let's say you test the flight of a 100 baseballs from 2014 hit by a mechanical batter that always hits with the same force, and compare the distances to 100 baseballs from 2015, and you find no significant difference between the distances. Ramping the data points up to 1000 of each or 10,000 of each is not going to magically reveal a statistical significance. Now, if you started with 5 of each year's ball, maybe there's not enough data yet -- just like if you studied 5 umpires calling balls and strikes on 5 batters. But with a dataset as large as is described for this study, statistical significance should not be a problem -- the null hypothesis is either rejected or it's not.
I would apologize for the TLDR post but if you're reading this thread, this is all relevant and essential. You can't talk about the results of a study without talking about how it was done. It's all about experimental design -- you cannot have a real result without a good design.
|
|
|
Post by cheers on Aug 15, 2021 23:59:46 GMT -5
Regardless of methodology or sample size, isn't .003 well within the range of "noise"? I'm not sure the ramifications are even worthy of debate.
That said, I genuinely hope there is no race element in umpiring...
|
|
|
Post by rangoon82 on Aug 16, 2021 6:21:52 GMT -5
Statistical significance is driven by sample size and effect size among other things. You can achieve statistical significance between between two groups at very low sample sizes if the effect size is big enough. For example, I wouldn't need many at bats vs major league pitching before a statistical test (and anyone watching) declared I am a significantly different hitter than than Ted Williams.
If you want to detect very small differences between two groups you need increasingly large sample sizes. Say you have a clinical trial with 1000 patients that shows no statistical significance between two drugs. Adding another 1000 patients can absolutely help achieve statistical significance - though the difference between the groups may be small and meaningless for all practical purposes.
In 2013 Jose Iglesias got 9 hits in his first 20 at bats (.450). Based on those ABs alone he was not statistically different than Ted Williams. As his number of at bats increased we could say with statistical significance that Jose Iglesias's batting average is significantly worse than Ted Williams'.
In your example of 2014 vs 2015 mechanical batter, I would suggest that as you approach infinity with your sample size you would start to detect subtle differences between the years. But you are right for two inherently equal processes the effect size is 0 so you need a sample size of infinity to achieve statistical significance.
For this study on ump racial bias (oops still haven't read it), I think jaffinator and others have asked questions about the effect size despite it claiming statistical significance.
|
|
TearsIn04
Veteran
Everybody knows Nelson de la Rosa, but who is Karim Garcia?
Posts: 2,837
|
Post by TearsIn04 on Aug 16, 2021 13:02:08 GMT -5
I want to read the study and learn more, but .3 percent seems like a tiny deviation that is the result of randomness, not racism. Someone who knows more about stats than me can comment on whether .3 percent falls within the standard deviation.
I'm someone whose life has been shaped by racism, is pretty quick to suspect racism and hates racism. Other than concerns that everyone has - such as money, personal safety, well-being of loved ones, the Red Sox! - I think race is the greatest motivator in our society. Nothing else has caused us to enslave people, declare people 3/5 of a human being, segregate people, intern people, keep people from voting, etc. But I can't draw a conclusion from the numbers in that study.
I think the racism argument becomes even weaker when we consider that the umps miss so many damn calls! They are equal opportunity incompetents when it comes to calling balls and strikes.
|
|
|