Thoughts on the 20-80 scale

ericmvan
Veteran

Supposed to be working on something more important

Posts: 8,936

Thoughts on the 20-80 scale Mar 31, 2021 18:44:31 GMT -5

Quote

Post by ericmvan on Mar 31, 2021 18:44:31 GMT -5

This reminded me that I had written a whole nearly-finished argument for replacing the current grade system with 25 to 80, back in late October! And forgot about it.

Rereading it ... I also forgot that I used WAR to tweak Kiley McDaniel's definitions of each grade!

It's quite long, so here's the tl;dr version.

1) A system that forces you to put Bobby Dalbec and Tanner Houck either in the same bucket as Tristan Casas, or in the same with Jay Groome, is plainly not fine-grained enough to do its job.

2) The 20-80 scale, redefined, via the actual distribution of roles in MLB from 2017 to 2019. The WAR/150 figures for 45 and above are empirical; it matches Kiley's at 60, but the observed spread is 1.1 WAR per grade, not 1.0. Obviously, you'd stick with Kiley's in most cases. Kiley has 45 as 1.5 WAR and Bench at 1.0; I have to do revise the WAR scale for 45 and below and just figured out what to do differently.

Gr	Kiley	                Me (Objective)	     WAR/150
80	Top 1-2 Player	        Best player in MLB	
75	Top 2 to 3 player	Top 2 to 3 player	7.3
70	Top 5 player	        Top 5 player	        6.2
65	All-Star	        Perennial All-Star	5.1
60	"Plus" 	                All-Star Candidate	4.0
55	Above-Average Regular	First-Division Regular	2.9
50	Average Regular	        Average Regular         1.8
45	Platoon / Utility	Second Division Regular	0.7
40	Bench Player	        Bench Player	
35	Emergency Player	Up-and-Down Player	
30		Solid Org Player	
25		Ordinary Org Player	
20		Fringe Org Player

And now the full argument and analysis.

(A good chunk of this belongs in the Meta forum, e.g., under a new thread called "Grading System," and mods can feel free to move it there once folks have had a chance to get a look at the stuff of general interest).

I would love to see you replace the 2 through 8 grade scale with a 20 through 80, which would make your evaluations directly comparable to BA, MLB, and FB (among others, no doubt). That would be hugely useful.

(BA and MLB grades are realistic ceiling plus risk of not reaching, it rather than an Overall Future Production, but you can convert their system to OFP fairly well by subtracting 5 points for High risk and 10 for Very High or Extreme).

The short version of this argument: a grading system that has Bobby Dalbec and Tanner Houck, who already have had significant MLB success and shown serious tools, the same as Jay Groome and Also Ramirez (that's a typo, but it's also a good nickname!) is clearly too wide-grained to do the job it's designed for. I just have to say those names and the distinction between the two pairs of prospects is clear and anything but negligible.

So, why not promote Dalbec and Houick to 5’s? Because now they have the same grade as Tristan Casas, which is just as obviously wrong, and for the same reason. Our minds can easily handle distinctions among groups of prospects that have double the discriminatory power of a simple 8 grade system—which in practice is really 4 grades, given the rarity of 7’s here (has it ever happened?).

Let’s start by verifying if there’s a WAR-based statistical rationale for the current descriptions of all the grades, and perhaps fine-tune the definitions and/or improve the descriptions thereby. I did hitters first and will get to pitchers soon [or never!].

I used fWAR from 2017-2019 as my data. I chose Kiley McDaniel’s descriptions of the grades. What I discovered was that if you defined 80 as 4 standard deviations above average, rather than the 3 as it is for tools, it matches beautifully. And this makes sense, because when you add up a set of tools, you get extra variance.

75 is described as top 2-3 players. Mookie and Aaron Judge are the only guys who score 75—and they rank second and third.

70 is a top 5 player. Rendon, Yelich, and Bregman are 70.

But here’s the best part. 80 means best player in MLB, but Mike Trout’s an 85. And obviously an 85 score is meaningless for scouting … but just as obviously, if an 80 means, best guy in baseball, future HOF, you need an 85 for maybe the best player in MLB history.

A 65 is described as All-Star. There are 20 guys with that grade or better, which is 2.4 players per the 8.5 positions. So if you refine that to “Perennial All-Star,” this is spot-on. It’s 1.2 guys per league, which means that in an off year you’re likely a reserve rather than starting.

A 60 is described as “Plus,” which is just the definition of the grade and not really helpful! But there are 5 guys per position this good, one more than you can fit on an All-Star team. So this can be described as “All-Star Candidate.”

A 55 is “Above Average Regular.” There are 9.6 players this good per position. Call that 10 and it’s perfect; it’s the top third of regulars. Since “Above Average” technically describes the 14th best player, what we’re talking about here is “First Division Regular.” If you’ve got one of the 10 best players at a position, there’s little thought of upgrading him.

A 50 is Average, which you knew. There are 149 players this good, 5 per team and 18 per position. What we can now say is that the 11th through 13th best players at a position are “Solid Average,” the 14th through 16th are just plain Average, and the 17th and 18th are “Fringe Average.”

A 45 is described as “Platoon / Utility.” Really? What happened to all the below average regulars? This grade has to be “Second Division Regular.” These are the 19th through 25th best players at a position. Beyond that, you’re starting a guy who should be on the bench of a good team.

A 40 is a Bench Player, a 35 is better described as “Up-and-Down Player” than Kiley’s “Emergency Player,” and 30, 25, and 20 would be Solid, normal, and Fringe Organizational players.

Your 5 grade includes both 50 and 55 -- Casas is clearly the latter now -- and I believe it’s one of the reasons the 4.5 bucket, itself already the sole intermediate grade in the system, is bloated in range (I’ll get to the other in a moment).

By definition 50 is an average regular. That would mean you rank 11th to 20th among players at your position, with the reminder that not that much separates the top and bottom of that group. It's "solid” to “fringe” first division starters. It's a guy you're thinking about upgrading, especially in the lower stretch, but even then it's not quite perceived as "hole." It's hard for me to say that Dalbec now projects to be the 21st to 30th best 3B in MLB, in the sense of median outcome, which is what putting a 45 on him means. He has the one weakness, which he has continually improved, and several strengths.

And the same argument goes for Houck. Justin Masterson had a 4.3 bWAR season for the Indians throwing 99.6% fastballs and sliders, so the whole "needs a third pitch" argument is dubious. And Masterson's stuff doesn't grade out nearly as well as Houck's. Yeah, command is huge, but it's hard to regress Houck's projection to being in the bottom 1/3 of MLB pitchers after what we've seen. With both him and Dalbec I can certainly see an outcome that's just below average as the median -- Dalbec as the 17th or 18th best 3B -- but putting a 45 on a guy is saying he's more likely to be in the bottom 17% of MLB players than dead-average. That doesn't seem right for either of these guys. If you were forced to pick one, you'd pick the latter.

The current problem, of course, is that if you put a 5 on Dalbec and Houck, that puts them in the same bucket as Casas, and that's obviously wrong.

(As I said in the short version) ... A system that forces you to put Dalbec and Houck either in the same bucket as Casas, or in the same with Groome, is plainly not fine-grained enough to do its job.

But if you had a 55 for Casas, then Dalbec and Houck can be 50 where they belong.

And the system will really come in useful when we need to differentiate the 65's from the 60's.

Last Edit: Mar 31, 2021 18:46:12 GMT -5 by ericmvan

"You either need some medication or you're an a******." -- David Ortiz correctly diagnosing Bobby Valentine

philsbosoxfan Veteran Posts: 15,903	Thoughts on the 20-80 scale Mar 31, 2021 20:49:44 GMT -5 dd likes this Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by philsbosoxfan on Mar 31, 2021 20:49:44 GMT -5 I give it an 80.
	Proud survivor of a hole in the ozone layer, an ice age, a complete polar cap meltdown, a worldwide millennium computer shutdown, and multiple; solar storms, Mayan calendar dates, Nostradamus quatrains and Apocalypses.

soxinsf
Veteran

Posts: 778

Thoughts on the 20-80 scale Mar 31, 2021 23:22:59 GMT -5 via mobile

Quote

Post by soxinsf on Mar 31, 2021 23:22:59 GMT -5

For the past several decades, I have written about an industry for which rating systems and the decisions that flow from them are extremely important.

Over that entire time, there have been debates about the many variations of evaluative notations. For example, the Michelin Guide, the most important restaurant rating guide, uses it a simple zero/one star/two star/three star system plus other symbols for good value meals. A competitive system, also from France, uses a 20-point system and also uses the half points so that system has a possible range of 40 different ratings.

The current 2-8 system, which uses half points when it wants to, has a maximum of 15 rating possibilities. The 20-80 system, if it uses only the fives and tens, also has a maximum of fifteen rating possibilities.

There are as many possible rating systems are there are imaginations. The 100-point system in use by Consumer Reports is also in very wide use in publications that rate all manner of alcoholic beverages.

Ultimately, however, it is not the rating system itself so much as the operator of that system that determines the value/ the utility of the ratings. To put it another way, a 5-rating from an unreliable source is not nearly as useful as a five from a trusted source. The same is true of similar ratings in any system.

So, what about changing from 2/8 to 20/80? Is there really a difference in potential accuracy?

ericmvan
Veteran

Supposed to be working on something more important

Posts: 8,936

Thoughts on the 20-80 scale Apr 1, 2021 1:49:26 GMT -5

Quote

Post by ericmvan on Apr 1, 2021 1:49:26 GMT -5

Mar 31, 2021 23:22:59 GMT -5 soxinsf said:

For the past several decades, I have written about an industry for which rating systems and the decisions that flow from them are extremely important.

Over that entire time, there have been debates about the many variations of evaluative notations. For example, the Michelin Guide, the most important restaurant rating guide, uses it a simple zero/one star/two star/three star system plus other symbols for good value meals. A competitive system, also from France, uses a 20-point system and also uses the half points so that system has a possible range of 40 different ratings.

The current 2-8 system, which uses half points when it wants to, has a maximum of 15 rating possibilities. The 20-80 system, if it uses only the fives and tens, also has a maximum of fifteen rating possibilities.

There are as many possible rating systems are there are imaginations. The 100-point system in use by Consumer Reports is also in very wide use in publications that rate all manner of alcoholic beverages.

Ultimately, however, it is not the rating system itself so much as the operator of that system that determines the value/ the utility of the ratings. To put it another way, a 5-rating from an unreliable source is not nearly as useful as a five from a trusted source. The same is true of similar ratings in any system.

So, what about changing from 2/8 to 20/80? Is there really a difference in potential accuracy?

The current system here uses only 4.5 among the half-points. A 2 through 8 system with half points everywhere IS the 20-80 system with only x5 used. It just has the decimal point moved.

It would indeed be less disruptive if SP simply added 3.5, 5.5, and 6.5 to the system. As I said, that allows you to put a 5.5 on Casas and Downs which in turn allows you to put a 5 on Duran, Dallbec, Mata, Houck, and Jimenez, who are clearly in a class above Seabold, Ward, Song, Yorke, Groome, and Ramirez.

When you put a 5.0 on a position player who is close to MLB, where the mean and median projections become close to one another, you're saying that if you let his career play out 100 times in parallel universes with different values for the currently unknown factors, the 50th best outcome has him as an average MLB player at his position. Putting a 4.5 says you get the 22nd best player at his position, which is a guy you'd be looking to relegate to a bench role if you wanted to contend.

The simple question for 5 vs. 4.5 is thus, which is more likely -- that this guy ends up as an acceptable starter at his position for a contendor, or not? Would anyone bet against Duran or Dalbec being good enough to merely hold down a job on a contender? (And no, I would not have made the same argument about Will Middlebrooks!) It may not be hugely more likely than not that Dalbec and Duran become acceptable starters for a contending team, but I think its more likely than not. That makes them 5 players, not 4.5.

"You either need some medication or you're an a******." -- David Ortiz correctly diagnosing Bobby Valentine

soxinsf
Veteran

Posts: 778

Thoughts on the 20-80 scale Apr 1, 2021 9:56:09 GMT -5 via mobile

Quote

Post by soxinsf on Apr 1, 2021 9:56:09 GMT -5

Eric—So let’s boil it down to this. The current system would work better if it used half points whenever it was helpful in displaying a shorthand notational meaning to the long and thoughtful word description provided for each player.

Yes!

Post by ericmvan on Mar 31, 2021 18:44:31 GMT -5

Post by philsbosoxfan on Mar 31, 2021 20:49:44 GMT -5

Post by soxinsf on Mar 31, 2021 23:22:59 GMT -5

Post by ericmvan on Apr 1, 2021 1:49:26 GMT -5

Post by soxinsf on Apr 1, 2021 9:56:09 GMT -5