Monday, November 25, 2019

What’s Wrong with the AIS Awards System


by Tom Waters

One of the most important functions of the American Iris Society (AIS) is to carefully evaluate new irises as they grow in gardens and decide which are worthy of commendation and can be recommended to the gardening public. This is done through a system of trained garden judges working in all geographical regions, who evaluate the irises and vote them awards.

I’ve been growing irises on and off since the 1970s, and served as a judge for many years. There have always been grumblings about the award system, from simple shaking of the head (“What were the judges thinking?”) to tales of secret regional cabals working to subvert the process. I’ve not taken much heed of such complaints, attributing them to a combination of sour grapes and the ubiquitous human inclination to complain and gossip. Although there are exceptions, I’m sure, judges I have known personally have all been honest, conscientious, and reasonably skilled and knowledgeable. They do their very best to vote for irises they deem truly worthy of recognition.

Nevertheless, I think there is a fundamental structural problem with the process of voting for AIS awards that keeps some good irises from being recognized and elevates some mediocre ones to unearned fame.

The awards system asks judges to vote following the model of a political election: an assortment of eligible candidates are placed on the ballot, and the judges are to vote for the one(s) they deem best. For this system to identify the best irises, judges need to be familiar with all or most of the candidates on the ballot. The rules state that you should not vote for an iris unless you have seen it growing in a garden (or gardens) over more than one year. Ideally, the judges should grow the irises themselves. The ideal of judges intimately familiar with all the candidates is not usually met. Often, judges have seen only a smattering of the eligible irises (particularly for early awards, such as honorable mention). They may select the best of those they are familiar with, but if they are only familiar with 10%, what of the other 90%?

When there are many names on the ballot, but only a few are actually seen and evaluated by the judges, the system is very vulnerable to a particular sort of bias. Not an intentional bias on the part of judges, but a systemic bias built in to the process: the more widely grown an iris is, the more likely it is to win awards.

Consider this hypothetical. Assume there are about 400 judges voting. Iris A is bred by a famous hybridizer that many iris growers order from. It is thus widely distributed and widely grown. 350 of those judges have seen it growing in a garden. It is a nice iris, but only 10% of the judges who have seen it think it should win the award. 10% is still 35 judges! Now consider iris B, introduced through a smaller iris garden that sells only a few irises each year. Maybe only 20 judges grow iris B. But iris B is extraordinary! It is so good in every way that 90% of the judges who grow it think it should win the award! But 90% of 20 judges is just 18, so iris B gets only about half the votes of iris A, although it is clearly a much better iris.

Note that this undesirable result is not a consequence of anyone making bad choices, being unethical, or doing anything wrong. The hybridizers, growers, and judges are all doing their best; it’s just the way the numbers play out.

Another way to look at this phenomenon is to consider the meaning of a judge voting for an iris or not voting for an iris. Clearly, a vote for an iris means the judge thought it was the best among those seen. But what does a judge not voting for an iris mean? It can mean two very different things: it can mean the judge has evaluated the iris and found it wanting, or it can simply mean the judge has not seen the iris. These are two very different circumstances, and treating them the same is a very bad idea.

In 2019, 378 judges voted for the Dykes Medal, and the iris that won received only 29 votes. That’s less than 8%. This is nothing new, it is typical of recent years. What does that mean? It is difficult for the public to be confident that this is the best iris of the year, when we don’t know what the other 349 judges thought of it. Did they love it, but just slightly preferred another iris over it? Did they think it was bad? Did they just not see it? Such ambivalent results are a direct consequence of using an election model with a long list of candidates, many of which are not familiar to most of the judges.

There is a way to address this structural bias. If we moved from an election model to a rating model, we could much more accurately identify the worthiest irises. A rating model is what is commonly used for reviews of products, businesses, restaurants, and so on. Everyone who is familiar with the product gives it a rating, and the average of those ratings is what helps future consumers decide whether the product is worthy or not.

How would a rating system for irises work? It would not have to be as elaborate as the 100-point scoring systems presented in the judges’ handbook. A rating from 1-10 would do just fine, or even a scale of 1-5 stars, like you often see in other product ratings.

Consider our two hypothetical irises again. Assume that judges who vote the iris worthy of the award rate it at 5 stars, and those who have seen it but do not vote for it rate at 3 stars. Iris A, which 350 have seen but only 10% vote for, would have an average rating of (315 x 3 + 35 x 5)/350 = 3.2. Iris B, which only 20 judges have seen but 90% vote for, would have an average rating of (2 x 3 + 18 x 5)/20 = 4.8. Iris B is the clear winner, as I think it should be.

In this system, judges would enter a rating for every iris they have evaluated. They would not have to pick the single best one to receive an award. They could rate any number of irises highly, and if they saw some with serious faults, they could give them low ratings, which would bring the average rating down and make it much less likely for these poorer irises to win awards, no matter how widely grown they are.

Judges would not enter a rating for irises they had not evaluated. So their not having seen it would not penalize the iris, since it would not affect its average rating at all. A non-rating (from not having seen the iris) would have a very different consequence from a low rating (the judge evaluated the iris and found it unworthy).

If such a system were implemented, some additional considerations would probably have to come into play. We might want the iris to be rated by some minimum number of judges before we would trust the average and give it an award, for example. We could also use this system to check for consistent performance in geographical areas, if that were deemed desirable. We could also demand a certain minimum average rating (say 4, perhaps), so that if no candidate iris were rated very highly, no award would be given.

Under the current system, I think the training and skill of the judges is largely wasted. They evaluate many irises over the course of the year, and form opinions about each one. That information is lost when they are instructed to simply vote for the best one. Every time a judge rates an iris favorably, its chance of receiving an award should go up; every time a judge rates an iris unfavorably, its chance should go down. Not being seen should not be a penalty.

A rating system would also encourage new hybridizers, as it would give us a way to recognize really exceptional irises that aren’t introduced through the big growers. It would allow hybridizers to build their reputation by receiving awards for quality work, rather than receiving awards because of an established reputation. Established hybridizers would not be much hurt by such a change; they still have the advantage of large, extended breeding programs and experience in recognizing quality seedlings. They don’t need the additional advantage of distribution bias to have a fair chance at awards.

I hope this post stimulates some discussion on the topic of our awards system and the consequences of structuring it as we have. I see the potential to improve the system in a way that makes it more fair to all new irises, more useful and credible with the gardening public, more supportive of new hybridizers, and more conscientious in reflecting the careful evaluation work of our judges.


13 comments:

  1. This makes a great deal of sense! Outstanding, and I hope it is considered by the AIS.

    ReplyDelete
  2. Excellent idea. Who should we talk to to get it done?

    ReplyDelete
  3. What a great idea Tom. I hope AIS will consider doing such a rating system.

    ReplyDelete
  4. Sounds like an ideal way to to judge all irises fairly! Great idea!

    ReplyDelete
  5. I had no idea that a rating system was NOT used! No wonder I questioned the quality of some of my iris. I wish you luck on getting the change implemented and thank you for a well thought out and well written article.

    ReplyDelete
  6. This does sound like a plausible concept as long as an iris with few votes but high rankings can be kept accurate with no possibility of cheating during the ranking process.Worth looking into in my opinion! verderandy

    ReplyDelete
  7. Thank you, Tom. It's wonderful that at last someone has publicly recognized that elephant in the room. Now is there any way to remove fashion and personal preference from the judging? Prize-winners should be great garden plants in as many climates as possible, and should be significantly different from what has come before. Someday a breeder will recognize that ruffles aren't necessarily required for beauty, that there's such a thing as simple elegance, but as long as judges are allowed to express preference in the degree of ruffling, we'll never get tailored irises with desirable modern characteristics.

    ReplyDelete
  8. Tom, thank you for an outstanding and thought-provoking post. Changing to a ratings model makes a great deal of sense and I hope the AIS will work to reform the awards system. As you pointed out, smaller hybridizers whose irises aren't widely distributed have almost no chance, and the same group of larger hybridizers tend to win year after year. I'm not knocking their irises, but how many wonderful new irises are we missing out on because they can't compete in the current system? Thanks again for your post. - Cathy Egerer

    ReplyDelete
  9. This suggestion is an excellent one for improving the awards process!! Thank you Tom. I also wish there were some way to limit the introduction of varieties not of any significant difference from things already on the market. There are now probably hundreds of iris introductions that no one grows, not only because they have not received recognition and awards, but because they are not significantly different from others already introduced. Though I realize such limiting of introductions is probably not practical, it would, in my humble opinion, have several benefits - fewer varieties for judges to have to keep abreast of, more accuracy of judging at shows (almost every year we have varieties entered under the wrong name and that not discovered until after the judging), and ultimately probably more income for hybridizers and easier decisions for buyers who tend to buy mostly "something new" that they recognize has merit and/or show winning qualities. (I can offer this viewpoint because I am one who, as a novice, introduced something I should not have because a few people in our club liked it though, while different, it was basically inferior to both of its parents). But I also agree with Nancy McDonald that excellent, beautiful self-colored, iris of classic form - "simple elegance" as Nancy put it - get overlooked for awards because judges are looking for the fancy, exotic and unique blooms, though they often do no grow well except on the West Coast. "Absolute Treasure" is an example that comes to mind of a "perfect" iris that, though it did get recognition, in my opinion should have gotten the Dykes Medal. But here, again, I guess, we come into the realm of personal preference. Anyway, Tom's idea merits wide acceptance.

    ReplyDelete
  10. Bonjour,

    Even if the word "impossible" is not supposed to exist in the French language, I would dare say that I find it impossible not to agree with the content of Tom's article.

    Now comes the interesting question of the rating system itself. I immediately think about the voting cards used by the judges in iris competitions. Their criteria and sometimes criteria coefficients may vary according to the country organizing the competition.

    Since judges worldwide are asked to vote, could this be an opportunity to create a basis for a unique voting rating system/card ? Florence Darthenay.



    ReplyDelete
  11. What an improvement this would be. Well done Tom.

    ReplyDelete
  12. I also like the rating system very much. I have often though how unfair it was to say an iris was the best when I had never gotten to see so very many that were on the ballot. How could I possibly say in all truthfulness that it is the best when I had not had the opportunity to compare!!
    Cynthia Wade Region 22

    ReplyDelete
  13. This makes so much sense and would level the playing field for smaller hybridizers. I hope the AIS considers the change

    ReplyDelete