by Tom Waters
One of the most important functions of the American Iris
Society (AIS) is to carefully evaluate new irises as they grow in gardens and
decide which are worthy of commendation and can be recommended to the gardening
public. This is done through a system of trained garden judges working in all
geographical regions, who evaluate the irises and vote them awards.
I’ve been growing irises on and off since the 1970s, and
served as a judge for many years. There have always been grumblings about the award
system, from simple shaking of the head (“What were the judges thinking?”) to
tales of secret regional cabals working to subvert the process. I’ve not taken
much heed of such complaints, attributing them to a combination of sour grapes
and the ubiquitous human inclination to complain and gossip. Although there are
exceptions, I’m sure, judges I have known personally have all been honest,
conscientious, and reasonably skilled and knowledgeable. They do their very
best to vote for irises they deem truly worthy of recognition.
Nevertheless, I think there is a fundamental structural problem with the process of
voting for AIS awards that keeps some good irises from being recognized and
elevates some mediocre ones to unearned fame.
The awards system asks judges to vote following the model of
a political election: an assortment of eligible candidates are placed on the
ballot, and the judges are to vote for the one(s) they deem best. For this
system to identify the best irises, judges need to be familiar with all or most
of the candidates on the ballot. The rules state that you should not vote for
an iris unless you have seen it growing in a garden (or gardens) over more than
one year. Ideally, the judges should grow the irises themselves. The ideal of
judges intimately familiar with all the candidates is not usually met. Often,
judges have seen only a smattering of the eligible irises (particularly for
early awards, such as honorable mention). They may select the best of those
they are familiar with, but if they are only familiar with 10%, what of the
other 90%?
When there are many names on the ballot, but only a few are
actually seen and evaluated by the judges, the system is very vulnerable to a
particular sort of bias. Not an intentional bias on the part of judges, but a
systemic bias built in to the process: the more widely grown an iris is, the
more likely it is to win awards.
Consider this hypothetical. Assume there are about 400
judges voting. Iris A is bred by a famous hybridizer that many iris growers
order from. It is thus widely distributed and widely grown. 350 of those judges
have seen it growing in a garden. It is a nice iris, but only 10% of the judges
who have seen it think it should win the award. 10% is still 35 judges! Now
consider iris B, introduced through a smaller iris garden that sells only a few
irises each year. Maybe only 20 judges grow iris B. But iris B is
extraordinary! It is so good in every way that 90% of the judges who grow it
think it should win the award! But 90% of 20 judges is just 18, so iris B gets
only about half the votes of iris A, although it is clearly a much better iris.
Note that this undesirable result is not a consequence of
anyone making bad choices, being unethical, or doing anything wrong. The hybridizers, growers, and judges are all doing their best; it’s just the way the numbers play
out.
Another way to look at this phenomenon is to consider the
meaning of a judge voting for an iris or not voting for an iris. Clearly, a
vote for an iris means the judge thought it was the best among those seen. But
what does a judge not voting for an
iris mean? It can mean two very different things: it can mean the judge has
evaluated the iris and found it wanting, or it can simply mean the judge has
not seen the iris. These are two very
different circumstances, and treating them the same is a very bad idea.
In 2019, 378 judges voted for the Dykes Medal, and the iris
that won received only 29 votes. That’s less than 8%. This is nothing new, it
is typical of recent years. What does that mean? It is difficult for the public
to be confident that this is the best iris of the year, when we don’t know what
the other 349 judges thought of it. Did they love it, but just slightly
preferred another iris over it? Did they think it was bad? Did they just not
see it? Such ambivalent results are a direct consequence of using an election
model with a long list of candidates, many of which are not familiar to most of
the judges.
There is a way to address this structural bias. If we moved
from an election model to a rating model, we could much more
accurately identify the worthiest irises. A rating model is what is commonly
used for reviews of products, businesses, restaurants, and so on. Everyone who
is familiar with the product gives it a rating, and the average of those
ratings is what helps future consumers decide whether the product is worthy or
not.
How would a rating system for irises work? It would not have
to be as elaborate as the 100-point scoring systems presented in the judges’
handbook. A rating from 1-10 would do just fine, or even a scale of 1-5 stars,
like you often see in other product ratings.
Consider our two hypothetical irises again. Assume that
judges who vote the iris worthy of the award rate it at 5 stars, and those who have seen it but do not vote for it rate at 3 stars. Iris A, which 350 have seen but only 10%
vote for, would have an average rating of (315 x 3 + 35 x 5)/350 = 3.2. Iris B,
which only 20 judges have seen but 90% vote for, would have an average rating
of (2 x 3 + 18 x 5)/20 = 4.8. Iris B is the clear winner, as I think it should
be.
In this system, judges would enter a rating for every iris
they have evaluated. They would not have to pick the single best one to receive
an award. They could rate any number of irises highly, and if they saw some
with serious faults, they could give them low ratings, which would bring the
average rating down and make it much less likely for these poorer irises to win
awards, no matter how widely grown they are.
Judges would not enter a rating for irises they had not
evaluated. So their not having seen it would not penalize the iris, since it
would not affect its average rating at all. A non-rating (from not having seen
the iris) would have a very different consequence from a low rating (the judge
evaluated the iris and found it unworthy).
If such a system were implemented, some additional
considerations would probably have to come into play. We might want the iris to
be rated by some minimum number of judges before we would trust the average and
give it an award, for example. We could also use this system to check for
consistent performance in geographical areas, if that were deemed desirable. We
could also demand a certain minimum average rating (say 4, perhaps), so that if
no candidate iris were rated very highly, no award would be given.
Under the current system, I think the training and skill of
the judges is largely wasted. They evaluate many irises over the course of the
year, and form opinions about each one. That information is lost when they are
instructed to simply vote for the best one. Every time a judge rates an iris
favorably, its chance of receiving an award should go up; every time a judge
rates an iris unfavorably, its chance should go down. Not being seen should not
be a penalty.
A rating system would also encourage new hybridizers, as it
would give us a way to recognize really exceptional irises that aren’t introduced
through the big growers. It would allow hybridizers to build their reputation
by receiving awards for quality work, rather than receiving awards because of
an established reputation. Established hybridizers would not be much hurt by
such a change; they still have the advantage of large, extended breeding
programs and experience in recognizing quality seedlings. They don’t need the
additional advantage of distribution bias to have a fair chance at awards.
I hope this post stimulates some discussion on the topic of
our awards system and the consequences of structuring it as we have. I see the
potential to improve the system in a way that makes it more fair to all new
irises, more useful and credible with the gardening public, more supportive of
new hybridizers, and more conscientious in reflecting the careful evaluation work
of our judges.
This makes a great deal of sense! Outstanding, and I hope it is considered by the AIS.
ReplyDeleteExcellent idea. Who should we talk to to get it done?
ReplyDeleteWhat a great idea Tom. I hope AIS will consider doing such a rating system.
ReplyDeleteSounds like an ideal way to to judge all irises fairly! Great idea!
ReplyDeleteI had no idea that a rating system was NOT used! No wonder I questioned the quality of some of my iris. I wish you luck on getting the change implemented and thank you for a well thought out and well written article.
ReplyDeleteThis does sound like a plausible concept as long as an iris with few votes but high rankings can be kept accurate with no possibility of cheating during the ranking process.Worth looking into in my opinion! verderandy
ReplyDeleteThank you, Tom. It's wonderful that at last someone has publicly recognized that elephant in the room. Now is there any way to remove fashion and personal preference from the judging? Prize-winners should be great garden plants in as many climates as possible, and should be significantly different from what has come before. Someday a breeder will recognize that ruffles aren't necessarily required for beauty, that there's such a thing as simple elegance, but as long as judges are allowed to express preference in the degree of ruffling, we'll never get tailored irises with desirable modern characteristics.
ReplyDeleteTom, thank you for an outstanding and thought-provoking post. Changing to a ratings model makes a great deal of sense and I hope the AIS will work to reform the awards system. As you pointed out, smaller hybridizers whose irises aren't widely distributed have almost no chance, and the same group of larger hybridizers tend to win year after year. I'm not knocking their irises, but how many wonderful new irises are we missing out on because they can't compete in the current system? Thanks again for your post. - Cathy Egerer
ReplyDeleteThis suggestion is an excellent one for improving the awards process!! Thank you Tom. I also wish there were some way to limit the introduction of varieties not of any significant difference from things already on the market. There are now probably hundreds of iris introductions that no one grows, not only because they have not received recognition and awards, but because they are not significantly different from others already introduced. Though I realize such limiting of introductions is probably not practical, it would, in my humble opinion, have several benefits - fewer varieties for judges to have to keep abreast of, more accuracy of judging at shows (almost every year we have varieties entered under the wrong name and that not discovered until after the judging), and ultimately probably more income for hybridizers and easier decisions for buyers who tend to buy mostly "something new" that they recognize has merit and/or show winning qualities. (I can offer this viewpoint because I am one who, as a novice, introduced something I should not have because a few people in our club liked it though, while different, it was basically inferior to both of its parents). But I also agree with Nancy McDonald that excellent, beautiful self-colored, iris of classic form - "simple elegance" as Nancy put it - get overlooked for awards because judges are looking for the fancy, exotic and unique blooms, though they often do no grow well except on the West Coast. "Absolute Treasure" is an example that comes to mind of a "perfect" iris that, though it did get recognition, in my opinion should have gotten the Dykes Medal. But here, again, I guess, we come into the realm of personal preference. Anyway, Tom's idea merits wide acceptance.
ReplyDeleteBonjour,
ReplyDeleteEven if the word "impossible" is not supposed to exist in the French language, I would dare say that I find it impossible not to agree with the content of Tom's article.
Now comes the interesting question of the rating system itself. I immediately think about the voting cards used by the judges in iris competitions. Their criteria and sometimes criteria coefficients may vary according to the country organizing the competition.
Since judges worldwide are asked to vote, could this be an opportunity to create a basis for a unique voting rating system/card ? Florence Darthenay.
What an improvement this would be. Well done Tom.
ReplyDeleteI also like the rating system very much. I have often though how unfair it was to say an iris was the best when I had never gotten to see so very many that were on the ballot. How could I possibly say in all truthfulness that it is the best when I had not had the opportunity to compare!!
ReplyDeleteCynthia Wade Region 22
This makes so much sense and would level the playing field for smaller hybridizers. I hope the AIS considers the change
ReplyDelete