Performance Appraisals, an Inspector General, and the Bell Curve

Performance appraisal season is approaching for federal employees. The author says that past performance ratings do not reliably predict positive changes for the future and offers some supporting evidence as to why.

‘Tis the season

The performance appraisal season has arrived for hundreds of thousands of Federal employees.  Ratings will be assigned to civil servants across the country and the world.  Some ratings will be used to determine salary increases, others bonuses/awards, some both, and many neither.  Those payouts, however, are drying up in most of the Executive Branch as a pay freeze and budget cuts take effect.

All of these appraisals will focus on a year’s worth of performance that will have already passed.  According to the late Dr. W. Edwards Deming, a brilliant statistician and management thinker, it is unlikely that all of this time and effort will lead to tangible improvements in the coming fiscal year.  He found that past ratings do not reliably predict positive changes for the future.  I haven’t found evidence that he was wrong.

Throwing darts and the bell curve

Most of the appraisals that are coming up next month, however, will be arrived at without much, if any, actual evidence obtained and annotated during the rating year.  The late Dr. Deming, however, would be surprised to learn that Federal supervisors and managers commonly lack sufficient documentation to grade employees objectively.

As students of appraisal know, subjective ratings are susceptible to factors other than the past year’s achievement.  They may be unduly influenced by recent events, by personal friendships and alliances, and other biases – most of them unconscious.  In fact, numerous studies of the evaluation process have shown that impressions of other people can be subject to any number of “non-merit” factors.

Another likely influence will be the “bell curve”.  It owes its name to the shape of a bell where people are clustered around a mean or central rating/outcome.  Federal agencies, however, cannot presume to have such a “forced distribution” of ratings.  To do so would violate Federal regulations.  5 CFR 430.208(c) states in part,

“The method for deriving and assigning a summary level may not limit or require the use of particular summary levels (i.e., establish a forced distribution of summary levels).”

It’s not your year

The use of a bell curved philosophy of performance ratings has always been a curiosity to me.  In most agencies it commonly works like this:

  1. An agency declares that goals, strategic plans, and objectivity are what drive performance evaluations.
  2. Many experts, HR professionals, and senior leadership profess that “objective measures” can and should be developed for any and every job.
  3. Managers and supervisors who are closer to where work actually gets accomplished see that objective metrics create half a picture (at best) of actual job performance.  Moreover, they demand considerable time/effort be spent documenting behind employees.
  4. Directives eventually provide a way out by: a)  Using vague “generic” or “benchmark” standards (or “contributing factors”) which can lead back to subjective ratings; and or b)  Suggesting that higher level standards not be defined in writing.

For a number of reasons, performance ratings tend to rise over years. In other words, the curve bulges to the right (due to higher performance ratings) over time.  Eventually, senior management and/or HR become alarmed that there are too many employees being evaluated above average.  Knowing that ratings in most agencies are anchored more to impressions than data, the “quota system” or “bell curve” is enforced… unofficially, of course.

I recall being told, “It’s not your year.  You got one [Outstanding rating] last year.  It’s someone else’s turn.”  This was an honest supervisor responding to the bell curve limitations being imposed from above.  I understood and appreciated the candid explanation.  It felt better than some pained pretext.  As his subject matter expert in area of performance appraisals, however, I also knew it violated the Code of Federal Regulations.

As if the Postal Service wasn’t in enough trouble

It was in this light that a recent report from the US Postal Service’s Office of Inspector General (OIG) caught my eye.  It seems as if lots of postal supervisors and managers, who are evaluated as “Executive Administrative Schedule” employees under a PFP system are unhappy.  On their behalf, the National Association of Postal Supervisors requested an investigation by their OIG regarding the efficacy of their evaluation process.

After examining the 2009 rating cycle, the IG found something that would startle only the most naïve among FedSmith readers.  The report concludes,

“We determined that individuals responsible for evaluating or approving sampled employees’ FY 2009 core requirement ratings were not compliant with PFP policies and procedures. Specifically, we found that managers lowered core requirement ratings in a manner inconsistent with PFP policies and procedures, which state that employees should be rated on these requirements based on agreed-upon objectives and targets and that end-of-year ratings should reflect employees’ individual achievements. In addition, managers used numeric targets to rate postmasters on their core requirements, which they are supposed to base on behavioral objectives.”

Evidence didn’t stand up to the curve

Despite specific metric objectives (that are touted by PFP advocates) the report found that the Postal Service still employed a quota system.  The OIG wrote:

“…46 percent of the evaluators, and 40 percent of the second-level reviewers responsible for rating 59 sampled employees lowered employee ratings because they either were instructed to do so or believed ratings should be in line with unit scores. Managers also used numeric targets to rate postmasters’ core requirements contrary to policy.”

I have run across the phenomenon of senior managers lowering performance ratings throughout my 36 years in and outside of government.  The complaint is too consistent to be coincidental.  I have heard of upper management increasing an employee’s rating, but that’s unusual.  In the case of the Postal Service, even objective measurements (that prove so elusive in practice) were trumped by the bell curve.

Assessing the costs

In the case of the USPS, how many employees win and how many lose seems to have been paramount.  The small group of winners should be well satisfied.  The majority of supervisors and managers contacted by their OIG, however, show symptoms that should be of concern to leaders throughout government.  If their PFP program was designed to motivate personnel, it doesn’t appear to be working very well.

All of this raises the question: Are appropriate rating dispersions and a Darwinist pay system actually more important to our government than the loyalty, commitment,  job satisfaction of those who did not rate among the elite?  Performance appraisal literature shows little objective evidence that validates the benefits of evaluating by the bell curve and/or tying the outcomes to monetary rewards.  In fact, there appears to be more evidence to the contrary.

Even when PFP isn’t the issue, the bell curve can adversely affect those who actually do the work of government.  I wouldn’t trust an organization where the vast majority of employees are rated above average.  By the same token, changes to ratings by managers who are least likely to know the individuals affected can be more destructive than the problem they were intended to correct.

What is the ultimate objective?

I suggest that we consider performance standards that focus more on improvement of individual performance, rather than just racking and stacking an agency’s workforce.  In an era that stresses goals, objectives, measures, and results, this can be a challenge.  Managers could focus more on how employees perform and less on outcomes that are hard to successfully attribute and quantify.

The Code of Federal Regulations reads,

“Performance standard means the management-approved expression of the performance threshold(s), requirement(s), or expectation(s) that must be met to be appraised at a particular level of performance. A performance standard may include, but is not limited to, quality, quantity, timeliness, and manner of performance.”

Little energy has been spent on the last option which might concentrate attention on the ways employees approach and execute their work.

Telling an employee to write 8-10 reports per year is a quantity standard.  Demanding that no more than 2 re-writes per year are required is a quality standard.  Insisting that at least 75-85% of reports be completed by or before deadlines is a timeliness standard.  These three examples focus management’s attention back into the past.  Furthermore, they require bean counting throughout the year.

The “manner of performance” alternative has not been seriously explored by the Office of Personnel Management (OPM) to date.  It’s an area that has gotten a lot of my attention since the regulations were issued by OPM decades ago.  If it can serve to point the employee toward more desirable work habits, it may begin to shift the objective of appraises toward real value – the actual saving of time and money.  For those familiar with TQM and LEAN, manner of performance standards can be used for process improvements at the individual level.

Looking ahead

The desire to create some sort of performance competition within the Federal workplace strong.  Add pay to the equation and the stakes get higher.  This apparently led the USPS to ignore its own measures in favor of quotas.

Even if their OIG’s conclusion is misplaced and only reflects the perceptions of disgruntled supervisors and managers, what has been gained and lost by their adoption of PFP and use of the bell curve?  For all of the consternation that led to a request for their OIG to investigate, is there evidence that employees are better at performing their jobs from one year to the next?  If so, was their pay-for-performance system the reason for such success?  Was it the cause of any failures?  I hope the IG returns in the future for a deeper exploration.

In the meantime, what has not worked in the past, need not be repeated year after year.  Where the evaluation process causes more consternation (costs) than motivation (benefits) , OPM, Human Capital Officers, and senior management should be looking for better options.  Focusing less on quantitative results and bell curves while doing more to anchoring appraisals to the improvement of individual work habits might be a good place to begin.

About the Author

Robbie Kunreuther is the Director of Government Personnel Services (GPS). GPS provides 1 to 3-day seminars to Federal agencies in four subject areas: Dealing with performance and conduct issues; Developing sensible performance appraisal criteria; Fostering cooperative labor-management relations; and Applying mediation skills in the workplace. Over the years, Robbie has trained thousands of Federal supervisors, managers, HR specialists, and union officials. For more information about him and GPS, go to