Stack Ranking Destroyed the Best Team I Ever Worked On
TL;DR / Key Takeaways
- Stack ranking forces a fixed percentage of employees into "below expectations" even when the entire team performs well, making the rating mathematically arbitrary.
- Research from Microsoft's own post-mortem and academic work on forced distributions shows stack ranking reliably destroys knowledge sharing and collaborative behavior.
- The damage is not just morale. It's structural. Teams optimize for internal competition instead of external output.
- Automated systems like the Predict & Profit trading bot have no politics. The edge calculation doesn't know your name and doesn't care who your manager likes.
I was on a team once where everyone was genuinely good.
Not "good for a government contractor" good. Not "good considering the budget" good. Actually competent, senior engineers who could scope a problem, write clean code, and ship without being babysat. I have worked on maybe four teams like that in thirty years. This was one of them.
Then the company rolled out forced distribution reviews. Fifteen percent must exceed expectations. Seventy percent meets expectations. Fifteen percent below expectations.
Every cycle, somebody good had to lose.
What Forced Distribution Actually Is
Stack ranking, forced distribution, vitality curve. Microsoft called it the curve. Amazon had its Organizational Health Index. GE under Jack Welch made it famous and called the bottom ten percent "managing out."
The mechanism is always the same. You take your team, rank everyone from best to worst, and assign predetermined outcome percentages regardless of absolute performance. If you have ten engineers and all ten are excellent, one of them is still below expectations. The math demands it.
This is not a performance management system. It is a population control system dressed up in HR language.
The Research Is Not Ambiguous
This is not a matter of opinion. The organizational behavior research on stack ranking is pretty consistent.
A 2012 Vanity Fair piece on Microsoft's lost decade cited stack ranking as a central cause of the company's decade-long drift from technical leadership. Engineers described spending as much energy managing their relative standing as doing actual engineering work. Collaboration collapsed because sharing knowledge with a peer made that peer more competitive against you.
A study published in the Academy of Management Journal found that forced distribution systems reduce employee cooperation and increase counterproductive work behaviors. The effect size was not small. Teams under forced distributions showed measurably worse knowledge transfer than teams under absolute rating systems.
Samuel Culbert, a UCLA professor who spent years studying performance reviews, wrote a book called "Get Rid of the Performance Review" and was fairly direct about it: the review serves the manager's need for control, not the company's need for performance.
Microsoft killed stack ranking in 2013. They said publicly it was hurting collaboration. Amazon has modified its system multiple times. The people who ran these experiments at scale eventually walked them back. That should tell you something.
What Happened to My Team
The rollout was gradual. First cycle, people were nervous but figured the ratings would shake out roughly fairly. Most people rated meets expectations. A few people got dinged and were surprised but not devastated.
Second cycle, the behavior started to shift.
The engineer who used to walk over and help you debug something started being slightly less available. Not dramatically. Just a five-minute conversation became a two-minute conversation. The informal knowledge sharing that made the team fast started to thin out.
By the third cycle, people were actively managing optics. Sending emails to create paper trails. Making sure their work was visible in the right meetings. Volunteering for high-visibility projects and quietly deprioritizing important but unglamorous work.
The unglamorous work, of course, is most of what keeps systems running.
The Mathematics of Arbitrary Failure
Here is the part that should make any engineer furious.
If your team has ten people and all ten perform at a level that would earn "meets expectations" under any reasonable absolute standard, forced distribution still puts one or two of them below expectations. The rating carries no information about absolute performance. It only carries information about relative rank within that specific team at that specific moment.
This means two engineers doing identical work at identical quality levels can receive different ratings simply because one of them happened to be on a stronger team. The person who rates "below expectations" on a strong team might rate "exceeds expectations" on a weaker one.
The number is not measuring what it claims to measure.
And then the company makes compensation decisions, promotion decisions, and termination decisions based on that number. The feedback loop becomes self-reinforcing. People who are good at internal competition get promoted into management and design the next cycle of the same system.
What Collaboration Is Actually Worth
The thing that made that team good was not that any individual was a genius. It was that we had built a culture where you could ask a dumb question out loud and get a real answer instead of a political one.
That culture has real economic value. It compresses debugging time. It prevents the kind of siloed architectural decisions that create ten-year maintenance debt. It catches the "wait, that assumption is wrong" conversation before it becomes a production incident at 2am.
Forced distribution destroys that culture because it makes every interaction have a hidden competitive subtext. You are not just helping your colleague. You are deciding whether to share something that might make them rank above you.
The value of the collaboration does not show up in any single engineer's rating. So the system cannot see it, cannot reward it, and systematically punishes the behaviors that produce it.
The Contrast That Made Me Build Something Else
I have thought about this a lot since I started building automated trading systems.
The bot I run does not know my name. It does not know my manager's name. It does not know who gave a good presentation in Q3 or who has lunch with the VP. It pulls ensemble forecasts from four independent models, calculates an edge against market pricing, and either the edge clears the threshold or it does not.
The winning trade does not care who wrote the algorithm.
That is not a small thing after thirty years of watching good work get invisible because the person doing it was not good at office politics. The Predict & Profit bot has generated real returns, logged every decision, and never once factored in whether I remembered to CC the right people on a status update.
There is something clarifying about a system that is purely outcome-based. Not inspirational. Just clarifying. The edge calculation is the edge calculation. Pass or fail. No curve.
Why Companies Keep Doing It
If the research is this clear, why do large companies still use some version of forced distribution?
A few reasons, none of them good.
It makes termination easier to defend legally. If you have documented that someone rated in the bottom fifteen percent for two cycles, the legal exposure of letting them go drops. The system is partly a paper trail generator.
It gives managers a forced mechanism for having hard conversations they would otherwise avoid. If the curve requires one person to be below expectations, the manager cannot just give everyone a "meets" and move on. The system forces action. That this action is often wrong does not change the fact that it removes managerial discomfort.
And it creates the appearance of rigor. Numbers and percentages and quartiles look like data. They look like the company is being analytical rather than political. They are not. They are politics with a spreadsheet attached.
The Team Eventually Fell Apart
The engineers who were best at internal competition got promoted. The engineers who were best at actual engineering left for places with better cultures. A few people who got repeatedly dinged in the bottom tier internalized it and started underperforming in ways that matched the rating they had been assigned.
That last one is the most corrosive. You tell a good engineer they are below expectations enough times, some of them start to believe you. The rating becomes a self-fulfilling prophecy.
Within two years the team I described at the start of this post was unrecognizable. Not because the people got worse. Because the incentive structure had selected for a completely different type of behavior.
I stayed too long. I kept thinking the culture would correct. It did not.
What I Would Tell a Younger Engineer
Document your own work obsessively, not for political reasons but because your memory is unreliable and your manager's memory is worse. Keep a running log of what you shipped, what you fixed, what you caught before it became a problem.
Build relationships outside your immediate team so your rating conversation is not entirely controlled by one manager who may or may not have a complete picture of what you do.
And if you are on a team that has just rolled out forced distribution, start looking. Not because you will necessarily be the one who gets hurt. Because the culture you are in is now slowly eating itself, and by the time it becomes obviously toxic it will already be too late to leave gracefully.
The Honest Closer
I built the Predict & Profit bot partly because I was tired of systems where the output of your work and the recognition of your work had almost no relationship to each other. The bot is not a perfect system. It loses trades. It has bugs. But it wins or loses based on the edge calculation, not on who liked who.
Thirty years in corporate taught me what a pure outcome signal feels like by contrast. It feels like relief.