Ranking

When everything gets 4 or 5 stars, nothing is truly important. Ranking questions force respondents to choose - and that’s where real priorities surface.

Rating scales measure intensity. Ranking measures priority. If you need trade-offs, ratings won’t give them to you. People cluster their ratings at the top and you end up with ten items all scored 4.2 out of 5. Ranking makes them pick, and picking is where the data gets honest.

When to Use

Use ranking questions when you need to know what comes first, not just what’s good:

Prioritize features - “Which improvements matter most to you?”
Test messaging - “Which value proposition resonates best?”
Inform roadmaps - “Rank these problems by impact on your work”
Compare ideas - “Which concepts do you prefer?”
Allocate resources - “Where should we invest next?”
Segment audiences - Different groups rank differently, and that difference is the insight

Ranking Methods

Four methods. Each works for a different list size and context. The system picks one based on your option count, or you choose manually.

Drag & Drop (Sort)

Respondents drag items into their preferred order. Simple, familiar, fast.

Use it for 5 or fewer items. Beyond that, people carefully place the top 2-3, then drag the rest into random spots just to finish. The middle of a long drag-and-drop list is noise, not signal.

Pairwise Comparison

Two items at a time. Pick the better one. Repeat.

This mirrors how people naturally decide things. Comparing two options is fast and feels effortless. Behind the scenes, each option gets a win rate based on how often it was chosen, producing a full ranking from simple binary choices.

You choose between “complete” pairwise (every possible pair shown) or “partial” (each respondent sees a sample). With 10 options there are 45 pairs; with 20 there are 190. Partial pairwise spreads the work across respondents so nobody votes on all combinations.

Use it for 6-15 options. Can feel repetitive if you show too many comparisons to one person - keep the number of pairs per respondent reasonable.

MaxDiff (Best-Worst Scaling)

Respondents see small subsets (3-5 options at a time) and pick the most and least important from each set. After several rounds with different combinations, votes are scored into a full ranking.

Each screen gives you two data points - a “best” and a “worst” signal - so MaxDiff extracts twice the information per screen compared to pairwise. It handles long lists because respondents never see the full set at once.

Raw scores are based on best and worst selections, then normalized for comparison

Use it for 8+ options. Sweet spot is 10-30 items with 3-5 shown per screen.

The catch: respondents must be familiar with all the options. Picking the best only requires recognizing one strong option. Picking the worst requires understanding all of them. If awareness varies, respondents guess - and guesses turn into noise.

Budget Allocation

Give respondents a fixed pool of points and let them distribute it. Also known as constant sum.

This is the only method that captures magnitude. Pairwise and MaxDiff tell you respondents prefer A over B. Budget allocation tells you they’d spend 40 points on A and 5 on B - they care about A eight times more.

The tradeoff: people tend to dump most points into one item and scatter the rest without much thought. Don’t over-interpret small differences between low-scoring items. Keep the list under 8 options - splitting 100 points across 15 items is more arithmetic than research.

Which Method Should I Pick?

Method	Best for	Options	Respondent effort
Drag & Drop	Quick ranking of short lists	2-5	Low
Pairwise	Medium lists, mobile-friendly	6-15	Low per vote
MaxDiff	Long lists, research projects	8-30+	Moderate
Budget	When magnitude matters	3-8	Higher

Simple rules: fewer than 6 items, use drag-and-drop. Between 6 and 15, use pairwise. Above that, use MaxDiff - unless you need to know how much more one option matters than another, in which case use budget with a short list.

Configuration Options

Algorithm - Leave on auto unless you have a reason to override. The system picks the method that fits your list size and minimizes respondent fatigue
Mode (Pairwise) - full shows every possible pair to each respondent. partial samples a subset and spreads coverage across respondents. Defaults to partial, which is what you want for most surveys
Items per screen (MaxDiff) - How many options appear in each subset, 3-10. Defaults to min(5, list size). Lower values are easier for respondents; higher values extract more data per screen
Target views per item (MaxDiff, Pairwise partial) - How many times each option appears across all screens, 1-10. Defaults to 3. The system reduces this to 2 automatically if the resulting survey would exceed 20 screens

Interpreting Results

Collecting rankings is half the job. Reading them correctly is the other half.

All methods produce different raw scores - win rates, best-worst differences, point totals, position order. To make results comparable regardless of method, we normalize everything to a 0-100 scale. An item scoring 85 means the same thing whether it came from pairwise, MaxDiff, or budget allocation. This lets you switch methods between surveys or compare results across different ranking questions without recalibrating how you read the numbers.

Small gaps between middle-ranked items are often statistical noise. Large gaps at the top or bottom are where decisions should focus.
Scores are relative, not absolute. A score of 85 means that option ranked highly compared to the others in your list. It does not mean 85% of people want it. Change the list and the scores shift
Budget results show magnitude, others don’t. Pairwise and MaxDiff tell you the order. Budget tells you how much more one item matters than another. A 10-point gap between two items in budget allocation is meaningful. The same gap in normalized pairwise scores may not be
Segment before you conclude. Overall rankings often hide the real story. Your power users and casual users may rank the same list in opposite order. Break results down by audience segment before making decisions

Best Practices

Write Comparable Options

All items should be the same type of thing. Mixing features with bug fixes and business goals gives you a ranking that nobody can act on. “Faster search” and “Better onboarding” are comparable. “Faster search” and “Fix login bug” are not.

Keep Lists Focused

Every option you add costs respondent attention and adds noise to results. Only include items you’d actually act on. If an option ranks last and you still wouldn’t cut it, leave it out.

Watch for Position Bias

In drag-and-drop, items at the top tend to stay there. Randomize the starting order so position doesn’t become a hidden variable in your data.

Match Method to Device

Pairwise and MaxDiff work well on phones. Drag-and-drop is harder on small screens - dragging on mobile is fiddly. Budget allocation requires concentration that a phone-on-the-bus respondent won’t give you.

Overview

Open Input

Structured Input