Every classic sorting algorithm, explained step by step. Start with Bubble Sort, end with Radix Sort. Plain English, C# code, array traces, Big O and references on every card.
Beginner FriendlyStep by StepWith References
01
What is Sorting?
Sorting means putting a list of things in order — smallest to largest, A to Z, oldest to newest. It sounds simple, but how you do it changes how fast it runs. The same job can take a fraction of a second or a full minute depending on the algorithm you choose. This guide walks you through the classics, one at a time.
1.1
Why are there so many sorting algorithms?
Because no single algorithm wins everywhere. Some are simple to write, some are fast on big data, some keep equal items in order (stable), some need extra memory, some don't. You pick the one that fits the job.
// ── The four properties you compare sorts on ────────
//
// 1. TIME — how many steps as the array grows.
// Best / Average / Worst (Big O notation)
//
// 2. SPACE — extra memory needed besides the input.
// "in-place" → O(1) extra, sorts inside the original array.
// "out-of-place" → needs an auxiliary array.
//
// 3. STABILITY — do equal elements keep their original order?
// Sorting people by age — two 30-year-olds stay in the same
// order they came in? That's "stable".
//
// 4. ADAPTIVE — does the algorithm go faster when the array
// is already partly sorted?
//
// You'll see all four mentioned on every card below. ↓
1.2
Big O in 60 seconds
Big O describes how the number of steps grows as the input grows. It doesn't measure seconds — it measures shape. For sorting n items, the common shapes are:
// n = number of items in the array
//
// O(n²) — "quadratic" . double the array, work goes 4×
// Bubble, Selection, Insertion
// Fine for n < 1,000. Slow for 1,000,000.
//
// O(n log n) — "linearithmic". double the array, work just over 2×
// Merge, Quick, Heap
// The gold standard for general-purpose sorting.
//
// O(n + k) — "linear" . work grows with array AND value range k.
// Counting, Radix (only for integers / bounded keys)
// Fastest when k is small.
//
// O(log n) — "logarithmic" . each step halves the work
// (you won't see this for SORTING, but for SEARCH a sorted array)
//
// Rule of thumb:
// small array (n < ~50) → Insertion Sort is genuinely fastest
// general purpose, any data → built-in Array.Sort (n log n)
// only integers in a small range → Counting / Radix Sort (O(n))
1.3
The one helper we'll use everywhere — Swap
Almost every sorting algorithm in this guide swaps two elements in an array. So let's define Swap once. In modern C# it's a single line thanks to tuple assignment.
// ── Swap two array slots in place ────────────────────
static void Swap(int[] a, int i, int j)
{
(a[i], a[j]) = (a[j], a[i]);
// Right-hand side is evaluated first, then assigned.
// No temp variable needed in C# — the tuple does it for you.
}
// ── Example ──────────────────────────────────────────
int[] arr = [5, 2, 8, 1];
Swap(arr, 0, 2);
// arr is now [8, 2, 5, 1]
// In old-school code you'd write:
// int tmp = a[i];
// a[i] = a[j];
// a[j] = tmp;
// Same result. The tuple version is just neater.
02
Bubble Sort — the friendliest one
So you've got an unsorted array, and you want a sorted one. Where do you even start?
Here's a thought experiment. Imagine I hand you five number cards face-up on a table — but with two rules: you can only look at two cards at a time, and you can only swap cards that are next to each other. How would you sort them?
Take a second. Actually try to picture what you'd do. We'll wait.
Brain Power
Before reading on, predict the answer:
If the leftmost card is bigger than its right neighbour, what should you do? And once you've done it, where do you look next?
That tiny intuition is the entire algorithm. Seriously.
You probably came up with something like: "Compare the first two. Swap them if the left one's bigger. Slide right, repeat." Congratulations — you just invented Bubble Sort. We're going to write that idea down formally now and then watch it work.
2.1
The recipe — what Bubble Sort actually does
Picture bubbles rising in a glass of soda. After each full sweep across the array, the biggest unsorted value has floated all the way to the right. Do another sweep, the next biggest floats up. And so on. That's where the name comes from.
// ── The recipe (memorise this — everything else is detail) ──
//
// 1. Start at the left.
// 2. Look at arr[j] and arr[j+1]. ← a single pair
// 3. If they're in the wrong order, SWAP.
// 4. Move one slot right. Repeat 2–3.
// 5. When you reach the end, that's ONE PASS.
// ☞ After pass 1, the LARGEST value is in its final spot.
// 6. Do another pass — one slot shorter than the last
// (you don't need to look at the already-final tail).
// 7. If a whole pass made ZERO swaps, you're done. Stop.
There are no dumb questions
Wait — why are only neighbours compared? Couldn't we just swap any two out-of-order values?
Sure, but then it wouldn't be Bubble Sort. Restricting yourself to adjacent swaps is what makes the algorithm so trivial to write — and that's the whole point of learning it first.
Why does the largest value reach the end after one pass?
Because whenever the current "winner" meets a bigger neighbour, it gets pushed forward by the swap. The biggest value in the array always wins every comparison, so it surfs all the way to the right.
Do I really need the "zero swaps" check?
No — but without it, Bubble Sort makes n − 1 passes even on an already-sorted array. The check turns the best case from O(n²) into O(n). Always include it.
2.2
Watch it sort — [5, 1, 4, 2, 8]
Talk is cheap. Let's actually sort an array. Reading the trace top-to-bottom is the same as watching the algorithm run in slow motion.
being compared just swapped in its final spot
Start51428unsorted
Pass 1 · a154285 > 1 → swap
Pass 1 · b145285 > 4 → swap
Pass 1 · c142585 > 2 → swap
Pass 1 · d142585 < 8 → no swap, 8 is final
Pass 2 · b124584 > 2 → swap, 5 is final
Pass 312458zero swaps → done!
Sharpen your pencil
Your turn. Apply Bubble Sort to [3, 8, 1, 4]. How many passes does it take, and how many total swaps?
Reveal the trace + answer
Start: 3, 8, 1, 4
Pass 1: 3,8 ok 3, 8, 1, 4
8,1 swap 3, 1, 8, 4
8,4 swap 3, 1, 4, 8 ← 8 final
Pass 2: 3,1 swap 1, 3, 4, 8
3,4 ok 1, 3, 4, 8 ← 4 final
Pass 3: 1,3 ok ← zero swaps, done!
Answer: 3 passes (the last is the empty-pass check),
3 swaps total.
Brain Power
Pass 2 in our trace only needed 3 comparisons, not 4. Why?
Hint: look at the last cell after Pass 1.
2.3
The code (C#) — annotated
Here it is in C#. The numbered pins (1, 2, 3, 4) match the explanation right below the code — read them in order.
static void BubbleSort(int[] arr)
{
int n = arr.Length;
for (int i = 0; i < n - 1; i++) 1
{
bool swapped = false; 2
for (int j = 0; j < n - 1 - i; j++) 3
{
if (arr[j] > arr[j + 1])
{
(arr[j], arr[j + 1]) = (arr[j + 1], arr[j]);
swapped = true;
}
}
if (!swapped) break; 4
}
}
The pass counter — the metronome of the algorithm. Each tick of i is one full sweep. Why not n? Because after n − 1 biggest-values have surfed to the right, the smallest value is left at index 0 by pure elimination — no pass needed to "place" it. Trap: writing i <= n doesn't break the answer but does a guaranteed-useless extra pass.
The detector flag — a one-bit memory of "did anything move?" Reset every pass. Why not just measure passes? Because nearly-sorted input might finish in pass 2, not pass n − 1 — the flag catches this the instant it happens. Trap: declaring swappedoutside the outer loop is the classic bug — once you set it true the first time, you've defeated the optimisation for the rest of the run.
The comparing loop — note the shrinking bound n − 1 − i. Every pass is one slot shorter than the last. Why not always go to n − 1? Because after pass i, the i largest values are already docked at the end — re-comparing them is pure waste. (Remember the Brain Power prompt above? This is the answer.) Trap: using j < n instead of j < n - 1 reads past the end of the array on the comparison arr[j] > arr[j+1] — guaranteed IndexOutOfRangeException.
The early exit — the line that earns Bubble Sort the word "adaptive". If a whole pass moved nothing, the array is sorted; stop. Why does this matter? Without it, an already-sorted array still costs O(n²); with it, O(n). One line, an entire order-of-magnitude. Trap: putting swapped = true in the wrong scope (see annotation 2) silently makes this if always-false-then-always-true — and you'd never notice in a code review unless you ran a benchmark on sorted input.
Mastery Move
Cocktail Shaker Sort. If you alternate the direction of each pass (left-to-right, then right-to-left), small values at the right end stop being stranded for n−1 passes — they migrate left in a single reverse sweep. Same Big O, but constant-factor faster and a favourite interview "can you improve this?" follow-up.
Spotting it in production. If you ever inherit a code review with two nested loops, an adjacent comparison, and a swap, it's Bubble Sort wearing a costume. Suggest Array.Sort unless the input is small and almost-sorted (then it's actually fine).
Watch Out
Using arr[j] >= arr[j + 1] (greater-or-equal) instead of > breaks stability — equal values now leap-frog each other. If you ever sort objects by a key (date, name, price) and care about preserving ties, that single character is the difference between "stable sort" and "subtle production bug".
Run it in your head on [5, 1, 4, 2, 8] and you should land at the same trace you saw on card 2.2. If it doesn't match — stop here, draw the array on paper, and step through it. Don't move on until it clicks.
It's a tuple swap — modern C#'s way of saying "swap these two values without a temp variable". The right side is built first, then assigned to the left.
So… should I actually use Bubble Sort in real code?
Honestly? Almost never. Just call Array.Sort(data). Bubble Sort is for learning what sorting is. On a million items it would be roughly a million times slower than QuickSort.
Is Bubble Sort stable?
Yes — because we only swap when arr[j] > arr[j+1] (strict greater-than). Equal values never swap past each other, so their original order is preserved.
Bullet Points
Bubble Sort = "walk the array, swap wrong-order neighbours, repeat".
Each pass sends the next-largest value to its final position at the end.
Time: O(n²) average and worst, O(n) best (with the early-exit flag).
Space: O(1) — sorts in place.
Stable: yes. In-place: yes. Adaptive: yes (because of the flag).
Used in real code: basically never. Used to learn how sorting works: essential.
03
Selection Sort — pick the smallest, place it, repeat
Bubble Sort was clumsy — every wrong pair generated a swap, and you got a lot of wrong pairs. Watch yourself sort a hand of playing cards in real life and you'll notice you don't do that. You scan the whole hand, find the lowest card, drop it on the left. Repeat with the next lowest. One move per pass, instead of "many small bumps."
That's Selection Sort. Same O(n²) Big O as Bubble — but a very different feel, and a very different real-world niche.
Brain Power
For an array of n elements, how many swaps does Selection Sort do — best case, worst case?
Bubble Sort can do up to n² / 2. Selection Sort does… something much smaller. Take a guess before scrolling.
The answer is at most n − 1 swaps, always. One per pass, and the last slot doesn't need one. That single fact is what makes Selection Sort relevant in 2026 — it's the algorithm of choice when writing data is expensive (flash memory, network sync, expensive serialisation). Everything else about it is "Bubble Sort but cleaner".
3.1
The recipe — scan, find min, dock it
The array gets split into two regions: a sorted prefix on the left (starts empty) and an unsorted suffix on the right. Each pass scans the suffix for its minimum and docks that value at the boundary.
// ── The recipe ───────────────────────────────────────
//
// 1. Boundary starts at index 0. Everything to the right is unsorted.
// 2. Scan the unsorted region. Remember the INDEX of the smallest value.
// 3. Swap that value with the element AT the boundary.
// ☞ The boundary slot is now permanently correct.
// 4. Move the boundary one step right.
// 5. Repeat until the boundary reaches the last slot.
// The final slot is automatically correct (only value left).
//
// Comparisons per pass: n − 1, n − 2, n − 3, … 1 → ~n²/2 total
// Swaps total: at most n − 1 (one per pass, often skipped)
//
// Insight: comparisons don't shrink with how sorted the data is.
// Selection Sort does the EXACT SAME WORK every time —
// it's NOT adaptive. Bubble Sort with the flag is.
There are no dumb questions
Why find the index of the minimum and not the value itself?
Because we need to swap at the end — and to swap you need to know where the smallest value lives, not just what it is.
Why does Bubble Sort do up to n²/2 swaps but Selection only n − 1?
Because Bubble Sort fixes pairs one bump at a time — each fixed pair is one swap. Selection Sort does all the "looking" first, then performs exactly one big jump per pass. Same total work, very different distribution.
If both are O(n²), why ever pick Selection over Bubble?
Two reasons. (1) Predictable cost: every pass takes the same time, regardless of input. (2) The number of writes is minimised — irreplaceable when writes are expensive (flash memory wear, audited mutations, replicated state).
3.2
Watch it sort — [64, 25, 12, 22, 11]
Each row = one full scan. Green cells are docked (final). Orange = current minimum being shuttled to the front.
the min we found just swapped docked / final
Start6425122211scan: min = 11
After pass 11125122264swap 11 ↔ 64 · next min = 12
After pass 21112252264swap 12 ↔ 25 · next min = 22
After pass 31112222564swap 22 ↔ 25 · next min = 25
After pass 4111222256425 already in place · done
Sharpen your pencil
Sort [5, 2, 4, 1, 3] with Selection Sort. How many actual swaps happen — i.e. cases where the min isn't already in place?
Reveal answer
Start: 5, 2, 4, 1, 3
Pass 1: min = 1 (idx 3). Swap with idx 0.
→ 1, 2, 4, 5, 3 (swap #1)
Pass 2: min in {2,4,5,3} = 2 (idx 1). Already there.
→ 1, 2, 4, 5, 3 (no swap)
Pass 3: min in {4,5,3} = 3 (idx 4). Swap with idx 2.
→ 1, 2, 3, 5, 4 (swap #2)
Pass 4: min in {5,4} = 4 (idx 4). Swap with idx 3.
→ 1, 2, 3, 4, 5 (swap #3)
Answer: 3 swaps out of a possible 4.
(Pass 2's "min is already in place" check saved one.)
Brain Power
If the array was already sorted [1, 2, 3, 4, 5], how many comparisons would Selection Sort still make?
(Hint: it's not zero. That's a key difference from Bubble Sort with the flag.)
3.3
The code (C#) — annotated
Same skeleton as Bubble Sort — outer loop + inner loop — but the inner loop is a scan, not a compare-and-swap. Read the pins.
static void SelectionSort(int[] arr)
{
int n = arr.Length;
for (int i = 0; i < n - 1; i++) 1
{
int minIndex = i; 2
for (int j = i + 1; j < n; j++) 3
{
if (arr[j] < arr[minIndex])
minIndex = j;
}
if (minIndex != i) 4
(arr[i], arr[minIndex]) = (arr[minIndex], arr[i]);
}
}
The boundary cursor.i is the index of the next slot to dock. We stop at n − 1 because the final slot self-resolves. Why not n? Because by the time the unsorted region has one element, that element is automatically the minimum — no pass needed. Trap: running to n isn't wrong, but you'll waste a comparison-less iteration. Aesthetic, not correctness.
The "min so far" pointer. Start by assuming the boundary itself holds the smallest — then prove yourself wrong. Why an index and not a value? Because you need the location to swap. Storing the value would force a second scan to find it. Trap: resetting minIndex outside the outer loop turns a sort into "find the global minimum and put it at slot 0 forever" — a single-pass nothing.
The scan — note j = i + 1. Start one past the boundary, because slot i is already the candidate. Why not j = i? You'd compare minIndex against itself on the first iteration — harmless but wasteful. Why < and not <=? Strict less-than makes the sort behave as if it were stable on the comparison side — though the swap on line 4 still breaks stability. (See the trap below.)
The conditional swap. Only swap if you found a smaller value somewhere down the array. Why bother with the if? An unconditional swap costs ~3 writes; skipping it when the min is already in place saves work — especially on flash storage where writes wear the device. Trap: removing the if doesn't change correctness, but on already-sorted input you go from 0 writes to 3 × (n − 1) writes — a silent perf regression.
Watch Out — Selection Sort is NOT stable
Consider [2a, 2b, 1] where 2a and 2b are tied. Pass 1 finds the min at index 2 (value 1) and swaps it with index 0 — which means 2a jumps over 2b all the way to the end.
Result: [1, 2b, 2a]. Their order flipped.
If you're sorting people by age (and two are the same age), or events by timestamp, this is a data corruption bug. Use Insertion Sort or LINQ OrderBy instead — both stable.
Mastery Move
Double Selection. While scanning for the min, track the max too. One pass now docks two values (min on the left, max on the right). The array shrinks from both ends — half the passes, same total comparisons, half the boundary management. Same Big O, ~2× faster in practice.
Stable Selection Sort. Replace the swap with a shift: rotate the min into position by sliding everything between it and the boundary one slot right. Now stable — but you've traded O(n) swaps for O(n²) writes. Use only if you need stability and can't afford the auxiliary memory Merge Sort wants.
Heap Sort = Selection Sort with a better "find the min". Selection finds the next extreme in O(n); a heap does it in O(log n). That single substitution turns Selection's O(n²) into Heap's O(n log n). You'll meet it in section 7.
There are no dumb questions
Does the early-exit trick from Bubble Sort apply here?
No. There's no flag that can detect "already sorted" because Selection always scans the whole remaining region. The cost is the same on a sorted, random, or reverse-sorted array. This is what "non-adaptive" means.
Selection Sort vs Bubble Sort — which should I prefer for tiny arrays?
Neither. Use Insertion Sort (next section) — it's the de facto standard for n < ~16. Selection Sort only wins when you specifically need to minimise writes.
"Minimises writes" — when does that actually matter outside textbooks?
EEPROM, NOR/NAND flash, write-amplification-aware databases, mutation-logged systems (event sourcing), and anywhere the "change" of a value triggers cascading work (cache invalidation, replication, audit). In those worlds, Selection Sort genuinely earns its keep.
// ── Try it ───────────────────────────────────────────
int[] data = [64, 25, 12, 22, 11];
SelectionSort(data);
Console.WriteLine(string.Join(", ", data));
// → 11, 12, 22, 25, 64 ✓ (3 actual swaps, 10 comparisons)
Bullet Points
Selection Sort = "scan, find the min, swap it to the front, repeat".
Time: O(n²) always — best, average, worst. Not adaptive.
Space: O(1) — in-place.
Writes: at most n − 1 swaps. The lowest of any general-purpose sort.
Stable: NO (the long-range swap breaks ties).
Sweet spot: small arrays + write-expensive storage (flash, audit logs).
Conceptual launchpad for Heap Sort — same idea, faster "find the min".
04
Insertion Sort — the algorithm your brain already runs
Here's a fun experiment. Deal yourself a 7-card hand of playing cards, face-up. Now sort them into ascending order. Pay attention to how you do it.
You probably picked up the cards one at a time and slid each new card into the right spot among the ones already in your hand. Right card already in your left hand? Slip it in. Bigger card? Push the smaller ones over.
You just ran Insertion Sort. Out of all the O(n²) sorts in this guide, this is the one your actual hand uses — and it's the one .NET, Java, and V8 all fall back to for small sub-arrays. There's a real reason for that.
Brain Power
Here's a real-world scenario: a live feed of incoming temperature readings, and you want to keep the last 100 sorted by value at all times. A new reading arrives every second.
Which sort would you reach for? Bubble? Selection? Something else?
(Quick Sort and Merge Sort, by the way, are the wrong answers — and the reason is exactly the point of this section.)
The right answer is Insertion Sort. Each new reading is a single value being inserted into an already-sorted list — that's literally the inner loop of Insertion Sort. Quick Sort would re-shuffle the whole array, Merge Sort would allocate. Insertion Sort just slides the newcomer left until it lands. O(n) per insert, on the dot.
4.1
The recipe — grow a sorted prefix, one value at a time
Same two-region split as Selection Sort: sorted prefix on the left, unsorted suffix on the right. But you don't find anything — you insert. Each pass takes the next unsorted value and shuffles it left until it finds its home.
// ── The recipe ───────────────────────────────────────
//
// 1. The first element is "a sorted list of 1" by definition. Skip it.
// 2. KEY = the next unsorted value (at index i).
// 3. Walk LEFT from index i − 1:
// • If the value to the left is bigger than KEY, shift it RIGHT.
// • Stop when the value to the left is ≤ KEY (or you fall off slot 0).
// 4. Drop KEY into the empty slot you just created.
// 5. Repeat with the next element. The sorted region grows by 1.
//
// Time per pass:
// • Best (already sorted): 1 comparison. Total: O(n).
// • Worst (reverse sorted): i comparisons. Total: O(n²).
//
// ☞ This is the FIRST sort in the guide that's actually adaptive
// in a meaningful way — and the reason it's the workhorse for
// small / nearly-sorted inputs everywhere.
There are no dumb questions
Why "shift right one at a time" instead of just swapping like Bubble Sort?
Because a swap is 3 writes (temp, dest, source); a shift is 1 write. Over an entire run, shifts cut the write count roughly in half. Same comparisons, fewer memory writes.
What does "adaptive" mean exactly?
The algorithm runs faster the more sorted the input already is. Bubble Sort with the flag was lightly adaptive. Insertion Sort is strongly adaptive: an array that's already sorted finishes in linear time, and "almost sorted" (a few values out of place) finishes in almost-linear time.
I thought all O(n²) sorts were equally bad. Are they?
On the asymptotic ladder, yes. In practice, no. Insertion Sort with small n (say n < 16) outperforms Merge and Quick on real hardware because of cache locality and zero recursion overhead. That's why hybrid sorts (IntroSort, TimSort) use it as their base case.
4.2
Watch it sort — [5, 2, 4, 6, 1, 3]
Each row = the array after inserting the next KEY. Orange = the KEY being placed. Green = the growing sorted prefix.
KEY (being inserted) sorted prefix
Start524613prefix = [5]
KEY = 2254613shift 5 → drop 2 at 0
KEY = 4245613shift 5 → drop 4 at 1
KEY = 62456135 ≤ 6 → stays put (best case)
KEY = 1124563shift 6,5,4,2 → drop 1 at 0 (worst case)
KEY = 3123456shift 6,5,4 → drop 3 at 2 · done
Sharpen your pencil — two arrays
Count the total shifts Insertion Sort performs on each of these:
(a) [1, 2, 3, 4, 5] (b) [5, 4, 3, 2, 1]
Reveal answers
(a) [1, 2, 3, 4, 5] — already sorted
KEY=2: 1 ≤ 2, no shifts
KEY=3: 2 ≤ 3, no shifts
KEY=4: 3 ≤ 4, no shifts
KEY=5: 4 ≤ 5, no shifts
────────────────────────
Total: 0 shifts. O(n) — best case.
(b) [5, 4, 3, 2, 1] — reverse sorted
KEY=4: shift 5 → 1 shift
KEY=3: shift 5, 4 → 2 shifts
KEY=2: shift 5, 4, 3 → 3 shifts
KEY=1: shift 5, 4, 3, 2 → 4 shifts
────────────────────────
Total: 10 shifts = n(n-1)/2. O(n²) — worst case.
The 0-vs-10 gap on the same n is exactly what
"strongly adaptive" looks like in numbers.
4.3
The code (C#) — annotated
Note: the inner loop is a while, not a for — because we don't know how many shifts each KEY will need. We stop the moment we find the right spot.
static void InsertionSort(int[] arr)
{
for (int i = 1; i < arr.Length; i++) 1
{
int key = arr[i]; 2
int j = i - 1;
while (j >= 0 && arr[j] > key) 3
{
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = key; 4
}
}
Outer loop starts at i = 1, not 0. The element at index 0 is, by definition, "a sorted list of size 1". Why not 0? KEY at index 0 would do nothing — j starts at -1, the while never runs, the value gets put back where it was. Harmless but wasteful. Trap: mixing this up with Selection Sort's i < n - 1 bound — Insertion needs to reach the last element to insert it.
Save the KEY before the shifts start. Critical. The shifts are about to overwrite the slot KEY currently lives in — if you don't snapshot the value first, it's gone. Why not just arr[j + 1] at the end? Because by then, that slot has been overwritten by a shifted value. KEY only exists in this local int. Trap: forgetting to save and then reading arr[i] later — silent data corruption that passes most small test cases.
The shift loop — note the short-circuit j >= 0 &&. Both conditions must hold. Why the bounds check first? Because C# evaluates left-to-right and short-circuits. If j falls below 0, the arr[j] read would throw — putting the cheap check first prevents that. Trap: swapping the operand order (arr[j] > key && j >= 0) compiles fine but throws IndexOutOfRangeException the moment KEY needs to be inserted at index 0.
Drop KEY into the gap at j + 1, not j. When the while exits, j is pointing at the value that's smaller than (or equal to) KEY — KEY belongs one slot to its right. Why + 1? The shift moved each arr[j] into arr[j + 1], leaving arr[j + 1] as the empty gap to fill. Trap: writing arr[j] = key instead — you'll overwrite the value KEY was supposed to land in front of, breaking the sort silently.
Mastery Move
Binary Insertion Sort. Replace the linear while with a binary search to find the insertion point in O(log n). Comparisons drop from O(n²) to O(n log n) — but the shifts are still O(n²), because once you've found the gap you still need to slide the elements over. Useful only when comparisons are expensive (e.g. sorting strings by lexicographic comparison).
TimSort = Insertion Sort + Merge Sort. Python's sorted() and Java's Arrays.sort() use TimSort: split the array into small "runs", sort each run with Insertion Sort (fast on small data), then merge the runs (O(n log n) overall). Insertion Sort isn't a relic — it's the base case of the world's most-used production sort.
.NET's IntroSort drops to Insertion Sort at n ≤ 16 inside Array.Sort. The 16 isn't arbitrary — it's roughly the size where Insertion Sort's cache-friendliness and zero-overhead beat Quick Sort's partition cost on real hardware.
Watch Out — the stability comparator
Insertion Sort is stable only because the inner condition is arr[j] > key (strict). If you "improve" it to arr[j] >= key, equal values now slide past each other — and you've broken stability without any tests failing on integer arrays (where stability is invisible).
This is the bug-of-bugs when you later refactor to sort objects by some key. Always keep the comparator strict.
There are no dumb questions
Can Insertion Sort be parallelised?
Not directly — each KEY's final position depends on every earlier insertion. The way to "parallelise" is what TimSort does: sort small chunks independently with Insertion, then merge.
Insertion Sort on linked lists — same algorithm?
The idea works, but the cost profile flips: linked lists have O(1) shifts (pointer reassign) but O(n) comparisons to find the insertion point. On linked lists, Merge Sort wins outright.
If Insertion Sort is so good for small n, why bother learning the others?
Because for n > ~32 Insertion's O(n²) catches up and crushes you. On a million items, Insertion is roughly 100,000× slower than Merge Sort. You need both: Insertion as the close-quarters specialist, an O(n log n) sort for the big jobs.
Insertion Sort = "pick the next value, slide it left until it fits".
Time: O(n²) worst, O(n) best — strongly adaptive.
Space: O(1) — in-place.
Stable: YES — provided the comparator stays strict (>, not >=).
The reigning champ for small n (< ~16) and nearly-sorted input.
Used by every major language's standard sort as a sub-routine (IntroSort, TimSort).
Mental model = "sorting a hand of playing cards". It's the most human-friendly sort.
05
Merge Sort — divide, conquer, never embarrass yourself
You and three friends are handed a giant stack of papers and asked to sort them by date. What's the fastest way?
Easy: split the stack into four piles, one per person. Each person sorts their pile. Then you take two sorted piles, walk them in parallel, and interleave them into one combined sorted pile. Two of you do this; one final merge and you're done.
That's Merge Sort. The two halves don't have to be sorted by people — recursion handles that. The only step you actually perform is the merge. Everything else is the same step at a smaller scale.
Brain Power
Why is Merge Sort always O(n log n) — not just on average, but in the absolute worst case too?
(Hint: think about two things. How many times can you cut an array in half before you get down to singletons? And how much work does each "level" of merging do?)
You can halve an array of size n exactly log₂(n) times before you hit 1-element pieces. Each level of merges costs O(n) total — because every element gets compared and copied at most once per level. log n levels × n work per level = n log n. Always. There is no input that makes this worse. That's the guarantee Quick Sort can't give you.
5.1
Two ideas, one algorithm
Merge Sort is two simple ideas stacked. Master both and you've mastered the algorithm — and a third of the algorithms book you'll read later.
// ── Idea #1 — DIVIDE & CONQUER ───────────────────────
//
// "I don't know how to sort 1,000,000 things.
// I DO know how to sort 1 thing — it's already sorted.
// So I'll split until everything's a 1-thing pile, and
// then I only need to know how to MERGE."
//
// This is a pattern, not just an algorithm:
// binary search, FFT, the closest-pair problem, parallel
// reduce — same shape.
// ── Idea #2 — MERGING TWO SORTED LISTS ───────────────
//
// Given two sorted lists, you can produce one sorted list
// in O(n + m) time using TWO POINTERS:
//
// Left: [1, 4, 8] Right: [2, 5, 7]
//
// 1 vs 2 → take 1 → [1]
// 4 vs 2 → take 2 → [1, 2]
// 4 vs 5 → take 4 → [1, 2, 4]
// 8 vs 5 → take 5 → [1, 2, 4, 5]
// 8 vs 7 → take 7 → [1, 2, 4, 5, 7]
// right done — drain left → [1, 2, 4, 5, 7, 8]
//
// That's the whole merge step. Two pointers, one comparison
// per slot of output. Try writing it from scratch — it's
// shorter than your typical for loop.
There are no dumb questions
If you're splitting until you have 1-element pieces, isn't that exponential work?
It's the opposite — it's logarithmic. Each split halves the size. n → n/2 → n/4 → … → 1 takes log₂(n) steps. For n = 1 million, that's only 20 levels of splitting.
Why O(n) per level of merging?
At each level, every element of the original array appears in exactly one merge. The total comparisons across all merges at that level = n. Multiply by log n levels and you get n log n.
Does the array actually get split into separate physical arrays?
Conceptually yes. In code, two cheap variants exist: (1) the textbook version that does allocate temporary halves (clearer), (2) an in-place version that uses index pairs and one auxiliary buffer (more efficient). Production code does (2). We'll show (1) for clarity.
5.2
Watch it sort — [38, 27, 43, 3, 9, 82, 10]
Read the diagram top → bottom (the split phase), then bottom → top (the merge phase). The split phase is "logically" recursive — no real work happens until the leaves.
You have L = [1, 4, 8, 9] and R = [2, 3, 5, 7]. Walk through the two-pointer merge and write down the order in which elements are taken.
Reveal the merge trace
L = [1, 4, 8, 9] R = [2, 3, 5, 7]
i=0 j=0
L[0]=1 vs R[0]=2 → take 1 (i→1) out: [1]
L[1]=4 vs R[0]=2 → take 2 (j→1) out: [1, 2]
L[1]=4 vs R[1]=3 → take 3 (j→2) out: [1, 2, 3]
L[1]=4 vs R[2]=5 → take 4 (i→2) out: [1, 2, 3, 4]
L[2]=8 vs R[2]=5 → take 5 (j→3) out: [1, 2, 3, 4, 5]
L[2]=8 vs R[3]=7 → take 7 (j→4) out: [1, 2, 3, 4, 5, 7]
R exhausted — drain L → 8, 9
out: [1, 2, 3, 4, 5, 7, 8, 9]
8 elements, 6 comparisons + 2 drained = O(n + m).
Brain Power
The merge step uses L[i] <= R[j] (less-than-OR-equal), not strict <. Why does that single character matter?
(Hint: think about what happens to two equal values when one is in L and the other is in R.)
5.3
The code (C#) — annotated
Two methods. MergeSort is the recursive shell (split). Merge is where the actual work happens (combine two sorted runs). Read the pins on each.
static void MergeSort(int[] arr, int lo, int hi)
{
if (lo >= hi) return; 1
int mid = lo + (hi - lo) / 2; 2
MergeSort(arr, lo, mid);
MergeSort(arr, mid + 1, hi);
Merge(arr, lo, mid, hi);
}
static void Merge(int[] arr, int lo, int mid, int hi)
{
int[] left = arr[lo..(mid + 1)]; 3
int[] right = arr[(mid + 1)..(hi + 1)];
int i = 0, j = 0, k = lo;
while (i < left.Length && j < right.Length)
{
arr[k++] = left[i] <= right[j] 4
? left[i++]
: right[j++];
}
while (i < left.Length) arr[k++] = left[i++]; 5
while (j < right.Length) arr[k++] = right[j++];
}
The recursion base case. When lo >= hi we have 0 or 1 element — already sorted. Why not lo == hi? Because of safety: in some corruption paths or off-by-one bugs lo > hi can happen, and a strict == would skip the bounce-out, recurse infinitely, and stack-overflow. Trap: writing lo > hi alone misses the lo == hi case and you'll recurse into a single-element sub-array forever.
The mid-point — written this way on purpose.lo + (hi - lo) / 2 is identical to (lo + hi) / 2 mathematically but cannot overflow. Why does that matter? When lo and hi are both large (close to int.MaxValue in production code that sorts huge arrays), lo + hi wraps around to a negative number and your index goes off into nowhere. Trap: the famous "binary search bug" Jon Bentley wrote about in 2006 — sat undetected in the JDK for decades. Same fix applies here.
Two temporary arrays — the cost of stability and simplicity. Merge needs the left and right halves untouched while it writes back into arr. Copying them out is the cleanest way. Why not merge in place? You can, but it's surprisingly hard (look up "in-place merge sort") and usually slower than the simple version. Trap: trying to "save memory" by reading from arr directly during the merge — you'll overwrite values you haven't merged yet. Use the temp arrays.
The stability switch — <=, not <. When left and right have equal values, taking from left first preserves the original order. Why does this matter? When you sort objects by a secondary key after already sorting by a primary, stability is what keeps the primary sort intact. Trap: "optimising" to strict < looks innocent and breaks every stable-sort guarantee in your codebase. The Brain Power above is asking exactly this.
The two drain loops. When one half is exhausted, the other half's remaining values are already in order — just copy them in. Why two separate loops instead of one? Because only one of them ever runs (the half with elements left). Combining them needs an extra branch in the hot loop, which is slower. Trap: forgetting one of these silently truncates the output — you'd lose the tail of either left or right.
Mastery Move
External Merge Sort — sorting more data than fits in RAM. Same algorithm, different storage. Read a chunk that fits in memory, sort it (with anything — QuickSort works fine), write it to a temp file. Repeat for every chunk. Then do a k-way merge of all the temp files using a min-heap to pick the next smallest value across all files. This is how databases sort 100 GB of data on a 16 GB box. It's the only general algorithm that scales past RAM.
LINQ's OrderBy uses a stable Merge Sort variant — that's why it preserves order on ties. If you ever wondered why list.OrderBy(x => x.Name).ThenBy(x => x.Date) works correctly (Date order is preserved within Name groups), that's stability + Merge Sort doing the heavy lifting.
Linked-list Merge Sort is the natural choice — splitting a linked list is O(n) regardless of structure, but the merge needs zero extra memory (just pointer reassignment). That's why functional languages (Haskell, OCaml) default to Merge Sort.
Watch Out — recursion depth + the overflow bug
Stack overflow on huge arrays. Recursion depth = log₂(n). For n = 1 million that's 20 frames — fine. But for n = 10¹⁸ (database sort)? Still only 60 frames. So this isn't actually a Merge Sort problem in practice. (Quick Sort, by contrast, can recurse n deep on bad pivots — see next section.)
The classic (lo + hi) / 2 overflow. Always write lo + (hi - lo) / 2. This bug lived in java.util.Arrays.binarySearch for nine years before being fixed in 2006. Don't reinvent it.
There are no dumb questions
Quick Sort vs Merge Sort — which one should I learn first for interviews?
Both, but if forced to pick: Merge Sort. Its analysis is cleaner (no probability), its guarantees are stronger, and the merge step is its own reusable algorithm. Quick Sort builds on the partition concept that you'll hit in many other algorithms — learn it second.
Why does .NET use Quick-based IntroSort instead of Merge Sort for Array.Sort?
Cache locality and zero allocations. Quick Sort partitions in place; Merge Sort needs O(n) auxiliary memory. For primitive arrays in RAM, Quick beats Merge in real wall-clock time despite the same Big O. For objects and LINQ, where stability matters, .NET uses Merge.
Can Merge Sort be parallelised?
Beautifully. The two recursive calls are independent — fork them onto two threads. Parallel.Invoke, Task.Run, OpenMP, anything works. This is why "external + parallel Merge Sort" is the algorithm of choice for big-data systems like Spark and Hadoop.
Always use lo + (hi - lo) / 2 — never (lo + hi) / 2.
06
Quick Sort — the practical champion (with a temper)
Merge Sort gives you a guarantee. Quick Sort gives you a gamble — and the gamble almost always pays off.
The idea is so clean it's almost insulting: pick any value (the pivot). Rearrange the array so everything smaller is on the left, everything bigger on the right. Now the pivot is in its final sorted position — and the left and right are independent problems you can solve recursively.
No merge step. No auxiliary memory. On average, devastatingly fast. On bad luck — O(n²). Most of this section is about how the gamble works and how production code rigs the deck.
Brain Power
You're going to write a naive Quick Sort that always uses the last element as the pivot. What's the worst possible input for that choice?
(Hint: think about an input where the pivot would be the smallest — or largest — value every single time. What does that input look like?)
The pathological inputs for last-element pivot are already-sorted and reverse-sorted arrays. Every partition leaves one side empty and the other side with n − 1 elements — recursion goes n deep, comparisons sum to n + (n-1) + ... + 1 = O(n²), and you've turned the fastest sort into the slowest. It's the most common trap in interview Quick Sort questions.
6.1
The recipe — pivot, partition, recurse
Same divide-and-conquer skeleton as Merge Sort, but smarter splitting. Instead of cutting blindly in the middle and merging at the end, Quick Sort uses a single rearrangement to put every element on the correct side of the pivot. Both halves can then be sorted independently — no merge step needed.
// ── The recipe ───────────────────────────────────────
//
// 1. PICK a PIVOT from the array.
// Naive: last element. Smart: median-of-three or random.
//
// 2. PARTITION the array around the pivot:
// everything <= pivot → goes LEFT
// the pivot → sits in the MIDDLE (final position!)
// everything > pivot → goes RIGHT
//
// 3. RECURSE on the left sub-array.
// RECURSE on the right sub-array.
// Don't recurse on the pivot — it's already done.
//
// 4. Base case: sub-arrays of size 0 or 1 are done.
// ── Why pivot choice is everything ───────────────────
//
// GOOD pivot (median):
// splits the array roughly 50/50
// recursion depth ≈ log n
// total work ≈ n log n ← best case scenario
//
// BAD pivot (max or min):
// one side empty, other side has n − 1
// recursion depth ≈ n
// total work ≈ n² ← pathological
//
// Naive Quick Sort + sorted input = the canonical example
// of "looked fine in tests, exploded in production."
There are no dumb questions
Quick Sort vs Merge Sort — they're both O(n log n) average, so which wins in real code?
Quick Sort, usually — by 2–3× on real hardware. Quick partitions in place (no allocations) and accesses memory linearly (cache-friendly). Merge needs an O(n) auxiliary buffer. For primitive arrays in RAM, that overhead is huge.
Why is Quick Sort NOT stable?
The partition step uses long-range swaps — moving a value to the other side of the pivot can leap-frog it over an equal value. Same problem as Selection Sort, same workaround: use a stable sort (Merge / LINQ OrderBy) when you care about ties.
Lomuto vs Hoare partition — which should I learn?
Lomuto (this card). It's simpler, easier to reason about, and the basis for most interview implementations. Hoare's partition is faster in practice (~3× fewer swaps) but trickier to get right — it's what production libraries use. Learn Lomuto first, graduate to Hoare when you need the speed.
6.2
Watch it sort — [7, 2, 1, 8, 6, 3, 5, 4]
Lomuto scheme: pivot = last element. The dark-bordered cell is the pivot. Green = sub-array that's already final. We'll only show the first partition step in detail, then the recursive results.
the pivot just swapped during partition in its final spot
Start72186354pivot = 4 (last)
After partition213468574 is final · 3 ≤4 left · 4 >4 right
After ⇐ [2,1,3]12346857left sub-array sorted recursively
After ⇒ [6,8,5,7]12345678right sub-array sorted · done
// ── Recursion tree on this run ───────────────────────
//
// [7,2,1,8,6,3,5,4] pivot 4 → final pos 3
// ╱ ╲
// [2,1,3] [6,8,5,7]
// pivot 3 pivot 7
// ╱ ╲ ╱ ╲
// [2,1] [] [6,5] [8]
// pivot 1 pivot 5
// ╱ ╲ ╱ ╲
// [] [2] [] [6]
//
// Depth ≈ log₂ 8 = 3 ✓
// On a SORTED input, the same algorithm makes
// depth = 8 → O(n²). That's the temper.
Sharpen your pencil — partition a sub-array
Run Lomuto partition on [3, 8, 1, 5, 2] with pivot = last element (2). What does the array look like after the partition, and at what index does the pivot end up?
Two methods. QuickSort = recursive shell. Partition = the real algorithm. The trick of Lomuto: i tracks the boundary of "things <= pivot seen so far".
static void QuickSort(int[] arr, int lo, int hi)
{
if (lo >= hi) return; 1
int p = Partition(arr, lo, hi);
QuickSort(arr, lo, p - 1);
QuickSort(arr, p + 1, hi); 2
}
static int Partition(int[] arr, int lo, int hi)
{
int pivot = arr[hi]; 3
int i = lo - 1; 4
for (int j = lo; j < hi; j++)
{
if (arr[j] <= pivot)
{
i++;
(arr[i], arr[j]) = (arr[j], arr[i]); 5
}
}
(arr[i + 1], arr[hi]) = (arr[hi], arr[i + 1]); 6
return i + 1;
}
The base case. Sub-arrays of 0 or 1 are sorted. Why >= not ==? Sub-arrays of size 0 happen naturally — when the pivot was the smallest or largest value, one side of the partition is empty (p − 1 < lo or p + 1 > hi). Strict equality misses these. Trap:lo == hi alone causes infinite recursion on empty sub-arrays.
Critically: don't recurse over the pivot. The pivot is at index p and already in its final sorted position. Recursing on (lo, p) instead of (lo, p - 1) wastes work and breaks the algorithm's guarantee. Trap: off-by-one here is the #1 Quick Sort bug — produces wrong output on inputs with duplicates.
Pivot = last element — naive but legible. Production code never does this. Why? Already-sorted input degrades to O(n²) (see the Brain Power). How to fix: use median-of-three (arr[lo], arr[mid], arr[hi] — pick the middle one) or random pivot (swap arr[hi] with a random index before partitioning). Both fix the worst case.
The boundary pointer starts at lo − 1. It represents "the last index that's known to be ≤ pivot." Starts before the sub-array because we haven't seen any small values yet. Why not lo? Because that would imply arr[lo] is already known small — but we haven't checked it. Starting at lo − 1 means i + 1 is always the next empty "small region" slot.
The partition swap. When we find a value <= pivot, increment i and swap it into the small region. Why <= not <? So that values equal to the pivot land on the left side. Mixed sides break the recursion balance on duplicate-heavy inputs. Trap:< instead of <= creates a bizarre worst case on arrays of all identical values — they all go right, partition produces an empty left, recursion depth = n, performance dies.
Final swap — pivot to its sorted position. The pivot has been sitting at arr[hi] the whole time. After the loop, i + 1 is the first "> pivot" slot. Swap the pivot there. Why i + 1 not i? Because i is the last small index — the pivot belongs just past the small region, not inside it. Trap: swapping with arr[i] moves a value <= pivot to the end and the pivot lands in the wrong half — algorithm silently produces unsorted output.
Mastery Move
Median-of-three pivot. Take the median of arr[lo], arr[mid], arr[hi] and use that as the pivot. Three comparisons, eliminates the sorted-input pathology, makes O(n²) cases astronomically rare on real data. This is what most libraries used to do.
Introsort = QuickSort + Heapsort + Insertion. Modern .NET Array.Sort uses IntroSort: start with Quick Sort, but if recursion depth exceeds 2 × log₂(n), switch to Heap Sort for the rest. Why? Heap Sort guarantees O(n log n) regardless of input — it's the safety net. And for sub-arrays of n ≤ 16, switch to Insertion Sort. Best of all three worlds, guaranteed O(n log n) worst case.
Three-way partitioning (Dutch National Flag). Standard Lomuto gives you two regions: <= pivot and > pivot. Dutch National Flag gives you three: <, =, and >. Recurse only on the < and > regions. On arrays with many duplicates (think: sorting by category), this turns O(n²) into O(n).
Quickselect — Quick Sort's clever cousin. If you only need the kth smallest value (not the full sort), you only need to recurse into one side after each partition. Average O(n) total. The standard library's nth_element in C++ and the basis of LINQ's OrderBy().First() optimisation in some runtimes.
Watch Out — naive Quick Sort is a DDoS vector
If you accept user-supplied data and run a fixed-pivot Quick Sort on it, an attacker can craft inputs that force O(n²) every time — and bring down your server with carefully chosen "sorted" payloads. This is called algorithmic complexity attack and it's a real CVE class. Always randomise the pivot (or use IntroSort) for production sorts on untrusted data.
Always use Array.Sort in .NET. It's IntroSort under the hood — none of these traps apply. The only reason to roll your own Quick Sort is to learn, or to teach.
There are no dumb questions
How deep can the recursion go in the worst case?
n levels — and on a million-element array that's a stack overflow waiting to happen. Production libraries (IntroSort, dual-pivot quick) cap recursion depth at ~2 log n and switch to Heap Sort if exceeded. The basic Lomuto QuickSort here is fine for n < ~10⁴.
Tail-call optimisation — does C# help?
Not automatically. C# doesn't guarantee TCO. But you can manually convert one recursive call to a loop (recurse on the smaller half, iterate on the larger). That keeps the stack at O(log n) even with bad pivots. Production-grade trick.
Is Quick Sort actually used anywhere I care about?
Everywhere. C's qsort, C++'s std::sort, Rust's unstable sort, .NET's Array.Sort for primitives — all variants of Quick + Insertion + sometimes Heap. It's the default in-place sort of every major systems language.
// ── Try it ───────────────────────────────────────────
int[] data = [7, 2, 1, 8, 6, 3, 5, 4];
QuickSort(data, 0, data.Length - 1);
Console.WriteLine(string.Join(", ", data));
// → 1, 2, 3, 4, 5, 6, 7, 8 ✓
// In production: just call Array.Sort(data). It's IntroSort,
// it's faster than your handwritten Quick Sort, and it's
// algorithmic-complexity-attack-resistant.
Bullet Points
Quick Sort = "pick pivot, partition, recurse on both halves".
Fastest in practice for primitive arrays — beats Merge by ~2× due to cache + no allocation.
Pivot choice is the whole algorithm: naive last-element = sorted-input DOS attack. Median-of-three or random fixes it.
Real-world variant: IntroSort (Quick + Heap + Insertion) is what .NET Array.Sort actually uses.
Three-way partition + Quickselect are the two interview "next-level" follow-ups worth knowing.
07
Heap, Counting & Radix — the specialists
The first six sections covered the general-purpose sorts. Section 7 is where the specialists live — the algorithms that beat the others only under specific conditions, but when those conditions hold, they beat them decisively.
Heap Sort = "Selection Sort, but the find-the-min step is O(log n) instead of O(n)". Same shape, dramatically faster.
Counting Sort and Radix Sort are even wilder — they don't compare elements at all. They count and bucket. Big O? O(n). Yes, linear. Yes, faster than any comparison sort. But only on integers in a known range.
Brain Power
The "comparison sorting lower bound" theorem says: any sort that decides positions by comparing pairs of values needs at least Ω(n log n) operations in the worst case.
But Counting Sort runs in O(n). How is that possible? What rule does it break?
(Hint: read the sentence again. What word matters most?)
The word is comparing. Counting Sort never compares two values to each other — it uses them as indices into a counting array. By stepping outside the comparison model, it sidesteps the lower bound entirely. The same trick is what makes hash tables O(1): they don't compare, they index.
7.1
Heap Sort — selection sort with a smarter min-finder
A heap is a binary tree stored in an array, where every parent is ≥ both its children (a max-heap). The root is always the maximum. Take the root, swap it to the end, restore the heap, repeat. Two for-loops, no recursion, in-place, guaranteed O(n log n). In .NET this same data structure powers PriorityQueue<T, K>.
// ── The recipe ───────────────────────────────────────
//
// Phase 1 — BUILD a max-heap from the input. O(n)
// (Yes, only n. Surprising. Proof in references.)
//
// Phase 2 — Repeatedly do:
// • swap root (max) with the last unsorted slot
// • the array now has [..., max] sorted at the end
// • shrink the "heap region" by one
// • re-heapify the root to restore the property
// That's O(log n) per iteration, n iterations → O(n log n)
// ── Heap-as-array — the secret weapon ────────────────
//
// Heap stored as an array, NOT a tree of nodes:
//
// index: 0 1 2 3 4 5 6
// value: [16 14 10 8 7 9 3] ← max-heap
//
// For any index i:
// parent(i) = (i - 1) / 2
// left(i) = 2*i + 1
// right(i) = 2*i + 2
//
// No pointers. No allocations. Cache-friendly.
// This is why heaps are the right way to ship priority queues.
There are no dumb questions
Why is building a heap O(n) instead of O(n log n)?
Looks paradoxical but the math is clean. Half the nodes are leaves (zero work), half the rest are at depth 1 (1 swap each), etc. The total is bounded by 2n, not n log n. Worth reading the proof once — it's elegant.
Why does the root have to go to the END of the array, not the front?
Because the heap region shrinks from the back, putting sorted values exactly where they belong without any extra memory. The same array stores both "still heaping" and "final sorted output". A small data-layout trick that makes the algorithm in-place.
Heap Sort vs Quick Sort — which is "better"?
Heap Sort has the better guarantee (no O(n²) worst case). Quick Sort has the better average (more cache-friendly, fewer swaps). Production code uses Quick by default and falls back to Heap when Quick's recursion gets too deep — that's IntroSort.
static void HeapSort(int[] arr)
{
int n = arr.Length;
// Phase 1: build max-heap from the bottom up.
for (int i = n / 2 - 1; i >= 0; i--) 1
Heapify(arr, n, i);
// Phase 2: extract max one at a time.
for (int i = n - 1; i > 0; i--)
{
(arr[0], arr[i]) = (arr[i], arr[0]); 2
Heapify(arr, i, 0); 3
}
}
// "Sift-down" — restore the max-heap property at `root`.
static void Heapify(int[] arr, int heapSize, int root)
{
int largest = root;
int left = 2 * root + 1;
int right = 2 * root + 2;
if (left < heapSize && arr[left] > arr[largest]) largest = left;
if (right < heapSize && arr[right] > arr[largest]) largest = right;
if (largest != root) 4
{
(arr[root], arr[largest]) = (arr[largest], arr[root]);
Heapify(arr, heapSize, largest);
}
}
Start at the last non-leaf and walk backwards. Index n/2 − 1 is the deepest internal node. Why backwards? So that when we heapify a node, its children's subtrees are already heaps — that's what makes a single sift-down sufficient. Trap: starting at index 0 (top-down) doesn't build a valid heap and costs O(n log n) instead of O(n).
Move the max (root) to the end of the unsorted region. The root is the current maximum; the last slot of the heap region is its final sorted home. Why i > 0 not i >= 0? When only one element is left it's already in place — no swap needed. Trap:i >= 0 still works but does one wasted heapify on an empty heap.
Re-heapify with the SHRUNK size i. The freshly-placed max at index i is now outside the heap region — passing i (not n) is what keeps the algorithm in-place. Trap: passing n would re-include the sorted suffix, destroying both the heap and the sort order.
Recurse only if something actually moved. If the root was already the largest, the subtree is already a heap. Why bother with the check? Avoids meaningless recursion that does no work. Trap: not having the check is correct but burns stack frames — and the recursion can be replaced by a while-loop in production code to make it iterative.
Mastery Move
The heap is more useful than the sort. Heap Sort itself isn't a default — IntroSort uses it only as a safety net. But the data structure behind it (the heap) is the foundation of every priority queue on the planet: Dijkstra's algorithm, A* search, OS process schedulers, event loops, Top-K-from-a-stream problems. PriorityQueue<TElement, TPriority> in .NET 6+ is a heap with a tidy API — use it directly.
Top-K with a heap. Stream of a billion values, you want the top 100. Build a min-heap of size 100. For each new value, if it's bigger than the heap's min, pop the min and push the new value. After one pass: the heap is the top 100. O(n log k) time, O(k) memory — the canonical efficient solution.
Heap Sort is the basis of "external" priority queues too — disk-backed heaps for queues larger than RAM. Lucene's segment-merging uses one. The k-way merge step of external Merge Sort uses one.
7.2
Counting Sort — sort without comparing
Counting Sort doesn't compare values. It uses them — as indices into a counting array. If you know the value range up front and the range is small, this beats every comparison sort: O(n) flat. The catch is in the words "if you know" and "small".
// ── The recipe ───────────────────────────────────────
//
// You want to sort ages of 1,000 employees. Ages are 0–120.
//
// 1. Allocate count[121] = all zeros.
// 2. For every input value v: count[v]++
// Now count[v] = how many times v appeared.
// 3. PREFIX-SUM the counts left-to-right:
// count[i] = number of values <= i
// = "where does the LAST occurrence of i go in output?"
// 4. Walk the input IN REVERSE. For each value v:
// output[--count[v]] = v
// (Reverse walk is what keeps the sort stable.)
//
// Total: 3 linear passes. O(n + k) time, O(n + k) space.
// ── Example: [4, 2, 2, 8, 3, 3, 1] k = 8 ────────────
//
// Phase 1: count = [0, 1, 2, 2, 1, 0, 0, 0, 1]
// ↑ ↑ ↑ ↑
// one 1 two 3's one 8
//
// Phase 2: prefix sum → [0, 1, 3, 5, 6, 6, 6, 6, 7]
// ↑ ↑ ↑
// 3 ≤ 3 ≤ 4 ≤ 8
//
// Phase 3: walk input in reverse, drop into output
// → output = [1, 2, 2, 3, 3, 4, 8] ✓ sorted, stable.
There are no dumb questions
Why walk the input in REVERSE in phase 3? Doesn't forward work too?
Forward works for correctness — but breaks stability. Reverse + decrementing the count means the last occurrence of v in the input lands at the last slot for v in the output, preserving relative order for ties. Critical when sorting objects by a key.
What if the values are negative?
Add an offset. Shift all values up by −min so the smallest becomes 0, sort, then subtract the offset back. Same algorithm, two lines extra.
If the value range is huge (say up to 1 million), is Counting Sort still worth it?
For 1 million values in [0, 1 million]: roughly. The 4 MB count array fits in L2 cache and the algorithm is still ~10× faster than Quick Sort. For [0, 2 billion]? Forget it — you'd need 8 GB. That's where Radix Sort takes over.
static int[] CountingSort(int[] arr, int maxVal)
{
int[] count = new int[maxVal + 1]; 1
int[] output = new int[arr.Length];
foreach (int x in arr) count[x]++; 2
for (int i = 1; i <= maxVal; i++) 3
count[i] += count[i - 1];
for (int i = arr.Length - 1; i >= 0; i--) 4
output[--count[arr[i]]] = arr[i];
return output;
}
Count array sized for the value range, not the input. If max value is 120, allocate int[121]. Why size by k and not n? Because the index into count is the value itself — slots not corresponding to actual values stay zero. Trap: sizing it by arr.Length instead of maxVal + 1 means values larger than n explode with IndexOutOfRangeException.
Count occurrences — a single pass. Use each value as the index. Why foreach instead of for? Negligible difference here; both are O(n). I prefer foreach for clarity since we never need the index. Trap: negative values throw — handle with an offset.
Prefix sum — converts "count" into "ending position". After this, count[v] − 1 is the index in output where the last occurrence of v belongs. Why a separate loop? Because phase 4 needs all counts finalised. Combining phase 2 and 3 in one loop produces wrong totals. Trap: off-by-one with i <= maxVal vs i < maxVal — the count array has maxVal + 1 slots, so <= is correct.
Reverse walk + decrement = stable output.--count[arr[i]] reserves a slot and immediately writes into it. Decrementing makes the next occurrence of the same value land in the previous slot. Why reverse? Walking forward would put the first input value at the last output slot for ties — order flipped. Trap:count[arr[i]]-- (postfix on the index expression) computes the slot then decrements — same effect here but easier to get wrong elsewhere.
Watch Out — value range trap
Someone in code review says "let's use Counting Sort, it's O(n) — way faster than Quick Sort." Then someone else points out that the values are 64-bit integers up to 10¹⁸. Counting Sort would need an exabyte-sized count array.
The right question isn't "is Counting Sort faster?" — it's "is k small enough that n + k beats n log n?". Rule of thumb: k < n · log n is fine. Above that, you're paying more in memory than you save in time. For huge ranges, use Radix Sort instead.
7.3
Radix Sort — Counting Sort, applied per digit
If the value range is too big for Counting Sort to allocate (think 32-bit integers, ID numbers, sortable strings), Radix Sort works around it. Sort by the least-significant digit first using a stable Counting Sort, then by the next digit, then the next. After d passes — where d = number of digits — the array is sorted. Linear time per pass, d passes, O(d · n) total.
// ── Example trace ────────────────────────────────────
//
// Start: 170, 45, 75, 90, 802, 24, 2, 66
//
// By 1s digit → 170, 90, 802, 2, 24, 45, 75, 66
// ones: 0 0 2 2 4 5 5 6
//
// By 10s digit → 802, 2, 24, 45, 66, 170, 75, 90
// tens: 0 0 2 4 6 7 7 9
//
// By 100s digit → 2, 24, 45, 66, 75, 90, 170, 802
// hundreds: 0 0 0 0 0 0 1 8
//
// Sorted! Each pass = one stable Counting Sort.
//
// ── Why does this work? ──────────────────────────────
//
// Because each pass is STABLE, the order from previous passes
// is preserved within each digit bucket. So after sorting by 10s,
// numbers with the same 10s digit are still in 1s-digit order.
// After sorting by 100s, same logic on the 100s bucket.
//
// Read each row carefully — that's the whole proof.
static int[] RadixSort(int[] arr)
{
int max = arr.Max(); 1
for (int exp = 1; max / exp > 0; exp *= 10) 2
arr = CountingSortByDigit(arr, exp);
return arr;
}
static int[] CountingSortByDigit(int[] arr, int exp)
{
int[] output = new int[arr.Length];
int[] count = new int[10]; 3
foreach (int x in arr) count[(x / exp) % 10]++; 4
for (int i = 1; i < 10; i++) count[i] += count[i - 1];
for (int i = arr.Length - 1; i >= 0; i--)
{
int digit = (arr[i] / exp) % 10;
output[--count[digit]] = arr[i];
}
return output;
}
Find the max — it tells you when to stop. Number of digits in max = number of passes needed. Why not pick a fixed number? If all your data is small (3-digit ID codes), running 10 passes is pure waste. Trap: empty arrays — arr.Max() throws on those. Add an early return.
The loop is unusual: by powers of 10.exp jumps 1, 10, 100, 1000… Why max / exp > 0? Because once exp exceeds max, all digits at that place are 0 — no more sorting needed. Trap: running while exp <= max wastes one extra pass. Using != 0 instead of > 0 is fine for positive values but breaks if you ever extend to negatives.
10 buckets — one per digit. Base 10 means digits are 0–9. Why not base 256? You can. Sorting 32-bit integers in base 256 = 4 passes total (vs 10 for base 10). Production radix sorts use base 256 or higher. Trap: mistakenly sizing count by arr.Length — index goes 0–9 regardless of input size.
Extract the digit: (x / exp) % 10. Divide by exp to right-shift past the lower digits, mod 10 to grab the rightmost remaining. Why not bit-shifts? For base 10 you need division; for base 256 (or any power of 2), bit-shifts ((x >> shift) & 0xFF) are vastly faster. That's how production radix sorts get their speed. Trap: negative numbers — % in C# returns the sign of the dividend, so digits go negative. Handle with offset or two-pass (negatives, then positives).
Mastery Move
MSD Radix Sort. Same algorithm but starting from the most significant digit. After the first pass, the array is partitioned by leading digit — recurse into each bucket. Used for sorting variable-length strings (dictionaries, log files by URL). The trade-off: MSD is more cache-friendly but harder to implement than LSD.
Bit-level Radix Sort. For 32-bit integers, pick base 256 (8 bits at a time). Sort by the low byte, then second-low byte, then third, then high. Exactly 4 stable Counting Sorts per array. On modern hardware this beats Quick Sort for n > ~10⁵.
Radix sort is what databases use to build indexes. Sorting trillion-row tables by a numeric column? Radix. PostgreSQL's external sort is a Quick + Radix hybrid for numeric types. Knowing this is a real interview differentiator.
There are no dumb questions
So Radix is O(n) — why isn't it the default for everything?
Three reasons. (1) The constant factor is high; for n < ~10⁵ Quick Sort wins in wall-clock time. (2) Only works on integer-like data — no generic IComparer<T>. (3) Each pass allocates an output array (no in-place radix). Quick Sort is more flexible.
Can I Radix-sort strings?
Yes — treat each character as a "digit" (in base 26 or 128 or whatever). MSD Radix Sort is the standard string-sorting algorithm in compilers, search engines, and dictionary tools. Trie + Radix together is a power combo.
Does .NET have a built-in Radix Sort?
No — Array.Sort is always comparison-based. If you need Radix for big integer arrays, you'll write or grab one. The implementation is short enough (~50 lines) that copying from Knuth Vol 3 is realistic.
Bullet Points — the specialists
Heap Sort — O(n log n) always, in-place, not stable. Selection Sort with a smarter min. Real value is the underlying heap → priority queue.
Counting Sort — O(n + k), stable, NOT in-place. Use when k (value range) is small enough.
Radix Sort — O(d · n), stable, NOT in-place. Use when k is too big for Counting but values are integer-like.
Together they break the O(n log n) "lower bound" — by not comparing at all.
None are the default — they shine only under specific conditions, but when they shine they devastate comparison sorts.
Sentinel intuition: if your data has structure (integers, fixed-width strings, bounded range), look at Counting / Radix before Quick. If you need ordering guarantees AND in-place, Heap.
08
Cheatsheet, Decision Tree & References
You've now met every sorting algorithm worth knowing. The last section is the one to come back to once a year — the cheatsheet, the decision tree, and the resources to dig deeper.
Memorise the table. Internalise the decision tree. Bookmark VisuAlgo. Done.
Brain Power — interview rehearsal
You're in a system design interview. The interviewer says: "We have 100 million 32-bit unsigned integers stored in a single file on a server with 4 GB of RAM. Sort them and write the result back. What's your approach?"
Before reading the card below, think it through. What rules out Quick Sort? What rules out Counting Sort? Where does the answer live?
The answer is external Merge Sort + Radix per chunk. RAM-sized chunks → Radix-sort each (linear, fits in cache), spill sorted chunks to disk. Then k-way merge the chunks with a min-heap. That's how databases do it. Quick Sort needs RAM. Counting Sort needs an 8 GB count array. The unique combination this problem asks for is exactly what production systems use.
8.1
Complexity cheatsheet — print this, pin it to the wall
Eight sorts, one table. Memorise this and you can answer 90% of "which sort?" questions on the fly. n = items, k = value range, d = digits, b = base (10 for decimal radix).
// Algorithm | Best | Average | Worst | Space | Stable | In-place | Adaptive
// --------------+------------+------------+------------+----------+--------+----------+----------
// Bubble | O(n) | O(n²) | O(n²) | O(1) | YES | YES | YES
// Selection | O(n²) | O(n²) | O(n²) | O(1) | NO | YES | NO
// Insertion | O(n) | O(n²) | O(n²) | O(1) | YES | YES | YES ★★★
// Merge | O(n log n) | O(n log n) | O(n log n) | O(n) | YES | NO | NO
// Quick | O(n log n) | O(n log n) | O(n²) | O(log n) | NO | YES | NO
// Heap | O(n log n) | O(n log n) | O(n log n) | O(1) | NO | YES | NO
// Counting | O(n + k) | O(n + k) | O(n + k) | O(n + k) | YES | NO | NO
// Radix (LSD) | O(d·(n+b)) | O(d·(n+b)) | O(d·(n+b)) | O(n + b) | YES | NO | NO
// ── Reading the table in 30 seconds ──────────────────
//
// STABILITY — preserves order of equal items? Critical when
// sorting by a secondary key after a primary.
// IN-PLACE — sorts inside the input, no auxiliary array.
// Matters when memory is tight or O(n) extra is a no-go.
// ADAPTIVE — faster on already-sorted or nearly-sorted input?
// Insertion Sort is the king here.
//
// ── The 5 sentences that summarise everything ────────
//
// 1. Comparison-based sorts can't go below O(n log n) in the worst case.
// 2. Counting and Radix sidestep that by NOT comparing.
// 3. Quick is fastest in practice; Merge is fastest with a guarantee.
// 4. Insertion is fastest for small n and almost-sorted data.
// 5. Heap is the safety net — guaranteed O(n log n) in-place.
8.2
Decision tree — "which sort do I use?"
Walk the tree top-down. Answer "yes" or "no" at each node. Most real-world problems land at one of the first three leaves.
Is your data in a .NET collection?
│
├─ YES → does order of EQUAL items matter (stability)?
│ │
│ ├─ YES → LINQ .OrderBy(...).ThenBy(...) ← stable Merge
│ │
│ └─ NO → Array.Sort / List<T>.Sort ← IntroSort
│
└─ NO (you're rolling your own — interview, embedded, custom data)
│
├─ Is n < ~16?
│ └─ YES → Insertion Sort ← simplest, fast
│
├─ Is the data almost-sorted (streaming inserts)?
│ └─ YES → Insertion Sort ← strongly adaptive
│
├─ Are the values integers with range < n · log n?
│ └─ YES → Counting Sort ← linear
│
├─ Are the values integers in a HUGE range (or fixed-width strings)?
│ └─ YES → Radix Sort ← linear
│
├─ Need GUARANTEED O(n log n) and stability?
│ └─ YES → Merge Sort ← safe
│
├─ Need GUARANTEED O(n log n) but in-place?
│ └─ YES → Heap Sort ← steady
│
└─ Otherwise (the 80% case)
└─ Quick Sort with median-of-three pivot ← fastest in practice
// ── The interview shortcut ───────────────────────────
// 1. Ask: "what's n? what's the value range? do duplicates matter?"
// 2. Match the answers against the tree.
// 3. State your choice + WHY.
// Interviewers don't grade you on the algorithm — they grade you on the WHY.
Mastery Move — system design framing
Sorting is rarely the problem — it's the constraint. In a real interview ("design a leaderboard", "sort 1 TB of logs by timestamp", "find top-K trends"), the right sort is usually derived from the question, not the answer to it.
Three real-world constraints to listen for:
• "More data than RAM" → external Merge Sort.
• "Streaming data, query top-K" → heap of size k, NOT a full sort.
• "Sort millions of fixed-width keys" → Radix on the bytes, not a comparison sort.
If you never write your own sort in production code, you've understood the lesson. Use Array.Sort or LINQ.OrderBy. The sorts in this guide are the foundation — but in a JIT-compiled, profile-guided runtime, the built-in is faster than anything you'd hand-write.
8.3
What .NET actually does — the production reality
In real code you almost never roll your own sort — and you shouldn't. The runtime is smarter than you. Here's exactly what each call does under the hood, in 2025-vintage .NET.
// ── Array.Sort / List<T>.Sort — IntroSort hybrid ─────
//
// For primitive types (int, long, etc.):
// 1. Quick Sort with median-of-three pivot.
// 2. If recursion depth exceeds 2 · log₂(n) → switch to Heap Sort
// (guarantees O(n log n) even on adversarial input).
// 3. For sub-arrays of size ≤ 16 → Insertion Sort.
//
// Result: NOT STABLE, but the fastest general-purpose
// in-place sort the runtime can offer.
int[] nums = [5, 1, 4, 2, 8, 3];
Array.Sort(nums); // ≈ best in class for primitives
// ── LINQ OrderBy / OrderByDescending — stable Merge ──
//
// • Uses a stable Merge-like algorithm internally.
// • Allocates a working buffer of n.
// • ThenBy/ThenByDescending chain into a composite comparer
// — total sort, NOT a re-sort per call.
// • Deferred execution: nothing sorts until you enumerate.
var sorted = orders
.OrderBy(o => o.Date) // primary key
.ThenBy(o => o.Total) // secondary key (stability preserves date order)
.ToList(); // ← THIS is when the sort runs
// ── Span<T>.Sort — zero-allocation primitive sort ────
//
// Same algorithm as Array.Sort, but operates on a span —
// useful inside hot paths where you can't allocate.
Span<int> span = stackalloc int[] { 5, 2, 8, 1, 9, 3 };
span.Sort(); // sorts in place, no GC pressure
// ── PriorityQueue<TElement, TPriority> — the heap ────
//
// Added in .NET 6. Backed by a binary min-heap.
// THIS is what you actually want when "I need the smallest first"
// — not a sort, a priority queue.
var pq = new PriorityQueue<string, int>();
pq.Enqueue("low", 5);
pq.Enqueue("urgent", 1);
pq.Enqueue("medium", 3);
while (pq.Count > 0)
Console.WriteLine(pq.Dequeue());
// → urgent, medium, low (O(log n) per enqueue/dequeue)
// ── The performance table (rough, m1-pro, .NET 8) ────
//
// n = 1,000,000 random ints:
// Array.Sort ~ 35 ms
// Insertion (hand-written) ~ 600,000 ms (don't try this at home)
// Hand-written QuickSort ~ 60 ms (no JIT-tuned IntroSort tricks)
// LINQ OrderBy → ToArray ~ 110 ms (extra allocations)
// Counting Sort (k=1000) ~ 12 ms (when applicable, wow)
// Radix Sort (4 passes) ~ 25 ms (faster than Array.Sort!)
//
// You will not beat Array.Sort by writing your own Quick Sort.
// The only sorts that beat it are problem-specific (Radix, Counting).
Watch Out — the 3 most common production sort bugs
1. "I'll just use OrderBy twice for a multi-key sort."list.OrderBy(x => x.A).OrderBy(x => x.B) sorts by A then resorts by B — losing the A order. You want .OrderBy(x => x.A).ThenBy(x => x.B). (Stability + ThenBy is the only reason this works.)
2. "Why is Array.Sort not preserving my order on ties?" Because it's not stable. If you need stability on a primitive array, you don't have a built-in — wrap with index, sort by (value, original-index), or use Enumerable.OrderBy.
3. "Sort comparer threw, now my list is half-mutated." If your IComparer<T> throws mid-sort, the array is in an arbitrary partially-sorted state. Production tip: validate inputs before calling Sort, and use stateless comparers.
8.4
Further reading — the sources I actually trust
Everything in this guide cross-checks against these. The interactive visualisers are especially worth a few minutes — watching the algorithms run locks intuition in faster than any prose can.
// ── Books ────────────────────────────────────────────
//
// • CLRS — Cormen, Leiserson, Rivest, Stein
// "Introduction to Algorithms", 4th ed. (MIT Press, 2022)
// The standard reference for the field. Chapters 2, 6–8.
//
// • Sedgewick & Wayne — "Algorithms", 4th ed.
// Friendlier than CLRS. Java code. Pairs with the Coursera course.
//
// • Knuth — "TAOCP Vol 3: Sorting and Searching"
// The deep end. Reach for it when you want to read the bedrock
// of the field. Not for first-time learning.
//
// • Skiena — "The Algorithm Design Manual", 3rd ed.
// Practical, problem-solving focus. Great companion to CLRS.
// ── Free university courses (best of the best) ───────
//
// • MIT 6.006 — Introduction to Algorithms (Erik Demaine et al.)
// https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/
// Lectures 3–7 cover sorting comprehensively. Free, on YouTube too.
//
// • Princeton Algorithms (Sedgewick) on Coursera
// Both Algorithms I & II free to audit. Pairs with Sedgewick's book.
//
// • Stanford CS161 (Tim Roughgarden)
// Lecture notes free online. Excellent on Quick Sort analysis.
// ── Interactive visualisers ──────────────────────────
//
// • https://visualgo.net/en/sorting
// Step through every sort. Side-by-side mode. Best in class.
//
// • https://sorting.at/
// Race multiple sorts on the same input. Eye-opening.
//
// • https://www.toptal.com/developers/sorting-algorithms
// Animated, classic distributions (random, sorted, reverse, ...).
// ── Quick-reference sites ────────────────────────────
//
// • https://en.wikipedia.org/wiki/Sorting_algorithm overview
// • https://www.bigocheatsheet.com/ Big O at a glance
// • https://www.geeksforgeeks.org/sorting-algorithms/ many implementations
// • https://learn.microsoft.com/dotnet/api/system.array.sort .NET specifics
// ── Papers / talks worth your time ───────────────────
//
// • Robert Sedgewick — "Quicksort is optimal" (1977)
// The original average-case analysis. Still beautiful.
//
// • David Musser — "Introspective Sorting and Selection Algorithms" (1997)
// The IntroSort paper. 8 pages. Read it to understand
// why .NET's Array.Sort is the way it is.
//
// • Tim Peters — "List sort algorithm description" (CPython source)
// The original TimSort note. Lives in cpython/Objects/listsort.txt.
// ── Want to keep going? ──────────────────────────────
//
// Searching → binary search and its variants
// Data structures → trees, heaps (you've already met heaps!), graphs
// Algorithms → BFS, DFS, dynamic programming
//
// You already know enough about Big O to start any of them today.
// Head to the "Algorithms & Data Structures" full guide on this site.
The whole guide in 8 bullets
Bubble Sort — for learning, never for shipping.
Selection Sort — when writes are expensive (flash, audit logs).
Insertion Sort — best for small n, best for nearly-sorted, basis of TimSort and IntroSort.