# New Capacity Bounds for the Binary Deletion Channel New Results and Open Problems for Insertion/Deletion Channels and Related Problems Michael Mitzenmacher Harvard University The Most Basic Channels Binary erasure channel. Each bit replaced by a ? with probability p. Binary symmetric channel. Each bit flipped with probability p. Binary deletion channel. Each bit deleted with probability p. M. Mitzenmacher

2 The Most Basic Channels Binary erasure channel. Each bit replaced by a ? with probability p. 1101110010 01 1? 0111 ? 01? 01 Binary symmetric channel. Each bit flipped with probability p. 1101110010 01 1001111011 01 Binary deletion channel. Each bit deleted with probability p. 1101110010 01 101110101 M. Mitzenmacher 3

The Most Basic Channels Binary erasure channel. Each bit is replaced by a ? with probability p. Very well understood. Binary symmetric channel. Each bit flipped with probability p. Very well understood. Binary deletion channel. Each bit deleted with probability p. We dont even know the capacity!!! M. Mitzenmacher 4

Motivation This seems a disturbing, sad state of affairs. It bothers me greatly. M. Mitzenmacher 5 Motivation This seems a disturbing, sad state of affairs. It bothers me greatly. And there may be applications M. Mitzenmacher 6

Motivation This seems a disturbing, sad state of affairs. It bothers me greatly. And there may be applications Hard disks, pilot tones, genetics, etc. M. Mitzenmacher 7 Whats the Problem? Erasure and error channels have pleasant symmetries; deletion channels do not. Example: Delete one bit from 1010101010. Delete one bit from 0000000000.

Understanding this asymmetry seems fundamental. Requires deep understanding of combinatorics of random sequences and subsequences. Not a historical strength of coding theorists. But it is for CS theory. M. Mitzenmacher 8 In This Talk Main result: capacity of binary deletion channel is at least (1- p)/9. Compare to capacity (1- p) for erasure channel. First within constant factor result. Still not tight.

We describe path to this result. Generally, well follow the history chronologically. We describe recent advances on related problems. Insertion channels, alternative models. We describe many related open problems. What do random subsequences of random sequences look like? M. Mitzenmacher 9 Capacity Lower Bounds Shannon-based approach: 1. Choose a random codebook. 2. Define typical received sequences.

3. Construct a decoding algorithm. M. Mitzenmacher 10 Capacity Lower Bounds: Erasures 1. Choose a random codebook. Each bit chosen uniformly at random. 2. Define typical received sequences. No more than (p + ) fraction of erasures. 3. Construct a decoding algorithm.

Find unique matching codeword. M. Mitzenmacher 11 Capacity Lower Bounds: Errors 1. Choose a random codebook. Each bit chosen uniformly at random. 2. Define typical received sequences.

Between (p ),(p + ) fraction of errors. 3. Construct a decoding algorithm. Find unique matching codeword. M. Mitzenmacher 12 Capacity Lower Bounds: Deletions 1. Choose a random codebook. Each bit chosen uniformly at random.

2. Define typical received sequences. No more that (p + ) fraction of deletions. 3. Construct a decoding algorithm. Find unique matching codeword. Yields poor bounds, and no bound for p > 0.5. M. Mitzenmacher 13 GREEDY Subsequence Algorithm Is S a subsequence of T?

Start from leftmost point of S and T Move right on T until match next character of S Move to next character of T T 000110010 S 01010 M. Mitzenmacher 14 Basic Failure Argument When codeword X of length n is sent, and p just greater than 0.5, received sequence R has just less than n/2 bits. Is R a subsequence of another codeword Y? Consider GREEDY algorithm If Y is chosen u.a.r., on average two bits of Y

are needed to cover each bit of R. So most other codewords match! M. Mitzenmacher 15 Deletions: Diggavi/Grossglauser 1. Choose a random codebook. Codeword sequences chosen by a symmetric first order Markov chain. 2. Define typical received sequences. No more that (p + ) fraction of deletions.

3. Construct a decoding algorithm. Find unique matching codeword. M. Mitzenmacher 16 Symmetric First Order Markov Chain 1 q/1 q/0 0 q/1

1 1 q/0 0s tend to be followed by 0s, 1s tend to be followed by 1s M. Mitzenmacher 17 Intuition To send a 0 bit, if deletions are likely, send many copies in a block. Lowers the rate by a constant factor. But makes it more likely that the bit gets through.

First order Markov chain gives natural blocks. M. Mitzenmacher 18 Diggavi/Grossglauser Results Calculate distribution of number of bits required for GREEDY to cover each bit of received sequence R using random codeword Y. If R is a subsequence of Y, GREEDY algorithm will show it! Received sequence R also behaves like a sym. first order Markov chain, with parameter q. Use Chernoff bounds to determine how many codewords Y of length n are needed before R is

covered. Get a lower bound on capacity! M. Mitzenmacher 19 The Block Point of View Instead of thinking of codewords being randomly chosen bit by bit: 0, 00, 000, 0001, 00011, 000110, 0001101. Think of codewords as being a sequence of maximal blocks: 000, 00011, 000110, . M. Mitzenmacher 20

Improvements, Random Codebook 1. Choose a random codebook. Codeword sequences chosen by laying out blocks according to a given distribution. 2. Define typical received sequences. No more that (p + ) fraction of deletions, and number of blocks of each length close to the expectation. 3. Construct a decoding algorithm.

Find unique matching codeword. M. Mitzenmacher 21 Changing the Codebook Fix a distribution Z on positive integers. Probability of j is Zj. Start sequence with 0s. First block of 0s has length given by Z. Then block of 1s has length given by Z. And so on. Generalizes previous work: first order Markov chains lead to geometric distributions Z.

M. Mitzenmacher 22 Choosing a Distribution Intuition: when a mismatch between received sequence and random codeword occurs under GREEDY, want it to be long lasting with significant probability. (a,b,q)-distributions: A short block a with probability q, long block b with probability 1 q. Like Morse code. M. Mitzenmacher 23

Results So Far M. Mitzenmacher 24 So Far Decoding algorithm has always been GREEDY. Cant we do better? For bigger capacity improvements, it seems we need better decoding. Best algorithm: maximum likelihood. Find the most likely codeword given the received sequence. M. Mitzenmacher

25 Maximum Likelihood Pick the most likely codeword. Given codeword X and received sequence R, count the number of ways R is obtained as a subsequence of X. Most likely = biggest count. Via dynamic programming. Let C(j,k) = number of ways first k characters of R are subsequence of first j characters of X. C ( j , k ) C ( j 1, k ) I [ X j Rk ]C ( j 1, k 1) Potentially exponential time, but we just want capacity bounds. M. Mitzenmacher

26 The Recurrence C ( j , k ) C ( j 1, k ) I [ X j Rk ]C ( j 1, k 1) I would love to analyze this recurrence when: Y is independent of X Y is obtained from X by random deletions. If the I[Xj = Rk] values were all independent, would be possible. But dependence in both cases makes analysis challenging. I bet someone here can do it. Lets talk. M. Mitzenmacher 27

Maximum Likelihood Standard union bound argument: Let sequence R be obtained from codeword X via a binary deletion channel; let S be a random sequence obtained from another random codeword Y. Let C(R) = # of ways R is a subsequence of X. Similarly C(S) = # of ways S is a subsequence of X. What are the distributions of C(R), C(S)? Unknown; guess is a lognormal or power law type distribution. Also C(S) is often 0 for many parameters. Want C(R) > C(S) with suitably high probability. M. Mitzenmacher 28

Conclude: Maximum Likelihood This is really the holy grail. As far as capacity arguments. Questions: What is the distribution of the number of times a small random sequence appears as a subsequence of a larger random sequence? Same question, when the smaller random sequence is derived from the larger through a deletion process. M. Mitzenmacher 29 Better Decoding

Maximum likelihood havent got it yet An approximation, intuitively like mutual information: Consider a received block of 0s (or 1s). What block(s) did it arise from? Call that sequence a type. For random codewords and deletions, number of (type,block) pairs for each type/block combination is highly concentrated around its expectation. M. Mitzenmacher 30 Type Examples 1111000011000100011100110011000 1111000011000100011100110011000 Received sequence:

1100000111100 (type,block) pairs: (1 1 1 1 , 1 1) (0 0 0 0 1 1 0 0 0 1 0 0 0 , 0 0 0 0 0) (1 1 1 0 0 1 1 0 0 1 1 , 1 1 1 1) (0 0 0 , 0 0) M. Mitzenmacher 31 New Decoding Algorithm 1. Choose a random codebook. Codeword sequences chosen by laying out blocks

according to a given distribution. 2. Define typical received sequences. No more that (p + ) fraction of deletions, and has near the expected number of (type,block) occurrences for each (type,block) pair. 3. Construct a decoding algorithm. Find unique codeword that could be derived from the expected number of (type,block) occurrences given the received sequence. M. Mitzenmacher

32 Jigsaw Puzzle Decoding Received sequence: 0 0 1 1 0 0 1 1 0 1 Jigsaw puzzle pieces 00000 0000 0110 1011 11

111 00 00 00 11 11 11 x44 x88

x8 x16 x100 x210 M. Mitzenmacher 33 Jigsaw Puzzle Decoding : Examples Received sequence: 0 0 1 1 0 0 1 1 00000 1011 0110 11 00

11 0000 111 00 11 00000 111 00

11 M. Mitzenmacher 00 11 000001011011011 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 1 1 0000111000001011 00 11 0000 11

00 11 0 0 1 1 0 0 1 1 00000111000011 0 0 1 1 0 0 1 1 34 Formal Argument Calculate upper bound on number of possible jigsaw puzzle coverings. Get lower bound on capacity. Challenge 1: Dont get exactly the expected number of pieces for each (type,block) pair; just close. Challenge 2: For very rare pieces, might not even be close.

End result: an expression that can be numerically computed to give a lower bound, given input distribution. M. Mitzenmacher 35 Calculations All done by computer. Numerical precision not too challenging for moderate deletion probabilities. Terms in sums become small quickly. Fairly smooth. We guarantee our output is a lower bound. Computations become time-consuming for

large deletion probabilities. M. Mitzenmacher 36 Improved Results M. Mitzenmacher 37 Ullmans Bound Ullman has an upper bound for synchronization channels. For insertions of a specific form. Zero-error probability.

Does not apply to this channel although it has been used as an upper bound! We are the first to show Ullmans bound does not hold for this case. What is a (non-trivial) upper bound for this channel? Some initial results M. Mitzenmacher 38 Insertion/Deletion Channels Our techniques apply for some insertion/deletion channels. GREEDY decoding cannot; depends on received sequence being a subsequence of the original codeword.

Specifically, the case of duplications: 0 becomes 000. Maintains block structure. M. Mitzenmacher 39 Poisson Channels Recall discrete Poisson distribution with mean m. Pr[ X k ] e m m k / k! Consider a channel that replaces each bit with a Poisson number of copies. Call this a Poisson channel. Poisson channels can be studied using our duplication/deletion analysis. Capacity when m = 1 is approx. 0.1171.

From numerical calculations. M. Mitzenmacher 40 Reduction! A code for any Poisson channel gives a code for every deletion channel. Intuition: solve deletion problem by replacing every bit by 1/(1 p) bits. On average, 1 copy of every original bit gets through. Distribution of number of copies is approximately Poisson. So its like a Poisson channel, but rate reduced by factor of (1 p). M. Mitzenmacher

41 Randomized Reduction Picture Take codeword X for Poisson channel Randomly expand to X for deletion channel using a Poisson number of copies per bit Expands by 1/(1 p) factor Send X over deletion channel Receive R M. Mitzenmacher

Decode R using the Poisson channel codebook 42 Reduction! A code for any Poisson channel gives a code for every deletion channel. To send codeword over deletion channel with deletion probability p, use a codeword X for the Poisson channel code But independently replace each bit by a Poisson distributed number of bits with mean 1/(1 p). At output, each bit of X appears as a Poisson distributed number of copies (with mean 1) a Poisson channel. Decode for the Poisson channel.

M. Mitzenmacher 43 Capacity Result Input to the deletion channel is 1/(1 p) factor larger than for Poisson channel. Implies capacity for the deletion channel is at least 0.1171(1 p) > (1 p) / 9. Deletion channel capacity is within a constant factor of the erasure channel (1 p). First result of this type that we know of. Best result (using a different mean) is 0.1185(1 p). M. Mitzenmacher 44

More New Directions Information theoretic analysis Upper bounds Sticky channels Segmented deletion/insertion channels Large alphabets Trace reconstruction See survey article

M. Mitzenmacher 45 Sticky Channels Motivation: insertion/deletion channels are hard. So what is the easiest such channel we can study? Sticky channels: each symbol duplicated a number of times. Like a sticky keyboard! xxxxxxxxxxxxx Examples: each bit duplicated with probability p, each bit replaced by a geometrically distributed number of copies. Key point: no deletions. Intuitively easy: block structure at sender completely preserved at the receiver.

M. Mitzenmacher 46 Sticky Channels : Results New work: numerical method that give near-tight bounds on the capacity of such channels. Key idea: symbols are block lengths. 000 becomes 3. Capacity for original channel becomes capacity per unit cost in this channel. Use techniques for capacity per unit cost for Markovian channels. M. Mitzenmacher

47 Bounds Achieved : Duplication p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 M. Mitzenmacher Lower bound Upper bound 0.7405

0.7406 0.6611 0.6618 0.6400 0.6404 0.6488 0.6499 0.6788 0.6797 0.7273 0.7277 0.7914 0.7915 0.8674 0.8675 0.9469 0.9479

48 Segmented Channels Motivation: what about deletions makes them so hard? Can we restrict deletions and make them easy? Segmented deletion channel: guarantee at most one deletion per segment. Example: At most 1 deletion per original byte. Similarly, can have segmented insertion channel. Motivation: Clock skew/timing errors. M. Mitzenmacher 49

Codes for Segmented Deletions : Our Approach Create a codebook C with strings of b bits. Codeword is concatenation of blocks from C. Aim to decode blocks from left to right, without losing synchronization, regardless of errors. Questions: How can this be done? What properties does C need? How large can C be? M. Mitzenmacher 50 Segmented Channels : Results New work: 0-error, deterministic algorithms for segmented deletion/insertion

channels. With reasonable rates. 44% for one bit/byte. 59% for one bit/2 bytes. Simple computationally linear time. M. Mitzenmacher 51 Upper Bounds: Results An upper bound approach using side information that gives numerical bounds. An upper bound approach using side information that gives asymptotic behavior as fraction of deletions p goes to 1.

As p goes to 1, cannot obtain capacity better than 0.7918(1 p). Gap between (asymptotic) upper/lower bounds now roughly (1 p)/9 and 4(1 p)/5. M. Mitzenmacher 52 Bounds Achieved p 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.8 0.9 M. Mitzenmacher Lower bound Upper bound 0.56196 0.70428 0.34669 0.55078 0.22243 0.44664 0.14841 0.37118 0.10186 0.31109 0.06956 0.25815

0.04532 0.20811 0.02727 0.16341 0.01238 0.10000 53 Open Questions Capacity lower bounds: Improvements to argument. What is the best distribution for codewords? Can even more general (type,block) pairs be usefully used?

Avoid overcounting jigsaw solutions that appear multiple times? Specific better lower bounds for Poisson channel, translate immediately into better general bounds! Upper bounds: Tighter upper bound for capacity of binary deletion channel? Maximum likelihood arguments: How do random subsequences of random sequences behave? Look for threshold behaviors, heavy-tailed distributions. M. Mitzenmacher 54 Open Questions : Coding All this has been on lower bounds

almost nothing about coding! Except segmented deletion channel. There has been some experimental work done, but very, very limited results so far. And little good theory. Can we take the insight here and use it to develop good codes? M. Mitzenmacher 55 Specific Code Challenges Find a good code for the Poisson channel. Code for the Poisson channel immediately gives codes for the deletion channel!

Find good codes for basic sticky channels. Easiest channels with block structure : no deletions, just duplicates. May yield insight for other channels. Low-density parity-check coding techniques seem applicable. M. Mitzenmacher 56 The Goals Simple, clear, tight bounds for capacity for binary deletion channels, and practical codes that are close to capacity. Its been done for erasure channels and error-correcting channels, why not deletion

channels too? Many more specific open questions on this path given in survey article M. Mitzenmacher 57