Ralph C. Merkle
This is an early draft.
In a previous proposal (Merkle and Hellman, 1978) a public key system was derived by creating instances of the knapsack problem that could be solved if certain secret information was known. Unfortunately, the resulting instances proved to be less secure than the general (unrestricted) knapsack problem (Adleman 1983, Brickel 1985).
The present system uses the knapsack problem but need not restrict the type of knapsack that is used. As a consequence, solving for x remains NP complete. This eliminates the types of attacks that were used against the earlier system. However, it also complicates the task of Alice and Bob, the two communicants who wish to agree upon a common key over an open communication channel. If there is no secret information which permits solution of the knapsack, what is the mechanism used by Alice and Bob to agree on common information which cannot be deduced by the eavesdropper Eve?
|AL(i+j, w) ~ AL(i, w) + AL(j, w) mod k|
where w parameterizes the family by selecting a specific linear function, AL(*, w).
We will be deliberately vague about the exact meaning of "~" in this definition. This vagueness is dictated by the fact that the degree of inexactness that we desire is not sharply defined. In the specific system proposed, we will operationally define a precise definition of "~", but this precise definition is only one of many possible precise definitions and should not be construed as "the" definition.
A class of functions with this property can be defined by:
|AL(i, w) = floor((( w * i mod m ) / m ) * k)|
While k is an integer parameter the parameters w and m can take on either real or integer values. All multiplication and division operations used in this paper are defined as the standard operations over the reals (divisions, in particular, are not integer divides over the modulus m but are real divisions even if both operands happen to have integer values). The mod operation is also defined over the reals and produces a value which is greater than or equal to zero, and less than m. The function floor(arg) converts a real value to an integer value, and is defined to be the greatest integer less than or equal to arg. This function will be used to explicitly indicate when real values should be converted to integers (possibly with loss of information caused by elimination of the fractional part).
In this paper we will use integer values for w and m (as well as k). We will further assume that k and m are powers of 2, i.e., log2 m and log2 k are both integers. We are making m and k fixed system parameters even though this is not strictly necessary, and in our subsequent examples will set m = 2200 and k = 28.
Looking at the definition of AL, we first note that ( w * i mod m ) / m is a real number between 0 and 1 (inclusive of 0 and exclusive of 1). When we multiply by the integer k, we produce a real number between 0 and k (inclusive of 0 and exclusive of k). When we convert to an integer using the floor function we produce an integer between 0 and k (inclusive of 0 and exclusive of k). Thus, the function AL(i, w) does indeed produce an integer result, as desired.
Further, AL(*, w) satisfies our property: AL(i+j, w) ~ AL(i, w) + AL(j, w) mod k. We can see this by the following argument. First, we define L:
|L(i, w) = (( w * i mod m ) / m ) * k|
L(*, w) maps integers onto reals by performing almost the same calculation as AL(*, w) but leaving out the "floor" operation. L(*, w) satisfies the property:
|L(i+j, w) = L(i, w) + L(j, w) mod k|
That is, L(*, w) is a linear function. We show this as follows by repeated application of distributive laws:
L(i+j, w) = ( w * (i+j) mod m / m ) * k
= ( ((w*i) + (w*j)) mod m / m ) * k
= ( (w*i mod m) + (w*j mod m) - r*m ) / m * k
= ( (w*i mod m)/m + (w*j mod m)/m - r*m/m ) * k
= ((w*i mod m)/m )*k + ((w*j mod m)/m)*k - (r*m/m)*k
= L(i, w) + L(j, w) - r*k
(where r is 0 if (w*i mod m) + (w*j mod m) is less than m, and 1 otherwise)
As the only difference between L(*, w) and AL(*, w) is the floor operation, we would expect that
L(i+j, w) = L(i, w) + L(j, w) - r*k
AL(i+j, w) ~ AL(i, w) + AL(j, w) mod k
This approximate equality would frequently be exact, but would also frequently produce an integer result that was "off by one." This satisfies our claim that AL(*, w) is an approximately linear function (and provides a specific definition of "approximate" for this particular class of functions).
For the sake of exposition we assume that w and the elements of a are randomly chosen 200 bit quantities, that k is 28 (256), that m is 2200, and that n is 200. The bi are 8-bit quantities. No particular assumptions should be made about the security or efficiency provided by this method of selecting the parameters: it is provided only as an aid to intuition in understanding the proposal. Note that, although m can be used as part of the private key, we are deliberately making it a public and fixed system parameter to simplify the calculations. Thus, the secret key (in this example) is w alone.
m, k (both powers of 2) and n
|Alice knows||Bob knows||Bob chooses||Bob computes||Alice computes|
|secret||w||x||T' = floor ( x * ( b
+ 0.5) mod k )
S = x * a
Tmin = min(T', (T'+ k/2) mod k)
If Tmin = T' then bit = 0 else bit = 1
|T = AL(S, w) = floor((( w * S mod m ) / m ) * k)
If distance( T, Tmin, k ) < distance ( T, Tmin + k/2, k ) then bit = 0 else bit = 1
|public||a, b||a, b||S, Tmin|
Assume that Bob knows a and b (the public key) and the system parameters m, k and n. Alice knows these, and in addition knows w. To agree upon a common key, Bob selects a random integer n-vector x and computes S = x * a. Note that we are relaxing the restriction that the xi must be boolean and will permit a larger range of values. For example, we might select the xi from the range [0,1,2,3] instead of the range [0,1]. Bob transmits S to Alice. Alice computes T = AL(S, w). Bob computes T', an approximation to T, by adding elements of b:
T' = floor ( x * ( b + 0.5 ) mod k )
The expression b + 0.5 means the vector (b1 + 0.5, b2 + 0.5, b3 + 0.5, ... bn + 0.5).
It is central to this proposal that
|T ~ T'|
We show that T ~ T' by the following argument. We first define:
err = L(a, w) - AL(a, w) - 0.5 = L(a, w) - b - 0.5
and note that each erri can be treated as a random variable over the interval [-0.5 .. 0.5] which has mean 0 and variance 1/12 ( 1/12 is the integral of x2 over [-0.5 .. 0.5]). The constant 0.5 is a consequence of the fact that AL(i, w) is defined using the floor function rather than the nearest integer. While we could change this definition and eliminate the constant 0.5, this would complicate further proofs.
We now observe that (taken mod k):
AL(S, w) ~
L(S, w) =
L(x * a, w) =
x * L(a, w) =
x * ( err + b + 0.5 ) =
x * err + x * (b + 0.5) ~
x * (b + 0.5) ~
The error term x * err has mean 0 and variance:
summation over i = 1 to n of: xi2 * 1/12
|S = 75895||T' = 8
T = 8
At this point, Alice knows T and Bob knows T'. As T ~ T', if k is sufficiently large then Alice and Bob have agreed upon some common information. As cryptographic keys must be bit-for-bit identical (and in public key distribution we want to agree upon cryptographic keys), we need some mechanism of converting the approximately equal T and T' into exactly equal information known to Alice and Bob. This can be done using various error correcting methods (generally similar in concept to some methods used in quantum cryptography to eliminate random errors introduced during the basic key exchange process). We will provide only one simple example of such a method, but the reader should understand that other (and better) methods are possible.
We assume that Alice and Bob wish to agree upon a single bit. Intuitively, we can't set an arbitrary fixed cutoff for selecting a 0 or a 1 because T might be just below the cutoff and T' just above the cutoff. We want to tailor the cutoff to the particular value of T. As Bob can't transmit T' without giving away information to any eavesdropper, Bob instead transmits both T' and the value farthest away from T' mod k (which is (T' + k/2) mod k, where we assume that k is divisible by 2). Alice can then see if T is closer to T' or (T' + k/2) mod k, which will reliably let Alice and Bob agree upon a common bit provided that T' - T (the error introduced by our approximation) isn't too large. Transmitting both T' and (T' + k/2) mod k in a fixed order would compromise the bit being agreed upon, but we really only need to transmit the smaller of these two values and can therefore avoid this problem.
More precisely, Bob can transmit Tmin = min(T', (T'+ k/2) mod k) ) to Alice. If Bob transmitted T', then Bob selects the bit 0, otherwise Bob selects the bit 1. Alice knows Tmin, and can compare T with Tmin. If distance(T, Tmin, k) < distance(T, Tmin + k/2, k) then Alice assumes that Bob transmitted T', and therefore that Alice should select the bit 0; otherwise Alice selects the bit 1. (The notation "distance(i, j, k)" denotes the distance between i and j modulo k, i.e., the minimum of ( i - j ) mod k and ( j - i ) mod k). In this fashion, Alice and Bob have agreed upon a single bit of common information.
This method can be generalized to permit Alice and Bob to agree upon a single digit in bases larger than 2.
We first note that we chose the vector a randomly. More generally, it could be chosen by any method which gave us confidence that solving for some x given a and S was a hard problem. As this is an NP complete problem, we expect finding hard instances should be feasible. It is important to realize that previous systems based on the knapsack (Merkle and Hellman, 1978, being just one example), used restricted knapsacks as one-way hash functions, with secret "trap door" information that allowed the particular instance to be easily solved. In this proposal, no such "trap-door" information exists. The particular knapsack instance is unrestricted, and neither A nor B will ever solve for x. As a consequence, solving for x is an NP complete problem.
Knowledge of the private key, w, is of no help in solving for x, as Alice (who generated w) must still solve an NP complete problem to find x. Put another way, if w was helpful to Bob in solving for x, then it would be helpful to anyone attempting to solve any knapsack problem, for anyone could select a random w and compute some b from a, where a was any knapsack problem they wished to solve.
We have transmitted Tmin = min( T', (T' + k/2) mod k ) from Bob to Alice. It might appear that we could transmit min (T, (T + k/2) mod k ) from Alice to Bob and get much the same result. However, to compute T we must know w. It is possible that the cumulative effect of disclosing many different values computed from T (when the basic protocol is repeated to compute many common key bits) would provide leverage in determining w. As the calculation of T' does not use w its disclosure can provide no such leverage.
The disclosure of Tmin might theoretically provide assistance to Eve in computing x (as to compute Tmin Bob had to use his knowledge of x). There are various reasons for believing this is not the case, but it is sufficient to note that Tmin can provide at most log2 k bits of information and could therefore provide at most a factor of k improvement in any algorithm for computing x. As a new random value of x is selected each time the basic protocol is iterated, there can be no cumulative gain of information.
A seemingly more hopeful route than solving for x would be to solve for w. However, we prove that solving for w is NP complete. We can encode an NP complete problem into the vectors a and b in such a way that determining w will allow us to find a solution.
A more subtle problem is that the proof that solving for w is NP complete compromises our proof that solving for x is NP complete. It might be that all vectors a for which it is hard to find w make it easy to find x, and vice versa.
It is worth emphasizing that even proofs that a cryptographic problem is NP complete do not provide assurances that any specific instance of the problem is hard to solve.
b = AL(a, w)
is NP complete.
In outline, we will embed 3SAT into a and b, and will represent boolean variables by selected bits of w. Other bits of w will provide additional degrees of freedom required to make the embedding work, while still other bits of w will be set to 0 to prevent undesired propogation of carries.
(v1 or v2 or not v3) and
(v2 or v3 or v4) and
(not v7 or not v1 or v3) and
In words: the formula must be the logical "and" of a set of clauses, and each clause must consist of the logical "or" of three literals, and each literal must be either a variable or the negation of a variable, and each variable can take on one of exactly two values: TRUE or FALSE (0 or 1).
w = 00000000v00000000v00000000v .... 00000000f00000000f .... 00000000fThat is, w consists of a repeating pattern of 8 bits of 0 followed by either a single variable bit or a single freedom bit.
The multiplication of w times some ai can be represented as repeated additions of shifted values of w (w is artificially shortened for purposes of illustration):
ai the shifted w's ai,10 00000000v00000000v00000000v00000000f00000000f ai,9 00000000v00000000v00000000v00000000f00000000f ai,8 00000000v00000000v00000000v00000000f00000000f ai,7 00000000v00000000v00000000v00000000f00000000f ai,6 00000000v00000000v00000000v00000000f00000000f ai,5 00000000v00000000v00000000v00000000f00000000f ai,4 00000000v00000000v00000000v00000000f00000000f ai,3 00000000v00000000v00000000v00000000f00000000f ai,2 00000000v00000000v00000000v00000000f00000000f ai,1 00000000v00000000v00000000v00000000f00000000f ai,0 00000000v00000000v00000000v00000000f00000000f w*ai = ---------bbbbbbbb------------------------------------- bi <log2 k> <--------------- log2 m ------------------>Each row represents w shifted left by the appropriate amount. The binary digits of ai are shown as a column to the left as ai,j's. The dashes ("-") in the product represent don't care bits (i.e., we're not concerned with the value of these bits), while the block of 8 "b"'s represents the bits in bi.
We first show that we can force 8-bit blocks of w to have the value 0. We do this by taking the preceding diagram and setting a1 = 1 and setting b1 = 0. This gives the following picture:
a1 the shifted w's 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 1 00000000v00000000v00000000v00000000f00000000f w*a1 = ---------00000000------------------------------------- b1As the only bit in a1 which is 1 is the least significant bit, the only w that participates in the sum is the bottom row, which is shifted left by 0 bits. This being the case, the only possible values of w which satisfy the requirement that b1 = AL(a1, w) must have 8 bits of 0 at the left followed by the first variable value, v.
This same method can be repeated to create the other required blocks of 0's. The next multiplication looks like:
a2 the shifted w's 0 00000000v00000000v00000000v00000000f00000000f 1 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f w*a2 = ---------00000000------------------------------------- b2That is, a2 is just a1 shifted left by 9 bits. This creates the second block of 8 0's in w. For reasons of space, we have not shown the values of w shifted left by more than 9 bits, but as the reader will appreciate the process illustrated can be continued to create as many blocks of 0 in w as might be desired.
The values of both the variable bits and the freedom bits are left undetermined by this initial step, which merely forces 0's into w. We now wish to show how to create a single 3SAT clause, which forces at least one of 3 boolean variables to be a 1. Before doing this, we first illustrate a simple way of forcing two variable bits to be complementary: the first bit must be 0 if the second bit is 1, while the second bit must be 0 if the first bit is 1. This is shown in the following multiplication:
a3 the shifted w's 1 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f 1 00000000v00000000v00000000v00000000f00000000f 0 00000000v00000000v00000000v00000000f00000000f w*a3 = ---------00000001------------------------------------- b3Two bits in a3 have been set to "1". These bits select two w's, the first w being shifted left by 1 and the second w being shifted left by 10. This causes two variable bits to be aligned with the bottom bit of b3. We designate the lower variable bit v0, and the next variable bit v1. Note that this renumbering is skipping over the intervening 0 bits.
The value of b3 is 1, so we have 1 = b3 = v0 + v1. This can only be true if either v0 = 0 and v1 = 1, or if v0 = 1 and v1 = 0. This mechanism permits us to take any variable vi and create its negation in vj.
This example illustrates the general approach: we can select a variable bit by placing a "1" in the proper spot of the binary representation of ai. All selected variable bits must sum to the value of bi. To represent a single clause in 3SAT, we will select three variable bits and two freedom bits and demand that they sum to 3. The two freedom bits are otherwise unconstrained, and so can freely take on either the value 0 or the value 1. For all five bits to sum to 3, there must be three bits that have the value 1. As there are only two freedom bits, they can at most provide two of the required values. One of the variable bits must therefore be "1".
If we omit the unselected shifts of w we can illustrate this as follows:
a4 the shifted w's 1 00000000v00000000v00000000v00000000f00000000f 1 00000000v00000000v00000000v00000000f00000000f 1 00000000v00000000v00000000v00000000f00000000f 1 00000000v00000000v00000000v00000000f00000000f 1 00000000v00000000v00000000v00000000f00000000f w*a4 = --------------------------00000011------------------------------------- b4As this method allows us to select any three variable bits that we might desire by setting appropriate bits of ai to "1", successive pairs of ai and bi can encode successive terms of the 3SAT problem. If a literal in a clause is the negation of a variable, we can create (as described above) a new variable which is its negation, and replace the negation of the original variable with the new variable.
As 3SAT is NP complete, the ability to find a w such that b = AL(a, w) would permit us to solve an NP complete problem. An important question not addressed by this proof is whether the security of w depends on the number of elements in the vectors a and b: is w secure even when n is large? The larger the value of n the more we over-determine w. An algorithm by (Frieze et al., 1988) can be adapted to solve for w when n is sufficiently large (an idea proposed by Matt Franklin). This suggests that small values of n are to be preferred. Further investigation to better determine which instances are hard (and hence would make good candidates for cryptographic use) is warranted.
Because determining w is an NP complete problem, it follows that the problem of determining AL(i, w) for any i, given only a and b is also NP complete.
summation over i = 0 to n-1 of: xi2 * 1/12
(as was derived earlier)
This formula lets us compute the specific variance for a specific vector x. Biasing the selection of x towards vectors with a smaller variance might be useful, as this would reduce the error rate. This bias increases the knowledge that Eve has of x, and so these two conflicting desires need to be balanced. We will not investigate this possibility further in this paper.
For our example values of k=256 and n=200, and assuming the xi are randomly chosen and binary, then an average x will have ~100 bits that are "1" and ~100 bits that are 0. This implies a variance of ~8.3, or a standard deviation of ~2.9. An error of 64 (k/4) is required to cause an error big enough to shift a "0" to a "1" (or the reverse), which is over 22 standard deviations. The probability of error is therefore extremely low.
As an aside, it might prove useful to transmit 2 (or more) bits in S, rather than one. If there were 4 distinct code points (instead of the two code points used to transmit 1 bit of information) the separation between adjacent code points would be ~11 standard deviations: this still produces a very low error rate and permits two bits of information to be transmitted for each value of S.
If we assume the xi are drawn uniformly from the distribution [0, 1, 2, 3] then there will be approximately 50 0's, 50 1's, 50 2's, and 50 3's. This gives a typical variance of 50 * ( 02 + 12 + 22 + 32 ) / 12 or ~58, or a standard deviation of ~7.6. This implies that an error of ~8.4 standard deviations would cause an error in the agreed on bit, so the error probability is still very low.
It might be useful to design the system so that these error rates were much higher (e.g., ~ 1 in 100). By making the system more marginal during normal operation, it should be possible to make cryptanalysis more difficult. If this is desired Alice and Bob could agree on many bits (by repeating the basic protocol many times) and then apply an error correcting scheme to eliminate those bits that were in error.
The formula for the variance of T' can be used to compute the variance of T''. Because the variance depends on the square of the x'i, larger values for them will cause the variance to increase rapidly and the error rate to similarly increase rapidly. Once the standard deviation (the square root of the variance) exceeds ~k/4, x' will provide little useful information.
Two ways of increasing the computational efficiency are to (a) agree on more than 1 bit at a time and (b) to share computations across repeated iterations of the basic protocol. An example of the former has already been given, while an example of the latter would be to compute x1 * a and x2 * a by first adding those elements of a that were in both x1 and x2, and then sharing this common sum in the later computations. More complex methods of reducing the total number of additions when many sums over the same vector a are to be computed are possible.
An approach which should reduce both the computational and communications overhead is to use more than one value for w and more than one vector b with the same vector a. For example, we could generate w2 in addition to w, and create b2 = AL(a, w2). After selecting x, Bob would compute:
T2' = floor ( x * ( b2 + 0.5 ) mod k )
while Alice would compute:
T2 = AL(S, w2)
The remaining calculations would proceed as before, but now Alice and Bob would be able to use S twice, and so agree on twice as many bits.
Having used w2 and b2, there is no problem in extending this to w3 and b3, w4 and b4, etc. This allows the single sum S to be used many times and to encode many bits. This reduces communications overhead (we need to transmit fewer values of S to generate the same number of common key bits) and the computational overhead (we need to compute fewer values of S).
As the size of a can be effectively reduced, the size of the next largest component of the public key, b, will become significant. For our example numbers, this will take 200 * 8 = 1600 bits or 200 bytes. The size of the private key is 200 bits (the size of w). As b can be easily recomputed from w and a, and a is generated pseudo randomly from a small key, the total amount of information that must be remembered for the private key is quite small.
If we use several different values w1, w2, w3 ... and several corresponding vectors b1, b2, b3 ... we will increase the size of the public and private keys correspondingly.
At the cost of reducing system security, the size of the public key can be reduced by making k and n smaller. The range of values that can be assumed by the xi could be made larger to compensate for a smaller value of n. How to best manage this tradeoff in the context of a real system deserves further attention.
|L(i, w) = (( w * i mod m ) / m ) * k|
by setting k = m = 1, we get:
L(i, w) = i * w mod 1
where w is a real parameter between 0 and 1. Given this, we can now ask what information we gain if we know the linear function of some arbitrary integer i:
L(i, w) = z
i * w = z mod 1
w = ( z + ii ) / i
for some integer ii. We can express ii as ii = j * i + remainder where remainder is between 0 and i, giving:
w = ( z + j * i + remainder ) / i
w = ( z / i ) + j + ( remainder / i )
The integer j can be dropped, as we are doing arithmetic mod 1. This gives:
w = ( z / i ) + ( remainder / i )
where z is in the range from 0 to 1, remainder is in the range from 0 to i, i is any non-zero integer, and w is the parameter which defines a linear function such that L(i, w) = z.
The remainder can take on one of i distinct values, which in turn define i distinct values of w. Therefore, knowing L(i, w) does not uniquely define L(*, w) or w: far from it. For large values of i, there will be a large number of possible parameters w such that L(i, w) = z. If we know only an approximate value of z (as when we know the value of AL(i, w)) then the number of possible parameters w consistent with this knowledge will be much larger.
As discussed before, given a linear function we can derive an approximately linear function. In fact, we can derive a vast number of approximately linear functions from a single linear function by using different methods of approximation. Beyond this, it is possible to derive approximately linear functions when there is no underlying linear function. To see this, we note that the set of integers defined by all possible subset sums of a vector of integers a might form a tiny portion of the space of possible integers within a given range. This can occur if, for example, the knapsack defined by a was sparse. Note that "the range" of a knapsack is simply [0 .. Smax] where Smax is the sum of all the ai (assuming they are all positive). As a consequence, an approximately linear function need only be defined on those integers for which the knapsack has a solution and could be undefined or generate inconsistent values on other integers in the range of the knapsack.
As the knapsack becomes less and less sparse, the integers within the range of the knapsack for which there is a solution become more and more crowded, the number of integers which have multiple solutions increases, and the approximately linear function must correspond more and more to some linear function.
The security of this new proposal has not been established. Because it is based on an NP complete problem, this system might prove more resistant to cryptanalysis by quantum computers than methods based on factoring or related problems (Okamoto et al., 2000). The eventual feasibility of quantum computers is increasingly of concern (e.g., Mullins, 2001). Further research to evaluate its security and optimize its performance is warranted.