Vous êtes sur la page 1sur 11

Privacy Issues in Public Key Infrastructures

The use of electronic transactions is rapidly spreading to include virtually all forms of exchange and communication traditionally handled on paper. This shift is primarily being driven by the obvious advantages of electronic communication, including increased speed, reduced cost, and ease of automation. These electronic transactions suffer from two serious problems. With computer involvement in all transactions, and with the low cost of long term storage, all information which flows between parties can easily be captured, recorded, and later analyzed leading to vast stores of information about individuals destroying personal privacy. Electronic transactions also lack the traditionally security protections which occur in the real world. Most paper based documents depend upon, in part, the difficulty in forging copies or forging one's own identity. In an electronic realm, however, copying and distributing information is simple and free. Cryptographic techniques, especially in the form of public key certificates, are often viewed as a solution to the identification problem, but serve to worsen the privacy problem by providing a simple means of identifying and tracking individuals. In order to prevent the most pervasive surveilence system ever from developing while maintaining security, a public key infrastructure, such as those described by Seftan Brands, must be developed which provides security while providing it's users some degree of privacy. The technical tools such as those described by Brands, while necessary to build a privacy preserving public key infrastructure, are, however, not sufficient outside of a theoretical context to provide workable real world privacy protection. Stefan Brands' paper, *Rethinking Public Key Infrastructure and Digital Certificates -- Building In Privacy*, Brands presents a number of different public key infrastructures suited for a variety of applications which use a collection of cryptographic techniques to provide privacy protection. Privacy protection naturally comes in the form of some degree of anonymity. These infrastructures are, however, necessarily not completely anonymous. If one user of the system wishes to verify a property about another user of the system, the former must be able to accept the signature of some other party, typically the Certificate Authority (CA), which has verified the property. Clearly in such a system the CA must know some information about the second party in order to assert its truth. The goal of a privacy protecting public key infrastructure is to allow the property to be verified without being able to trace it back to the identifying information held by the CA, even should the verifier and the CA conspire. While public key infrastructures can be designed to have virtually any degree of privacy protection for either provers (those who have some property whose truth they wish to assert), verifiers, and CAs (which are just a special case of a prover or a verifier depending on the context), this ``privacy'' will not necessarily translate into a real world protection of privacy for consumers. The public key infrastructure schemes can be formally proved not to release any information about the user that the user did not authorize the system to release. In a practical application of these system, however, implementation issues will likely lead to the need for more information to be released than one might require in the truly minimalistic case. Individuals are often also willing, for one reason or another, to disclose some amount of information about themselves in exchange for a discount, more personalized service, or greater convenience. While

consumers may wish to maintain some degree of privacy, the collective temptations to expose information will probably lead to the release of a sufficient degree of personal information that reidentification will likely become possible thereby destroying the privacy the system purports to offer. Privacy protecting public key infrastructures are nonetheless necessary, for without them no attempt at privacy can even be practically attempted. Even with their use, however, it will likely remain necessary to provide forms of cultural and legal protection to prevent the abuse of the data which will still be possible to collect.

Privacy Protecting Public Key Infrastructures

A public key infrastructure is a system of protocols between different parties usually in an effort to allow one party to securely prove some property, most commonly identity, to another which uses asymmetric cryptographic algorithms.

Identity Certificates
A typical view of a public key infrastructure involves three parties, the prover, the verifier, and a CA. Essentially the prover is trying to prove to the verifier his/her identity. The CA either maintains, or simply certifies, a set of bindings between public keys and an identifier. When a prover wishes to prove his/her identity to a verifier, the verifier obtains a copy of the public key for the supposed prover's identity either by receiving a copy from the prover that has been certified by the CA, or by communicating directly, through a presumably secure channel, with the CA. The prover and the verifier then engage in a dialog which will leave the verifier assured that the prover does hold the private key tied to the given public key. A certificated copy of a public key provided by a CA in a scheme such as the this is essentially an identity certificate. Any attributes which need to be associated with an individual's identity are either encoded into the certificate along with the individuals identifying information, or are stored in a separate database which can be queried. Naturally, though, every transaction, regardless of how little information is actually needed, will require one to reveal one's unforagable identity. This naturally creates the ideal means to easily track any individual as every transaction with any party will expose an individual's globally unique identifier.

Attribute Certificates
The first step in order to obtain even a shred of privacy in a public key system must be to provide some level of disassociation between one's unique identity and some given attribute. This way one can expose that one has some right, such has holding a driver's license, without revealing one's entire identity. Typically these attribute certificates consist of some statement of the attribute and a public key which are concatenated and signed to create some certificate. The holder of the certificate also maintains the private key. Just as with identity certificates, which can be considered a subcase of an attribute certificate, the prover and the verifier engage in a dialog which assures the verifier that the prover does indeed know the private key.

In a scheme using attribute certificates a user is free to only show those attributes he/she wishes to reveal to any given verifier rather than his/her entire identity.

Lack of Privacy
Being able to separate, at least to some degree, the various attributes associated with one individual is essentially the goal of a privacy protecting public key infrastructure. The use of attribute certificates, however, does very to prevent organizations from combining data about the use of attribute certifictes to create a database as comprehensive as could be obtained through the direct use of identity certificates. Since each attribute certificate is unique, it can be used to correlate information between different verifiers in otherwise unrelated transactions. One might also be required to provide certification for multiple attributes thereby allowing different attribute certificates can be correlated with one another. When a sufficient, and not very sizeable, number of attributes have been grouped it becomes possible for one to reidentify an particular individual. Perhaps more dangerously privacy is lost without the individual aware of exactly what information about them is known by whom. One particularly worrisome source of privacy invasion comes from the potential use of online (meaning at least one of the communicating parties maintains a realtime connection with a CA) systems to provide certificate verification and to act as real time clearing houses to prevent replays. With this information, however, the CA has an incredible ability to track and compile detailed information about any of it's users. Even if correlating attributes is made more difficult by preparing separate certificates for any possible combination of attribute requests, the obtaining of attribute certificates still poses a problem as the CA be able to tie any certificate to the individual to which is was originally assigned. This becomes a problem if the certificate authority eventually sees the certificates after they are shown. Such an opportunity is , in fact, very likely to arise if the CA needs to verify the certificates as is required in many schemes to prevent replay and other forms of attack. While the CA may not necessarily conspire with the verifier and obtain all information known by the verifier, the CA still will learn that the given proover had been interacting with a specific verifier, which alone can be a very intrusive bit of knowledge.

Blinding Certificates
Many of the problems of traceability stem from the ability of some party, other than the certificate holder, to tie a given attribute certificate to an identity. The information to do so naturally arises as the certificate needs to be signed by their certificate authority, and, in order to verify the attribute the certificate authority naturally needs to know the identity of then entity requesting the signed certificate. A technique known as blinding makes it possible for a certificate authority to issue signed certificates verifying some attribute of the holder in such a manner that the CA can not, now or later, tie the public key associated with that certificate to the identity of the certificate holder even though the CA does indeed know the identity of the individual to whom the certificate is being issued. Blinding is commonly used to provide this sort of anonymity in electronic cash applications where the CA, in this case typically a bank, needs to know to whom it is issuing the electronic cash so that it can properly deduct it from their

account, but needs to be prevented from determining to whom it issued a particular electronic coin when a merchant attempts redeem its value. The blinding technique involves the receipt of the certificate first constructing a public and private key set for the certificate. The client then ``blinds'' the public key in some manner and sends the blinded key to the CA. The CA then signs the blinded key and returns it to the client who then ``unblinds'' the key resulting in a signed public key without the CA having ever possessed an unblinded copy of the public key. A simplistic blinding scheme, which works with some encryption techniques, involves performing a modulo multiplication of the public key with a large random number. The signing process then signs the altered public key in a manner which is invariant to multiplication so that when the resulting certificate is divided by the same number, the result is the same as if the original, unblinded key had been signed.

Selective Disclosure
When dicussing attribute certificates one typically thinks of the attribute being certified as some piece of text which gets securely bound into the certificate when it is signed by the CA. Proof of the given attribute is given by proving to a verifier that one does indeed possess the private key associated with the public key contained in the public certificate. Unfortunately if multiple attributes are bound together (potentially because in some instances it is necessary to have them associated with one another), the showing of the certificate will result in a verifier being able to read and verify all attributes in the certificate. In such a situation it is not possible for the proover to selectively reveal certain subsets of attributes without having obtained independent certificates containing exactly each subset which is desired. It is, however, possible to allow selective disclosure of individual attributes when the attributes are actually embedded into the certificates private key. In this scheme a key of a sufficiently large key length can encode an arbitrary number of attributes. Rather than showing the certificate with the attributes to the verifier, the conversion between the prover and the verifier is designed such that the prover can prove the existence of the attribute. As before the prover presents a signed certificate of his/her public key to the verifier and then engages in a dialog with the verifier in which the prover attempts to prove not only that he/she possess the private key associated with the certified public key, but also that private key satisfies some formula. The formula which can be tested against can be any boolean formula thereby allowing complex expressions to be evaluated on the attributes. Furthermore the prover has complete knowledge of exactly which boolean expression is being evaluated and the resulting dialog admits no information to the verifier other than the truth of the formula on the private key held by the verifier and associated with the given certified public key. The proover can therefore proove to be older than some age without actually revealing what that age is.

Restrictive Blind Issuing

As described above the creation of public-private key sets which contain attributes encoded into the private key requires the certificate authority to generate the key pair. This, while creating a system which allows individuals to selectively disclose information to other parties, still allows the CA almost complete knowledge about all transactions.

In order for such a system to provide a reasonable degree of privacy to the individual user, the CA must be able to encode attributes into the private key of the user without actually learning the final public key or the final private key. Brands introduces a new technique for doing just this known as restrictive blinding. A restrictive blinding protocol works by being able to divide the information content of the private key into a blinding portion and a blinding invariant portion. This allows the CA to encode any desired attributes into the blinding invariant portion of the key. Unlike the more traditional blinding approach this does, in the end, leave the CA knowing something about the value of the secret key since the CA clearly knows the blinding invariant portion. If, however, the total key length is sufficiently long so as to contain enough randomness in the blinded portion of the key, it will remain computational infeasible for the CA to determine the complete private key. The blinding process, furthermore, can result in complete blinding of the public key so that the CA is not left with partial public keys with which to attempt later correlation of key use.

Limited Show Certificates

The ability to encode attributes directly into the private key of a certified key pair also provides the capability of creating secure and anonymous limited show certificates. A limited show certificate is similar to a bus ticket or a coupon in that it certifies that the holder is entitled to some action, but only for a limited number of times. Limited show certificates are a crucial to provide a degree of privacy in many applications. Without them, individuals would have to accept some ticket identifying them which is then used to query a central database, to prevent replay attacks, to determine their right to some service. Unfortunately this can only work if verifiers are tied into a central clearing house (a task most likely performed by the certificate authority) to prevent provers from showing the same certificate more times than is permitted. In order to preemptively protect against replay, all verifiers must be connected in real time to the clearing house. It would be possible to detect replays in an offline manner, but at that point the transaction would have already been performed. In response, the clearing house could then seek damages from the individual who committed the fraud. This, however, would require the clearing house to maintain some means of mapping an overused certificate back to a real identities to discourage replay attacks. Naturally this destroys any privacy of the system. Once can use the ability to encode information into the secret key of the certificate to encode the identifier of the user of the key thereby associating the key with a individual identity. The showing protocol can then be designed such that the verifier receives enough information to be sure that the prover does indeed hold the private key, but not enough to deduce any practical information about the private key. If, however, the certificate is shown twice, and the challenges given by the verifiers in each case are different, the information available from the transcript of both showings can be used to compute the identity that was encoded into the private key. This allows the showing process to occur offline and allows a clearing house to later merge the transcripts of the certificate showings to detect replays. When such a replay is detected the clearing house is then able to compute the identity of the individual and seek damages. When examining the replay transcripts the clearing house effectively does more than just determine the identity tied to the replayed certificate, but generates what is a essentially a signed confession.

While it may seem odd that one reveal no information about the secret key in the first showing, and yet reveal the identity of the certificate holder on the second try, an intuition can be gained from a simplified example. Consider a number which is the result of a one way function on two inputs, h = f(x,y)^(1/3). Then suppose that each of those inputs is constructed from a one way function of two other inputs such that x = g(a,c) and y =g(b,d). If the verifier is told either x or y and the two numbers needed to compute the other (a and c if y was already given or b and d if x was already given), then the verifier can verify that the given number was computed properly without knowing all the inputs that went into its generation. If a and c and d are chosen to be random numbers and b=a xor i where i is the user's identity, then revealing either a or b will not provide any useful information about the user, but revealing both allows the user's identity to be trivially computed as i=a xor b In this example one can not be certain of finding a fraud as to do so one verifier would need to choose to receive a while the other chooses b while the other chooses. The information, furthermore, cannot be encoded into the private key as the verifier is able to reconstructed the number from the information supplied in the showing protocol. It is possible, however, using more complicated techniques, as developed by Brands, to allow the information to be encoded in the private key and allow the showing protocol to admit only a fixed, and arbitrary, number of showing before the hidden information is revealed.

Unavoidable Information Leakage

The possible public key infrastructures developed by Brands provide schemes to be used in a vast array of imagined situations and insure the only information which is conveyed in the showing of a digital certificate is just that information which the user of the system has consented to have revealed. Unfortunately in a realistic transaction one will be required to disclose some information and will often disclose far more than one realizes even when using what appears to be a privacy protecting form of authentication.

Individuals' Disclosure
In today's culture, the biggest source of information leakage about consumers are consumers themselves. Admittedly a great deal of information garnered about people comes through covert channels, but individuals willingly and freely continue to release vast amounts of information about themselves. Baker, the former chief council for the NSA, noted that ``the biggest threats to our privacy in a digital world come not from what we keep secret but what we reveal willingly [...] Encryption can't protect you from the misuse of data you surrendered willingly.'' Individuals are lead to release information about themselves often through some degree of ignorance. Quite often people are simply not aware how much information they actually reveal when engaging in transaction. This issue, however, will actually be somewhat mitigated with increased use of electronic transactions. Since electronic transactions are conducted through a computer of some form it becomes natural to employee some sort of proxy or agent to conduct the transaction on one's behalf. Indeed this is what happens at some level regardless of the system, but this proxy could eventually be tied to and trusted by the individual engaging in the transaction. This proxy must then be configured with the actual preferences of the individual

forcing a conscious decision about what information can be released when. Brands actually advocates such a system in which each user conducts transactions through a trusted smart card. Release of personal information must be authorized through the smart card which can occur either at the time of the transaction by querying the user, or might be automated through predefined preferences. A similar system for configuring one's web browser to act as such a proxy is being promoted by the W3C called P3P. The larger threat to excessive information disclosure by individuals, however, stems from individuals releasing a limited amount of information about themselves under the assumption that they shall remain anonymous. Unfortunately this assumption is often far from the truth since it is often possible to combine information from multiple sources to uniquely identify individuals from very small fragments of data. This capability has arisen from the ability of computers to store any information they gather for long periods of time. Since storage is not likely to becomes any more expensive, individuals can expect any information they ever release to be permanently stored and possible used to reidentify them at a later date. Unfortunately no public key infrastructure can directly solve this as it will remain a cultural problem that can only be directly resolved by instilling a sense of paranoia into all consumers. It may seem a bit paranoid to be constantly assuming that information gathered by different sources is likely to be merged into a common database, but large unified database efforts are already in use. Recently Publishing and Broadcasting Ltd entered into an effort to create a unified database of information about the entire population of Australia. This database will cross index information from electoral rolls, credit card information, casino records, bank statements, and will even be partnered with Microsoft in order to integrate information from Microsoft's Hotmail and Passport services.

Organizations' Demands for Information

While individuals may often be very willing to freely disclose information about themselves, they often will not go unencouraged by numerous organizations looking to collect information. Naturally the most often discussed motivation for organizations to attempt to gather personal information about consumers is to act as a source of direct marketing information. While many consumers will freely provide personal contact information to an organization in order to receive notification of future products or sales, they often do not consider that the information they provided has substantial value to the collecting organization outside of its own contact and marketing use of it. Such data is commonly traded and sold to other organizations looking to reach new customers. Even organizations which are not directly looking to sell direct marketing information still will often have a legitimate desire to collect personal information in order to provide a higher level of service. By collecting information about customer's purchases a company is able to better present itself to that customer and offer personalized services. For instance amazon.com tracks individuals' purchases and interests in order to offer up web pages customized to show products that might of interest. It is also able to use this collection of information to offer remarkably intelligent recommendations for new products. A privacy protecting scheme, such as Brands',

would block the ability for amazon.com to collect this information. At some level it would seem proper that one's purchases should remain strictly private and confidential, but the majority of users don't find amazon's use of aggregate data to produce recommendations of other products of interest to be intrusive (as opposed to amazon's later use of purchase circles). Consumers will readily trade off the small loss in privacy for the improved benefit of what amounts to a more intelligent store. This same trade off is often made consumers in other environments. Grocery store club cards making tracking customer's buying habits easier for the store, a concession consumers readily make for a combination of customized coupons at the checkout stand or discounts on various products. One might turn back to attempt to build a privacy preserving means of allowing amazon the same benefit without exposing one's identity. Using the tools available Brands' system one could envision a system in which individuals securely identify themselves as some pseudnonomous identity which allows amazon to build up a personal profile of sorts without a tie to a real identity. Unfortunately for real security most of Brands' certification schemes involve certificates and pseudnonomous identities with short life times to prevent sufficient information from being revealed to determine the corresponding real identity. Using longer term certificates would be possible, but the aggregation of information about preferences, habits, and purchased products could eventually be tied back to a specific individual. The above argument can be analogously applied to the case of the supermarket discount card. In the case of amazon, however, preventing the loss of anonymity becomes even more difficult when dealing with issues of payment and shipping. While credit card transaction are difficult to safely make anonymous to both the vendor and the credit card company, many anonymous and secure electronic cash schemes have been developed. Of course one must still provide shipping information. Even an attempt to route shipping information through a third party will cause one's privacy information to be lost if various organization conspire against the consumer.

Problems with Limited Use of Privacy

The main source of anonymity in any system is the ability to disappear in the crowd. Unfortunately, as various data mining techniques have shown, quite often the crowd, once some seemingly innocent information is known, is very small. This is true even when the initial sample set is quite large. Should consumer's choose to accept a world in which private, and thereby almost inevitably potentially identifying information, is released, any chance of privacy for those who desire it may be lost. As the sample set becomes smaller the ability to correlate separate activities of a given individual becomes easier. A privacy protecting system, such as Brands', attempts to counter these problems by allowing an individual to carefully constrain the amount of information they release if they truly wish to remain anonymous. Individuals are also expected to consistently using fresh pseudonymous identities to prevent correlation. Unfortunately privacy systems such as Brands' only protect information which one reveals or does not reveal through an authentication process. Far more information is revealed in the actual transaction and that information cannot be withheld or protected using any form of cryptographic techniques.

A potentially very dangerous source of identifying information comes from the underlying network protocols which parties use to communicate. IP addresses can allow different transaction to be correlated quite easily. Even if dynamic IP addresses are used, the IP addresses for a given user are still only drawn from a small pool. One could naturally imagine a system which employees various techniques to provide anonymous connections, but this is hardly likely to occur. With federal legislation pushing to ensure systems contain provisions to aid law enforcement it is likely that traceability will remain in the system at some level. With such a capability in the system it seems unlikely that businesses won't eventually be able to exploit it. Consider the current phone system which, while it contains caller-id and caller-id blocking, still unconditionally transmits the caller's number to 800 lines. Tracking individuals through network addresses is a relatively conventional channel which, while unlikely, could be guarded against. Information revealed through more covert channels, however, is much harder, if even possible, to protect. When conducting a transaction one will necessarily release information about what was purchased, exchanged, or traded as well as information about where and when the transaction occurred. When conducted online, information about where the transaction occurred is roughly equivalent to network addresses and, as mentioned, may be possible to disguise. With in person transactions, however, one can identify the physical location of the transaction. Suppose a company has, through typical information gathering techniques, developed a rather large database of consumer preferences and buying choices, such as what amazon has done with book and CD purchases. This database could then serve as a means of correlating related purchases based on the buying habits of others. Naturally this would hardly be exact, but given the success of amazons recommendations already, it would probably be fairly accurate. When also combined with information about times of sales, locations of purchases one is releasing quite a bit of information which can never be hidden by sophisticated cryptography techniques as this information is necessarily released in order to carry out the transaction.

Coping with Information Leakage

While some of the sources of information loss through a privacy protecting public key infrastructure that have been mentioned can be patched by the addition of other anonymizing schemes, the rest lack such a clear cut technical solution. Of course one could, if sufficiently motivated, remain anonymous by engaging in games of subterfuge to masks one's true activities in fake transactions. Short of clandestine activities, however, this is not at all practical, especially for everyday transactions. In many ways the most desirable solution would involve public demands to protect their own privacy. This would most likely lead to the greatest flexibility for the consumers by having a more adaptive system. While there has been some evidence of consumers reacting to companies attempts to garner more information, such as the failure of Divx, in general there is little reason to suspect that this will happen in general. As already noted, in many current situations consumers are all too ready to release information about themselves in exchange for some service. Perhaps the difference between Divx and supermarket club cards arises from the obvious intrusiveness of the Divx collection scheme, involving a dialup modem and explicit billing statements, while a supermarket club card is always offered up in exchange for something and

often does not even contain one's name. Consumer demand will probably lead to at least a greater perceived degree of privacy within certain fields, such as medical information, but this perceived privacy may be little more than just a perception. This leads to the, perhaps unfortunate, conclusion that, even with cryptographic techniques designed to protect one's privacy, society and the market will not produce a system which provides any substantive degree of privacy for most transactions. If privacy is to be protected, therefore, enacting further legislation will likely be necessary.

The ease and efficiency of committing fraud in a world dominated by electronic transactions is creating a strong demand for strong cryptographic based security and authentication. The natural tendency when dealing with vast numbers of entities interacting in mostly random connections is to employee a public-private key system to provide identification, and thus authentication. These public key systems, unfortunately, provide a far too powerful means of authentication and in doing so could quickly become the key to create an almost totally inclusive commercial surveillance network. This pervasive tracking system is made possible by the use of a unique public key one typically uses as an identifier. It is therefore possible to build a system to better protect one's privacy by using unique, freshly minted, identifiers for different applications. In order to provide sufficient authentication, however, these identifiers need to have needed rights, privileges, and other attributes securely encoded into them and be signed by an authority which has verified these attributes. Brands' has shown how such information can be encoded into the secret key of a certified key pair and how the encoded information can be selectively revealed. Such a system also requires a means of preventing the issuer of the certificate from being able to trace its use, a problem which, even with the use of encoded attributes, can be solved by various blinding techniques which prevents a CA from knowing the true value of the signed public key. In a properly implemented system using these techniques it is possible to show that a certificate holder can release no information about themselves when showing their certificate than they authorize, and that the resulting log from the showing cannot be traced back to them through the certificate authority. Without the use of a privacy protecting scheme there is clearly little hope for individuals to even attempt to maintain a small degree of privacy, but its use does not immediately result in a system which prevents individuals lives from being tracked through their electronic transactions. This release of information comes both from the willing release of information by users and from the potential for creative data inference techniques to infer identities from otherwise anonymous transactions. Consumer's willingness to expose information about themselves in exchange for nearly any benefit, and sometimes for no reason at all will undoubtedly big the largest source of information, at least initially. One might consider this a minor problem since the information is being exposed willingly, but still remains somewhat unsatisfactory as most people do wish to retain a degree of privacy, but often simply don't understand the effectiveness with which personal information can be correlated across databases. Even for those individuals who are sufficiently concerned about privacy to carefully guard what information they release when

showing a certificate, there remains many covert channels which can still allow a automated tracking system to attempt to correlate one's various seemingly independent transactions to regain the ability to monitor and collect information about people's habits and preferences. A privacy protecting public key infrastructure is clearly necessary in order to allow individuals to try to maintain the level of privacy that existed before the transition to electronic transaction. These privacy protecting techniques, however, are not sufficient to truly provide true privacy, and when used alone may lead to a false sense of privacy. In order to prevent creative information gathering and data mining techniques from circumventing the attempts to hide information, it will therefore be necessary to rely on legislature rather than the purely technical structure prevent by Brands.