Tuesday, November 18, 2008

Arrogance of obfuscation.


"a new statistical method that can establish the presence of a single
individual's genetic signature in a sample containing DNA from
hundreds of different people."

"It was previously assumed that aggregating the data of hundreds or
even thousands of people — essentially giving the overall genetic
composition of the group as a whole — would make it impossible to
identify any one person in it."

"Hypothetical scenarios aside, it is highly unlikely that any person
has ever actually been picked out of an aggregate database, and not
just because the mathematics of the new method are so complex."

I sometimes worry more about DNA being 'stored' and derivitive
contextual information and the associated information security thereof
rather than any credit card data etc. This is prompted by a great new
service https://www.23andme.com/

In the future when a genomic representation of me can be anywhere and
stored in multiple locations, how do I asert that I am me.

The biometric system *must* be ensured to be localised and not
connected to any other networks... erm a bit like SCADA systems I hear
you say? Will we need an X point check system.. something I have,
know, macro am, micro am, quantum am, how Y is done/performed, how Z
is done/performed, how K is done/performed..

Fun: I spoke before about nanobots, I now present "nano-identbots"
which are like mayflies i.e. they die once they identify a host and
have a localised PGP relationship with the system with which identify
is supposed to be represented to. They generate keys on birth and
immediately go out to identify the subject etc...


t.eerkes said...


I want to know more. I've started a genetic testing company that is handling our consumer data differently than 23andMe and Navigenics (not sharing it with other groups, even if "anonymized" and aggregated), but I'm always interested in potential improvement.
We have to convey our information to our customers over a network somehow. They login to see their results at their account.
How could a SCADA system make this better?
Educate me please.
-Tera Eerkes

Deda said...

Can you expand a bit more on how, and in what precise circumstances, the data can be disaggregated to show individual's DNA.

I am not familiar with how aggregated data is presented, but in a purely statistical case it would not be possible to disaggregate anonymised samples.

In a real laboratory situation with specimen samples I can see the advantages of this refinement.

In aggregate presentations I don't, on the face of it, see the problem.

Donal said...
This comment has been removed by the author.
Donal said...

It's about de-identification. It's feasible once you have something to compare.

Here is some more info...

Privacy-Preservng Data Mining: Models and Algorithms (Advances in Database
# Hardcover: 514 pages
# Publisher: Springer; 1st edition (July 7, 2008)
# Language: English # ISBN-10: 0387709916

K-Anonymity and De-identification:

K-anonymity (k-Anonymity: a model for protecting privacy)

Also from a colleague on the SecurityMetrics mailing list:

"I believe that readers of this mailing list might also be interested
in the perturbation approach (originally described in
and covered in chapter 6 in the book). This method is suitable for
survey-like data collection: you mask data by adding noise, but some
statistical properties are maintained despite the noise.
Unfortunately, research shows that the perturbation methods suggested
so far are susceptible to various attacks (chapter 15)."

Deda said...

That's a lot of material. Can you give me a 4 sentence guide/summary to start me out on this safari?