I am in the early design stages of a system (QuestGuide) to help researchers gather information (arbitrarily complex), analyze it, annotate it, visualize it, and, ultimately, share it. There are enough problems to solve that this feels like the software equivalent of the Seven Summits, but I'm retired and this project will keep me off the streets for a few years :-).
After watching a 2-part interview with Eben Moglen at Slashdot, I realized that I wasn't designing for / thinking about anonymity. Although my imagined target demographic is academic researchers (and maybe people shopping for a refrigerator), it is entirely possible that this system will be used by people who are gathering and correlating information that, if-and-when they share it, could get them imprisoned or worse.
For the purposes of being able to backtrack to the original source (e.g. for citations, source checking, etc.), I am collecting date-time, IP/URL of original sources, and all manner of stuff. This is being done as an aid for the researcher, but it could also be used as a source of information to work backward to the person who published their quest anonymously.
I have identified the following as needing removal / scrubbing / obfuscating:
- UUIDs used need to be random and not related to MAC address and time.
- All metadata with time-date, URL, IP, etc. must be removed.
- The transmission of the quest must be encrypted with the receiver's public key, but not signed (or otherwise associated) with the sender.
Question:
How many more types of information that might be in either the data or in the supporting metadata can you think of that I need to scrub when someone wants to be really anonymous?
If there's a list somewhere of things that have tripped people up in the past (anecdotal or real), that would be educational.
Update / Clarification:
QuestGuide is 100% FOSS. I'm currently trying to understand how GPLv3 Affero plays in a system composed of many FOSS components: MariaDB, Django, and numerous other bits of FOSS released under various licenses. It may be necessary to fallback to GPLv2 Affero.
Although I'm sure there can be revenue generating operations based on it, I see them falling more along the lines of Red Hat selling support for a completely FOSS-based product. I'm retired and have no particular need or desire to be part of any of those operations.
The system itself is designed to run locally on the user's machine—DB, proxy server, UI, all of it. The "central site" is intended purely for the convenience of users who need multi-location access to their quests, who wish to share their results with the world, or who wish to download preconfigured components and entity definitions. Once QuestGuide is installed, the user does not need to have any further contact with any sort of centralized server.