Doug Cutting from Cloudera gave our closing keynote on day 1.
Hadoop started a revolution. It is an open source platform that really harnesses data.
Education! The better data we have the better our education system can be. Education will be much better if we can have a custom experience for each student – these kinds of observations are fed by data. If we’re going to make this happen we’re going to need to study data about these students. The more data you amass the better predictions you can make. On the flip side it’s scary to collect data about kids. inBloom was an effort to collect this data, but they ended up shutting down because of the fear. There is a lot of benefit to be had, and it would be sad if we didn’t enable this type of application.
Heathcare is another area this becomes handy. Medical research benefits greatly from data. The better data we collect the better we can care for people. Once again this is an area that people have fears about shared data.
Climate is the last example. Climate is changing and in order to understand how we can effect it data plays a huge role. Data about our energy consumption is part of this. Some people say that certain data is not useful to collect – but this isn’t a good approach. We want to collect all the data and then evaluate it. You don’t know in advance what value the data you collect will have.
How do we collect this data if we don’t have trust? How do we build that trust? There are some technology solutions like encrypting data and anonymizing data sets – these methods are imperfect though. In fact if you anonymize the data too much it muddies it and makes it less useful. This isn’t just a technical problem – instead we need to build trust.
The first way to build trust is to be transparent. If you’re collecting data you need to let people know you’re collecting it and what you’re going to use it for.
The next key element is establishing best practices around data. These are the technical elements like encryption and anonymization. This also includes language to agree/disagree to ways our data is shared.
Next we need to draw clear lines that people can’t step over – for example we can’t show someone’s home address without their express permission. Which gives us a basis for the last element.
Enforcement and oversight is needed. We need someone who is checking up on these organizations that are collecting data. Regulation can sound scary to people, but we have come to trust it in many markets already.
This is not just a local issue – it needs to be a global effort. As professionals in this industry we need to think about how to build this trust and get to the point where data can be stored and shared.