ATO2014: Pax Data

Doug Cutting

Doug Cutting from Cloudera gave our closing keynote on day 1.

Hadoop started a revolution. It is an open source platform that really harnesses data.

Doug CuttingIn movies the people who harness the data are always the bad guys – so how do we save ourselves from becoming the bad guy? What good is coming out of good data?

Education! The better data we have the better our education system can be. Education will be much better if we can have a custom experience for each student – these kinds of observations are fed by data. If we’re going to make this happen we’re going to need to study data about these students. The more data you amass the better predictions you can make. On the flip side it’s scary to collect data about kids. inBloom was an effort to collect this data, but they ended up shutting down because of the fear. There is a lot of benefit to be had, and it would be sad if we didn’t enable this type of application.

Heathcare is another area this becomes handy. Medical research benefits greatly from data. The better data we collect the better we can care for people. Once again this is an area that people have fears about shared data.

Climate is the last example. Climate is changing and in order to understand how we can effect it data plays a huge role. Data about our energy consumption is part of this. Some people say that certain data is not useful to collect – but this isn’t a good approach. We want to collect all the data and then evaluate it. You don’t know in advance what value the data you collect will have.

How do we collect this data if we don’t have trust? How do we build that trust? There are some technology solutions like encrypting data and anonymizing data sets – these methods are imperfect though. In fact if you anonymize the data too much it muddies it and makes it less useful. This isn’t just a technical problem – instead we need to build trust.

The first way to build trust is to be transparent. If you’re collecting data you need to let people know you’re collecting it and what you’re going to use it for.

The next key element is establishing best practices around data. These are the technical elements like encryption and anonymization. This also includes language to agree/disagree to ways our data is shared.

Next we need to draw clear lines that people can’t step over – for example we can’t show someone’s home address without their express permission. Which gives us a basis for the last element.

Enforcement and oversight is needed. We need someone who is checking up on these organizations that are collecting data. Regulation can sound scary to people, but we have come to trust it in many markets already.

This is not just a local issue – it needs to be a global effort. As professionals in this industry we need to think about how to build this trust and get to the point where data can be stored and shared.

ATO2014: Saving the world: Open source and open science

Academese

Marcus Hanwell, another fellow opensource.com moderator, was the last session of the day with his talk about saving the world with open source and open science!

In science there was a strong ethic of ‘trust, but verify’ – and if you couldn’t reproduce the efforts of the scientist then the theory was dismissed. The ‘but verify’ part of that has kind of gone away in recent years. In science the primary measure of whether you were successful or not was to publish – citations to your work are key. Then when you do publish your content is locked down in costly journals instead of available in the public domain. So if you pay large amounts of money you can have access to the article – but not the data necessarily. Data is kept locked up more and more to keep the findings with the published person so that they get all the credit.

AcademeseJust like in the talk earlier today on what Academia can learn from open source Marcus showed us an article from the 17th century next to an article today – the method of publishing has not changed. Plus these articles are full of academese which is obtuse.

All of this makes it very important to show what’s in the black box. We need to show what’s going on in these experiments at all levels. This includes sharing your steps to run calculations – the source code used to get this info should be written in open source because now the tools used are basically notebooks with no version control system. We have to stop putting scientists on these pedestals and start to hold them accountable.

A great quote that Marcus shared from an Economist article was: “Scientific research has changed the world. Now it needs to change itself.” Another was “Publishing research without data is simply advertising, not science.” Scientists need to think more about licenses – they give their rights away to journals because they don’t pay enough attention to the licenses that are out there like the creative commons.

What is open? How do we change these behaviors? Open means that everyone has the same access. Certain basic rights are granted to all – the ability to share, modify and use the information. There is a fear out there that sharing our data means that we could prove that we’re wrong or stupid. We need to change this culture. We need more open data (shared in open formats) and using open source software, more open standards and open access.

We need to push boundaries – most of what is published in publicly funded so it should be open and available to all of us! We do need some software to share this data – that’s where we come in and where open source comes in. In the end the lesson is that we need to get scientists to show all their data and not reward academics solely for their citations because this model is rubbish. We need to find a new way to reward scientists though – a more open model.

ATO2014: Open Source in Healthcare

Data center

Luis Ibanez, my fellow opensource.com moderator, was up next to talk to us about Open Source in Healthcare. Luis’s story was so interesting – I hope I caught all the numbers he shared – but the moral of the story is that hospitals could save insane amounts of money if they switched to an open system.

There are 7 billion people on the planet making $72 trillion a year. In the US we have 320 million people and that’s 5% of the global population, but we make 22% of the economic production on the planet – what do we do with that money? 24% of that money is spent on healthcare ($3.8 trillion) – not just the government, this is the spending of the entire country. This is more than they’re spending in Germany and France. However we’re ranked 38th in healthcare quality in the world. France is #1 however and they spend only 12% of their money on healthcare. This is an example of how spending more money on the problem is not helping.

Is there something that geekdom can do to set this straight? Luis says ‘yes!’

So, why do we go to the doctor? To get information. We want the doctor to tell us if we have a problem they can fix and know how to fix it. Information connects directly to our geekdom.

Data centerToday if you go to a hospital our data will be stored in paper and will go in to a “data center” (a filing cabinet). In 2010 84% of hospitals were keeping paper records versus using software. The healthcare industry is the only industry that needs to be paid to get them to switch to using software to store this information – $20 billion spent between 2010 and 2013 to get us to 60% of hospitals storing information electronically. This is one of the reasons we’re spending so much on healthcare right now.

The problem here (and this is Luis’s rant) is that the hospitals have to pay for this software in the first place. And you’re not allowed to share anything about the system. You can’t take screenshots, you can’t talk about the features, you are completely locked down. This system will run your hospital (a combination of hotel, restaurant, and medical facility) – they have been called the most complex institution of the century. These systems for a 400 bed hospital cost $100 million – and they have to buy these systems with little or no knowledge of how they work because of the security measures around seeing/sharing information about the software. This is against the idea of a free market because of the NDA you have to sign to see the software and use the software.

An example that Luis gave us was Wake Forest hospital which ended up being in the red by $56 million. All because they bought software for $100 million – leading to them having to fire their people, stop making retirement payments and other cuts. [For me this sounds a lot like what libraries are doing – paying salaries for an ILS instead of putting money toward people and services instead and saving money on the ILS]

Another problem in the medical industry is that 41% (less than 1/2) have the capability to send secure messages to patients. This is not a technology problem – this is a cultural problem in the medical world. Other industries have solved this technology problem already.

So, why do we care about all of this? There are 5,723 hospitals in the US, 211 of them are federally run (typically military hospitals), 413 are psychiatric, 2,894 are non profits and the others are private or state run. That totals nearly 1 million beds and $830 billion a year is spent in hospitals. The software that these hospitals are buying costs about $250 billion.

The federal hospitals are running a system that was released in to the public domain called VistA. OSEHRA was founded to protect this software. This software those is written in MUMPS. This is the same language that the $100 million software is written in! Except there is a huge difference in price.

If hospitals switched they’d spend $0. To keep this software running/updated we’d need about 20 thousand developers – but if you divide that by the hospitals that’s 4 developers per hospital. These developers don’t need to be programmers though – they could be doctors, nurses pharmacists – because MUMPS is so easy to learn.

ATO2014: Open Source & the Internet of Things

Internet of Things

Erica Stanley was up next to talk to us about Open Source and the Internet of Things (IoT).

The Internet of Things (Connected Devices) is the connection of things and people over a network. Why the Internet of Things? Why now? Because technology has made it a possibility. Why open source Internet of Things? To ensure that innovation continues.

Some of the applications we have for connected devices are: Health/Fitness, Home/Environment and Identity. Having devices that are always connected to us allow us to do things like monitor our health so that we can see when something might be wrong before we feel symptoms. Some devices like this are vision (Google glass) related, smart watches, wearable cameras, wristbands (fitbit), smart home devices (some of which are on my wishlist), connected cars (cars that see that the car in front of you has stopped versus slowed down) and smart cities like Raleigh.

Internet of ThingsThere are many networking technologies these devices can use to stay connected, but bluetooth seems to be the default that is being used. There is a central device and a peripheral device – the central device wants the data that the peripheral device has. They use bluetooth to communicate with each other – the central device requesting info from the peripheral.

Cloud commuting, another important technology, has been one of the foundations for the Internet of Things – this is how we store all the info we’re passing back and forth. As we get more ability for our devices to learn we get more devices that can act on the data they’re gathering (there is a fitness app/device that will encourage you to get up and move once in a while for example).

Yet another technology that’s important is augmented reality showing us results of data in our day to day (Google glass showing you the directions to where you’re walking).

One challenge facing us is the fact that we have devices living in silos. So we have Google devices and Samsung devices – but they don’t talk to each other. We need to move towards a platform for connected devices. This will allow us to have a user controlled and created environment – where the devices I want to talk to each other can and the people I want to see the data can see the data. This allows us to personalize our environment but also secure our environment.

Speaking of security, there are some guidelines for developers that we can all follow to be sure to create secure devices. When building these devices we want to think about security from the very beginning. We need to understand our vulnerabilities, build security from the ground up. This starts with the OS so that we’re building an end-to-end solution. Obviously you want to be proactive in testing your apps and use updated APIs/frameworks/protocols.

Some tools you can use to get started as far as hardware: Arduino Compatible devices (Lilypad, Adafruit Flora and Gemma), Tessel, and Metawear. Software tools include: Spark Core, IoT Toolkit, Open.Sen.se, Cloud Foundry, Eclipse IoT Tools, and Huginn (which is kind of an open source IFTTT).

One thing to keep in mind when designing for IoT is that we no longer own the foreground – we might not have a screen or a full sized screen. We also have to think about integration with other devices and discoverablity of functionality if we don’t have a screen (gesture based device). Finally we have to keep in mind low energy and computing power. On the product side you want to think about the form factor – you don’t want a device that no one will want to wear. This also means creating personalizable devices

Remember that there is no ‘one size fits all’ – your device doesn’t have to be the same as others that are out there. Try to not get in the way of your user – build for people not technology! If we don’t try to take all of the user’s attention with the wearable then we’ll get more users.

ATO2014: How Raleigh Became an Open Source City

Open Raleigh

Next up was Jason Hibbets and Gail Roper who gave a talk about the open source initiative in Raleigh.

Gail started by saying ‘no one told us we had to be more open’. Instead there were signs that showed that this was a good way to go. In 2010 Forbes labeled Raleigh one of the most wired cities in the country, but what they really want is to be the most connected city in the country.

Raleigh has 3 initiatives open source, open data, and open access – the city wants to get gigabit internet connections to every household. So far they have a contract with AT&T and they are working with Google to see if Raleigh will become a Google fiber city.

The timeline leading up to this though required a lot of education of the community about what open meant. It didn’t mean that before this they were hiding things from the community. Instead they had to teach people about open source and open access. There were common stereotypes that the government had about open source – the image of a developer in his basement being among them.

Why did they do this? Why do they want to be an open city? Because of SMAC (Social, Mobile, Analytics, Cloud). Today’s citizens expect that anywhere on any device they should be able to connect to the web. Government organizations like Raleigh’s will have 100x the data to manage. So providing a government that is collaborative and connected to the community becomes a necessity not an option.

“Empowerment of individuals is a key part of what makes open source work, since in the end, innovations tend to come from small groups, not from large, structured efforts.” -Tim O’Reilly

Next up was Jason Hibbets who is the team lead on opensource.com by day and by night he supports the open Raleigh project. Jason shared with us how he helped make the open Raleigh vision a reality. He is not a coder, but he is a community manager. Government to him is about more than putting taxes in and getting out services – it’s about us – the members of the community.

Jason discovered CityCamp – a government unconference that brings together local citizens to build stronger communities where they live. These camps have allowed for people to come together to share their idea openly. Along the way the organizers of this local CityCamp became members of Code for America. Using many online tools they have made it easy to communicate with their local brigade and with others around the state. There is also a meetup group if you’re in the area. If you’re not local you can join a brigade in your area or start your own!

Jason has shared his story in his book The foundation for an open city.

ATO2014: What Academia Can Learn from Open Source

Git process

Arfon Smith from Github was up to talk to us about Academia and open source.

Arfon started with an example of a shared research proposal. So you create a document and then you edit the filename with each iteration because word processing applications is not good at tracking changes and allowing collaboration. Git though is meant for this very thing. So he showed us a book example on Github where the collaborators worked together on a document.

In open source there is this ubiquitous culture of reuse. Academia doesn’t do this – but why not? The problem is the publishing requirement in academia. The first problem is that ‘Novel’ results are preferred. You’re incentivized to publish new things to move ahead. The second problem is that the value of your citation is more powerful than the number of people you’ve worked with. And thirdly, and more generally, the format sucks. Even if it’s an electronic document it’s still hard to collaborate on it (see the document example above). This is state of the art technology … for the late 17th century. (Reinventing Discovery).

So, what do open source collaborations do well? There is a difference sometimes between open source and open source collaborations, this is an important distinction. Open source is the right to modify – it’s not the right to contribute back. An open source collaborations are highly collaborative development processes that allow anyone to contribute if they show an interest. This brings us back to the ubiquitous culture of reuse. These collaborations also expose the process by which they work together – unlike the current black box of research in academia.

Git processHow do we get 4000 people to work together then? Using git and Github specifically you can fork the code from an existing project and work on it without breaking other people’s work and then when you want to contribute it back you submit a pull request to the project. The beauty of this is ‘code first, permission later’ and every time this process happens the community learns.

The goal of a contribution of Github is to get it merged in to the product. Not all open source projects are receptive to these pull requests though, so those are not the collaborative types of projects.

Fernando Perez: “open source is .. reproducible by necessity.” If you don’t collaborate then these projects wouldn’t move forward – so they need to be collaborative. The difference in academia is that you have to work alone to and in a closed fashion to move ahead and get recognition.

Open can mean within your team or institution – it doesn’t have to be worldwide like in open source. But making your content electronic and available (which does not me a word doc or email) makes working together easier. Academia can learn from open source – more importantly academia must learn from open source to move forward.

All the above seems kind of negative, but Arfon did show us a lot of examples where people are sharing in academia – we just need to get this to be more widespread. Where might more significant change happen? The most obvious place to look is where communities form – like around a shared challenge – or around shared data. Science and big data are where we’re going to see this more hopefully.

There are challenges still though – so how do we make sharing the norm? The main problem is that academic reward ‘credit’ – so articles written by you solely. Tools like Astropy is hugely successful on github, but the authors had to write a paper about it to get credit. The other issue is trust – academics are reluctant to use other people’s stuff because we don’t know if their work is of value. In open source we have solved this problem already – if the package was downloading thousands of times it’s probably reliable. There are also tools like codeclimate that give your code a grade.

In short the barriers are cultural not technical!

ATO2014: Using Bootstrap to create a common UI across products

PatternFly

Robb Hamilton and Greg Sheremeta from Red Hat spoke in this session about Bootstrap.

First up was Robb to talk about the problem. The problem that they had at Red Hat was that they had a bunch of products that all had their own different UI. They decided that as you went from product to product there should be a common UI. PatternFly was the initiative to make that happen.

Bootstrap was the framework they chose for this solution. Bootstrap is a front end framework for apps and websites. It’s comprised of HTML, CSS, JavaScript and an icon font (for resolution independent icons). Of course Bootstrap is open source and it’s the most popular project on Github. Bootstrap is mobile-first and responsive – design for the smallest screen first and then as the screen gets bigger you can adjust. Bootstrap has a lot of components like a grid, drop down menus, fonts, and form elements. So the answer to ‘Why Bootstrap’ seems obvious now. But one reason that Red Hat chose it was that most everyone was already using it in their products.

PatternFly is basically Bootstrap + extra goodness.

Up next was Gregg to talk about using PatternFly on his project – oVirt. First when you have to work with multiple groups/products you need good communication. The UI team was very easy to reach out to, answering questions in IRC immediately and providing good documentation. One major challenge that Gregg ran in to was having to write the application in a server-side language and then get it to translate to the web languages that PatternFly was using.

Gregg’s favorite quote: “All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections” – David Wheeler. So he needed to come up with a layer of indirection to get from his language to bootstrap. He Googled his problem though and found a library that would work for him.

ATO2014: Modern Applications & Data

All Things Open

Dwight Merriman from MongoDB was up next to talk to us about modern applications and data.

We’re not building the same things that we were before – we’re building a whole new class of applications that just didn’t exist before. When creating an app these days you might use pieces from 12 other applications, if you had to do this with a closed source project this would be very difficult. Open source makes the modern applications possible – otherwise you have 12 other things to go buy to make your application work.

We’re in the midst of the biggest technology change in the data layer in 25 years. We talk about big data and this is all part of it. One of the differences is the shape of the data. It’s not all tabular data anymore. The new tools we’re creating today are very good at handling these new shapes. Saying ‘unstructured data’ is inaccurate – it’s dynamic data – hence the word ‘shape’.

Speed is another aspect of this. Everything is real-time now – you don’t want to wait overnight for you report anymore. As developers as we build systems we need to start with a real-time mentality. While this sound logical – it’s actually a big change in the way we were taught which was to do things in batches. These days, computers are a lot faster so if you can do it (real-time) it’s a lot better.

We also need to think about our approach to writing code these days – this has changed a lot from how we were taught years ago. It’s not just about writing the perfect spec anymore, it’s a lot more collaboration with the customer. Iteration is necessary now – look at how Facebook changes a tiny bit every day.

Dwight then shared with us some real world examples from John Deer, Bosch and Edeva. Edeva is doing some interesting things with traffic data. They have built a technology that will see your speed when you’re driving over this one bridge in Sweden, if you’re going over the speed limit it will create speed bumps specifically for you. That’s just one say they’re putting their data to use in a real life scenario.

“There’s new stuff to do in all domains – in all fields – and we have the tools to do them now.”

ATO2014: Open Source – The Key Component of Modern Applications

All Things Open

Jeffrey Hammond from Forrester Research started this morning with a talk about Open Source – The Key Component of Modern Applications. Jeffrey wants to talk to us about why open source matters. It’s the golden age to be a developer. If you have people who work for you who are developers you need to understand what’s going on in our space right now. The industry is changing drastically.

When you started a software company years ago it would cost $5 to $10 million. Today software innovation cost about 90% less than it used to. This is because of a variety of things including: elastic infrastructure, services that we can call upon, managed APIs, open source software, and a focus on measurable feedback. Open source is one of the key parts of this. It is one of the driving forces of modern application development. In 2014 4 out of 5 developers use or have used open source software to develop or deploy their software.

The traits of modern applications show why we expect to see more and more open source software everywhere. One of those traits is the API. Another is asynchronous communication – a lot of the traditional frameworks that developers are used to using are not conducive to this so we’re seeing new frameworks and these are open source. We’re seeing less and less comparison of open source versus proprietary and more open source compared to open source.

Jeff showed us the Netflix’s engagement platform and how every part of their system is built on open source source. Most of the popular tools out there have this same architecture built on open source.

This development is being driven by open source communities. What Jess call collaborative collectives. Those of us looking to hire developers need to restructure to use the power of these collectives.

When asked if they write code on their own time 70% of developers say they do. That desire to write code on your own time is built on a variety of motives, all those motives represent intrinsic motivation – it makes them feel good. For those developers a little over 1 in 4 contribute to open source projects on their own time. So, if you’re looking to hire productive developers Jeff says there is a direct correlation between those who participate in open source to those who are amazing and productive programmers.

I’d add here that we need to educate the next generation in this model better so that they can get jobs when they graduate.

We are in a generational technology shift – web-based applications are very different from the systems that have come before them. The elasticity of open source licenses make them the perfect fit for these new modern architectures and comes naturally to most developers. Open source projects are driving the formation of groups of people who know how to work collaboratively successfully.