Talk 1 Embracing Cloud Deployment for Big Data and DevOps
Steven Woodward, Technology Incubation Lab, CTO, AstraZenica
Background: At the time of writing, AstraZenica is the 7th largest pharmaceutical company in the world. Founded in 1999, the London-Swedish multinational has a range of products which address a wide range of illnesses. The research and development of these products alone generates vast volumes of data, which includes a significant percentage of unstructured and even multimedia (e.g. video) content.
Steven describes the challenge of indexing (i.e. making searchable) 200 million+ unstructured documents; amongst disconnected silos of content.
This takes place in AWS.
– Enterprise search
– Serving business applications
– Search for chemical structures after turning graphical description of structure into something searchable
– Search for documents which relate (chemically, for example) to documents containing a particular compound
– People search based on skills e.g. genomics, java
The first four items above have relied on increasing levels of sophistication to be introduced over a number of years.
To support an exponential growth in data volume, the organisation needed to move to using containers:
– AWS + docker
– Elastic cloud provided the required scalability and prove affordable
The company also creates many thousands of videos, which (for example) result from patient interviews. Infrastructure now supports in-video search, which can search for video content as well as video metadata. The video is transcribed automatically. Information is them categorised and made available to search engine. Search can also jump directly to the point where a particular term or expression is used.
DevOps: Code changes can be made available to staging environment within 3 minutes. [From my experience of the Pharma industry, this quite an impressive achievement which demonstrates an agility that few comparable corporations can boast.]
Future: Wearables, beacons (Eddystone!), temperature sensors. Will all be put to use in patient environments whilst paying attention to quality of life. Retention (or otherwise) of QoL can now be measured objectively using this tech. However, it will also be generating vastly greater volumes of data.
Artificial Intelligence: One envisaged application of AI is to provide (automated) alerts when something anomalous takes place with patient.
Predictive Modelling: Machine learning, too, will play an increasingly important role.
Great insight into the current activities and thinking of a leading biotech research company. Would love to have heard more detail about AI and machine learning progress and plans.
Talk 2 BI and Analytics in WWF
Samwell Harper, Technology and Innovation Manager, WWF International
WWF collects a lot of species data. WWF wants to use this data to inform actions and policies, which involves addressing some 6500+ employees.
WWF is the world’s largest conservation organisation. Organisations located all over the world. It’s a federated network, which means that the separate organisations collect data in their own ways. The mission is the common theme, but how each one operates is a local decision.
Global Insight team responsible to consolidating data from everyone else. “Converts” local data to make it globally accessible.
WWF is only now beginning to recognise data as the new currency. Data to-date has not been valued as much as it should have been.
WWF is now combining pictures with data, thus reflecting the increasing importance of the data.
Data is highly diverse in terms of both structure and content, which makes it difficult to apply data warehouse or machine learning tech to.
Samwell then says that WWF’s thinking on how to process the data is still immature. He therefore won’t elaborate further. [Shame!]
Instead he will focus on WWF’s user community, which is extremely diverse. This means one kind of report won’t cut it.
Report Examples: See slides. Includes financial, funding as well as conservation data.
Questioner asked which tech is involved. Answer: Microsoft stack SQL Server, Azur, Tableau Analytics. However, no further details are provided. [Again: a bit of a shame.]
Interesting, as far as conservation goes. However, virtually no information was revealed regarding BI, which was in the title of the presentation. The technologies involved were only mentioned briefly by name (after prompting), which my my perspective was a bit of a lost opportunity. Despite this, keep up the great work WWF!
I’m going to take a look at the exhibition and then to switch tracks for a bit. Over to Enterprise Mobility…
Talk 3 How Effective is Your Enterprise Strategy?
Anj Latif, Head of IT, British Pearl
Trend: Rapidly increasing numbers of employees out of the office. Why? Companies want employees to be more productive. [A bit obvious, no?]
Companies are embracing mobile devices. [I think I know where this is going…]
Enterprise mobility is going to happen. [If there’s somebody in the audience that doesn’t know this already, they’re at the wrong conference.]
Calor Gas (in collaboration with RNF) has been going enterprising mobile for years now. Enterprise mobile may be new to some, but it certainly isn’t “going to happen in the future”; it’s been happening for some time. Still, perhaps it’s news for the speaker. His enthusiasm, to be sure, is spot on.
After that unsuccessful change of track, back to IoT and Analytics…
Cloud vs Edge – Centralised, Distributed and Hybrid Computing
Renaud Di Francesco, PhD, Director Europe Technology Standards Office, Sony Europe Ltd
Renaud gives us a brief history recap and observes that the argument between centralised and decentralised has been going back and forth for decades and will continue to do so. The paradigms can be compared in the contexts of storage, architecture, processing and networking:
Points out the suffering we experience when when data is centralised. Many scenarios require the infamous Excel export/import cycle in order to get a handle on the data. He describes this [not unjustifiably IMO] as a feudal setup.
Following IBM PC revolution, we’re back to centralised in the form of the cloud. Star-shaped topology and massive energy consumption. 30 nuclear full-time power plants. [Someone’s thinking about the environment – first mention other than the WWF guy. Shame on us!]
Renaud draws our attention to emerging standards, which are designed to support a new world, which will see centralised systems yielding to the edge i.e. mobile:
The Mobile Edge Computing (MEC) standard, for example, recognises that we are all essentially wailing around with supercomputers in out hands (at least by the standards of a few years ago). Consequently there are potentially many problems which would benefit from performing processing tasks closer to the cellular customer; thereby reducing network traffic, improving app performance and reducing server load. More info about this technology and its strategic relevance here.
Then we have an ever-so-slightly cynical take on analytics:
[My interpretation of Renaud’s point is that the economics of analytics are set to change with the introduction of MEC and/or similar standards. But I must confess to being a little fatigued at this point.]
Recommends his own books, which discuss big data, its market places, the economics involved. I do not interpret this as a shameless plug, however, as the books look genuinely highly interesting and they are related to the topic he’s entertainingly presented to us today:
Renaud’s reminder of the oscillation between centralisation and decentralisation is timely (irony: EU referendum is June 23rd.) With the typical mobile developer’s emphasis being on utilising RESTful services to invoke server-power, we’re perhaps neglecting the potential of the device itself to handle more complex tasks. Our stance is justified, I feel, by the fact that we generally try to conserve resources in the mobile device, thereby prolonging battery life. Consequently, this driver is in direct conflict with the the MEC initiative. I suspect the battle for control will continue for some time to come…
The marketing of the future: gaining new business with new data and analytics
Professor Peter Gentsch, director CRM Aalen University
Herr Gentsch presents asks the audience to guess the identity of a “celebrity” on the basis of a handful of keywords e.g. nationality, gender, marital status. The person in question turns out to be none other than Prince Charles. He then – provocatively – asks the user audience to think of someone else who fits the same description – and it turns out to be format Black Sabbath frontman Ozzy Osbourne. This is obviously quite amusing, but the point is well made: The typical attributes used today to “characterise” identities are surely insufficient for profiling in marketing.
Potential solution to the problem: Associate the identity with a Facebook account and this gives access to a much richer set of attributes, thereby enabling a much more fine-grained demographics. In short: Facebook, Linked-In etc. can be a very reliable source of differentiable data.
The speaker argues that using new data and new analytics will bring the biggest benefits. We can see this with the advents of Deep Blue (the first chess computer to beat a world champion chess player), Watson (Jeopardy, 2011), Alpha Go (2015).
The game changer in all these cases deep learning. This Gentsch observes that deep learning is not necessarily a new approach, but the ability to massively parallelise the computation means we can get much more out of deep learning than was previously possible.
He identifies different types of AI (3 types); we’re only at level 1 (aka “Weak AI”):
Looking at exponential trends (Moore’s law etc.), by 2020 AI will reach level of one human [cue dramatic music. But seriously: this is impressive and/or scary, depending on your viewpoint].
Deep learning automatically detects new leads. Topics have a much stronger differentiation.
In the case study presented here (a German firm called Berner), exploiting deep learning result in : 6x opportunities; 22% increase in conversion rate; identification of unexpected audiences.
Another possibility: Use AI to automatically engage with customers.
The speaker then presents an example where this trend recently went horribly wrong: The infamous Tay chatterbot, which “intelligently” learned from the humans it interacted with that an infamous 20th century dictator and mass murderer “wasn’t all that bad” – as well as several other very bad ideas – before Microsoft pulled the plug.
This was a great reminder that the potential to make dramatic positive difference to marketing results is available to us here and now. However, it was also also a reminder that with great power comes great responsibility (apologies for the cliché, but sometimes this cannot be avoided).
Defining a new profession of the Data Scientists to address critical skills gap in European research and industry
Yuri Demchenko, Research Lead, System and Network Engineering, EU Commission funded project
McKinsey report (amongst others) highlight a huge shortage in data scientists.
EDISON taxonomy/vocabulary is designed to assist in the furtherance of the data science discipline.
Demchenko provides us with a concise definition of Data Scientist:
Transcribed, for your convenience:
A Data Scientist is a practitioner who has sufficient knowledge in the overlapping regimes of expertise in business needs, domain knowledge, analytical skills, and programming and systems engineering expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle.
Excuse me for suggesting that this definition was derived at by committee. Nevertheless, the gist is clear.
Additional competence areas will be required e.g. data management:
The speaker notes that up to 40% of data-related problems are domain-specific. There is currently no standard model to map between domains and this is a gap that needs to be addressed.
Question from the audience: “Might we envisage a new kind of qualification – the MDS (“Master of Data Science”) or similar – taking on an importance at least as important as that of the classical MBA?”. The questioner then makes a cynical comment about MBAs, which I take very personally indeed! But despite the insult, I think this is an absolutely excellent question. Data is trickier than the vast majority of people realise, yet we insist on gathering it in increasing volumes with the intent of generating insights.
Having worked for many years in both the pharmaceutical and banking industries, I have personally had the opportunity to witness a number of very serious misapplications of mathematics and statistics. In each case this involved well-meaning personnel attempting to take on complex real-world problems, but without having the required qualifications. The result: dodgy scientific conclusions, highly questionable business decisions, and absolutely no handle on the degree of confidence that one should have in any of the conclusions derived.
This can and must change if the world of science and business wants to profit from big data. We need experts and we need to begin investing in the data sciences now.
Big data analytics driven decision making: who makes the decisions, man or machine?
Richard Self, Research Fellow, Derby University
Richard points out that “little data” is also important e.g. a small number of harsh complaints about a restaurant are potentially very useful (for the restaurant) even if most reviews are excellent. However (the key point): determining the veracity of the data will be very hard for a machine.
[After two days of talks, I now confess to being too fatigued to take everything on board that Richard is presenting here. Even so, the main take-home is, I believe…]
Humans must make the decisions, machines should advise.
And one of the reasons for this is that neural networks don’t [yet] explain themselves.
The speaker presents a conclusion from the Standish CHAOS report, that 60% of all big data projects are failing today, which in light of the previous talk, is relatively easy to understand: Too few people understand how to extract value from big data.
Yet more reinforcement that we will be needing experts in order to get the most of big data. But in addition, the speaker has reminded is that we will need to be very careful when handing over decisions to silicon-based entities. Problems will arise, in particular, when humans begin to experience AI on a daily basis, and begin to trust systems more and more. This will inevitably cause (some) people to stop thinking. Clearly there will be casualties. The question is: Will the benefits outweigh the disadvantages? Collectively, we need to ensure that they do.