As we approach KDD-2015, the largest and highest quality conference on Data Mining, Data Science, and Knowledge Discovery, we want to introduce you to the amazing invited speakers we have lined up in the Industry and Government invited talks program that focuses on applications: deployed, real-world applications in industry and government, with quantifiable value delivered. These talks presents a rare opportunity to hear from the very best about the most exciting topics when it comes to building highly scalable platforms and deploying real applications. The speakers will share key insights from their experiences and present valuable lessons learned.

Our theme this year is Data Science and Big Data. This is a rapidly growing sector of our industry and promises to bring nothing less than one of the biggest disruptions ever to hit the Data and Analytics world since its inception. To give you an idea of what we are talking about, we draw on a recent article by Forbes that proclaimed the market will exceed USD $50 Billion by 2018. This market was barely $6 Billion in 2012! In addition, these figures do not account for the Analytics Industry, which in 2015, is estimated to total USD $135 Billion! (see the Forbes article if you find the numbers intriguing).

Whether the industry size estimates will bear out or not, we believe that the Big Data revolution is upon us, and it will change everything. This technology is what enabled Google and other search companies to index the entire world wide web (or at least the visible part of it) — which at last count had about 1 billion sites with 3 billion global users online (see: the Internet live stats). And now the same Big Data technology has been “democratized” and made available to all via the Hadoop open source initiative. So our list of invited speakers features Amr Awadallah, the co-founder and CTO of Cloudera, the biggest company that supports the open source releases of Hadoop. We also believe that the use of analytics technology on the cloud will be an essential part of tomorrow. For that, we bring to you Joseph Sirosh, Corporate Vice President at Microsoft responsible for the cloud offering of machine learning and data mining algorithms. Joseph left his critical position at Amazon to join Microsoft to launch these services. Microsoft recently bought Revolution Analytics, one of the largest supporters of the open source R project for statistical analysis.

This year, we are also focusing on Open Source as a theme, and we have invited several speakers to cover this important area and its impact on Analytics. Chris White ran the famous XDATA Program at DARPA where he created the largest library of open source sophisticated analytics packages. He will tell us all about the treasures in this program before Microsoft recruited him away. Bassel Ojjeh will share with us major lessons learned from leveraging open source and deploying a large-scale analytics platforms in open source. He will also cover the issues you need to think about as you leverage the “free” open source software available for Analytics.

The full list of speakers includes luminaries and very senior execs from public companies that are major players in Analytics including Adobe, LinkedIn, Visa, and Rocket Fuel along with speakers who run high impact applications.

In the next few days, we will blog about each of the speakers and the topics they will cover. So please stay tuned and track this blog. More importantly, we hope to see you at Sydney and have you partake in the opportunity to meet our speakers in person and to participate in the lively discussion that will take place.

Stay tuned, and see you in Sydney on August 10.

Usama and Rajesh - co–chairs of Invited Talks Track

Usama Fayyad (@usamaf) , Chief Data Officer, Barclays

Rajesh Parekh (@rgparekh), VP Data Science, Groupon

Have you ever wondered what are the issues, challenges and opportunities that the Open Source movement brings to BigData in the enterprise? It is one thing to ‘play around’ with some free open source tools. It is a whole other thing to officially adopt the platform inside an organization where robustness, reliability, availability, etc. become serious challenges. What about support? upgrades? security? patches? etc? We have several invited talks in our track from leaders in this area on experiences, lessons learned, and advice.

The Big Data revolution has ushered in a large array of open source software tools and solutions. Open source big data software is seeing widespread adoption in large and medium sized companies both high-tech and traditional. Open source solutions have matured from cool technology that a few companies experiment with to robust platforms that are being used live in large-scale production systems. In addition, open source has become a tool for attracting and retaining top talent in data science and engineering. The younger generation of data scientists and engineers is proactively seeking out opportunities who have bought into the open source mantra.

Some of the best known tools for Big Data are open source. Perhaps the most famous among these is Hadoop. Hadoop has become so pervasive that often Big Data and Hadoop are used synonymously. The rise of Hadoop has spawned an entire eco-system of companies developing open source software and value-added services around it. Big Data technologies have matured to such an extent that robust open source tools are now available for the entire data spectrum starting from distributed data storage and file systems, to databases and warehouses, to business intelligence and analytics, all the way up to data mining and knowledge discovery.

A key focus area of our Industry and Government Invited Talks at the KDD 2015 is open source software and systems for Big Data Analytics. We are delighted to feature Dr. Amr Awadallah, Co-founder and CTO of Cloudera — the largest distributor/support company for Hadoop stack technology. Amr will talk about the impact of Hadoop and projects in the Hadoop eco-system on big data management. Hadoop will play a very critical role in the data center of 2020 and is likely to benefit a very large spectrum of industries. He will also discuss the tools and technologies that are at risk of displacement or encroachment as the Hadoop tsunami rolls on.

Bassel Ojjeh, CEO of LigaDATA, will share his experiences and insights on powering real-time decision engines using open source software. Financial services and healthcare industries could be the biggest beneficiaries of Big Data solutions. However, they are the most cautious in adopting and leveraging the open source tools that are prevalent in the Big Data space. Bassel will talk about how his company has partnered closely with a big finance firm and a healthcare provider to build an open source realtime decision engine. This engine cleanly integrates with commonly used open source tools and systems such as Kafka for data streaming and HBase and Cassandra for NoSQL data storage. Further, Bassel will quantify the key metrics improvements driven by this realtime decisioning engine and outline some unsolved problems/challenges that need to be addressed for much more widespread adoption of open source software.

Finally, Dr. Chris White will share with us his experience and learning when he ran the largest and most well known project in BigData for DARPA. The XDATA project, a USD $100M project funded some of the most advanced security and data intelligence project in BigData to create a search engine for the Dark Web and Chris insisted on making all the work open source so to maximize the benefit to the community. Chris ran this project until last June when he decided to leave DARPA and join Microsoft. There were many learnings from this large project that involved academia, startups, and large contractors and built up a great library of open source capabilities over BigData.

Here is the detailed Industry and Government Invited Talks program. We look forward to seeing you at the KDD conference in Sydney in August 2015. Come join the discussion, and learn from the leaders in these fields. In our next blog we will focus on other focus areas and great speakers.

Rajesh Parekh (@rgparekh) and Usama Fayyad (@usamaf) co-chairs, Industry and Government Invited talks at KDD 2015

Large-Scale Machine Learning: How do we make the learning algorithms work on BigData? What does the Cloud have to offer here? Lear about this from the industry captains in these areas: Microsoft, LinkedIn, Adobe, and many others...

In this third blog on our series focusing on the distinguished invited talks for the KDD-2015 Conference held in Sydney, Australia starting August 10, 2015 (see the details of the invited talks at: KDD 2015 Industry Invited Talks series) we focus on Large-Scale Machine Learning. The previous two blogs covering other parts of the series can be found at: Overview and Open-Source.

Machine Learning algorithms have come a long way from handling a few hundred data points (for example, the Iris dataset) in the 1990’s to billions of data points currently (for example, text classification and ad response prediction). This growth has been fueled by the tremendous advances in computing infrastructure and the development of algorithms that can rapidly train models over large data sets, and now BigData. These technologies have gone beyond the research labs and into the mainstream with several large corporations (like Google, Facebook, Yahoo!, LinkedIn, Microsoft and Twitter) leading the charge in developing them and releasing them as open source packages and libraries.

Scaling up machine learning to handle the BigData sets that are both very large volumes and include unstructured data from sources such as the web, mobile devices, and sensors is non-trivial. The KDD 2015, Industry and Government Invited Talks track features presentations from world renowned experts who have successfully built large-scale machine learning platforms and have delivered successful applications using them.

Joseph Sirosh (@josephsirosh), Corporate Vice President at Microsoft, will present his team’s pioneering work on the intelligent cloud. He has built a commercial Machine Learning as a Service platform that provides cloud hosted intelligent APIs for connected software applications. The platform is already being used for several cool applications including face analysis, speech recognition, and forecasting. Joseph will present his views on the key trends for the intelligent cloud and their implications on the future of data science.

Deepak Agarwal (@StatGuru1) will showcase LinkedIn’s best practices for scaling machine learning systems for applications like recommendations, search and computational advertising. These systems make an astronomical number of decisions every day on what to serve users when they are visiting the website and/or using the mobile app. Three main challenges to scale Machine Learning methods include: a) scientific b) infrastructure and c) organizational. Deepak’s talk will leverage key examples from LinkedIn to show us how these challenges can be practically overcome.

Computational Advertising is one of the earliest and (possibly) the most commercially successful applications of Machine Learning. The landscape of digital advertising is extremely crowded with publishers, agencies, networks, and exchanges all vying for their share of the advertising dollar pool. We have brought you some of the top leaders and experts in this arena to share their learnings with you:

George John (@gjohn) co-founded Rocket Fuel in 2008 to apply large-scale machine learning to digital advertising. George will present a fast-paced overview of the business and technology context for Rocket Fuel from its inception through today and will discuss how mainstream customers were convinced to adopt complex technologies and algorithms that powered their entire digital marketing initiative.

Anil Kamath (@kamathanil) co-founded Efficient Frontier to apply large-scale optimization algorithms to Search Engine Marketing (SEM). Efficient Frontier evolved into a unified advertising platform to allow marketers to manage all of their digital advertising campaigns including SEM, display, and social media. Efficient Frontier was acquired by Adobe in 2011. Anil is now the VP and Technology Fellow at Adobe where he is applying data science and optimization techniques to cross channel data to attribute marketing effectiveness, drive media planning, and real ­time optimization of ad campaigns. Anil’s talk will focus on solutions that help Chief Marketing Officers understand the effectiveness of their campaigns and drive increased return on investment (ROI).

Robo Advisors are rapidly shaking up the investment world! They enable people to make very smart decisions with their money for an extremely low fee. But should you trust your money to a robot? Professor Vasant Dhar (@VasantDhar) will lead us through an insightful discussion on the growth of machine learning algorithms for managing investment portfolios and whether human judgment holds any advantage over fully automated algorithms.

The explosion of ecommerce and online banking and finance has given rise to nefarious crimes and fraud schemes that siphon off billions of dollars each year from unsuspecting individuals and corporations. Waqar Hasan (@whasan), SVP of Data at Visa and Ming Wang, SVP or Research at Visa will describe how Visa is leveraging advanced algorithms to counter credit card fraud.

Big Data Analytics and large scale machine learning are not confined to commercial ventures alone. The theme of last year’s KDD conference (KDD-2014) was Data Mining for Social Good. Julie Batch, the Chief Analytics Officer at IAG is working on a global initiative that leverages Big Data Analytics for natural disaster preparedness and recovery. The recent calamities such as the Nepal earthquake (April 2015) and the South Asia floods in the aftermath of cyclone Komen (August 2015) highlight the urgent need for a global system to allow governments and aid agencies to efficiently respond and save lives. Julie will present insights on building such a global platform, the data sources used, and the kinds of insights derived.

Accurate models of user behavior are critical to the success of any user-facing machine learning application. Professor Qiang Yang (@qiangyang) will outline the critical requirements in developing user models and their applications in telecommunications and the internet industries.

Lastly, we will round-up the stellar series of talks with a panel discussion on “What Does It Take to Bring Big Data Analytics to the Mainstream?” Our speakers and organizers will participate in this engaging discussion on how successful applications have taken a pragmatic approach to rise above the hype. This panel is moderated by Usama Fayyad (@usamaf) co-chair of this Invited Talks Track and Honorary Conference Chair for KDD-2015.

Here is the detailed program of the KDD 2015 Industry Invited Talks series. We look forward to seeing you at the KDD-2015 conference in Sydney next week!

Rajesh Parekh (@rgparekh) and Usama Fayyad (@usamaf) co-chairs, Industry and Government Invited talks at KDD 2015