Introduction to Big Data Analytics


  • Data analytics is an emerging interdisciplinary area of machine learning, data mining, predictive analytics, text mining, forecasting and optimization, and statistics.

Big data analytics is the process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions. With big data analytics, data scientists and others can analyze huge volumes of data that conventional analytics and business intelligence solutions can't touch. Consider that your organization could accumulate billions of rows of data with hundreds of millions of data combinations in multiple data stores and abundant formats. High-performance analytics is necessary to process that much data in order to figure out what's important and what isn't.

Big data is now a reality: The volume, variety and velocity of data coming into your organization continue to reach unprecedented levels. The Big Data phenomenon has emerged as a result of vast amounts of data that are becoming available across a wide range of application domains across science, business, and government.

Data may originate from many disparate sources, including:

  • scientific instruments;
  • digitally-authored media, including text, images, audio, and emails;
  • streaming data from weblogs, videos, financial/commercial transactions;
  • from ubiquitous sensing and control applications in engineered and natural systems;
  • social interactional data from social networks, twitter feeds and click streams; or
  • scientific data from large-scale surveys, and brain research.

The data can be temporal, spatial, or dynamic; structured or unstructured; and the information and knowledge derived from data can differ in representation, complexity, granularity, context, quality, provenance, reliability, and trustworthiness. This phenomenal growth means that you must not only understand big data in order to decipher the information that truly counts, but also understand the possibilities of big data analytics.

When you successfully complete this non-credit course, you will receive 18 educational contact hours (1.8 Continuing Education Units, or CEUs).


Statistics basics, including t-tests, linear regression, and statistical significance
Case study: weather data

Clustering Analysis
Case study: subtype discovery in medical data

Link analysis and recommendation system
Case study: friend recommendation in social network

Classification and Prediction
Case study: Physical activity classification

Data and information visualization
Case study: high-dimension data visualization

Text mining
Case study: Enron email corpus

Anomaly analysis: Detecting frequent items
Case study: credit card fraud detection

Data Privacy and security
Case study: intrusion detection

Learning Outcomes

Students will be able to:

  • Demonstrate knowledge of statistical data analysis techniques used in decision making
  • Apply principles of Data Science to the analysis of large-scale problems
  • Use data mining software to solve real-world problems
  • Employ cutting edge tools and technologies to analyze Big Data


Ping Chen

Ping Chen is an Associate Professor of Computer Engineering and the Director of Artificial Intelligence Lab at the University of Massachusetts Boston. His research interests include Bioinformatics, Data Mining, and Computational Semantics. Dr. Chen has received five NSF grants and published over 50 papers in major Data Mining, Artificial Intelligence, and Bioinformatics conferences and journals. Ping Chen received his BS degree on Information Science and Technology from Xi'an Jiao Tong University, MS degree on Computer Science from Chinese Academy of Sciences, and PhD degree on Information Technology at George Mason University.