Research, Workshops, Talks, and Tutoring

Research

Research at the CI Lab centers around data science, machine learning, and artificial intelligence. Current work includes building a Quantum Machine Learning Classifier (QMLC). The QMLC uses the mathematics of quantum computing in a deep neural network. The development work uses R. A. Fisher’s “Iris Data” (Fisher, 1936) to find and classify the specific flower type of the three different iris flower species. The quantum-computing machine learning classifier performs significantly better than one using classical deep learning neural network methods, taking fewer epochs to train. Upcoming work includes the use of optimizers, encapsulation, and quantum entanglement and superposition.

 We’re also working with outside partnerships on research to enhance STEM education in middle schools. The CI Lab is here to collaborate with you and your team.

Workshops

The CI Lab provides a place for workshops and tutorials crossing the scope of data analytics. Organized by tools and processes, the workshops cover data collection and mungling, exploratory data analysis and visualization, and predictive modeling. Students will learn about these key pillars of data analysis using Python and R.  Foundational workshops include topics from probability theory and statistics, modeling and data, and statistical computing.  

Click on the modules below to access the workshop videos.

Webscraping

Learn how to extract data from a website.  What is webscraping? Why do we need it and how do we do it? Mihir Zala shows us how to do it and why.

Webscraping Slides

Twitter Application Programming Interface (API) - Part 1

Learn how to use Twitter API to get data from tweets.  Nayana Mahajan walks us through the details in this two part tutorial.

Twitter API tutorial Slides

Twitter API Jupyter Notebook

Twitter API Notebook – Empty Shell

Twitter Application Programming Interface (API) - Part 2

Cotinue working with Nayana on Twitter API in part 2 of this topic. (The slides for this video are the same as part 1.)

Exploratory Data Analysis

Before any model building can begin, a data analyst must get to know the data.  Exploratory Data Analysis (EDA) is a key step in the process, where the analyst gets to see and understand the possible relationships in the data before attempting to build a model.  Neha Mathur takes us through this critical first-step in data analysis.

Exploratory Data Analysis Slides

Exploratory Data Analysis Jupyter Notebook

EDA-Shell Try it!!

Data Cleaning

One outcome from EDA is to discover that the data you have are not ideal.  In fact, real data are never pristine as they are in textbooks!  Cleaning data and data mungling (reshaping and manipulating data for analysis) are essential, and although not the exciting part of a data scientist’s day, they are key to good research.  In this module, Gio Abou Jaoude illustrates the highs and lows of data cleansing.

Data Cleaning Slides

Monte Carlo Simulation

Monte Carlo simulation is at the core of many numerical methods used in statistical computing. It can be used to generate random processes and is central to Bayseian analysis, Markov chain Monte Carlo, and Gibbs sampling.  Gio Abou Jaoude explores Monte Carlo methods in this video through its application in several settings.

Monte Carlo Slides

Talks

We have the following talks scheduled for the Spring 2022 semester:

    • April 11, 2022 – Seidenberg Lounge 12:10pm-1:10pm. Gio Abou Jaoude (MS in CS May ’22): “Quantum Machine Learning Classifier.”
    • April 27, 2022 – Seidenberg Lounge 12:10pm-1:10pm. Profs. Yegin Genc and Frank Parisi “Data Science and Computational Intelligence.”

 Tutoring

At the CI-Lab you can get help with a variety of topics. We offer tutoring in Python, R, Git, and data science.