CSE Training Workshops in Data Analytics, Fall 2014 • DCL L440, 1–3 pm

Project maintained by uiuc-cse Hosted on GitHub Pages — Theme by mattgraham

All workshops will be held in L440 Digital Computer Laboratory, an EWS computer laboratory in the basement. There is no sign-up for this series—walk-ins are welcome and encouraged!

Introduction to R, Part 1

Sep. 8, 1:00–3:00 pm • DCL L440

This workshop targets students with some programming experience and little to no prior exposure to the statistical and data analysis language R. We will conduct a hands-on walkthrough of basic R features and packages.

We will cover the following topics:

How to install R Studio on EWS workstations

Link to lesson

Link to exercises

Introduction to R, Part 2

Sep. 15, 1:00–3:00 pm • DCL L440

This tutorial continues the introduction to R begun previously, including new topics such as importing packages.

Link to lesson

Link to capstone exercise

Pandas (Python Data Analysis) Library

Sep. 17, 9:00–11:00 • DCL L440

The Pandas module provides an R-like interface for manipulating and analyzing data sets and their statistics.

Introduction Big Data and Analytics (Seminar by Big Data and Analytics Council)

Sep. 25, 5:00 pm • 1002 Lincoln Hall

Although not part of the CSE workshop series, we recommend this talk hosted by the student group Big Data and Analytics Council which will cover Big Data and its applications in a popular manner to those interested in applying data analysis techniques to their research and coursework.

Data Mining Applications

Sep. 29, 1:00–3:00 pm • DCL L440

Machine Learning Applications

Oct. 6, 1:00–3:00 pm • DCL L440

Machine Learning in Python

Oct. 22, 9:00–11:00 • DCL L440


Oct. 23, 9:00–11:00 am • DCL L440

Presented by Dr Mark Sammons and Hao Wu of the Cognitive Computation Group. Please note the updated date and time.

Come learn how to perform cloud processing of natural language, whether your interest is business intelligence, computer science, computational linguistics, or text mining.

IllinoisCloudNLP makes it straightforward for experts and nonexperts alike to process large texts as needed.


We will follow the instructions here. Unless you already have an Amazon Web Services account, you will use a CSE training account uniquely assigned to you in the workshop. (Good user practice: you don't want to expose this information, but since I'll reset it immediately after the workshop it's "okay" here.)

Mark Sammon's page

KNIME Graphical Analytics

Oct. 27, 1:00–3:00 pm • DCL L440

KNIME is an open platform for sophisticated data mining and statistics on your data. The visual workbench combines data access, transformation, investigation, predictive analytics, and visualization in one package. Come to this hands-on workshop and get started today!


KNIME can be executed directly from the extracted archive.

Data Files

Big Data: Hadoop and MapReduce

Nov. 10, 1:00–3:00 pm • DCL L440

Today we will discuss Hadoop and MapReduce, a popular algorithm and platform for large-scale data analytics. We will also use Amazon Web Services’ cloud computing infrastructure.

Big Data: SQL, Pig, and the Hadoop Zoo

Nov. 17, 1:00–3:00 pm • DCL L440

We will teach the database language SQL, the SQL-like interface to Hadoop, Pig, and what the elements of the Hadoop Zoo, or ecosystem of tools and platforms around Hadoop, are.

About These Workshops


Neal Davis and Yuanzhi Qi developed these materials. This content is available under a Creative Commons Attribution 3.0 Unported License.


If you have any questions about course availability, concepts, or content, please contact Neal Davis, Training Coördinator for Computational Science & Engineering, at davis68 at illinois dot edu.