CSE Training Workshops in Data Analytics, Fall 2014 • DCL L440, 1–3 pm
All workshops will be held in L440 Digital Computer Laboratory, an EWS computer laboratory in the basement. There is no sign-up for this series—walk-ins are welcome and encouraged!
This workshop targets students with some programming experience and little to no prior exposure to the statistical and data analysis language R. We will conduct a hands-on walkthrough of basic R features and packages.
We will cover the following topics:
This tutorial continues the introduction to R begun previously, including new topics such as importing packages.
The Pandas module provides an R-like interface for manipulating and analyzing data sets and their statistics.
Although not part of the CSE workshop series, we recommend this talk hosted by the student group Big Data and Analytics Council which will cover Big Data and its applications in a popular manner to those interested in applying data analysis techniques to their research and coursework.
Presented by Dr Mark Sammons and Hao Wu of the Cognitive Computation Group. Please note the updated date and time.
Come learn how to perform cloud processing of natural language, whether your interest is business intelligence, computer science, computational linguistics, or text mining.
IllinoisCloudNLP makes it straightforward for experts and nonexperts alike to process large texts as needed.
We will follow the instructions here. Unless you already have an Amazon Web Services account, you will use a CSE training account uniquely assigned to you in the workshop. (Good user practice: you don't want to expose this information, but since I'll reset it immediately after the workshop it's "okay" here.)
On your EWS machine, please open a terminal window to work in and execute the following code:
module load sun-jdk/1.7.0-latest-el6-x86_64
Follow the instructions here.
When your terminal output reads
[info] play - Application started (Prod) [info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
then navigate to
To monitor jobs, you can log in to the AWS site with the username assigned as
csetrainingXX and password
Capricorn1, then select ‘EC2’ and
Monitor Instances on the left. There you can see your machine instances running on the cloud and some data about their execution.
KNIME is an open platform for sophisticated data mining and statistics on your data. The visual workbench combines data access, transformation, investigation, predictive analytics, and visualization in one package. Come to this hands-on workshop and get started today!
KNIME can be executed directly from the extracted archive.
Today we will discuss Hadoop and MapReduce, a popular algorithm and platform for large-scale data analytics. We will also use Amazon Web Services’ cloud computing infrastructure.
Please download this file to your Desktop.
Then open a command-line window and execute the following:
cd Desktop module load canopy ipython notebook mapreduce.ipynb
You will log in later as well to the AWS CSE workshop site using the login and password distributed in class.
We will teach the database language SQL, the SQL-like interface to Hadoop, Pig, and what the elements of the Hadoop Zoo, or ecosystem of tools and platforms around Hadoop, are.
Neal Davis and Yuanzhi Qi developed these materials. This content is available under a Creative Commons Attribution 3.0 Unported License.
If you have any questions about course availability, concepts, or content, please contact Neal Davis, Training Coördinator for Computational Science & Engineering, at davis68 at illinois dot edu.