Apache Spark for Beginners
Please login to enroll is this event.
Apache Spark is a fast parallel cluster computing engine that supports interactive computing on large scale datasets in popular languages including Python, R, SQL, Scala and Java.
This training session will cover the basics: importing data into an Apache Spark cluster and an overview of some analytic tools that can be used with Spark including Python (PySpark) in Jupyter notebooks and R (SparkR) for interactive data analysis.
To illustrate the tools, we will show how Spark clusters can be used to perform analysis on both semi-structured data (for applications such as text analysis and genomics) and tabular/columnar formatted data (such as an SQL database).
We will also look at what it takes to set up a Spark cluster and OIT’s Spark services as well as run some hands-on data analysis illustrating how to optimize compute jobs for Spark.
Subjects: intermediate research computing spark
|Date||Monday, February 11th, 2019|
|Time||1:00pm - 3:00pm|
|Location||TEC - Classroom|
|Enrolled||14 of 30|