MLlib is Spark’s machine learning library. GraphX is Spark’s API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.
About MLLib, GraphX, and R
Who should take this course?
Programmers and developers familiar with Apache Spark who wish to expand their skill sets.
Course Content
- start the course
- describe data types
- recall the basic statistics
- describe linear SVMs
- perform logistic regression
- use naïve bayes
- create decision trees
- use collaborative filtering with ALS
- perform clustering with K-means
- perform clustering with LDA
- perform analysis with frequent pattern mining
- describe the property graph
- describe the graph operators
- perform analytics with neighborhood aggregation
- perform messaging with Pregel API
- build graphs
- describe vertex and edge RDDs
- optimize representation through partitioning
- measure vertices with PageRank
- install SparkR
- run SparkR
- use existing R packages
- expose RDDs as distributed lists
- convert existing RDDs into DataFrames
- read and write parquet files
- run SparkR on a cluster
- use the algorithms and utilities in MLlib
Call Now- +91-921-276-0556