Introduction to big data midterm exam solution

Introduction to big data midterm exam solution.

QUESTION 1

What are the three characteristics of Big Data, and what are the main considerations in processing Big Data?

Explain the differences between BI and Data Science.

Briefly describe each of the four classifications of Big Data structure types. (i.e. Structured to Unstructured)

List and briefly describe each of the phases in the Data Analytics Lifecycle.

In which phase would the team expect to invest most of the project time? Why? Where would the team expect to spend the least time?

Which R command would create a scatterplot for the dataframe “df”, assuming df contains values for x and y?

What is a rug plot used for in a density plot?

What is a type I error? What is a type II error? Is one always more serious than the other? Why?

Why do we consider K-means clustering as a unsupervised machine learning algorithm?

Detail the four steps in the K-means clustering algorithm.

List three popular use cases of the Association Rules mining algorithms.

Define Support and Confidence

How do you use a “hold-out” dataset to evaluate the effectiveness of the rules generated?

List two use cases of linear regression models.

Compare and contrast linear and logistic regression methods.