Diabetes Prediction With Pyspark MLLIB
You will learn to develop a logistic regression model using Pyspark MLLIB to categorize patients as diabetes or non-diabetic in this one-hour project-based course. The popular Pima Indian Diabetes data set will be used. Goal is to classify diabetes using a basic logistic regression classifier from the pyspark Machine learning toolkit. With the installation of Pyspark, they will be able to complete the entire project on the Google Colab environment. To accomplish this project, you'll need a free Gmail account. Please keep in mind that the dataset and model used in this project cannot be used in real-life situations. They solely use your information for educational purposes.
You will be able to develop a logistic regression classifier using Pyspark MLlib to categorize diabetic and nondiabetic patients by the end of this project. You'll be able to use Pyspark in the Google colab environment as well. You will be able to clean and prepare data for analysis as well. You should be familiar with the Python programming language as well as the Logistic Regression algorithm on a theoretical level. To accomplish this project, you'll need a free Gmail account.
Note: This course is best suited to students in the North American region. They were working on bringing the same experience to other parts of the world.
THE SKILLS YOU WILL DEVELOP
- Data science
- Machine-learning
- Python Programming
- Google colab
- PySpark
LEARN STEP BY STEP:
- Introduction & Install Dependencies
- Clone and Explore Dataset
- Data Cleaning and Preparation
- Correlation analysis and Feature Selection
- Split Dataset and Build the Logistic Regression Model
- Evaluate and Save the model
- Model Prediction on a new set of unlabeled data
Rating: 4.6/5
Enroll here: coursera.org/projects/diabetes-prediction-with-pyspark-mllib