Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 11499cc

Browse filesBrowse files
authored
Update README.md
1 parent 54fa9db commit 11499cc
Copy full SHA for 11499cc

File tree

Expand file treeCollapse file tree

1 file changed

+19
-6
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+19
-6
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+19-6Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ The following libraries are required to successfully implement the projects.
1414
The projects are divided into various categories listed below -
1515

1616
## Supervised Learning
17-
- [##**Linear Regression**]()
17+
- [**Linear Regression**]()
1818
- [Linear Regression Single Variables.](https://github.com/suubh/Machine-Learning-in-Python/blob/master/Linear%20Regression/LinearRegressionSingle%20Variables.ipynb) : A Simple Linear Regression Model to model the linear relationship between Population and Profit for plot sales.
1919
- [Linear Regression Multiple Variables.](https://github.com/suubh/Machine-Learning-in-Python/blob/master/Linear%20Regression/LinearRegressionMultipleVariables.ipynb) : In this project, I build a Linear Regression Model for multiple variables for predicting the House price based on acres and number of rooms.
2020

@@ -31,13 +31,26 @@ The projects are divided into various categories listed below -
3131
- [**Random Forest Classification**](https://github.com/suubh/Machine-Learning-in-Python/blob/master/RandomForest/RandomForest.ipynb) : In this project I used Random Forest Classifier (90.0%) and Random Forest Regressor (61.8%) on the Social Network Ads dataset.
3232

3333
## Unsupervised Learning
34-
- [**K Means Clustering**](https://github.com/suubh/Machine-Learning-in-Python/blob/master/K-means/creditcard.ipynb) : K-Means clustering is used to find intrinsic groups within the unlabelled dataset and draw inferences.It is one of the most detailed projects, In this project, I implement K-Means Clustering on Credit Card Dataset to cluster different credit card users based on the features.I scaled the data using `StandardScaler` because normalizing will improves the convergence.I also implemented the [*Elbow Method*](https://en.wikipedia.org/wiki/Elbow_method_(clustering)) to search for the best numbers of clusters.For visualizing the dataset I used [*PCA(Principal Component Analysis)*](https://en.wikipedia.org/wiki/Principal_component_analysis) for dimensionality reduction as the dataset features were large in number.In the end I used [*Silhouette Score*]() which is used to calculate the performance of clustering . It ranges from -1 to 1 and I got a score of 0.203.
35-
36-
## NLP(Natural Language Processing)
37-
- [Text Analytics]()
38-
- [Sentiment Analysis]()
34+
- [**K Means Clustering**](https://github.com/suubh/Machine-Learning-in-Python/blob/master/K-means/creditcard.ipynb) : K-Means clustering is used to find intrinsic groups within the unlabelled dataset and draw inferences.It is one of the most detailed projects, In this project, I implement K-Means Clustering on Credit Card Dataset to cluster different credit card users based on the features.I scaled the data using `StandardScaler` because normalizing(scale in range 0 to 1) will improves the convergence.I also implemented the [*Elbow Method*](https://en.wikipedia.org/wiki/Elbow_method_(clustering)) to search for the best numbers of clusters.For visualizing the dataset I used [*PCA(Principal Component Analysis)*](https://en.wikipedia.org/wiki/Principal_component_analysis) for dimensionality reduction as the dataset features were large in number.In the end I used [*Silhouette Score*]() which is used to calculate the performance of clustering . It ranges from -1 to 1 and I got a score of 0.203.
35+
36+
## NLP( Natural Language Processing )
37+
- [Text Analytics](https://github.com/suubh/Machine-Learning-in-Python/blob/master/TextAnalytics/textAnalytics.ipynb) : It is a project for Introduction to Text Analytics in NLP.I performed the important steps -
38+
- ***Tokenization***
39+
- ***Removal of Special Char***
40+
- ***Lower Case***
41+
- ***Removing StopWords***
42+
- ***Stemming***
43+
- ***Count Vectorizer*** (Which generally performs all the steps mentioned above except Stemming)
44+
- ***DTM (Document Term Matrix)***
45+
- ***TF-IDF (Text Frequency Inverse Document Frequency)***
46+
47+
- [Sentiment Analysis](https://github.com/suubh/Machine-Learning-in-Python/tree/master/Sentiment%20Analysis) : I applied Sentiment analysis in MovieReview (Dataset from nltk library) and RestaurentReview Datasets to predict the positive and negative review . I used Naive Bayes Classifier (78.8%) and Logistic Regression (84.3%) to build the models and for prediction.
3948

4049
## Data Cleaning and Preprocessing
50+
- [Data Preprocessing](https://github.com/suubh/Machine-Learning-in-Python/blob/master/Data%20Preprocessing/Untitled.ipynb) : I perform various data preprocessin and cleaning methods which are mentioned below -
51+
- ***Label Encoding*** : It converts each category into a unique numeric value ranging from 0 to n(size of dataset).
52+
- ***Ordinal Encoding*** : Categories to ordered numerical values.
53+
- ***One Hot Encoding*** : It creates a dummy variable with value 0 to n(unique value count in the column) for each category value.Extra columns are created.
4154

4255

4356

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.