You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-6Lines changed: 19 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ The following libraries are required to successfully implement the projects.
14
14
The projects are divided into various categories listed below -
15
15
16
16
## Supervised Learning
17
-
-[##**Linear Regression**]()
17
+
-[**Linear Regression**]()
18
18
-[Linear Regression Single Variables.](https://github.com/suubh/Machine-Learning-in-Python/blob/master/Linear%20Regression/LinearRegressionSingle%20Variables.ipynb) : A Simple Linear Regression Model to model the linear relationship between Population and Profit for plot sales.
19
19
-[Linear Regression Multiple Variables.](https://github.com/suubh/Machine-Learning-in-Python/blob/master/Linear%20Regression/LinearRegressionMultipleVariables.ipynb) : In this project, I build a Linear Regression Model for multiple variables for predicting the House price based on acres and number of rooms.
20
20
@@ -31,13 +31,26 @@ The projects are divided into various categories listed below -
31
31
-[**Random Forest Classification**](https://github.com/suubh/Machine-Learning-in-Python/blob/master/RandomForest/RandomForest.ipynb) : In this project I used Random Forest Classifier (90.0%) and Random Forest Regressor (61.8%) on the Social Network Ads dataset.
32
32
33
33
## Unsupervised Learning
34
-
-[**K Means Clustering**](https://github.com/suubh/Machine-Learning-in-Python/blob/master/K-means/creditcard.ipynb) : K-Means clustering is used to find intrinsic groups within the unlabelled dataset and draw inferences.It is one of the most detailed projects, In this project, I implement K-Means Clustering on Credit Card Dataset to cluster different credit card users based on the features.I scaled the data using `StandardScaler` because normalizing will improves the convergence.I also implemented the [*Elbow Method*](https://en.wikipedia.org/wiki/Elbow_method_(clustering)) to search for the best numbers of clusters.For visualizing the dataset I used [*PCA(Principal Component Analysis)*](https://en.wikipedia.org/wiki/Principal_component_analysis) for dimensionality reduction as the dataset features were large in number.In the end I used [*Silhouette Score*]() which is used to calculate the performance of clustering . It ranges from -1 to 1 and I got a score of 0.203.
35
-
36
-
## NLP(Natural Language Processing)
37
-
-[Text Analytics]()
38
-
-[Sentiment Analysis]()
34
+
-[**K Means Clustering**](https://github.com/suubh/Machine-Learning-in-Python/blob/master/K-means/creditcard.ipynb) : K-Means clustering is used to find intrinsic groups within the unlabelled dataset and draw inferences.It is one of the most detailed projects, In this project, I implement K-Means Clustering on Credit Card Dataset to cluster different credit card users based on the features.I scaled the data using `StandardScaler` because normalizing(scale in range 0 to 1) will improves the convergence.I also implemented the [*Elbow Method*](https://en.wikipedia.org/wiki/Elbow_method_(clustering)) to search for the best numbers of clusters.For visualizing the dataset I used [*PCA(Principal Component Analysis)*](https://en.wikipedia.org/wiki/Principal_component_analysis) for dimensionality reduction as the dataset features were large in number.In the end I used [*Silhouette Score*]() which is used to calculate the performance of clustering . It ranges from -1 to 1 and I got a score of 0.203.
35
+
36
+
## NLP( Natural Language Processing )
37
+
-[Text Analytics](https://github.com/suubh/Machine-Learning-in-Python/blob/master/TextAnalytics/textAnalytics.ipynb) : It is a project for Introduction to Text Analytics in NLP.I performed the important steps -
38
+
-***Tokenization***
39
+
-***Removal of Special Char***
40
+
-***Lower Case***
41
+
-***Removing StopWords***
42
+
-***Stemming***
43
+
-***Count Vectorizer*** (Which generally performs all the steps mentioned above except Stemming)
44
+
-***DTM (Document Term Matrix)***
45
+
-***TF-IDF (Text Frequency Inverse Document Frequency)***
46
+
47
+
-[Sentiment Analysis](https://github.com/suubh/Machine-Learning-in-Python/tree/master/Sentiment%20Analysis) : I applied Sentiment analysis in MovieReview (Dataset from nltk library) and RestaurentReview Datasets to predict the positive and negative review . I used Naive Bayes Classifier (78.8%) and Logistic Regression (84.3%) to build the models and for prediction.
39
48
40
49
## Data Cleaning and Preprocessing
50
+
-[Data Preprocessing](https://github.com/suubh/Machine-Learning-in-Python/blob/master/Data%20Preprocessing/Untitled.ipynb) : I perform various data preprocessin and cleaning methods which are mentioned below -
51
+
-***Label Encoding*** : It converts each category into a unique numeric value ranging from 0 to n(size of dataset).
52
+
-***Ordinal Encoding*** : Categories to ordered numerical values.
53
+
-***One Hot Encoding*** : It creates a dummy variable with value 0 to n(unique value count in the column) for each category value.Extra columns are created.
0 commit comments