Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Data Modeling Steps using Statistics and ML tools in Python

Notifications You must be signed in to change notification settings

yekabe/DataModelingStepsML

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
64 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Holistic Data Modeling Practices in Python A to Z: All Steps A Data Scientist Should Know and Apply

We wanted to create a developing open resources of data modeling practices using statistical and machine learning topics from scratch (into level) to a good level (grad or applied research level). Almost all useful data science models and statistical methods will be studied under one cover along with Notebook in Python. We think this developing resource is crucial to know for a data scientist when touching the data first time.

When building a data-based model in today's machine learning and big data era, it is crucial to explore it from holistic and comprehensive approaches using multivariate perspectives, thus, models are trained well and thus improved. Data mining and feature engineering using statistical and visualization methods are very crucial when touching data first. Advanced methods -such as manifold learning in exploration-, automating search algorithm for estimation -such as a grid or smart search method in exploitation-, deep tools -such as Scikitlearn and Keras in deepening-, good choices of activation functions and optimizers are needed to complete the modeling task including predictive and supervised modeling. Exploratory data analysis, discovering randomness and interdependence in features, model choice-fit-validation-improvement, outlier detection, and reproducibility are each requires data scientist to comprehend almost all the methods and tools beside time commitment.

Here, we apply data science methods and tools in a notebook/workshop style in the exploratory and applied format with real-world data, aiming to open so-called black-boxes. In our group, we have a statistician, a programmer, an applied mathematician, an AI expert, and a chameleon. While preparing our notes, we discuss the multi-aspects and theories of modeling practices using classical and all up-to-date methods and tools, and reflect these here. Of course, designing search grid and pipeline, writing python functions to automate these practices, setting up for stream data are some of the practices we include here as well.

The outline of the resources can be found in the table of contents (notebooks/All_Steps_A_Data_Scientist_Should_Know_and_Apply.ipynb). We post and update the related notebooks of each method under Notebook folder.

We will develop as we learn, discuss and apply methods. We owe thanks to the data scientists who share their resources since we have benefitted in our notes. Eventually, all notes and files will be reorganized in the form of workshops and notebooks.

YB on behalf of Data Science Group, Rochester, March 2020

About

Data Modeling Steps using Statistics and ML tools in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  
Morty Proxy This is a proxified and sanitized view of the page, visit original site.