- to get an extensive overwiew of the tools used, reference: https://github.com/Farooq-azam-khan/sklearn-datascience.git
- more complex data science libraries were not used in some scenarios for the following reasons:
-
- Provide a first principles approach
-
- provide a case for why such libraries are useful when dealing with abstractions
-
- a pythonic / functional approach is used. Some notes / examples of these topics can be found at
./PythonConcepts.md
Follow these instruction so that you can use this repository:
- Launch a python virtual environment by typing
pipenv shell.- If you do not have virtualenv use:
pipenv install.
- If you do not have virtualenv use:
- Once it is setup activate the virtual environment by typing:
pipenv shell- This way the python modules installed on your computer will not affect your moduels in this virtual environment nor will they be affected.
- Download the zip file and extract the
src\file and therequirements.txtfile in to the virtuelenv directory. - Install the required python packages from requirements.txt. This following code will install all of the requirements:
pip3 install -r requirements.txt.- To check that you have done the above steps correctly just type
pip freezeto see all the packages that are installed on your virtual environment.
- To check that you have done the above steps correctly just type
- Now, you can run any python file in this repository, just type
python [file_name].pyin that directory.
- To learn about PLA look at the following files:
preceptron.py,linear_function.py,boolean_function.py, andplanar_equation.py. - In the
preceptron.pyfile, thePerceptron()class contains the perceptron learning algorithm. This is an extremely useful algorithm to understand because neural networks and deep neural networks build on this simple algorithm.- The algorithm itself has two main parts to it, predicting results based on input and training based on desired outcome and actual outcome.
- The reason PLA is simple and not useful in modern day research is because it can only predict linearly separable data (i.e. it can only separate things with a line and can be proven with linear algebra).
- The
linear_function.pyfile contains a graphical understanding of how PLA does linear separation of 2d inputs. - The
Point()class generates random points with a label. Thelabel=1if it is above the actual line andlabel=-1if it is below. PLA will try to approximate this line as best as possible by putting a point to either side of a line. Think of it as organizing a bowl of dimes and nickels. The dimes will go in one basket and the nickels in another. - There are four outcomes when predicting a set of inputs. It could be false positive, false negative, true positive, and true negative. These can be seen in the legend of the graph.
- The
boolean_function.pyfile contains examples of PLA successes as well as its failure, i.e. the XOR problem. Look atneural_network.pyfor improvement to PLA and a solution to thexorproblem.
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- to be implemented
- In the directory
Neural_Networkyou will find 2 files:matrix.pyandneural_network.py. - the
matrix.pycontains matrix operations (which you can look at if you are interested but it is not necessary for an intuitive understanding). neural_network.pycontains two important functions, thefeed_forward(inputs)and thetrain(inputs, targets)methods. Both expect arrays as parameters. Note that this is very similar to the Perceptron Learning Algorithm; however, the complexity to the algorithm comes from the linear algebra, and the calculus involved with it. This is because we are storing out weights and biases in a matrix and in some cases trying to get the derivative/gradient of that matrix. The necessary linear algebra comes from thematrix.pyfile.- An interesting feature in this class is the
map(func)method. If you are coming fromJava/C/C++/JSit should be noted that, inPython, you can pass in functions to another function, i.e. a function can be treated as a parameter. For example, iffunc(x) = 2*xthenmap(func)is allowed and will bemap(2*x). - On the subject of python, there are no such thing as
arrayorArrayList. This is justlistwhich behaves like anArrayList. - You do not have to worry about double or single quotation you can use either as long as you are consistent.
- Common syntax for writing a
listis by using afor each loop, which in python is the defaultfor loop.
- An interesting feature in this class is the
- with the
neural_network.pyfile as you can see thexorproblem, although simple to us cannot be solved by the PLA but it is very easy for the NN (after 1000 iterations of training).
- Before using Tensorflow we must try and understand what a tensor is. We have a strong understating of scalars, and vectors. Matrices are intuitive, but tensors can be a bit tricky. As we know scalars are just the set of real numbers, vectors provide magnitude and direction. Note also that scalars and vectors have different rules for multiplying, adding, etc. A Matrix is a collection of vectors or just a table of rows and columns. With tensors, we go a step further which gives us a higher order generalization.
- Tensors are an array of matrices. For example, if we had two sets of m by n matrices we can store them in an array object and we would have a tensor. Theoretically, it is possible to do operations on tensors, but it is exponentially harder to implement on a computer. Large matrix operations are very expensive and one could imagine how expensive tensor operations could get.
- The beauty of Tensorflow is that it is heavily optimize. It takes care of all the memory management involved in doing tensor operations, hence, the "flow" in "Tensorflow".
- In the director
Tensorflowwe have the following files:NN_tf.py,iris_tf.py. - The
NN_tf.pyfile contains an implementation of a Deep Neural Network with Tensorflow.- A Deep Neural Network is essentially a Neural Network with many hidden layers.
NN_tf.pytrains a model on the mnist dataset which has 784 inputs, 3 hidden layers, and 10 outputs. The mnist dataset contains hand written digits. With out model we are trying to predict which digits it is hence the 10 outputs and the 784 is each pixel in the image. The 3 hidden layers is arbitrary and is calculated through experimentation.
- The
iris_tf.pyis another application of the tensorflow library. The dataset operated on here is another famous dataset, the iris dataset. It has 4 inputs and 3 outputs. The 4 inputs include the sepal width, sepal length, pedal length, and pedal width. The model tries to predict the type of iris flower based on the inputs (setosa, virginica, or versicolor).
- to be implemented