- Understand data visualization capabilities in python
- Introduction to Numpy
- Introduction to Pandas
poetry env use $(pyenv which python)
poetry add matplotlib seaborn
poetry run jupyter lab
- The basic statistical functions like
mean,median,mode,sum,countare all types of "reduce" functions because they accept an iterable and return a single value in response - Lambda functions in python are only expression lambda, unlike java it does not support statement lambda.
- map, filter, reduce - similar to how these concepts are used in Java, they return an iterator
- What is the difference between list comprehensions and generator expressions?
- How to perform case-insensitive compare in strings? (hint: str.casefold())
- Recollect how dictionary comprehension works? see - Dictionary comprehension
- How to represent an empty set, why we can't represent an empty set like
{}similar to how we could represent an empty list using[] - Dictionary comprehensions
- Recollect Set mathematical operations (union, intersection, difference, symmetric difference), see Set Math
- What are the basic stat functions in python (mean, median, mode, sum, count, range) Stat functions
- How can you chart a barplot for a six-sided dice roll ? See - Static Visualizations
- seaborn
- matplotlib
- sum
- count
- range
- minimum
- maximum
- mean
- median
- mode : Most frequent occurrence
Dispersion - Defines spread of given numbers how closer or distant they are Standard deviation: sqrt of variance
- Numeric python - most two-dimensional array in python always use numpy
- Pandas library is based on numpy
Numpy has a high performance array called as ndarray that is 35% more performant than the list array data structure
in python.
See numpy_ndarray
- How to find out the datatype of a numpy array?
- What are dimensions in ndarray?
- What are shape attribute in ndarray?
- What is the difference between
sizeanditemsizein andarray? - How to flatten a ndarray? What is the difference between
flatten()and.flatattribute? - Explain the
arangefunction? How similar or different is it fromrange? - How to generate floating point array ranges using numpy?
- What is the purpose of
reshapein numpy? - How does array broadcasting work in numpy?
- What is the use of
linspacefunction in numpy?
- Does numpy have built-in values for stats? Numpy stats
- How to find average by row or column using numpy?
- What are universal functions in python, how to use them? Numpy universal functions
See - Numpy Array Operations
- How to choose to find specific rows in ndarray?
- How to find specific columns in ndarray?
- How to slice data from ndarray?
- How to transpose an
ndarray? - How does array horizontal stack and vertical stacking work?
See - Numpy References
- What is the difference between
copyandviewin numpy? - How does
deepcopywork? - What is the difference between
flattenandravel?
- Created by Wes Mckinney
pandas-panel datas, for data measurements derived over time- 2 key data structures -
SeriesandDataFrames - Series is used for one-dimensional collections, Dataframes for two-dimensional data
- SQL style data manipulations
Pandas uses numpy under the hood hence good basics on numpy helps!
Later read Wes Mckinney's book - Data wrangling with numpy, pandas and ipython
See - Pandas Intro
- How to create a pandas
Serieswith custom indices? - Does pandas have inbuilt stats methods, how does it offer more convenience than using numpy?
- What happens when we add string functionality into Series, does it modify parent array?
- Enhanced two-dimensional array with support for missing data. This is a key benefit as numpy only handles homogeneous integer types.
- Additional operations and capabilities that are useful in datascience tasks.
Practice: Pandas Dataframes
- How is
pandasDataframe stats compare to numpy or pandas.Series stats? - How to define precision for stats returned in pandas
describeoperation? - How to assign custom index labels while creating a dataframe?
- What is the difference between
locandilocattributes in dataframes? - Explain how slicing by row and column works in pandas dataframes?
- How to select specific rows or columns ?
- How to select all rows while slicing data?
- How does boolean indexing work in dataframes? What value is replaced in columns that don't match given condition?
- Explain
atandiat, when is this useful? - What is the difference between using
Tvstranspose()methods for dataframes / numpy? - How to sort dataframes?
- How does transpose work?