8 python libraries for data science
1. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
In python interpreter
>>> import numpy as np
>>> a = np.arange(15).reshape(3,5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
2.scikit-learn - is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.features various algorithms like support vector machine, random forests, and k-neighbours, and it also supports Python numerical and scientific libraries like NumPy
and SciPy
.
>>> clf = RandomForestClassifier(random_state=0)
>>> x=[[1,2,3],
... [11,12,13]]
>>> y = [0,1]
>>> clf.fit(x,y)
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
criterion='gini', max_depth=None, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=0, verbose=0,
warm_start=False)
>>> clf.predict(x)
array([0, 1])
>>> clf.predict([[4,5,6],[13,14,15]])
array([0, 1])
3.pandas
When working with tabular data, such as data stored in spreadsheets or databases, Pandas is the right tool for you. Pandas will help you to explore, clean and process your data. In Pandas, a data table is called a DataFrame.
https://pandas.pydata.org/getting_started.html
The primary two components of pandas are the Series
and DataFrame
>>> import pandas as pd
>>> data = {
... 'naveen':[50,40,30,20],
... 'John':[23,50,34,22]
... }
>>> marks = pd.DataFrame(data)
>>> marks
naveen John
0 50 23
1 40 50
2 30 34
3 20 22
>>> marks = pd.DataFrame(data,index=['English','Maths','Science','History'])
>>> marks
naveen John
English 50 23
Maths 40 50
Science 30 34
History 20 22
4.Sympy -
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible
>>> from sympy import solve,Eq,symbols
>>> x, y, z, d = symbols('x y z d')
>>> eq1 = Eq(x+y,8)
>>> eq2 = Eq(x+z,13)
>>> eq3 = Eq(z+d,6)
>>> eq3 = Eq(z-d,6)
>>> eq4 = Eq(y+d,8)
>>> solve(eq1,eq2,eq3,eq4,(x,y,z,d))
5.mathplotlib -Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy
>>> import matplotlib.pyplot as plt
>>> import numpy as nu
>>> x = nu.linspace(0,10,100)
>>> plt.plot(x,x,label='linear')
[<matplotlib.lines.Line2D object at 0x7fe300680910>]
>>> plt.legend()
<matplotlib.legend.Legend object at 0x7fe2f38f6450>
>>> plt.show()
6. Tensorflow
TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.
https://www.tensorflow.org/api_docs/python
https://machinelearningmastery.com/introduction-python-deep-learning-library-tensorflow/
7. Keras: The Python Deep Learning library
Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.
There are two main types of models available in Keras: the Sequential model, and the Model class used with the functional API.
https://keras.io/
8.Scipy -
https://www.scipy.org/getting-started.html
The SciPy ecosystem
Scientific computing in Python builds upon a small core of packages:
-
Python, a general purpose programming language. It is interpreted and dynamically typed and is very well suited for interactive work and quick prototyping, while being powerful enough to write large applications in.
-
NumPy, the fundamental package for numerical computation. It defines the numerical array and matrix types and basic operations on them.
-
The SciPy library, a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics, and much more.
-
Matplotlib, a mature and popular plotting package that provides publication-quality 2-D plotting, as well as rudimentary 3-D plotting.