Skip to main content

Pandas in python

 Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks.

Install and import

->pip install pandas

To import pandas we usually import it with a shorter name since it's used so much:

import pandas as pd

Data Structures in Pandas

The primary two components of pandas are the Series and DataFrame.

Series is essentially a column, and a DataFrame is a multi-dimensional table made up of a collection of Series.

Series vs DataFrame



Pandas Series

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet.



>>> import pandas as pd

>>> a=pd.Series([1,2,3,4,5,6,7,8])

>>> a

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8

dtype: int64

>>> b=pd.Series([1,2,3,4,5,6,7,'hello'])

>>> b

0        1
1        2
2        3
3        4
4        5
5        6
6        7
7    hello


dtype: object  #string is considered as object in pandas


>>> a.ndim
        1
>>> b.ndim
        1
>>> a.size
        8
>>> b.size
        8
>>> a.shape
        (8,)
>>> b.shape
        (8,)

Dictionary in Series

>>> c=pd.Series({'chennai':12,'delhi':35,'kolkaata':56,'banglore':32})

>>> c['delhi']  #Accessing through key

        35

>>> c[1]  #Accessing through index

        35

>>> c.index  

        Index(['chennai', 'delhi', 'kolkaata', 'banglore'], dtype='object')



>>> c[c>10]   #this will give the values of c which are lesser than 10

        chennai     12
        delhi       35
        kolkaata    56
        banglore    32
        dtype: int64


>>> d=c

>>> c+d  #this will sum the values of respective keys

        chennai      24
        delhi        70
        kolkaata    112
        banglore     64
        dtype: int64


Data Frames in Pandas

--> DataFrame Creation:

>>> data=pd.DataFrame({
                                            'names':['ram','som','ravi'],
                                            'marks':[90,80,85],
                                            'city':['chennai','chennai','chennai']})

>>> data

  names  marks     city
 ram     90  chennai
1   som     80  chennai
ravi     85  chennai

>>> data['names']

0     ram
1     som
2    ravi
Name: names, dtype: object

Applying functions

It is possible to iterate over a DataFrame or Series as you would with a list, but doing so — especially on large datasets — is very slow.

An efficient alternative is to apply() a function to the dataset. For example, we could use a function to convert movies with an 8.0 or greater to a string value of "good" and the rest to "bad" and use this transformed values to create a new column.

First we would create a function that, when given a rating, determines if it's good or bad:

def rating_function(x):
    if x >= 85:
        return "good"
    else:
        return "bad"

Now we want to send the entire rating column through this function, which is what apply() does:

>>> data["grade"]=data["marks"].apply(rating_function)

>>> data

  names  marks     city grade

0   ram     90  chennai  good

 som     80  chennai   bad

ravi     85  chennai  good


Dataframe/Series.head() method

Pandas head() method is used to return top n (5 by default) rows of a data frame or series.

>>> data.head(2)

      names  marks     city     grade

       0   ram     90  chennai      good

       1   som     80  chennai       bad


Pandas Dataframe.describe() method

Pandas describe() is used to view some basic statistical details like percentile, mean, std etc. of a data frame or a series of numeric values. 

>>> data.describe()

       marks

count    3.0

mean    85.0

std      5.0

min     80.0

25%     82.5

50%     85.0

75%     87.5

max     90.0

DataFrames Concatenation


        concat() function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes

# Creating first dataframe
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index = [0, 1, 2, 3])
  
# Creating second dataframe
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                    index = [4, 5, 6, 7])
  
# Creating third dataframe
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                    'B': ['B8', 'B9', 'B10', 'B11'],
                    'C': ['C8', 'C9', 'C10', 'C11'],
                    'D': ['D8', 'D9', 'D10', 'D11']},
                    index = [8, 9, 10, 11])
  
# Concatenating the dataframes
pd.concat([df1, df2, df3])

Output:

Concatenation




DataFrame.loc[]


Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame.



>>> d={'names':['ram','som','kumar','bala','arun'],'marks':[90,90,90,78,56],'sections':['A','B','A','B','A']}

>>> data=pd.DataFrame(d)

>>> data

   names  marks sections
0    ram     90         A
1    som     90        B
 kumar   90        A
3   bala      78        B
4   arun      56        A

>>> data.loc[1:2]

   names  marks sections
  som     90        B
2  kumar     90        A



iloc:

    “iloc” in pandas is used to select rows and columns by number, in the order that they appear in the data frame

>>> data.iloc[0]

names       ram
marks        90
sections      A
Name: 0, dtype: object



DataFrames Merge



Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame objects.

# Python program to merge
# dataframes using Panda
  
# Dataframe created
left = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})
  
right = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})
                        
# Merging the dataframes                      
pd.merge(left, right, how ='inner', on ='Key')

Output:

Merging



Pandas GroupBy:

Groupby mainly refers to a process involving one or more of the following steps they are:

  • Splitting : It is a process in which we split data into group by applying some conditions on datasets.
  • Applying : It is a process in which we apply a function to each group independently
  • Combining : It is a process in which we combine different datasets after applying groupby and results into a data structure
>>> 
    d={
            'names':['ram','som','kumar','bala','arun'],
            'marks':[90,90,90,78,56],
            'sections':['A','B','A','B','A']}

>>> data=pd.DataFrame(d)

>>> data
   names  marks sections
0   ram     90        A
1   som     90        B
2   kumar  90        A
3   bala     78        B
4   arun     56        A

>>> data.groupby('sections')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002B888579D88>

>>> data.groupby('sections').groups
{'A': [0, 2, 4], 'B': [1, 3]}

>>> data.groupby('sections').sum()

           marks
sections       
A           236
B           168
>>> 


Comments

Popular posts from this blog

Is-A and Has-A relationships in python

  In object-oriented programming, the concept of IS-A is a totally based on Inheritance, which can be of two types Class Inheritance or Interface Inheritance. It is just like saying "A is a B type of thing". For example, Apple is a Fruit, Car is a Vehicle etc. Inheritance is uni-directional. For example, House is a Building. But Building is not a House. #Is-A relationship --> By Inheritance class  A:    def   __init__ ( self ):      self .b= 10    def   mym1 ( self ):      print ( 'Parent method' ) class  B(A):    def   mym2 ( self ):      print ( 'Child method' ) d = B() d.mym1() #output: Parent method d.mym2() #output: Child method HAS-A Relationship:  Composition(HAS-A) simply mean the use of instance variables that are references to other objects. For example Maruti has Engine, or House has Bathroom. Let’s understand...

Magic Methods in Python

  What Are Dunder Methods ? In Python, special methods are a set of predefined methods you can use to enrich your classes.  They are easy to recognize because they start and end with double underscores, for example  __init__  or  __str__ . Dunder methods let you emulate the behavior of built-in types.  For example, to get the length of a string you can call  len('string') . But an empty class definition doesn’t support this behavior out of the box: These “dunders” or “special methods” in Python are also sometimes called “magic methods.” class NoLenSupport : pass >>> obj = NoLenSupport () >>> len ( obj ) TypeError : "object of type 'NoLenSupport' has no len()" To fix this, you can add a  __len__  dunder method to your class: class LenSupport : def __len__ ( self ): return 42 >>> obj = LenSupport () >>> len ( obj ) 42 Object Initialization:  __init__ "__init __ ...

Architechture of Kubernetes

  Kubernetes Architecture and Components: It follows the client-server architecture, from a high level, a Kubernetes environment consists of a  control plane (master) , a  distributed storage system  for keeping the cluster state consistent ( etcd ), and a number of  cluster nodes (Kubelets). We will now explore the individual components of a standard Kubernetes cluster to understand the process in greater detail. What is Master Node in Kubernetes Architecture? The Kubernetes Master (Master Node) receives input from a CLI (Command-Line Interface) or UI (User Interface) via an API. These are the commands you provide to Kubernetes. You define pods, replica sets, and services that you want Kubernetes to maintain. For example, which container image to use, which ports to expose, and how many pod replicas to run. You also provide the parameters of the desired state for the application(s) running in that cluster. API Server: The  API Server  is the front-end...