Skip to main content

Pandas in python

 Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks.

Install and import

->pip install pandas

To import pandas we usually import it with a shorter name since it's used so much:

import pandas as pd

Data Structures in Pandas

The primary two components of pandas are the Series and DataFrame.

Series is essentially a column, and a DataFrame is a multi-dimensional table made up of a collection of Series.

Series vs DataFrame



Pandas Series

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet.



>>> import pandas as pd

>>> a=pd.Series([1,2,3,4,5,6,7,8])

>>> a

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8

dtype: int64

>>> b=pd.Series([1,2,3,4,5,6,7,'hello'])

>>> b

0        1
1        2
2        3
3        4
4        5
5        6
6        7
7    hello


dtype: object  #string is considered as object in pandas


>>> a.ndim
        1
>>> b.ndim
        1
>>> a.size
        8
>>> b.size
        8
>>> a.shape
        (8,)
>>> b.shape
        (8,)

Dictionary in Series

>>> c=pd.Series({'chennai':12,'delhi':35,'kolkaata':56,'banglore':32})

>>> c['delhi']  #Accessing through key

        35

>>> c[1]  #Accessing through index

        35

>>> c.index  

        Index(['chennai', 'delhi', 'kolkaata', 'banglore'], dtype='object')



>>> c[c>10]   #this will give the values of c which are lesser than 10

        chennai     12
        delhi       35
        kolkaata    56
        banglore    32
        dtype: int64


>>> d=c

>>> c+d  #this will sum the values of respective keys

        chennai      24
        delhi        70
        kolkaata    112
        banglore     64
        dtype: int64


Data Frames in Pandas

--> DataFrame Creation:

>>> data=pd.DataFrame({
                                            'names':['ram','som','ravi'],
                                            'marks':[90,80,85],
                                            'city':['chennai','chennai','chennai']})

>>> data

  names  marks     city
 ram     90  chennai
1   som     80  chennai
ravi     85  chennai

>>> data['names']

0     ram
1     som
2    ravi
Name: names, dtype: object

Applying functions

It is possible to iterate over a DataFrame or Series as you would with a list, but doing so — especially on large datasets — is very slow.

An efficient alternative is to apply() a function to the dataset. For example, we could use a function to convert movies with an 8.0 or greater to a string value of "good" and the rest to "bad" and use this transformed values to create a new column.

First we would create a function that, when given a rating, determines if it's good or bad:

def rating_function(x):
    if x >= 85:
        return "good"
    else:
        return "bad"

Now we want to send the entire rating column through this function, which is what apply() does:

>>> data["grade"]=data["marks"].apply(rating_function)

>>> data

  names  marks     city grade

0   ram     90  chennai  good

 som     80  chennai   bad

ravi     85  chennai  good


Dataframe/Series.head() method

Pandas head() method is used to return top n (5 by default) rows of a data frame or series.

>>> data.head(2)

      names  marks     city     grade

       0   ram     90  chennai      good

       1   som     80  chennai       bad


Pandas Dataframe.describe() method

Pandas describe() is used to view some basic statistical details like percentile, mean, std etc. of a data frame or a series of numeric values. 

>>> data.describe()

       marks

count    3.0

mean    85.0

std      5.0

min     80.0

25%     82.5

50%     85.0

75%     87.5

max     90.0

DataFrames Concatenation


        concat() function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes

# Creating first dataframe
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index = [0, 1, 2, 3])
  
# Creating second dataframe
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                    index = [4, 5, 6, 7])
  
# Creating third dataframe
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                    'B': ['B8', 'B9', 'B10', 'B11'],
                    'C': ['C8', 'C9', 'C10', 'C11'],
                    'D': ['D8', 'D9', 'D10', 'D11']},
                    index = [8, 9, 10, 11])
  
# Concatenating the dataframes
pd.concat([df1, df2, df3])

Output:

Concatenation




DataFrame.loc[]


Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame.



>>> d={'names':['ram','som','kumar','bala','arun'],'marks':[90,90,90,78,56],'sections':['A','B','A','B','A']}

>>> data=pd.DataFrame(d)

>>> data

   names  marks sections
0    ram     90         A
1    som     90        B
 kumar   90        A
3   bala      78        B
4   arun      56        A

>>> data.loc[1:2]

   names  marks sections
  som     90        B
2  kumar     90        A



iloc:

    “iloc” in pandas is used to select rows and columns by number, in the order that they appear in the data frame

>>> data.iloc[0]

names       ram
marks        90
sections      A
Name: 0, dtype: object



DataFrames Merge



Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame objects.

# Python program to merge
# dataframes using Panda
  
# Dataframe created
left = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})
  
right = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})
                        
# Merging the dataframes                      
pd.merge(left, right, how ='inner', on ='Key')

Output:

Merging



Pandas GroupBy:

Groupby mainly refers to a process involving one or more of the following steps they are:

  • Splitting : It is a process in which we split data into group by applying some conditions on datasets.
  • Applying : It is a process in which we apply a function to each group independently
  • Combining : It is a process in which we combine different datasets after applying groupby and results into a data structure
>>> 
    d={
            'names':['ram','som','kumar','bala','arun'],
            'marks':[90,90,90,78,56],
            'sections':['A','B','A','B','A']}

>>> data=pd.DataFrame(d)

>>> data
   names  marks sections
0   ram     90        A
1   som     90        B
2   kumar  90        A
3   bala     78        B
4   arun     56        A

>>> data.groupby('sections')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002B888579D88>

>>> data.groupby('sections').groups
{'A': [0, 2, 4], 'B': [1, 3]}

>>> data.groupby('sections').sum()

           marks
sections       
A           236
B           168
>>> 


Comments

Popular posts from this blog

Is-A and Has-A relationships in python

  In object-oriented programming, the concept of IS-A is a totally based on Inheritance, which can be of two types Class Inheritance or Interface Inheritance. It is just like saying "A is a B type of thing". For example, Apple is a Fruit, Car is a Vehicle etc. Inheritance is uni-directional. For example, House is a Building. But Building is not a House. #Is-A relationship --> By Inheritance class  A:    def   __init__ ( self ):      self .b= 10    def   mym1 ( self ):      print ( 'Parent method' ) class  B(A):    def   mym2 ( self ):      print ( 'Child method' ) d = B() d.mym1() #output: Parent method d.mym2() #output: Child method HAS-A Relationship:  Composition(HAS-A) simply mean the use of instance variables that are references to other objects. For example Maruti has Engine, or House has Bathroom. Let’s understand...

Inheritance and Types in Python

  Inheritance   Creating a new class from existing class is known as inheritance . The class from which features are inherited is known as base class and the class into which features are derived into is called derived class . Syntax: class  derived- class (base  class ):       < class -suite>      Inheritance promotes reusability of code by reusing already existing classes.  Inheritance is used to implement  is-a  relationship between classes.   Following hierarchy is an example representing inheritance between classes:   Single inheritance   When a derived class inherits only from syntax, the base class is called single inheritance. If it has one base class and one derived class it is called single inheritance.   Diagram     Syntax class  A:  #parent class         #some code       class  b(A):...

Exception Handling in Python

  Introduction   An error is an abnormal condition that results in unexpected behavior of a program. Common kinds of errors are syntax errors and logical errors. Syntax errors arise due to poor understanding of the language. Logical errors arise due to poor understanding of the problem and its solution.   Anomalies that occur at runtime are known as exceptions. Exceptions are of two types: synchronous exceptions and asynchronous exceptions. Synchronous exceptions are caused due to mistakes in the logic of the program and can be controlled. Asynchronous exceptions are caused due to hardware failure or operating system level failures and cannot be controlled.   Examples of synchronous exceptions are: divide by zero, array index out of bounds, etc.) . Examples of asynchronous exceptions are: out of memory error, memory overflow, memory underflow, disk failure, etc. Overview of errors and exceptions in Python is as follows:     Handling Exceptions   Flowch...