Class 12 IP revision notes Comprehensive Guide

This article consists of the Chapter-wise term 1 class 12 IP revision notes as a comprehensive guide. There are two units for the Term I syllabus.

Distribution of Marks

Unit NoUnit NameMarks
1Data handling using pandas and data visualization25
2Database Query using SQL25
3Introduction to Computer Neworks10
4Societal Impacts10
Total70

So let’s start with this unit 1 data handling using pandas for Term 1 Class 12 IP revision notes:

Data handling using pandas

  • Python supports number of libraries to deal with data
  • Python libraries provide python modules provides basically written in C, to access the number of functions for I/O operations, complex problem-solving modules, data science and interface design for GUI applications etc.
  • In other words, libraries are collections of modules and packages that fulfil specific needs or applications.
  • Some commonly used libraries are python standard library, NumPy library, SciPy library, Tkinter library, Pandas library, Matplotlib library etc.

Introduction to Pandas – Term 1 Class 12 IP revision notes

  • Pandas stand for (PANel DAta System)
  • It was developed by Wes McKinney
  • It is open-source python library that makes data science or data analysis easy and effective
  • It provides the flexible and powerful functions and properties for 1D and 2D data structure
  • It provides high-performance data analysis tools
  • It is used in major fields like academic, commercial such as finance, economics, statistics and analytics etc.

Difference between NumPy and Pandas – Term 1 Class 12 IP revision notes

Key PointNumPyPandas
DataRequires homogeneous dataCan have heterogeneous data
EffectivenessNumPy is very effective for same kind of collectionprovides a simple interface for operations like select, access, plot, join and group by function
Kind of dataIt is a handy tool for numeric dataIt is a handy tool for data processing in the tabular form of data
MemoryConsumes less memoryConsumes more memory
IndexingIndexing is very quickIndexing is slow compared to NumPy

Features of Pandas – Term 1 Class 12 IP revision

  • Efficient to read different types of data like integer, float, double etc.
  • In a data frame rows and columns can be added, deleted or modified anytime
  • Support group by, aggregate functions, joining, merging
  • Capable to pull data from MySQL database and CSV files and vice-versa
  • Can extract data from large data set and combine multiple tabular data structures in a single unit
  • Can find and fill missing data
  • Reshaping and reindexing can be done in various forms
  • Can be used for future prediction from received data
  • Provides functions for data visualization using matplotlib and seaborn

Installing Pandas – Term 1 Class 12 IP revision

  • The installation can be done in pandas using pip command.
  • Open cmd prompt to use pip commands
  • The following commands can be useful for installation with pip installer:
  • Checking whether pandas is installed or not – pip list
  • Installing pandas – pip install pandas
  • To uninstall pandas – pip uninstall pandas

Importing Pandas for a program

To import pandas follow this command

import pandas as pd

Data Structures in Pandas – Term 1 Class 12 IP revision notes

  • The way of storing, organizing, and maintaining data for appropriate applications is known as a data structure
  • Can help in extracting information easily
  • Pandas provide the following data structures:
    • Series:
      • It is a 1-dimensional data structure
      • Stores homogeneous data
      • It is data mutable and sizes immutable data structure
    • Dataframe:
      • It is 2 dimensional data structure
      • Stores heterogeneous data
      • It is data mutable as well as size mutable
    • Panel
      • It is a 3-dimensional data structure

Working with series

  • A set of ordered dictionaries with associated indexes and values.
  • An associated index refers to the numeric position starting with 0
  • Users can also assign values or labels for the index in series
  • The series() function is used to create series
  • Syntax:
import pandas as pd
<series_object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Parameters

  • data :
    • sequences, array, iterable, dictionary, list, or scalar value
    • Contains data stored in Series.
    • If data is a dict, argument order is maintained.
  • indexarray-like or Index (1d)
    • Values must be hashable and have the same length as data.
    • Non-unique index values are allowed.
    • Will default to RangeIndex (0, 1, 2, …, n) if not provided.
    • If data is dict-like and index is None, then the keys in the data are used as the index.
    • If the index is not None, the resulting Series is reindexed with the index values.
  • dtypestr, numpy.dtype, or ExtensionDtype, optional
    • Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usage.
  • namestr, optional
    • The name to give to the Series.
  • copybool, default False
    • Copy input data. Only affects Series or 1d ndarray input.

Creating series empty series

import pandas as pd
s=pd.Series()
print(s)

Creating series using a sequence (List)

import pandas as pd
s=pd.Series([23,45,67,78])
print(s)

Creating series using a sequence (List) and Assigning index

import pandas as pd
s=pd.Series([23,45,67,78],index=['JAN','FEB','MARCH','APRIL'])
print(s)

Creating series using multiple lists

import pandas as pd
sales=[23,45,67,78]
mon=['Jan','Feb','Macrh','April']
s=pd.Series(data=sales, index=mon)
print(s)

Creating series using range() function

import pandas as pd
s=pd.Series(range(5))
print(s)

Creating series using range and assigning index using for loop

import pandas as pd
s=pd.Series(range(5), index=[i for i in 'pqrst'])
print(s)

Creating series using missing values

import pandas as pd
import numpy as np # for NaN value 
s=pd.Series([23,np.NaN,67,np.NaN])
print(s)

Creating series using scalar value

import pandas as pd
s = pd.Series(7,range(5))
print(s)

Creating series using numpy array

import pandas as pd
import numpy as np
ar=np.array([22,33,44,55])
s=pd.Series(ar)
print(s)

Creating series using dictionary

import pandas as  pd
d={'Sachin':45,'Kapil':67,'Bhavin':89,'Mahesh':78}
s=pd.Series(d)
print(s)

Creating series using mathematics expression

import pandas as pd
import numpy as np
ar=np.arange(11,16)
s=pd.Series(ar,index=ar*3)
print(s)

Select Access elements of a series

There are certain ways to select and access elements of a series. The most popular ways are indexing and slicing.

Indexing

  • can be used with series with label index or positional index
  • the positional index always starts with 0 and labelled index will be the index assigned by user
  • Example of positional index
import pandas as pd
s=pd.Series([45,67,87,11,23])
#accessing the positional index 
print(s[1])
#accessing multiple index with positional index
print(s[[1,3]])
  • Example of labelled index
import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
#accessing the single label index 
print("Accessing the single label index:",s['Feb'])
#accessing multiple indexes with labelled index
print("Accessing multiple indexes with labelled index",s[['Feb','Mar']])

Changing the index using reset_index() function

import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
s.reset_index(inplace=True,drop=True)
print(s)

Accessing series using Slicing

  • used to extract elements from the series
  • slice can be done using [start:stop:step]
  • it will return the n-1 values from the series when positional indexes are used
  • it will return all the values from series when labelled indexes are used
  • Example
import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
print("Example 1 with position slicing, excludes the value at the 4th index")
print(s[1:4])
print("Example 2 with label slicing, includes all the labels")
print(s['Jan':'Apr'])
print("Example 3 Reverse order")
print(s[::-1])

Modifying values using Slice

import pandas as pd
s=pd.Series([45,67,87,11,23],index=['Jan','Feb','Mar','Apr','May'])
print("Example 1 with position slicing, excludes the value at the 4th index")
s[1:3]=30
s['Jan':'Apr':3]=20
print(s)

Attributes of Series

  • The attributes are also known as properties
  • The syntax of accessing attributes/perperties are as following
    • <series_object>.properties
Properties (Attribute)UseExample
indexReturns the index of the seriess.index
Ouptut:
Index([‘Jan’, ‘Feb’, ‘Mar’, ‘Apr’, ‘May’], dtype=’object’)
name.nameAssigns a name to the indexs.index.name=’Month’
Output:
Month
Jan 45
Feb 67
Mar 87
Apr 11
May 23
dtype: int64
nameAssigns a name to the seriess.name=’Monthly Data’
print(s.name)
Output:
Monthly Data
valuesReturns the list of values from the seriess.values
Output:
[45 87 67 11 23]
dtypeReturns the data type of the sizes.dtype
Output:
int64
shapeReturns the number of rows and columns in tuple form, as series is only 1D data structure so it returns the only number of rowss.shape
Output:
(5,)
nbytesReturns the no. of bytes from the seriess.nbytes
Output:
40
ndimReturns the dimension of given series, which is always 1s.ndim
Output:
1
sizeReturns the number of elements from the seriess.size
Output:
5
itemsizeReturns the size of the specified item from the seriess[2].itemsize
Output:
8
hasnansReturns True if the series contains NaN values.hasnans
Output:
False

s=pd.Series([2,None,3])
s.hasnans
Output:
True
emptyReturns True if the series is emptys.empty
Output:
False

s=pd.Series()
s.empty
Output:
True

Methods of Series

Certain methods are used for series manipulations. The methods required some parameters to be passed int to the brackets. These methods are as follows:

MethodUseExample
head()Returns top 5 rows from the series by default otherwise specified rows from series. s.head(2)
Output:
Jan 45.0
Feb 67.0
Name: Monthly Data, dtype: float64

s.head()
Output:
Jan 45.0
Feb 67.0
Mar 87.0
Apr NaN
May 23.0
Name: Monthly Data, dtype: float64
count()Count the Non-NaN values in the seriess.count()
Output:
4
tail()Returns bottom 5 rows from the series by default otherwise specified rows from the series.s.tail(2)
Output
Apr NaN
May 23.0
Name: Monthly Data, dtype: float64

s.tail()
Output
Jan 45.0
Feb 67.0
Mar 87.0
Apr NaN
May 23.0
Name: Monthly Data, dtype: float64
len()This function is used to return the length of the given series.len(s)
Output:
5

Mathematical Operations

  • The mathematical operations such as add, subtract, multiply and division can be performed on multiple series.
  • While performing mathematical operations series must be matched.
  • All missing values or mismatched values will be filled by NaN.
  • Example:
import pandas as pd
s=pd.Series([45,67,87,None,23],index=['Jan','Feb','Mar','Apr','May'])

s1=pd.Series([21,22,23,24,25],index=['Jan','Mar','Apr','May','June'])

s2=s+s1
print(s2)

The calculation will be done as follows:

indexss1s + s1
Jan452166
Feb67NaNNaN
Mar8722109
AprNaN23NaN
May232447
JuneNaN25NaN

Output:

Apr       NaN
Feb       NaN
Jan      66.0
June      NaN
Mar     109.0
May      47.0
dtype: float64
  • add() function can be also used for the addition.
  • Add function also supports fill_value parameter to fill the NaN value.
  • Example: s=s.add(s1,fill_value=0)

Now in the next section of Term 1 Class 12 IP revision notes we will see the data frame portion.

DataFrame

  • It is 2D data structure of Pandas.
  • It processes the data in tabular form.
  • It is having row indexes and column labels.
  • Each column consists of a different data type of values.
  • pd.DataFrame() method is used to create a data frame.
  • The syntax to create dataframe is as following:
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
  • Parameters
    • data: ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion order. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order.
    • index: Index or array-like, Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index is provided.
    • columnsIndex or array-like:Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
    • dtypedtype, default None: Data type to force. Only a single dtype is allowed. If None, infer.
    • copybool or None, default None, Copy data from inputs. For dict data, the default of None behaves like copy=True. For DataFrame or 2d ndarray input, the default of None behaves like copy=False.

Creating Empty DataFrame

import pandas as pd
df=dp.DataFrame()

Output:

Empty DataFrame
Columns: []
Index: []

Creating data frame using numpy array

import pandas as pd
import numpy as np
a1=np.array([11,22,33,44])
a2=np.array([20,30,40,50])
a3=np.array([2,4,6,8])
cols=['A','B','C','D']
df=pd.DataFrame([a1,a2,a3],columns=cols)
print(df)

It will assign default index columns for the dataframe in cols list not created.

Creating dataframe using list of dictionaries

import pandas as pd
d=[{'Mon':30,'Tue':40,'Wed':44},{'Mon':41,'Tue':28,'Wed':42}]
df=pd.DataFrame(d,index=['Ahmedabad','Baroda'])
print(df)

Creating dataframe using dictionary of lists

import pandas as pd
df=pd.DataFrame({'Team':['Australia','India','England'],'Rank':['II','I','III'],'Points':[123,137,120]})
print(df)

Creating dataframe from series

import pandas as pd
df={'KL Rahul':pd.Series([2,21,48],index=['Pak','NZ','AFG']),
'Rohit Sharma':pd.Series([0,17,53],index=['Pak','NZ','AFG']),
'Virat Kohli':pd.Series([57,32,10],index=['Pak','NZ','AFG'])}
pint(df)

Iteration on dataframe

  • Iteration can be done in two ways: iterate over rows, iterate over columns
  • Pandas provided two functions for iteration: iterrows, iteritems

Iterating by rows

import pandas as pd
d=[{'Mon':30,'Tue':40,'Wed':44},{'Mon':41,'Tue':28,'Wed':42}]
df=pd.DataFrame(d,index=['Ahmedabad','Baroda'])
for (ri,s) in df.iterrows():
  print("~"*50)
  print("City:",ri)
  print("~"*50)
  print("\nTemprature Record:")
  print(s)

Output:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
City: Ahmedabad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Temprature Record:
Mon    30
Tue    40
Wed    44
Name: Ahmedabad, dtype: int64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
City: Baroda
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Temprature Record:
Mon    41
Tue    28
Wed    42
Name: Baroda, dtype: int64

Iterate over dataframe by columns

import pandas as pd
d=[{'Mon':30,'Tue':40,'Wed':44},{'Mon':41,'Tue':28,'Wed':42}]
df=pd.DataFrame(d,index=['Ahmedabad','Baroda'])
for (ci,s) in df.iteritems():
  print("~"*50)
  print("Day:",ci)
  print("~"*50)
  print("\nTemprature Record:")
  print(s)

Output

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Day: Mon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Temprature Record:
Ahmedabad    30
Baroda       41
Name: Mon, dtype: int64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Day: Tue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Temprature Record:
Ahmedabad    40
Baroda       28
Name: Tue, dtype: int64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Day: Wed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Temprature Record:
Ahmedabad    44
Baroda       42
Name: Wed, dtype: int64

Follow this link to read comprehensive notes on

Now in the next section of Term 1 Class 12 IP revision notes we are going to discuss some operations on dataframe. These operations are add/delete or insert/remove rows/columns from dataframe, select/access rows/columns from dataframe, rename, head and tail function, indexing using labels and boolean indexing.

Add columns in dataframe

#Method 1 – Specify the column label doesn’t exist

  • To add a column, use column lable by assigning data to the new column
  • Example
df['Thurs']=[33,32]

Output

            Mon  Tue  Wed  Thurs
Ahmedabad   30   40   44     32
Baroda      41   28   42     41

If the column label is given which is already present in the dataframe then it will update the data in the same column.

Add multiple Columns

df[['Thurs','Fri']]=[[32,41],[33,43]]

#Method 2 Insert method

Syntax: df.insert(index, column_label, data, allow_duplicate)

Parameters:

  • index: the index where column is going to be inserted
  • column_label: the column label which is going to be inserted
  • data: data is going to be inserted, provide the list of values for multiple or all rows get the fix value
  • allow_duplicate: allows to insert the duplicate values or not, values can be True or False

Example:

df.insert(2,"Thurs",[20,22],False)

Output:

            Mon  Tue  Thurs  Wed
Ahmedabad   30   40     20   44
Baroda      41   28     22   42

#Method 3 assign() function

  • assign() function will create a new dataframe with newly added column
  • take a new dataframe object and initialize it with new columns and data as parameters for assigned function

Syntax:df=df.assign(new_column=[data_list])

Parameters:

  • new_column: specify the column name needs to be added
  • data_list: provide the data list for the, specify None for NaN

Example:

df=df.assign(Thurs=[20,24])

Output:

            Mon  Tue  Wed  Thurs
Ahmedabad   30   40   44     20
Baroda      41   28   42     22

#Method 4 By using dictionary

  • Create a dictionary for data
  • Assign the data to the column
df['Fri']={30:'Ahmedabad',35:'Baroda'}

Add row in dataframe

  • You can add row using following two ways
    • append()
    • loc

The Append Method


df=df.append({'Mon':34,'Tue':37,'Wed':42},ignore_index=True)

The loc[] method

df.loc[3]=[34,56,78]

Select/Access data from dataframe

  • There are various methods to select/access data from dataframe.
  • Data can be extracted in following ways:
    • All rows, all columns
    • All rows, limited columns
    • Limited rows, limited columns
    • Limited rows, All columns
  • Extracting data based on label indexing
    • the label-based indexing can be done using loc method
print(df.loc['Ahmedabad'])
  • Extracting data based on positional indexing
print(df.loc[3])
  • The above method will extract data from a particular row as specified in the square bracket.
  • The loc can be also used to modify the content.
  • Passing a single column label and returning the column as a series.

Syntax: df.loc[:, column]

Example:

print(df.loc[:,'Mon'])
  • Passing range of column labels and returning the range of columns from dataframe

Syntax: dfl.loc[;,’Mon’,’Tue’]

Example:

print(df.loc[:,'Mon':'Tue'])
  • Passing range of columns labels and row labels and returning the data

Syntax:df.loc[row1:rown,Col1:Coln]

Example:

print(df.loc['Ahmedabad':'Baroda','Mon':'Tue'])
  • Specifying a list of row indexes and returning data:

df.loc[[row1,ro2,…]]

Example:

print(df.loc[['Ahmedabad','Baroda']])
  • Boolean Indexing
    • Uses to filter records from the dataframe
    • The condition must be needed
    • returns True or False
    • Indexes can be created with either True or False or with 0 and 1
  • Example:
#Creating DataFrame with boolean indexing
#Method1
df=pd.DataFrame({'R.No':[2,4,5,7,9,11],
                 'name':['Divya','Deepu','Sanjay','Mohan','Chirag','Dhara']},
                index=[True,False,True,True,False,False])

#Accessing boolean indexes
print(df.loc[True])
print(df.iloc[1])

DataFrame Properties

Property/AttributePurposeExample
df.indexDisplay list of row labelsprint(df.index)

Ouput:
Index([‘Ahmedabad’, ‘Baroda’], dtype=’object’)
df.columnsDiplay list of columnsprint(df.columns)

Output:
Index([‘Mon’, ‘Tue’, ‘Wed’], dtype=’object’)
df.axesDisplay the tuples of row labels and columns labelsprint(df.axes)

Output:
[Index([‘Ahmedabad’, ‘Baroda’], dtype=’object’), Index([‘Mon’, ‘Tue’, ‘Wed’], dtype=’object’)]
df.dtypesReturns the data types of dataframeprint(df.dtypes)

Output:
Mon int64
Tue int64
Wed int64
dtype: object
df.sizeFetch the size of dataframe which is the product of no. of rows and columnsprint(df.size)

Output:
6
df.shapeIt displays the tuple of no. of rows and columns.print(df.shape)

Output:
(2, 3)
df.ndimReturns the dimension of dataframe which is always 2.print(df.ndim)

Output:
2
df.valuesReturns the list of values from dataframe.print(df.values)

Output:
[[30 40 44]
[41 28 42]]
df.TTranspose the dataframe.print(df.T)
df.countCounts the number of values in the dataframe as parameters passed by the user.

The possible arguments are as follows:
1. 0 – Rows
2. 1 – Columns
3. axis= ‘index’ – Rows
4. axis=’columns’ – Columns
df.count(0)
df.count(10)
df.count(axis=’index’)
df.count(axis=’columns’)
df.emptyChecks whether the dataframe is empty or not. If it’s empty returns True otherwise false. print(df.empty)

Output:
False

Fetching records from dataframe using different conditions

Consider the following dataframe and write a command to do the following:

Year2018Year2019Year2020Year2021
India44352522
Pakistan41322218
England40302015
Australia38281812
  • Fetch the records on India, Pakistan and Australia for the year 2019 and 2021.
print(df.loc[df.index.isin(['India','Pakistan','Australia']),['Year2019','Year2021']])
  • Display the matches played by England in 2021.
print(df.loc[df.index=='England','Year2021'])
  • Display the records for 2020 year for all the teams
print(df['Year2020'])
print(df.loc[:,'Year2020'])
print(df.loc[:,df.columns.isin(['Year2020'])])
  • Display the records of the team which played matches between 40 to 50 in 2018.
#method1
df1=df[df['Year2018'].between(40,50)]
print(df1['Year2018'])

#method2
df1=df[(df['Year2018']>=40)&(df['Year2018']<=50)]
print(df1['Year2018'])

#method 3
df1=df.query('40>= Year2018 <=50')
print(df1['Year2018'])
  • Display records in reverse order
print(df[::-1])
  • Display bottom 3 records
print(df.tail(3))

Display the records for 2019 for the teams that played more than 30 matches

#Method1
df1=df.loc[df.Year2019>30]
print(df1['Year2019'])

#Method2
df1=df.query('Year2019>30')
print(df1['Year2019'])

#Method3
df1=df[df.Year2019>30]
print(df['Year2019'])

Delete Columns

  • There are three ways to delete columns
    • Delete using drop() function
    • Delete using label
    • Delete using column property
  • Delete using drop() function
    • The syntax of drop method is as follows

df=df.drop(column_list,axis=1)

Example:

df=df.drop(['Year2018','Year2019'],axis=1)
  • Delete column using column label
    • The syntax of drop method using column label is as follows

df=df.drop(columns=column_list)

Example:

df=df.drop(columns=['Year2018','Year2019'])
  • Delete column using columns properpty
    • The syntax is as follows

df=df.drop(df.columns[columnindex],axis=1)

Example:

df=df.drop(df.columns[[1,2]],axis=1)

Delete Rows

There are certain ways to delete rows from dataframe. They are:

  1. Using Index Name
  2. Using Drop Method
  3. Using .index

Using Index Name

#Method 1
df=df.drop('Pakistan')

#Method 2
df=df.drop(index='Pakistan')

#Multiple rows
df=df.drop(['Pakistan','England'])
df.drop(['Pakistan','England'], inplace=True)

#Method 3
df=df.drop(df.index[[1,3]])

Rename column names in Dataframe

#Method1
df=df.rename({'Year2018':2018,'Year2019':2019},axis=1)

#method2
df.rename(columns={'Year2018':2018,'Year2019':2019},inplace=True)

head() Function

  • The head() function is used to display top/first n rows from the dataframe.
  • If no parameter is supplied to the head() function, it will display 5 records by default.
  • Example:
df.head(3)
  • The tail() function displays bottom/last n rows from the dataframe.
  • It is opposite of head() function.

Binary Operations

  • Binary operations can be done on multiple dataframes
  • These operations can be addition, subtraction, multiplication and division.
  • For these operations, indexes should be matched.
  • If indexes are not matching then it will return NaN values.

CSV File

  • It stands for Comma Separated Value file
  • Each value is separated by a comma by default
  • It is known as a separator or delimiter character
  • It is a common file format to store tabular data
  • It can be operated or opened by text editor (Notepad) or spreadsheet software (MS Excel)
  • There are two operations performed on CSV file to dataframe
    • Write Data into CSV – to_csv() function is used to write
    • Load Data from CSV – read_csv() function is used to read data

Writing Data from pandas to csv using to_csv() function

  • to_csv() function requires following parameters:
    • path : This parameter specifies the csv file path which can be external or internal
    • sep: It specifies separator character replaced by comma
    • na_rep: It specifies the value in place of NaN. The default is ”.
    • float_format: This option specifies the number format to store in CSV file. As you know python displays a large number after decimal values in output. So this option reduces the length of digits into specified digits. 
    • header: It is used to export data column header into CSV. It can be specified as True or False. By default it is True.
    • columns: To write columns into CSV. By default it is None.
    • index: To write row number or not. By default it is True.
import pandas as pd
df=pd.DataFrame({'Name':['Akash','Lucky','Nirav','Sameer'],
                 'Sales':[200,130,189,176],
                'Comm':[500,470,495,444]})
df.to_csv("E:\\data.csv")

  • read_csv() function requires following paramters:
    • file_path or buffer: It is similar as to_csv() parameter.
    • sep: It is too similar to to_csv() sep parameter.
    • index_col: Make a passed column as an index
    • header: Change the header of as passed row
import pandas as pd
df=pd.read_csv("E:\Python Programs\CSV1.csv")
print(df)

Watch this video for more unerstanding:

Data Visualization

  • Helps to understand data in a better way
  • It is a process of representing data in graphics or pictures
  • It can use various charts and graphs to show trends, relationships between variables and comparisons
  • It provides an effective way to communicate information to intended users
  • Some popular examples are traffic symbols, ultrasound reports, atlas book of maps, the speedometer of a vehicle etc.
  • It is effectively used in many fields like health, finance, science, mathematics, engineering etc.

Plotting using matplotlib – installing and importing matplotlib

  • The matplotlib library is used to plot data on chart or graph
  • It can be installed using pip install command – pip install matplotlib
  • You can import the matplotlib package using – import matplotlib.pyplot
  • pyplot module contains a collection of functions used to plot data
  • Matplotlib provides control over every aspect of a figure
  • It offers interactive and non-interactive plotting and can save images in different formats
  • It was written by J.D.Hunter and developed by full-fledged community
  • It is distributed under a BSD-Style License
  • the plot() function will create necessary figures and axes to achieve the desired plot

Basic components of a chart

The matplotlib chart has the following components:

  1. Figure: The surrounding or outline area of a plot is a called a figure
  2. Axes: They are the lines where the data can be plotted. There are two or three types of axes. the axes contain title, x-label and y-label.
  3. Artist: Everything present in the figure is called artist, generally consisting of text objects, Line objects and collection objects.
  4. Labels: This indicated what data is to be plotted.
  5. Title: It is used to specify the title for the plot
  6. Legend: It shows different types of sets of data plotted in different colours or marks in the chart

Line plot

  • Line plot plots data on straight lines
  • The styles of lines can be modified easily using markers and line styles
  • It requires X-axes and Y-axes data
  • It is mostly used to visualize the trend in data over an interval of time
  • The important functions used for the line chart are as following:
    • plot(x,y,color,others): Draw lines as per specified lines
    • xlabel(“label”): For label to x-axis
    • ylabel(“label”): For label to y-axis
    • title(“Title”): For title of the axes
    • legend(): For displaying legends
    • show() : Display the graph
import matplotlib.pyplot as mpp 
mpp.plot(['Mayank','Shiv','Rani'],[220,190,194],'Red') 
mpp.xlabel('Employee') 
mpp.ylabel('Sales') 
mpp.title('Progress Report Chart') 
mpp.show()

Customizing line chart

Matplolib provides various functions for customizing chart. The following functions are used to customize the chart.

Function Use
gird()Shows the grid lines on plot figure
legend()Display the legends
savefig()Save the current figure
xticks()Set the current tick location for the x-axis
yticksSet the current tick location for the y-axis

Changing line colour and line style

  • Matplotlib provides different styles and colours for lines. The following tables are showing styles and colours.
  • The colours can be used for the same are with the abbreviations b,c,g,k,m,r,w,y shows the colour blue, cyan, green, black, magenta, red, white and yellow respectively.
  • The styles are -,–,-.,: shows solid line, dashed line, dash-dot line, and dotted line respectively.
mpp.plot(['Mayank','Shiv','Rani'],[220,190,194],'m',linestyle='--') 

Changing marker, marker size and linewidth

  • The marker, marker size and linwidth can be changed accordingly as and when needed.
  • The values for the marker are as follows:
MarkerDescription
‘o’Circle
‘*’Star
‘.’Point
‘,’Pixel
‘x’X
‘X’X (filled)
‘+’Plus
‘P’Plus (filled)
‘s’Square
‘D’Diamond
‘d’Diamond (thin)
‘p’Pentagon
‘H’Hexagon
‘h’Hexagon
‘v’Triangle Down
‘^’Triangle Up
‘<‘Triangle Left
‘>’Triangle Right
‘1’Tri Down
‘2’Tri Up
‘3’Tri Left
‘4’Tri Right
‘|’Vline
‘_’Hline
  • Marker size can be any numeric value for the marker with the parameter markersize or ms.
  • You can set the line width using linewidth parameter to change the width of line
mpp.plot(['Mayank','Shiv','Rani'],[220,190,194],color='g', linestyle='-.', marker='o', linewidth=3) 

Plotting Bar Chart

  • Bar charts are used to show the comparison between data
  • The bar() function or kind=’bar’ parameter of plot function is used to plot the bar chart
  • It shows the bar chart with rectangular bars
  • The rectangular bar has a height up to the corresponding value
  • It requires two data series
  • The bars can be plotted vertically or horizontally
  • The parameters for bar function is as following:
xsequence of scalars representing the x coordinates of the bars. align controls if x is the bar center (default) or left edge.
widthscalar or array-like, optional. the width(s) of the bars default 0.8
bottomscalar or array-like, optional. the y coordinate(s) of the bars default None.
histtype {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}, Default ‘bar’
orientation {‘horizontal’, ‘vertical’}, Default ‘horizontal’
align{‘center’, ‘edge’}, optional, default ‘center’
import matplotlib.pyplot as plt
courses=['C','C++','Java','DotNet','Python','Perl']
no_of_std=[20,22,25,30,45,21]
plt.bar(courses,no_of_std)
plt.xlabel('Courses') 
plt.ylabel('Strength') 
plt.title('Strength per course') 
plt.show()

Customizing Bar Chart

You can customize the bar chart as a line chart.

Observe the following code:

plt.bar(courses,no_of_std,width=.5,color=['r','g','b'],label='Courses', dgecolor='k',linewidth=3,linestyle='-.')

Plotting Histogram

  • It is a powerful technique for data visualization
  • It is a graphical display of frequencies
  • It is an accurate graphical representation of the probability distribution of numerical data
  • It was introduced by Karl Person
  • It plots the quantitative variable
  • It shows what portion of data set falls into each category specified as non-overlapping intervals called bins
  • To make a histogram the data is sorted into “bins” and the number of data points in each bin is counted
  • The height of each column in the histogram is then proportional to the number of data points its bin contains
  • df.plot(kind=’hist’) will create a histogram
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Ankush', 'Divya', 'Priya', 'Manu', 'Tanu','Bhavin'],
'Height' : [60,61,63,65,61,60],
'Weight' : [53,65,52,58,50,51]}
df=pd.DataFrame(data)
df.plot(kind='hist',bins=5,edgecolor='red',linewidth=3,
        color=['m','y','k'],linestyle=':',fill=False,hatch='o')
plt.show()
  • The values for hatch parameters are: ‘-‘, ‘+’, ‘x’, ‘\’, ‘*’, ‘o’, ‘O’, ‘.’
plt.hist(df,bins=[50,55,60,65],edgecolor='red',linewidth=3,
 color=['m','y','k'],linestyle=':',fill=False,hatch='o')

Just have a look at another example of histogram:

import matplotlib.pyplot as m
english=[77,66,88,99,55,44,33,79,68,83]
maths=[56,89,70,50,60,65,90,80,47,82]
m.hist([english,maths], orientation='horizontal', histtype="stepfilled", cumulative=True)
m.show()

Societal Impacts

  • Nowadays the digital technologies surrounding the world around us
  • In this world, everything is almost interconnected in a way or other
  • A network is a group of devices connected together for sharing resources and information
  • The following networks are examples of different networks
    • Social Media Network
    • Mobile Network
    • Computer Networks
    • Various networks like airlines, groups of schools, groups of colleges, hospitals etc.
  • The main purpose of the network is to share data and resources as well as establish a connection for communication
  • The size of the network may vary from small to large
  • Network consists of different hosts like servers, desktops, laptops and smartphones and some network devices such as switches, hubs, routers, modems etc.
  • Data Packets refer to data divided into small units for the communication
  • The devices can be connected through a wired or wireless
  • A single computer connected to a network that receives, creates, stores or sends data to different networks is called a node
  • Computer control and manages the resources, users, files and databases in the network is called a server

Digital Footprint

  • While surfing on the internet we are leaving a trail of data that reflects the actions performed by us online, which is called a digital footprint
  • A digital footprint can be created knowingly or unknowingly
  • It includes the following:
    • Websites visited
    • Sent Emails
    • Online forms
    • IP address
    • Location information
    • Device information
  • The information left as a digital footprint could be used for advertising or misused or exploited
  • So be aware of what you are uploading, writing, downloading, filling in the form etc. online
  • There are two kinds of digital footprints:
    • Active Digital Footprint
      • Data submitted intentionally online
      • It includes emails, responses, and posts written on different online platforms
    • Passive Digital Footprint
      • Data submitted unintentionally online
      • It includes data generated online when a website is visited, using a mobile app, browsing the internet etc.
  • A person who uses the internet may have a digital footprint
  • When you examine the browser settings you will get it stores browsing history, cookies, passwords, auto-fills etc.
  • Besides browsers, most of the digital footprints are stored on the servers
  • You cannot access these data, cannot erase or remove them, or you didn’t have any control over how the data can be used
  • Even if you delete data from your end but it remains there
  • There is no guarantee that digital footprint will be deleted from the internet completely
  • These can be used to track the user, their location, device and other usage details

Net and communication etiquette

  • While using the internet, users need to be aware of how to conduct themselves, behave properly with others online, follow some ethics, morals and maintain some values online
  • Anyone who is using digital technology and internet is a digital citizen or netizen
  • Everyone who is using internet should practice a safe, ethical and legal use of digital technology
  • He/She must be abiding by net etiquette, communication etiquette and social media etiquette

Follow these links for further topics:

So I hope you have enjoyed Term 1 Class 12 IP revision notes. If you have any concerns related to any topic or any other doubts related to any topic from the Term 1 Class 12 IP revision notes, feel free to ask in the comment section. Like and share this article with your classmates and friends. Thank you for reading this article, TATA!!!!