NCERT Solution Chapter 3 Data handling using Pandas II

In this article, I will provide NCERT Solution for Chapter 3 Data handling using Pandas II. We have covered Chapter 1 Querying and SQL Functions and Chapter 2 Data Handling using Pandas – I.

Data handling using pandas II class 12 NCERT solutions

Let us start the article data handling using pandas II class 12 questions and answers.

  1. Write the statement to install the python connector to connect MySQL i.e. pymysql.
    • pip install pymysql
    • pip install sqlalchemy
  2. Explain the difference between pivot() and pivot_table() function?
    • The pivot() function is used to reshape and create a new dataframe from the original one.
    • Syntax:df_pi=df.pivot(index=”column_index”, columns=”column”, values=”value_to_display”)
    • The pivot_table() function is used to aggregate the values from rows with duplicate entries for the specified columns. The functions like min, max, mean etc. can be used with pivot_table() function.
    • Syntax:df_pt=df.pivot_table(index=’column_index’,columns=’column’,values=’value_to_display’,aggfunc=[‘aggrgate_functions’])
  3. What is sqlalchemy?
    • The sqlalchemy is a library used to interact with MySQL database by providing the required credentials.
    • It provides various functions like create_engine() to enable connection to be established.
  4. Can you sort a DataFrame with respect to multiple columns?
    • Yes, you can sort dataframe with respect to multiple columns.
    • The multiple columns can be specified one after one as in the form of list.
  5. What are missing values? What are the strategies to handle them?
    • The dataframe consists of rows and columns with certain values.
    • If the associated value is not available for the column in dataframe is considered as missing value.
    • It is denoted by NaN (Not a Number)
    • You can follow these strategies to handle missing values:
      • Drop the object having a missing value using dropna() function
      • Fill or estimate the missing value using fillna() function
      • You can check missing value using isnull() fuction
  6. Define the following terms:
    • median: Median refers to the middle value of a given data set
    • standard deviation: It is measured as the spread of data distribution in the given data set. The built-in function std() is used for calculating the standard deviation for a given dataset.
    • variance: It is the average of squared deviations from the mean.
  7. What do you understand by the term MODE? Name the function which is used to calculate it.
    • Mode refers to the values which the most repeated values for a given set of numbers. The mode() function is used to calculate it.
  8. Write the purpose of Data aggregation.
    • Data aggregation refers to transform the dataset and produce a single value from the dataset.
    • It can be applied to multiple rows and columns together.
    • For example average, maximum, minimum etc.
  9. Explain the concept of GROUP BY with help on an example.
    • The group by function can be used to split data into groups based on some criteria.
    • The group by function works based on a split-apply-combine strategy.
      • Split – The first step of the strategy is split which divides the data into groups
      • Apply – Apply refers to applying the required function
      • Combine – Combine refers to mixing the results from a new dataframe
  10. Write the steps required to read data from a MySQL database to a DataFrame.
    • To read data from a MySQL database to a dataframe follow these steps:
      • Establish the connection using connection string or create_engine() function
      • Fetch data from the table using read_sql_query() or read_sql_table() or read_sql() function
  11. Explain the importance of reshaping of data with an example.
    • Reshaping of data helps to prepare data to make them suitable for data analysis.
  12. Why estimation is an important concept in data analysis?
    • Estimation is an important concept in data analysis to handle the missing values.
    • The estimation refers to approximations to a nearby value or average value or zero.
    • Estimation produces the results not in an actual manner but will be a good approximation of actual results.

Watch this video for more understanding:

13. Assuming the given table: Product. Write the python code for the following:

ItemCompanyRupeesUSD
TVLG12000700
TVVIDEOCON10000600
TVLG15000800
ACSONY14000750

a) To create the data frame for the above table.

import pandas as pd
 di={'Item':['TV','TV','TV','AC'],'Company':['LG','VIDEOCON','LG','SONY'],
       'Rupees':[12000,10000,15000,14000],'USD':[700,600,800,750]}
 df=pd.DataFrame(di)
 print(df)

b) To add the new rows in the data frame.

Method 1:
df=df.append({'Item':'Freeze','Company':'LG','Rupees':17000,'USD':850},ignore_index=True)

Method 2:
df.loc[5]=['TV','SONY',25000,950]

c) To display the maximum price of LG TV.

Method 1:
print(df[(df['Item']=='TV')&(df['Company']=='LG')].max())


Method 2:
print(df.query('Item=="TV" and Company=="LG"').max())

d) To display the Sum of all products.

print(print(df.sum()))
print(df['Rupees'].sum())
print(df['USD'].sum())

e) To display the median of the USD of Sony products.

Method 1:
d=df[(df['Company']=='SONY')]
print(d['USD'].median())


Method 2:
d=df.query('Company=="LG"')
print(d['USD'].median())

f) To sort the data according to the Rupees and transfer the data to MySQL.

df=df.sort_values(by=['Rupees'])
engine=sq.create_engine("mysql+pymysql://root:root@localhost:3306/NCERT")
df.to_sql("product",engine,index=False)

g) To transfer the new dataframe into the MySQL with new values.

Same as above

14. Write the python statement for the following question on the basis of given dataset:

NameDegreeScore
0AparnaMBA90.0
1PankajBCANaN
2RamM.Tech80.0
3RameshMBA98.0
4NaveenNaN97.0
5KrrishnavBCA78.0
6BhawnaMBA89.0

a) To create the above DataFrame.

Note: I have written code for creating a dataframe in part a, for practicals it is required for all respective answers. I have written specific statements required for that answer from the answer b.

import pandas as pd
import numpy as np
d={"Name":["Aparna","Pankaj","Ram","Ramesh","Naveen","Krrishav","Bhawna"],
   "Degree":["MBA","BCA","M.Tech","MBA",np.nan,"BCA","MBA"],
   "Score":[90.0,np.nan,80.0,98.0,97.0,78.0,89.0]}
df=pd.DataFrame(d)
print(df)

b) To print the Degree and maximum marks in each stream.


df_gr=df.groupby("Degree")['Score'].max()
print(df_gr['Score'].max())

Or
print(df.groupby("Degree")['Score'].max())

c) To fill the NaN with 76.

df=df.fillna(76)
print(df)

d) To set the index to Name.

df=df.set_index("Name")
print(df)

e) To display the name and degree-wise average marks of each student.

df=df.set_index("Name")
df=df.groupby(["Name","Degree"])['Score'].aggregate('mean')
print(df)

f) To count the number of students in MBA.

print("No. of students in MBA:",df[df['Degree']=='MBA']["Name"].count())


g) To print the mode marks BCA.

df=df[df['Degree']=='BCA']["Score"].mode()
print("Mode of BCA:",df.to_string(index=False))

Solved Case Study based on Open Datasets

UCI dataset is a collection of open datasets, available to the public for experimentation and research purposes. ‘auto-mpg’ is one such open dataset. It contains data related to fuel consumption by automobiles in a city.

Consumption is measured in miles per gallon (mpg), hence the name of the dataset is auto-mpg. The data has 398 rows (also known as items or instances or objects) and nine columns
(also known as attributes).


The attributes are: mpg, cylinders, displacement, horsepower, weight, acceleration, model year, origin, car name. Three attributes, cylinders, model year and origin have categorical values, car name is a string with a unique value for every row, while the remaining five attributes have numeric value.


The data has been downloaded from the UCI data repository available at http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/.

Download Data

Following are the exercises to analyse the data.
1) Load auto-mpg.data into a DataFrame autodf.
2) Give description of the generated DataFrame autodf.
3) Display the first 10 rows of the DataFrame autodf.
4) Find the attributes which have missing values. Handle the missing values using following two
ways:
i. Replace the missing values by a value before that.
ii. Remove the rows having missing values from the original dataset
5) Print the details of the car which gave the maximum mileage.
6) Find the average displacement of the car given the number of cylinders.

7) What is the average number of cylinders in a car?
8) Determine the no. of cars with weight greater than the average weight.

Leave a Reply