Introduction – Dataframe
|3 x 5 Dataframe|
When there is a thought of two dimensions, consider MS Excel as the best example of two-dimensional data representation. It represents data in tabular form in rows and columns.
DataFrame can be divided into two simple words: i) Data and ii) Frame. So we can say that data can be surrounded in a frame of rows and columns. It can store any type of data within the frame. DataFrame is widely used to analyze big data.
In the above image 2D array is represented, which will be determined by m x n, where m=rows and n=cols. So according to the above example, we have a 2D array of 3 x 5 with 15 elements.
Characteristics of DataFrame
- DataFrame has two indexes/axes i.e row index & column index
- In DataFrames indexes can be numberes/letters/strings
- DataFrame is a collection of different data types
- DataFrame is value mutable i.e. values can be changed
- DataFrame is also size mutable i.e. indexes can be added or deleted anytime
To create DataFrame following modules should be imported, where pandas is mandatory as numpy is used according to need.
import pandas as pd
import numpy as np
dfo = pandas.DataFrame(<2D DataStructure>, <columns=column_sequence>,<index=index_sequence>,<dtype=data_type>,<copy=bool>)
Creating empty DataFrame & Display:
To create an empty DataFrame , DataFrame() function is used without passing any parameter and to display the elements print() function is used as follows:
Creating DataFrame from List and Display (Single Column):
DataFrame can be created using list for a single column as well as multiple columns. To create single column DataFrame using list declare and define a list and then pass that list object to DataFrame() function as following:
Creating DataFrame from List and Display (Multiple Columns):
Let’s have look on following code that creates multiple columns DataFrame using a list:
Specifying column names:
To specify column names use columns parameter and specify the names of columns as following in DataFrame() fuction:
Creating DataFrame from series:
As you learned series in an earlier post, DataFrame can be also created from series. In the following example, two series objects created to store player statistics in two different series and then DataFrame() function is used, have a look:
Creating DataFrame from Dictionaries:
Dictionary objects are also 2D data structure and can be passed to DataFrame() function. Users can create DataFrame from the dictionary of Series and a list of dictionaries. Following example display DataFrame created from the dictionary of Series:
Creating DataFrame using a list of dictionaries:
List of the dictionary is a list having multiple dictionary objects, if any value is missed in dictionary specification then NaN (Not a Number) will be displayed in the output. Let’s take a look in the following example:
Creating DataFrame from ndArrays:
To create DataFrame using ndArrays, nd Array should be created by importing NumPy module. Let’s have a look into the following example: