In this article, you will get comprehensive notes for Unit 4 Data Science Class 10 AI. This unit covers the topics related to data science we have discussed in AI domains. So here we begin!
Unit 4 Data Science – Class 10 AI
In class 9, you played a game Rock, Paper & Scissors based on data science. Data Science is one of the domains of AI. As we have discussed in that article, AI is nothing without data. The AI model or machine requires data to make the machine intelligent.
Introduction to Data Science
Data Science is one of the important concepts to provide some features like statistics, data analysis, machine learning and deep learning. Data science helps to understand and analyse the actual scenario and help to take fruitful decisions.
Data Science is just not a single field but it uses concepts and principles of Mathematics, Statistics, Computer Science and Information Science. It is also capable to discover hidden patterns from the raw data. It can also use for predictions. Observe the following picture which differentiates the points between data analysis and data science.
Watch this video for more clarity:
Why Data Science? – Data Science Class 10 AI
Earlier the data processing was quite easy because data was limited and structured. The structure can be analysed easily and effectively. Nowadays more than 80% of data is unstructured. So with unstructured data, the traditional methods cannot work appropriately.
In addition to this, day by day the number of internet users increasing day by day. So it increases the use of unstructured data. These unstructured data collected by the various organizations through mobile apps, websites and other platforms can be used to serve the specific requirements of the customer and users. This will increase the demand for data science.
Applications of data science
|4||Transport||Optimizing Vehicle Performance|
Self Driving cars
Vehicle monitoring system
|5||Healthcare||Medical Image Analysis|
Analysis of Reviews
|7||Artificial Conversational Bots||Speech recognition system|
Machine Learning Algorithm
Amazon’s Alexa and Apple’s Siri
Revisiting AI Project Cycle
Watch this video for more understanding of the topic of revisiting ai project cycle for Data Science Class 10 AI.
The concepts of Data Acquisition, Data Exploration, Modelling and evaluation explained in the following video:
Data collection is a method of gathering numeric and alphanumeric data. For data analysis, you need to perform data collection. It gives a clear idea about the dataset and adds value to it by providing deeper and clearer analyses around it. The AI predictions and suggestions by the machine are possible through data collection.
The data collection is mainly used for record maintenance and other purposes. The commonly used datasets are:
|Banks||It holds data for loans, accounts, lockers, payrolls, bank visitors etc.|
|ATM Machines||It holds data related to daily transactions, visitors information, money is withdrawn etc.|
|Movie Theaters||It holds details on movie details, tickets sold online and offline modes, purchase of refreshments etc.|
|School||School data like students fee collection, results, teachers; salary database etc.|
Sources for data collection
There are various sources for data collection found nowadays in the market. The major kinds of sources for data collection are:
|Online Sources||Offline Sources|
|Open-Sources web portals run by Government||Sensors|
|Reliable private websites such as Kaggle||Surveys|
|Word Organizations Open-source websites||Interviews|
The online sources provide the data collection facility by various websites, portals and apps. Users need to browse the web portal or download the app and follow the instructions. This method is not that popular as compared to offline sources right now but in future it become popular.
The offline sources are more likely effective and useful for data collection. The offline sources give a clear picture to make a decision. Here are a few ways for the same.
- Sensors: They are IoT-based devices which collect data from the physical world and transform it into digital form. They are connected through gateways to relay the data into the cloud and server.
- Surveys: Surveys can be conducted by using different questionnaires. It is most popular for a large amount of data. It should be handled carefully. The surveys are less expensive and easy to process. Surveys are mostly conducted by using forms. These forms can be online or offline.
- Interviews: Interviews are the best and most popular way to data collection. A list of questions is prepared to conduct interviews and collect data. It is one of the primary collection methods. It is the most expensive process. It can be also conducted over the phone, through a web chat interface.
- Observations: It includes collecting information without asking questions. It requires researchers, and observers, to add their judgement to data. It can determine the dynamics of a situation and cannot be measured through other data collection techniques. It can be combined with additional information such as video.
The following point should be remembered while accessing data from any data sources:
- Data which is available for public usage only should be taken up.
- Personal datasets should only be used with the consent of the owner.
- One should never breach someone’s privacy to collect data.
- Data should only be taken from reliable sources as the data collected from random sources can be wrong or unusable.
- Reliable sources of data ensure the authenticity of data which helps in the proper training of the AI model.
Types of data
For data science models or projects, generally, data is collected in the form of tables in different formats:
- CSV: It is a common and simple file format to store data in tabular form. It can be opened through any spreadsheet software (MS Excel), documentation software (MS Word )and any text editor (Notepad). Everyone contains a record, each record has a number of fields, and these fields are separated by a comma.
- Spreadsheet: A spreadsheet contains rows and columns to represent data in tabular form. Mostly spreadsheet is used to calculate data, manipulate data, analyse data and maintain data records. Ms excel is well known and popular spreadsheet software.
- SQL: It stands for Structured Query Language. It is used to handle the data stored in DBMS (Database Management Software) System. It provides basic commands to create, alter, delete and manage transactions for database management.
When the data is collected from different sources, it is required to use for different purposes. So data access is the key factor. Here in this section of Data Science Class 10 AI, you are going to learn about data access using python code.
There are a few python modules and libraries which are very useful for data access, they are:
- Numpy: It is one of the most popular packages of python for data access. It is a Numerical Python a fundamental package for arithmetic and logical operations on arrays in python. This is a very popular package to hand numeric data. It has various functions. methods and properties to work with numbers. It also works for the collection of homogenous data such as numbers, characters, booleans etc.
- Pandas: Follow the below-given links for pandas