Major Project Theme: Data preparation for Data mining
Data preparation for Data mining
What is data preparation?
Data Preparation is a pre-processing step in which data from one or more sources is cleaned and transformed to improve its quality prior to its use in business analytics. (Informatica UK, 2022)
Data preparation is often a lengthy undertaking for data professionals or business users, but it is essential as a prerequisite to put data in context in order to turn it into insights and eliminate bias resulting from poor data quality.
Data preparation steps:
1. Gather data
2. Discover and access data
3. Cleanse and validate data
4. Transform and enrich data
5. Store data
Why is it important?
Scientists say (Talend, 2022) that data preparation is the worst part of their job, but the efficient, accurate business decisions can only be made with clean data. Data preparation helps:
-fix errors quickly
-produce top-quality data
-make better business decisions
Additionally, as data and data processes move to the cloud, data preparation moves with it for even greater benefits, such as:
-superior scalability
-future proof
-accelerated data usage and collaboration
Types of data (Bridgwater, 2022)
1 - Big data
2 - Structured, unstructured, semi-structured data
3 - Time-stamped data
4 - Machine data
5 - Spatiotemporal data
6 - Open data
7 - Dark data
8 - Real time data
9 - Genomics data
10 - Operational data
11 - High-dimensional data
12 - Unverified outdated data
13 - Translytic Data
Data analysis
Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. (Data Analysis, 2022)
Data visualisation
Data visualization is the representation of data through use of common graphics, such as charts, plots, infographics, and even animations. (Education, 2022) These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.
What tools are used for data preparation, processing, and analysis?
Preparation
Data preparation is the process of gathering, combining, structuring and organizing data (Stedman, 2022) so it can be used in business intelligence (BI), analytics and data visualisation applications. The components of data preparation include data preprocessing, profiling, cleansing, validation and transformation; it often also involves pulling together data from different internal systems and external sources.
Processing
Data processing tools available and reviews a collection of recent reports on the topic. (Ingenta Connect, 2022) Data conversion, pre-processing, alignment, normalization, and statistical analysis are introduced, with their advantages and disadvantages, and comparisons are made to guide the reader.
Analysis
Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation, and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. (Ali & Bhaskar, 2016) The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
Reference list:
Ali, Z., & Bhaskar, S. B. (2016). Basic statistical tools in research and data analysis. Indian journal of anaesthesia, 60(9), 662–669. https://doi.org/10.4103/0019-5049.190623
Bridgwater, A., 2022. The 13 Types Of Data. [online] Forbes. Available at: <https://www.forbes.com/sites/adrianbridgwater/2018/07/05/the-13-types-of-data/?sh=107e99333362> [Accessed 24 April 2022].
Education, I., 2022. What is Data Visualization?. [online] Ibm.com. Available at: <https://www.ibm.com/cloud/learn/data-visualization> [Accessed 24 April 2022].
Infogram. 2022. [online] Available at: <https://infogram.com/page/data-visualization> [Accessed 24 April 2022].
Informatica.com. 2022. Definition: Data Preparation | Informatica UK. [online] Available at: <https://www.informatica.com/gb/services-and-training/glossary-of-terms/data-preparation-definition.html> [Accessed 24 April 2022].
Ori.hhs.gov. 2022. Data Analysis. [online] Available at: <https://ori.hhs.gov/education/products/n_illinois_u/datamanagement/datopic.html> [Accessed 24 April 2022].
Talend.com. 2022. What is Data Preparation?. [online] Available at: <https://www.talend.com/resources/what-is-data-preparation/> [Accessed 24 April 2022].



Comments
Post a Comment