Statistical Data Cleaning with Applications in R

Statistical Data Cleaning with Applications in R

Einband:
Fester Einband
EAN:
9781118897157
Untertitel:
Englisch
Genre:
Anwendungs-Software
Autor:
Mark van der Loo, Edwin de Jonge
Herausgeber:
John Wiley & Sons
Auflage:
1. Auflage
Anzahl Seiten:
320
Erscheinungsdatum:
30.03.2018
ISBN:
978-1-118-89715-7

Informationen zum Autor Mark van der Loo and Edwin de Jonge, Department of Statistical Methods, Statistics Netherlands, The Netherlands Klappentext A comprehensive guide to automated statistical data cleaningThe production of clean data is a complex and time-consuming process that requires both technical know-how and statistical expertise. Statistical Data Cleaning brings together a wide range of techniques for cleaning textual, numeric or categorical data. This book examines technical data cleaning methods relating to data representation and data structure. A prominent role is given to statistical data validation, data cleaning based on predefined restrictions, and data cleaning strategy.Key features:* Focuses on the automation of data cleaning methods, including both theory and applications written in R.* Enables the reader to design data cleaning processes for either one-off analytical purposes or for setting up production systems that clean data on a regular basis.* Explores statistical techniques for solving issues such as incompleteness, contradictions and outliers, integration of data cleaning components and quality monitoring.* Supported by an accompanying website featuring data and R code.This book enables data scientists and statistical analysts working with data to deepen their understanding of data cleaning as well as to upgrade their practical data cleaning skills. It can also be used as material for a course in data cleaning and analyses. Zusammenfassung A comprehensive guide to automated statistical data cleaningThe production of clean data is a complex and time-consuming process that requires both technical know-how and statistical expertise. Statistical Data Cleaning brings together a wide range of techniques for cleaning textual! numeric or categorical data. This book examines technical data cleaning methods relating to data representation and data structure. A prominent role is given to statistical data validation! data cleaning based on predefined restrictions! and data cleaning strategy.Key features:* Focuses on the automation of data cleaning methods! including both theory and applications written in R.* Enables the reader to design data cleaning processes for either one-off analytical purposes or for setting up production systems that clean data on a regular basis.* Explores statistical techniques for solving issues such as incompleteness! contradictions and outliers! integration of data cleaning components and quality monitoring.* Supported by an accompanying website featuring data and R code.This book enables data scientists and statistical analysts working with data to deepen their understanding of data cleaning as well as to upgrade their practical data cleaning skills. It can also be used as material for a course in data cleaning and analyses. Inhaltsverzeichnis Foreword xi About the Companion Website xiii 1 Data Cleaning 1 1.1 The Statistical Value Chain 1 1.1.1 Raw Data 2 1.1.2 Input Data 2 1.1.3 Valid Data 3 1.1.4 Statistics 3 1.1.5 Output 3 1.2 Notation and Conventions Used in this Book 3 2 A Brief Introduction to R 5 2.1 R on the Command Line 5 2.1.1 Getting Help and Learning R 6 2.2 Vectors 7 2.2.1 Computing with Vectors 9 2.2.2 Arrays and Matrices 10 2.3 Data Frames 11 2.3.1 The Formula-Data Interface 12 2.3.2 Selecting Rows and Columns; Boolean Operators 12 2.3.3 Selection with Indices 13 2.3.4 Data Frame Manipulation:The dplyr Package 14 2.4 Special Values 15 2.4.1 Missing Values 17 2.5 Getting Data into and out of R 18 2.5.1 File Paths in R 19 2.5.2 Formats Provided by Packages 20 2.5.3 Reading Data from a Database 20 2.5.4 Working with Data External to R 21 2.6 Functions 21 2.6.1 Using Functions 22 ...

Autorentext
Mark van der Loo and Edwin de Jonge, Department of Statistical Methods, Statistics Netherlands, The Netherlands

Klappentext
A comprehensive guide to automated statistical data cleaning The production of clean data is a complex and time-consuming process that requires both technical know-how and statistical expertise. Statistical Data Cleaning brings together a wide range of techniques for cleaning textual, numeric or categorical data. This book examines technical data cleaning methods relating to data representation and data structure. A prominent role is given to statistical data validation, data cleaning based on predefined restrictions, and data cleaning strategy. Key features: * Focuses on the automation of data cleaning methods, including both theory and applications written in R. * Enables the reader to design data cleaning processes for either one-off analytical purposes or for setting up production systems that clean data on a regular basis. * Explores statistical techniques for solving issues such as incompleteness, contradictions and outliers, integration of data cleaning components and quality monitoring. * Supported by an accompanying website featuring data and R code. This book enables data scientists and statistical analysts working with data to deepen their understanding of data cleaning as well as to upgrade their practical data cleaning skills. It can also be used as material for a course in data cleaning and analyses.

Inhalt
Foreword xi About the Companion Website xiii 1 Data Cleaning 1 1.1 The Statistical Value Chain 1 1.1.1 Raw Data 2 1.1.2 Input Data 2 1.1.3 Valid Data 3 1.1.4 Statistics 3 1.1.5 Output 3 1.2 Notation and Conventions Used in this Book 3 2 A Brief Introduction to R 5 2.1 R on the Command Line 5 2.1.1 Getting Help and Learning R 6 2.2 Vectors 7 2.2.1 Computing with Vectors 9 2.2.2 Arrays and Matrices 10 2.3 Data Frames 11 2.3.1 The Formula-Data Interface 12 2.3.2 Selecting Rows and Columns; Boolean Operators 12 2.3.3 Selection with Indices 13 2.3.4 Data Frame Manipulation:The dplyr Package 14 2.4 Special Values 15 2.4.1 Missing Values 17 2.5 Getting Data into and out of R 18 2.5.1 File Paths in R 19 2.5.2 Formats Provided by Packages 20 2.5.3 Reading Data from a Database 20 2.5.4 Working with Data External to R 21 2.6 Functions 21 2.6.1 Using Functions 22 2.6.2 Writing Functions 22 2.7 Packages Used in this Book 23 3 Technical Representation of Data 27 3.1 Numeric Data 28 3.1.1 Integers 28 3.1.2 Integers in R 30 3.1.3 Real Numbers 31 3.1.4 Double Precision Numbers 31 3.1.5 The Concept of Machine Precision 33 3.1.6 Consequences ofWorking with Floating Point Numbers 34 3.1.7 Dealing with the Consequences 35 3.1.8 Numeric Data in R 37 3.2 Text Data 38 3.2.1 Terminology and Encodings 38 3.2.2 Unicode 39 3.2.3 Some Popular Encodings 40 3.2.4 Textual Data in R: Objects of Class Character 43 3.2.5 Encoding in R 44 3.2.6 Reading andWriting of Data with Non-Local Encoding 46 3.2.7 Detecting Encoding 48 3.2.8 Collation and Sorting 49 3.3 Times and Dates 50 3.3.1 AIT, UTC, and POSIX Seconds Since the Epcoch 50 3.3.2 Time and Date Notation 52 3.3.3 Time and Date Storage in R 54 3.3.4 Time and Date Conversion in R 55 3.3.5 Leap Days, Time Zones, and Daylight Saving Times 57 3.4 Notes on Locale Settings 58 4 Data Structure 61 4.1 Introduction 61 4.2 Tabular Data 61 4.2.1 data.frame 61 4.2.2 Databases 62 4.2.3 dplyr 64 4.3 Matrix Data 65 4.4 Time Series 66 4.5 Graph Data 68 4.6 Web Data 69 4.6.1 Web Scraping 69 4.6.2 Web API 70 4.7 Other Data 72 4.8 Tidying Tabular Data 72 4.8.1 Variable Per Column 74 4.8.2 Single Observation Stored in Multiple Table…


billigbuch.ch sucht jetzt für Sie die besten Angebote ...

Loading...

Die aktuellen Verkaufspreise von 6 Onlineshops werden in Realtime abgefragt.

Sie können das gewünschte Produkt anschliessend direkt beim Anbieter Ihrer Wahl bestellen.


Feedback