• SDS Portal
Search
CUHK-Shenzhen
简体中文
  • About SDS
    • Overview
    • Academic Area
    • Dean’s Message
    • Publications
      • Brochure
      • School Newsletter
      • Annual Report
    • FAQ
    • Contact Us
  • Programmes
    • Introduction
    • Undergraduate
      • Data Science and Big Data Technology
      • Statistics
      • Computer Science and Engineering
      • Financial Engineering
      • 2+2 Double Major Programme
        • Interdisciplinary Data Analytics + X Double Major Programme
        • Aerospace Science and Earth Informatics + X Double Major Programme
      • Columbia University 3+2 Initiative (Columbia Class)
    • Taught Postgraduate
      • M.Sc in Data Science
      • M.Sc in Financial Engineering(Full-time/Part-time)
      • M.Sc in Artificial Intelligence and Robotics
      • M.Sc in Computer Science
      • M.Sc in Statistics
      • M.Sc in Bioinformatics
    • Research Postgraduate
      • M.Phil.-Ph.D. Programme in Data Science
      • M.Phil.-Ph.D. Programme in Computer Science
  • Faculty
    • Faculty
    • Emeritus Faculty
    • Affiliated Appointments
    • Researchers/Visitors
  • Students
    • UG Academic Advisory System
    • Ph.D. Students
    • Student Interviews
  • News & Announcements
    • News
    • Announcements
  • School Events
    • Academic Conferences
      • DDTOR 2025
      • CSAMSE 2023
      • RMTA 2023
      • ICASSP 2022
      • Mostly OM 2019
    • Academic Activities
    • SDS Colloquium Series
    • Other Events
  • Research
  • Jobs
    • Faculty Positions
    • Postdoctoral Fellowships
  • Career
    • Graduate Placements
    • International Programmes
  • About SDS
    • Overview
    • Academic Area
    • Dean’s Message
    • Publications
      • Brochure
      • School Newsletter
      • Annual Report
    • FAQ
    • Contact Us
  • Programmes
    • Introduction
    • Undergraduate
      • Data Science and Big Data Technology
      • Statistics
      • Computer Science and Engineering
      • Financial Engineering
      • 2+2 Double Major Programme
        • Interdisciplinary Data Analytics + X Double Major Programme
        • Aerospace Science and Earth Informatics + X Double Major Programme
      • Columbia University 3+2 Initiative (Columbia Class)
    • Taught Postgraduate
      • M.Sc in Data Science
      • M.Sc in Financial Engineering(Full-time/Part-time)
      • M.Sc in Artificial Intelligence and Robotics
      • M.Sc in Computer Science
      • M.Sc in Statistics
      • M.Sc in Bioinformatics
    • Research Postgraduate
      • M.Phil.-Ph.D. Programme in Data Science
      • M.Phil.-Ph.D. Programme in Computer Science
  • Faculty
    • Faculty
    • Emeritus Faculty
    • Affiliated Appointments
    • Researchers/Visitors
  • Students
    • UG Academic Advisory System
    • Ph.D. Students
    • Student Interviews
  • News & Announcements
    • News
    • Announcements
  • School Events
    • Academic Conferences
      • DDTOR 2025
      • CSAMSE 2023
      • RMTA 2023
      • ICASSP 2022
      • Mostly OM 2019
    • Academic Activities
    • SDS Colloquium Series
    • Other Events
  • Research
  • Jobs
    • Faculty Positions
    • Postdoctoral Fellowships
  • Career
    • Graduate Placements
    • International Programmes
  • SDS Portal
CUHK-Shenzhen
简体中文

Breadcrumb

  • Home
  • School Events
  • Academic Activities
  • 【Academic Seminar】DataPrep: Make Data Scientists Not Complain about Data Preparation

【Academic Seminar】DataPrep: Make Data Scientists Not Complain about Data Preparation

October 21, 2020 Academic Events

 

Topic: DataPrep: Make Data Scientists Not Complain about Data Preparation

Speaker: Prof. Jiannan Wang, Simon Fraser University

Time & Date: 10:30 am - 11:30 am, October 21, 2020

Venue:Zoom, Meeting ID: 559 916 3678

 

 

Abstract:

 

Data scientists have been complaining about data preparation (data collection --> data understanding --> data cleaning --> data enrichment --> data integration --> feature engineering) for many years. Although some efforts have been devoted to solving this problem, a recent survey released by Anaconda in 2020 shows that it is still the case that “Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on overall job satisfaction.”

In this talk, I will explain what makes data preparation hard to solve, and present DataPrep, a fast and easy-to-use python library to address these challenges. DataPrep aims to become the "scikit-learn" for data preparation. The DataPrep library currently contains two components: a data connector component to simplify web data collection and an exploratory data analysis (EDA) component to enable fast data understanding. I will describe their novel design in detail and demonstrate how they can significantly save data scientists’ time. I will also talk about our design of other components such as data enrichment and data cleaning. In the end, I will introduce a framework from Prof. Ion Stoica (UC Berkeley) about how to pick up a research problem and then use it to justify why data preparation is a great research problem to work on in the next decade.

Please refer to http://dataprep.ai for more detail about the DataPrep project.

 

 

Biography:

 

 

 

Professor Jiannan Wang is an Associate Professor in the School of Computing Science at Simon Fraser University. His current research interests are data preparation, ML model debugging/monitoring, and approximate query processing. Prior to that, he was a postdoc in the AMPLab at UC Berkeley. He obtained his Ph.D. from Tsinghua University. He has won an IEEE TCDE Rising Star Award (2018), an ACM SIGMOD Best Demonstration Award (2016), a Distinguished Dissertation Award from the China Computer Federation (2013), and a Google Ph.D. Fellowship (2011).

 

https://www.cs.sfu.ca/~jnwang/

 

Address: 3 - 6 Floor, Dao Yuan Building, 2001 Longxiang Road, Longgang District, Shenzhen
E-mail: sds@cuhk.edu.cn
Wechat Account: cuhksz-sds

sds.cuhk.edu.cn

Copyright © CUHK-Shenzhen School of Data Science