Analytics for NetApp E-Series AutoSupport Data Using Big Data Technologies

Zhang, Jialiang

PDF

Description

Our capstone project, utilizing novel Big Data technology, was to help NetApp Inc. develop the AutoSupport (ASUP) Ecosystem for their E-series products [1]. With this software framework, NetApp Inc. was able to collect normalized data, perform predictive analytics and generate effective solutions for its E-series products customers. We used the Star Schema for the data warehousing structure and built seven dimension tables and two fact tables to handle the plethora of E-series ASUP data. To refine our decision and eliminate improper technologies, we made a comparison of many eligible Big Data technologies with respect to their technical strengths and weaknesses. We utilized the latest Spark/Shark Big Data technology developed by Berkeley AMPLab [2] to construct the software framework. Additionally, to perform the featured predictive analytics we used K-means Clustering and K-fold cross-validation machine learning techniques on the normalized data set.

My main contribution in this project was to develop a parser to convert the majority of the E-series product’s daily/weekly and event-based ASUP logs into the iv normalized data format. After performing multiple trials and the overall assessment of both the difficulty and feasibility of different data parsing approaches, I recommended the approach of parsing the text-based data in raw ASUP data set. Based on the normalized data I generated, we then successfully built a prototype. And we expected that with our ASUP framework and predictive data analysis function, NetApp would have more power and efficiency in resolving the E-series product issue for its customer. At the same time, our project on ASUP framework would revolutionize NetApp’s data storage and customer support business and help the company exploit its niche market in the Big Data industry.

Details

Title

Analytics for NetApp E-Series AutoSupport Data Using Big Data Technologies

Creator

Zhang, Jialiang, Author

Published

2016-05-01

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2016-24

Type

Text

Format

technical reports

Extent

32 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket