Data mining: What is the VisMiner approach?

Features

  • Author: Statistics Views
  • Date: 18 Dec 2012

Last month saw the publication of Visual Data Mining: The Visminer Approach.

Data mining has been defined as the search for useful and previously unknown patterns in large datasets, yet when faced with the task of mining a large dataset, it is not always obvious where to start and how to proceed.

This book introduces a visual methodology for data mining demonstrating the application of methodology along with a sequence of exercises using VisMiner. VisMiner has been developed by the author and provides a powerful visual data mining tool enabling the reader to see the data that they are working on and to visually evaluate the models created from the data.

Here Statistics Views interviews the author, Professor Russ Anderson, a retired professor from West Texas A&M University who continues to guest lecture about his career, the book and how VisMiner software was developed.


thumbnail image: Data mining: What is the VisMiner approach?

1. Congratulations on the publication of Visual Data Mining: The VisMiner Approach. Your academic background is in marketing and computer science at Utah State University and business administration at Berkeley. You taught at the Department of Computer Information & Decision Management Department at West Texas A&M University and are now retired. What was it that brought you to recognise statistics as a discipline in the first place?

When I was working in Marketing during M.B.A degree at Berkeley, I wanted to work in marketing research, where statistics is obviously strongly involved and upon leaving and looking for job opportunities, all whom I applied to asked if I knew how to program a computer. I did not know how, so I took a course just to learn computer programming, hoping it would prepare me for the job market. I had never learnt anything like it and it opened up a whole new world for me. I entirely changed my mind about wanting to work in marketing and instead wished to pursue a career in computing. I decided to go back to Utah State University and obtain a degree in computer science.

2. With a background in computer science and software programming, is this the field that led you to work on visual data mining?

Whilst I was at Berkeley, I took an experimental class that had never been taught before called Marketing Modelling. They had used some very simple algorithms to write computer programs in a programming language called Basic, which was console-based. It was very interesting, but completely new. You have to realise this was back in the late 1970s and all this time, the professors were saying that when computers become more capable, we will be able to do this and that, the word “when” was always mentioned, and it peaked my interest. OK, I want to be there and be able to participate when computers become more capable.

You need to remember back then that desktop computers were these little toys back then which could not do much. Later when I was working on my PhD, I looked at the market and the current capabilities of computing software and considered if we were there yet according to what my professor had talked about that. I decided we were not, but I still worked on developing visualization software for operations research – linear programming, to be specific. It was still hard to be working with the capabilities in computing at that time.

Then, ten years ago, I decided that we had reached the point I could begin work on visual data mining and I am still working on it to this day.

3. What were your main objectives during the writing process? What did you set out to achieve in reaching your readers?

Over the years as I taught data mining, there were many ideas that were hard to get across to students. For example, the idea of over-fitting or over-training a model and for some reason, it was hard for students to grasp this. As I developed the software and worked on the book, I wanted to create something that would allow students to see what I was talking about. I wanted the student to be able to see the data and at what point it was over-fitting. So that was my main objective - to help students see the concepts I was trying to teach.

4. Were there areas that you found more challenging to write and if so, why?

The hardest part was using mathematics to describe the algorithms as they are beyond the statistical methodologies and I wanted to make them understandable for students and yet at the same time, I did not want to dig into too much detail, otherwise I was afraid that I may lose the students! I had to find the right balance so that important details were not left out.

5. What is the VisMiner approach and how does it apply to our every-day lives?

VisMiner is the name of the software that I developed to go along with the book. The approach is that everything is visual - you see what you are doing on the screen. The initial exploration of the data set is a visual process and the selection and application of the data-mining algorithms are also visual. The analysis of those algorithms is visual - I wanted students to be able to see the models they were building, using charts, graphs and plots. You don’t just look at the data visually but take the whole process and perform it visually. In terms of application of data to every-day lives, the software is designed so that it is easier for students to remember what they had learned. Six months later, sometimes you forget what you learnt before and you have to go back and learn again. With VisMiner, it is easy to go back and learn it again. It does not require a high learning curve.

6. As a retired university professor who now guest lectures, what do you think the future of teaching statistics will be? What do you think will be the upcoming challenges in engaging students?

I would prefer to answer that from the perspective of teaching data mining, which of course statistics applies to. More and more now, we are working with larger data sets and previously in a course, you were looking at 30-100 observations but now they run into their millions. There is a lot of work to do in getting students to work on these large data sets. There is a lot of teaching focussed on methodologies which is fine for graduate students and those working on their PhD as they’re the only ones who will ever do hypothesis testing.

I’ve worked in a business environment and think that 99% of my students who graduate will never do hypothesis testing again. That’s why I think data mining is interesting. I hope data mining is an area students will come to work on a lot. I’d like to get students to apply algorithms and work beyond the statistical basics, which a lot of them will not use once they are in the work force.

7. Over the years, how has your teaching, consulting, and research motivated and influenced each other? Do you get research ideas from visual data mining and incorporate your ideas into your teaching?

As I’m teaching the courses, I always check for the blank looks, which is as I’ve said what I’ve focussed on in developing software to help students understand, so that influences my teaching in that I need to explain visually how you can do this and that. From a consulting perspective, I’ve done a lot in personnel and salary data analysis, and looked into the problems practitioners were having and that’s gone back into my teaching.

8. What do you think the most important recent developments in the field have been? What do you think will be the most exciting and productive areas of research in statistics during the next few years and the main challenges facing the profession of statistics?

The big thing is Big Data, working on these very large data sets and the online processing of such data. Data is coming in constantly, e.g. internet activity - if you are in charge of an organization’s website, as this data is generated, you can capture it from the beginning and monitor it to influence your business decisions. You may suddenly see interest in one particular product, alter your marketing plans, etc. These huge flows of data come in and the faster you respond to them, the better your organization will be.

9. Are there people or events that have been influential in your career?

That one experimental course I took at Berkeley put together by Professor David Aaker was, as I’ve said, massively influential. Professor David B. Montgomery from Stanford put together this course and working with Hewlett Packard who provided the data for them, thought that now that we have online computers, what can we do with them? Never before having worked on computers, it opened up a whole new world for me that I had to be a part of. The capabilities of computing hardware could not stand up to what I wanted to do. It started me working on computers and then going back and taking a degree that totally changed my whole life!

Related Topics

Related Publications

Related Content

Site Footer

Address:

This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on StatisticsViews.com are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and StatisticsViews.com express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.