Reactive Business Intelligence

By Volkan TUNALI, December 25, 2010 3:14 pm

Reactive Business Intelligence - coverI’ve recently found an interesting data analysis and visualization book: Reactive Business Intelligence: From Data to Models to Insight by Roberto Battiti and Mauro Brunato.

The book explains data analysis concepts in an easy and intuitive way, supported with visual elements. It is freely available for download at http://grapheur.com/static/Battiti-Brunato-RBI.pdf.








There are also funny pictures that depict the subject. Below are two examples of such pictures I find very nice. I hope the authors don’t mind me putting them here. :)


Figure 7.1 from page 42 – Clustering.


Figure 17.1 from page 167 – Local search.
———————————————————————————————
Update April 6, 2011: A little note from the authors:
———————————————————————————————
Dear colleague:

Our latest book is now printed:

Reactive Business Intelligence. From Data to Models to Insight.
R. Battiti and M. Brunato,
Reactive Search Srl, Italy, February 2011.
ISBN: 978-88-905795-0-9

Full details at the book web site:

http://www.reactivebusinessintelligence.com/

Reactive Business Intelligence is about integrating data mining,
modeling and interactive visualization, into an end-to-end
discovery and continuous innovation process powered
by human and automated learning.
This holistic and unifying goal requires collecting and integrating
topics which are usually dissected in books dedicated to
different areas.

We plan to place figures and slides in the same place very soon.

– Roberto Battiti and Mauro Brunato
———————————————————————————————

Guide to Intelligent Data Analysis

By Volkan TUNALI, December 22, 2010 11:03 pm

I want to introduce a new Data Mining book from Springer: Guide to Intelligent Data Analysis. This book provides a hands-on instructional approach to many basic data analysis techniques, and explains how these are used to solve data analysis problems.

Authors: Michael R. Berthold (University of Konstanz, Germany), Christian Borgelt (European Centre for Soft Computing, Spain), Frank Höppner (Ostfalia University of Applied Sciences, Germany), Frank Klawonn (Ostfalia University of Applied Sciences, Germany).
Publisher: Springer
ISBN: 978-1-84882-259-7

In the book, chapters proceed with examples where KNIME and/or R are used as analysis tools. In addition, two chapters of appendices are dedicated to KNIME and R.

This is an excellent book which contains a very good combination of both theory and practice of data analysis. I strongly recommend this book to data mining researchers. For more information you can visit Springer page of the book.

Using “awk” to Join Text Files on Windows

By Volkan TUNALI, November 4, 2010 11:34 pm

Last time I used awk to split single cisi.all file into small files like cisi.1, cisi.2 etc. Now, I have needed to join these small files into a single one in a kind of XML format. I have read some tutorials on awk but I am unable to find such a thing as looping over many text files. So, I have used another solution with Windows BATCH file scripting. I have written a little awk program to format and output the content of a given file to some output file. Then, in a batch file, I loop over the files in a directory and for each file, I run the awk program.

Here’s the batch file named JOIN.BAT:

del output.xml
for /r %%X in (dataset\*.*) do (awk -f join.awk %%X)

Here’s the awk file named JOIN.AWK:

BEGIN { print "<DOC>\n<BODY>" >>"output.xml"}
{print $0 >>"output.xml"}
END { print "</BODY>\n</DOC>\n">>"output.xml"}

As you see in the awk program, the content of each file is appended to the file output.xml. On Unix-like systems, you can write similar shell scripts instead of batch file.

KDD2011: 17th ACM SIGKDD Conference on KDD

By Volkan TUNALI, October 18, 2010 12:41 am

KDD2011The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition.

KDD-2011 will run between from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location.

Important Dates

  • Aug 21-24, 2011 KDD-2011 Conference
  • May 13, 2011 Paper acceptance
  • Feb 18, 2011 Full Paper deadline
  • Feb 11, 2011 Paper abstract deadline

* All deadlines are for 11:59 PM Pacific time.

For more information, you can visit the conference home page.

Using “awk” to Extract Title and Body Text from “cisi.all” File

By Volkan TUNALI, October 10, 2010 11:55 pm

In my latest post I wrote an awk program to extract title part of the documents that reside in cisi.all file from the cisi document collection. In this new post, I extend that program to include the body part of the documents with a few additional lines.

Here’s the code:

BEGIN { docNo = 0 }
$0 == ".T" { docNo++  #starting a new doc when we encounter a .T line
			 textStarted = 1
			 printThisLine = 0
			 print "Processing Doc #" docNo
			 docName = sprintf("%s%d", "cisi.", docNo) # doc name is like cisi.1 cisi.2 etc.
			}
$0 == ".A" || $0 == ".X" || $0 == ".I" { textStarted = 0 }  # when a new field separator encountered stop printing until next .T or .W
$0 == ".W" { textStarted = 1  # body part starts with .W
			 printThisLine = 0
			}
	{ if (textStarted == 1)
		{
			if (printThisLine)
				print $0 > docName

			printThisLine = 1 #consequent lines after .T or .W will be printed
		}
	}

Similar awk code should be sufficient to extract title and body part from CRAN, MED and CACM collections.

Panorama Theme by Themocracy