Author Topic: Data Mining Tool: 64bit, Linux OS  (Read 1117 times)

roo_ster

  • Kakistocracy--It's What's For Dinner.
  • friend
  • Senior Member
  • ***
  • Posts: 21,225
  • Hoist the black flag, and begin slitting throats
Data Mining Tool: 64bit, Linux OS
« on: September 23, 2010, 11:28:51 AM »
Looking for a data mining tool.

It will be combing through gigabytes worth of text files and manipulating the data to create reports/etc that could go into MS Excel.  64bit.

A similar tool, Clementine, cost someone doing the same thing as we will $90K and it craps out on them.  They can only get a 32bit of Clementine and I suspect it runs out of memory (or at least the ability to address enough memory).

I'd prefer an open-source tool that runs on a 64bit linux OS.  We got some big hoss RHEL5 machines with beau coup RAM on which to run the beast in a cluster environment (LSF).

Any pointers would be appreciated.
Regards,

roo_ster

“Fallacies do not cease to be fallacies because they become fashions.”
----G.K. Chesterton

mtnbkr

  • friend
  • Senior Member
  • ***
  • Posts: 15,388
Re: Data Mining Tool: 64bit, Linux OS
« Reply #1 on: September 23, 2010, 01:03:46 PM »
Perl.

That's what I used when I was doing that sort of thing. 

If the files are logs, you could use Splunk. 

Chris

roo_ster

  • Kakistocracy--It's What's For Dinner.
  • friend
  • Senior Member
  • ***
  • Posts: 21,225
  • Hoist the black flag, and begin slitting throats
Re: Data Mining Tool: 64bit, Linux OS
« Reply #2 on: September 23, 2010, 01:09:02 PM »
Yep, I was thinking Perl as a backstop.

They are time-stamped logs, but not your usual IT-type logs.
Regards,

roo_ster

“Fallacies do not cease to be fallacies because they become fashions.”
----G.K. Chesterton

mtnbkr

  • friend
  • Senior Member
  • ***
  • Posts: 15,388
Re: Data Mining Tool: 64bit, Linux OS
« Reply #3 on: September 23, 2010, 02:08:00 PM »
might still work with Splunk.  It's free to try and has a free non-business license as well.

RevDisk

  • friend
  • Senior Member
  • ***
  • Posts: 12,633
    • RevDisk.net
Re: Data Mining Tool: 64bit, Linux OS
« Reply #4 on: September 23, 2010, 03:22:50 PM »
I'd prefer an open-source tool that runs on a 64bit linux OS.  We got some big hoss RHEL5 machines with beau coup RAM on which to run the beast in a cluster environment (LSF).

Any pointers would be appreciated.

You want KNIME.
"Rev, your picture is in my King James Bible, where Paul talks about "inventors of evil."  Yes, I know you'll take that as a compliment."  - Fistful, possibly highest compliment I've ever received.

CNYCacher

  • friend
  • Senior Member
  • ***
  • Posts: 4,438
Re: Data Mining Tool: 64bit, Linux OS
« Reply #5 on: September 23, 2010, 04:39:29 PM »
look into awk and sed as well as grep



On two occasions, I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.
Charles Babbage

roo_ster

  • Kakistocracy--It's What's For Dinner.
  • friend
  • Senior Member
  • ***
  • Posts: 21,225
  • Hoist the black flag, and begin slitting throats
Re: Data Mining Tool: 64bit, Linux OS
« Reply #6 on: September 23, 2010, 05:11:21 PM »
look into awk and sed as well as grep

awk, sed, & grep are my home boys; but are likely not up to the task for this.

Perl has all the awk, sed, grep functionality with fewer limitations. If it comes to scripting, Perl will be the hammer.
Regards,

roo_ster

“Fallacies do not cease to be fallacies because they become fashions.”
----G.K. Chesterton

CNYCacher

  • friend
  • Senior Member
  • ***
  • Posts: 4,438
Re: Data Mining Tool: 64bit, Linux OS
« Reply #7 on: September 24, 2010, 09:11:38 AM »
awk, sed, & grep are my home boys; but are likely not up to the task for this.

Perl has all the awk, sed, grep functionality with fewer limitations. If it comes to scripting, Perl will be the hammer.

Have you considered transforming those rows of logs into SQL tables?  Lots of ways you could then build reports.  If you set the proper indexes the report generation would be fast fast fast as well.
On two occasions, I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.
Charles Babbage

roo_ster

  • Kakistocracy--It's What's For Dinner.
  • friend
  • Senior Member
  • ***
  • Posts: 21,225
  • Hoist the black flag, and begin slitting throats
Re: Data Mining Tool: 64bit, Linux OS
« Reply #8 on: September 24, 2010, 11:24:51 AM »
Have you considered transforming those rows of logs into SQL tables?  Lots of ways you could then build reports.  If you set the proper indexes the report generation would be fast fast fast as well.

If  text-based data mining tool does not do the trick, that is likely the next step.

Regards,

roo_ster

“Fallacies do not cease to be fallacies because they become fashions.”
----G.K. Chesterton

tyme

  • expat
  • friend
  • Senior Member
  • ***
  • Posts: 1,056
  • Did you know that dolphins are just gay sharks?
    • TFL Library
Re: Data Mining Tool: 64bit, Linux OS
« Reply #9 on: September 25, 2010, 05:19:35 AM »
If you want to do sql queries on the data, and you're not going to concurrently write to the database, SQLite is fast, simple, and speaks reasonably good sql.  If that won't work, there's mysql and postgresql of course.  Even Access and MSSQL Server Express wouldn't be bad for this if it weren't for their database size limits -- apparently 2GB and 4GB.

It looks like Clementine, now called IBM SPSS Modeller, is more of a predictive analysis program which will be more statistical in focus.  SQL doesn't have the kind of statistical functions that you'd want.  If that's the kind of data mining you're looking for, I suggest R, which is a free, open-source, multi-platform, widely used clone of S+.
« Last Edit: September 25, 2010, 05:22:48 AM by tyme »
Support Range Voting.
End Software Patents

"Four people are dead.  There isn't time to talk to the police."  --Sherlock (BBC)

GigaBuist

  • friends
  • Senior Member
  • ***
  • Posts: 4,345
    • http://www.justinbuist.org/blog/
Re: Data Mining Tool: 64bit, Linux OS
« Reply #10 on: September 25, 2010, 12:06:21 PM »
Even Access and MSSQL Server Express wouldn't be bad for this if it weren't for their database size limits -- apparently 2GB and 4GB.

The Express version of MSSQL 2008R2 has a DB limit of 10GB now, up from the 4GB limit in 2008.