Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Mark Needham10/29/14
0 replies

Python: Converting a date string to timestamp

I’ve been playing around with Python over the last few days while cleaning up a data set and one thing I wanted to do was translate date strings into a timestamp.

Rob J Hyndman10/28/14
0 replies

HTS with Regressors

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints.

Pavithra Gunasekara10/27/14
0 replies

Getting Hadoop Up and Running on Ubuntu

In this post my aim is to get Hadoop up and running on a Ubuntu host using Local (Standalone) Mode and on Pseudo-Distributed Mode.

Arthur Charpentier10/27/14
0 replies

Removing Uncited References in a Tex File (with R)

Usually, once you have revised the paper, some references were added, others were dropped. But you need to spend some time to check that all references are actually mentioned in the paper. I wanted to work on that manually this week-end, but @3wen suggested to write a simple R function to scan the tex f file (as well as the aux file actually) to remove uncited references.

Giuseppe Vettigli10/26/14
0 replies

Andrews Curves

Andrews curves are a method for visualizing multidimensional data by mapping each observation onto a function. It has been shown the Andrews curves are able to preserve means, distance (up to a constant) and variances. Which means that Andrews curves that are represented by functions close together suggest that the corresponding data points will also be close together.

Maarten Ectors10/26/14
0 replies

Sentiment Analysis Beyond Tweets

Deep belief networks have made it possible to train computers to predict if a sentence is positive, negative or neutral. Most sentiment analysis captures headlines because tweets can be analysed. However are there business applications beyond social networking analytics? Here are five examples:

Mark Needham10/26/14
2 replies

R: Linear models with the lm function, NA values and Collinearity

In my continued playing around with R I’ve sometimes noticed ‘NA’ values in the linear regression models I created but hadn’t really thought about what that meant. On the advice of Peter Huber I recently started working my way through Coursera’s Regression Models which has a whole slide explaining its meaning:

Benjamin Ball10/26/14
0 replies

The Best of the Week (Oct 17): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (October 17 - October 24). This week's topics include an Apache Hadoop FAQ for executives, information retrieval with Apache Lucene and Tika, and validating configuration.

Ana-maria Mihalceanu10/24/14
0 replies

Understanding Information Retrieval by Using Apache Lucene and Tika - Part 3

This is a sequal of what was presented in part 1 and part 2 of this tutorial; after indexing and querying we can highlight the results of a search by making use of Highlighter(s).

Ana-maria Mihalceanu10/23/14
0 replies

Understanding Information Retrieval by Using Apache Lucene and Tika - Part 2

A sequal of what was implemented in Part 1 of this tutorial; we continue indexing and improving search conditions through different features provided by the Apache Lucene library.

Ana-maria Mihalceanu10/22/14
0 replies

Understanding Information Retrieval by Using Apache Lucene and Tika - Part 1

This tutorial will explain the Lucene and Tika frameworks will be explained through their core concepts (parsing, mime detection, indexing, scoring, boosting) via illustrative examples that should be applicable to not only seasoned software developers but to beginners to content analysis and programming as well.

Linda Gimmeson10/17/14
0 replies

FAQ of Executives Regarding Apache Hadoop

Apache Hadoop has slowly been infiltrating the mainstream business world, but many executives are still left with doubts about whether adopting Hadoop is a sound strategy for their organization. Is Hadoop enterprise friendly? Is it economical for an organization to use?

Tomasz Sobczak10/16/14
1 replies

Review of "Scaling Apache Solr" Book

Review of "Scaling Apache Solr" book.

Alec Noller10/15/14
1 replies

Dev of the Week: Ashwini Kuntamukkala

Every week here and in our newsletter, we feature a new developer/blogger from the DZone community to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Ashwini Kuntamukkala, Software Architect at SciSpike, Inc.

Adam Diaz10/15/14
0 replies

Hadoop and the mystery of the version number

When I’m working with people on Hadoop I ask what you would think is a simple question. What version of Hadoop are you using? In reality though it’s not as straight forward as you might think.

Mikio Braun10/14/14
0 replies

Parts But No Car

One question which pops up again and again when I talk about streamdrill is whether that cannot be done by X, where X is one of Hadoop, Spark, Go, or some other piece of Big Data infrastructure. The truth is that there’s a huge gap between “in principle” and “in reality”, and I’d like to spell this difference out in this post.

Kevin Daly10/11/14
0 replies

Hadoop 2.0 as Part of a Data Platform: It’s Not Just About Mapreduce!

Examining exactly what is a data platform? Get a better understanding of big data and it's application. In this article I’ll be talking about the HortonWorks Data Platform as a reference platform.

David Mai10/11/14
0 replies

22 Big Data & BI Events (U.S.) that You Must Attend Before the End of 2014

With so many events taking place it can be a very daunting task finding the one that perfectly fits your interests and needs. That being said, I’ve done some research and compiled a comprehensive list of 22 Big Data and Business Intelligence events that you must attend during Q4 of 2014.

Borislav Iordanov10/10/14
0 replies

Jayson Skima - Validating JavaScript Object Notation Data

A crash course on JSON Schema. A nearly complete coverage of the Draft 4 specification, in brief.

Mark Needham10/10/14
1 replies

R: A first attempt at linear regression

I’ve been working through the videos that accompany the Introduction to Statistical Learning with Applications in R book and thought it’d be interesting to try out the linear regression algorithm against my meetup data set.

David Mai10/10/14
0 replies

9 Influential Women Writers in Big Data and Business Intelligence

In my own experience as an editor who covers BI, I read numerous BI articles and I have found that despite the disproportionately low number of women in technology, many of the articles that I’ve read were authored by women. In BI, the works of women have provided great insight and thought leadership to the BI community and I personally want to list nine of the the top women writers who have helped shape my view on BI.

Mark Needham10/09/14
0 replies

R: Deriving a new data frame column based on containing string

I’ve been playing around with R data frames a bit more and one thing I wanted to do was derive a new column based on the text contained in the existing column.

Arthur Charpentier10/09/14
0 replies

How to Import Some Parts of a Large Database

In the introduction of Computational Actuarial Science with R, there was a short paragraph on how could we import only some parts of a large database, by selecting specific variables.

Mark Needham10/08/14
0 replies

R: Filtering data frames by column type ('x' must be numeric)

I’ve been working through the exercises from An Introduction to Statistical Learning and one of them required you to create a pair wise correlation matrix of variables in a data frame.

John Cook10/08/14
0 replies

The great reformulation of algebraic geometry

At the Heidelberg Laureate Forum I had a chance to interview John Tate. In his remarks below, Tate briefly comments on his early work on number theory and cohomology. Most of the post consists of his comments on the work of Alexander Grothendieck.