Big Data/BI Zone is brought to you in partnership with:
  • submit to reddit
Angela Ashenden04/18/14
0 replies

Teradata Looks to Build Bridges and Cross the Big Data Divide

Teradata unveiled a number of enhancements to its core data management offerings. One announcement stood out: the launch of QueryGrid, a tool designed to orchestrate the execution of analytic processing across parallel databases.

Ayende Rahien04/17/14
0 replies

The Dark Sides of Lucene

The author has been using Lucene for the past six or seven years, and after his last post, he thought it would be a good idea to talk a bit about the kind of things that it isn't doing well.

Rob J Hyndman04/17/14
0 replies

Generating Tables in LaTeX

Typ­ing tables in LaTeX can get messy, but there are some good tools to sim­plify the process.

Arthur Charpentier04/16/14
0 replies

Seasonal Unit Roots

So, whatever the test, we always reject the assumption that there is a seasonal unit root. Which does not mean that we can not have a strong cycle! Actually, the series is almost periodic. But there is no unit root!

John Cook04/16/14
0 replies

The Mean of the Mean is the Mean

The hypothesis of this theorem is that the underlying distribution has a mean. Lets see where things break down if the distribution does not have a mean.

Hubert Klein Ikkink04/15/14
0 replies

Coloring Different Data Sources in IntelliJ IDEA

The database plugin in IntelliJ IDEA is a useful tool to work with data in databases. As long as we got a JDBC driver to connect to the database we can configure a data source

Oliver Hookins04/15/14
0 replies

(Something, Something) Big Data!

SQL as an interface to big data operations is desirable for the same reasons the author found it useful, but it also introduces some performance implications that are not suited to traditional MapReduce-style jobs which tend to have completion times in the tens of minutes to hours rather than seconds.

Paul Miller04/15/14
0 replies

Infochimps CEO Jim Kaskade Talks About Acquisition and the Big Data Opportunity

Infochimps has moved in a different direction, focusing far more attention upon the tools and services required to work with data, less upon offering a place for customers to find data. We touch upon Hadoop’s role within the growing big data ecosystem, asking if it’s as important as its backers tend to claim.

Bill Jones04/15/14
0 replies

Social Media Mining With R

Get down with R and start visualizing your data in a whole new way!

Mikio Braun04/14/14
2 replies

Mikio's Guide To Real-Time Big Data

In most of these applications, you have to deal with evented data which comes in “in real-time”. Data is constantly changing and you usually want to consider the data over a certain time frame (“page views in the last hour”), instead of just taking all of the past data into account.

Sarah Ervin04/13/14
0 replies

The Best of the Week (Apr. 04): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Apr. 04 to Apr. 10). This week's best include a discussion of interoperability in the Internet of Things (IoT) discipline, a look at Apache Spark, and an adventure into Lucene's indexing process.

Sarah Ervin04/12/14
0 replies

Big Data Zone Link Roundup (Apr. 12)

For a look at what's been happening outside of the Big Data Zone, we've assembled a collection of links including the 30 best tools for data visualization, different perspectives on Hadoop and related tools, New Relic's Splunk-style Analytics, and the role of big data in the rise of the Internet of Things (IoT).

Nati Shalom04/11/14
0 replies

Notes from Big Data Business Challenges Panel Discussion

In this post the author summarizes their notes from a conference that included the following topics: Is Big Data a Big Hype? How do you make sense out of your Big Data? Do we need a new role for Chief Data Officer? What is the business value behind Big Data? Is there a good visualization tool for Big Data?

Fredric Paul04/11/14
0 replies

Building “The House of Data”

The solution was to “build the house of data” and for the time being, that means using Hadoop for what it calls internally, “hadumping.”

Rob J Hyndman04/10/14
0 replies

Interpreting Noise

What is going on here is that the com­men­ta­tors are assum­ing we live in a noise-​​free world. How­ever, the world is noisy — real data are sub­ject to ran­dom fluc­tu­a­tions, and are often also mea­sured inac­cu­rately. So to inter­pret every lit­tle fluc­tu­a­tion is silly and misleading.

Ayende Rahien04/10/14
0 replies

Sorting with Lucene

How do you do sorting on a field value? The answer is, not easily.

Matthew Dubins04/09/14
0 replies

Ontario First Nations Libraries Compared Using Ontario Open Data

The author of this article uses data concerning First Nations libraries in Ontario to demonstrate variations of data visualization in R.

Mehdi Daoudi04/08/14
0 replies

Simpson’s Paradox: DevOps’ Big Data Problem

Simpson’s Paradox is a phenomenon in which a trend identified from a population is reversed when investigated at the sub-population levels. Think about that again – conclusions drawn from an overall set of data are not indicative of the behavior of the underlying subsets.

Ayende Rahien04/08/14
0 replies

Peeking into Lucene indexing

Continuing his trip into the Lucene codebase, the author is now looking into the process indexing as they are happening. Interestingly enough, that is something that we never really had to look at before.

Joe Stein04/07/14
0 replies

Beyond MapReduce and Apache Hadoop 2.X

In this recap of a podcast with Bikas Saha and Arun Murthy, the author got to hear about some of what is in 2.4 and coming in 2.5 of Hadoop.

Istvan Szegedi04/07/14
0 replies

Apache Spark - a Fast Big Data Analytics Engine

Apache Spark is an increasingly popular alternative to replace MapReduce with a more performant execution engine but still use Hadoop HDFS as storage engine for large data sets.

Paul Miller04/07/14
0 replies

Microsoft Corporate Vice President Discusses Data, Data Platforms, and More

The Data Platform Group at Microsoft does a lot, from SQL Server and their Hadoopey HDInsight offering through to Business Intelligence and analytics capabilities which sit in or on top of the humble Excel spreadsheet.

Arthur Charpentier04/07/14
0 replies

Data News: "'Data' the Buzzword vs. Data the Actual Thing," and More

This installment of Arthur Charpentier's regular collection of data science-related links includes "'Big Data' the Buzzword vs. Data the Actual Thing, the influential interaction between data visualization and story telling, what big data can say about relationships between world leaders, and more.

Sarah Ervin04/06/14
0 replies

The Best of the Week (Mar. 28): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Mar. 28 to Apr. 03). This week's best include the Apache Solr/Lucene 4.7.1 announcement, a discussion of how some tools can make things harder instead of easier, and an overview of the upcoming ApacheCon.

Dmitry Kan04/04/14
0 replies

Implementing own LuceneQParserPlugin for Solr

One convenience of this implementation is that we can deploy the above classes in a jar under solr core's lib directory. We do not need to overhaul solr source code and deal with deploying some "custom" solr shards.