Big Data/BI Zone is brought to you in partnership with:
  • submit to reddit
Doug Turnbull06/20/13
342 views
0 replies

Lockdown Solr with IIS as a Reverse Proxy

We’ve been developing rich client-side applications that talk directly to Solr’s HTTP interface from Javascript — requiring a publicly accessible Solr. One concern that you’ll naturally have with Solr is that by default Solr’s HTTP API has no concept of security.

John Cook06/20/13
631 views
0 replies

Gelfand's Question

The MathWorld article on Gelfand’s question says that the answer is no for values of n less than 100,000. That range seemed small to me. My intuition was that you’d need to try larger values of n to have a reasonable chance of finding a solution.

Eric Gregory06/20/13
263 views
0 replies

Graph Search: Ontologies and Content Intelligence

Jan Aasman of Franz and Matthieu Jonglez of Smartlogic explore the meaning, theory, and manifold purposes of Content Intelligence and Graph Search, including an in-depth enterprise use case.

Rafał Kuć06/20/13
632 views
0 replies

Apache Lucene and Solr 4.3.1

Today Apache Lucene and Solr PMC announced another version of Apache Lucene library and Apache Solr search server numbred 4.3.1

Eric Genesky06/19/13
427 views
0 replies

Big Data in Practice with Cassandra

Big Data is a fast growing trend in enterprise applications that comes with a novel promise compare to past technological revolutions . . .

Kay Cichini06/19/13
368 views
0 replies

Use R to Bulk-Download Digital Elevation Data with 1" Resolution

Here's a little r-script to convenientely download high quality digital elevation data, i.e. for the Alps, from HERE . . .

Trevor Parsons06/19/13
225 views
0 replies

Musings from an AWS Meetup

After opening up our new Boston office earlier this year (for any of you locals we’re down in the innovation district on Summer St) we finally got the chance to attend out first AWS Boston meetup.

Maarten Ectors06/18/13
996 views
0 replies

Presto – Facebook's Exabyte-Scale Query Engine

Presto is an ANSI-SQL compatible real-time data warehouse query engine so existing data tools should be working with it unlike Hive which needed special integration.

Pushpalanka Jay...06/18/13
547 views
0 replies

Useful Commands to Deal with SVN

The commands I came across with, while working with svn in Linux.

Christopher Taylor06/18/13
398 views
0 replies

Analytics need for speed can cause you to crash and burn

Technology is allowing us to harness big data and understand it in milliseconds but will this quest for speed be your ultimate undoing?

Eric Genesky06/18/13
62 views
0 replies

DevOps and Security

Helen Bravo, of the Open Web Application Security Project, presents a 35-minute discussion at Snowfroc 2013.

Justin Bozonier06/17/13
2749 views
0 replies

Fuzzy Puzzles: Having My Baby

A friend at work, Drew Fustin, proposed this puzzle in our group chat one day as I was meandering on about Bayesian shiny things.

Eric Gregory06/17/13
1103 views
0 replies

Building a Data Science Platform in Scala

John A. De Goes, CTO of Precog, discusses PrecogDB -- a data science platform in Scala.

Ravi Kalakota06/17/13
1704 views
0 replies

NSA PRISM – The Mother of all Big Data Projects

As a data engineer and scientist, I have been following the NSA PRISM raw intelligence mining program with great interest. The engineering complexity, breadth and scale is simply amazing compared to say credit card analytics (Fair Issac) or marketing analytics firms like Acxiom.

Arthur Charpentier06/17/13
1283 views
0 replies

Visualizing Densities of Spatial Processes

We recently uploaded a revised version of our work, with Ewen Gallic on Visualizing spatial processes using Ripley’s correction: an application to bodily-injury car accident location.

Eric Gregory06/16/13
1466 views
0 replies

Data Science and Predictive Modeling at LinkedIn

Monica Rogati, Senior Data Scientist at LinkedIn, discusses data science and predictive modeling.

Anand Epl06/16/13
1437 views
0 replies

OCAJP 7 Object Lifecycle in Java

In the real-world, we can find so many objects around us, for example Cars, Birds, Humans etc. All these objects have a state and behavior. If we consider a Car then it have some data speed, lights on, direction, etc. and have some actions turn right, accelerate, turn lights on, etc.

Kai Wähner06/15/13
2034 views
0 replies

How to Create intelligent Business Processes Thanks to Big Data

BPM is established, tools are stable, many companies use it successfully. However, today’s business processes are based on data from relational databases or web services.

Pieter Humphey06/15/13
1772 views
0 replies

Targeting Big Data: Spring XD 1.0 Milestone 1 Released

Spring XD makes it easy to solve common big data problems such as data ingestion and export, real-time analytics, and batch workflow orchestration.

Todd Homa06/14/13
1772 views
0 replies

Cassandra Bulk CDC Extract

The development team moved their persistence layer from Oracle to Cassandra. How does the Data Warehouse team extract data for reporting?

John Cook06/14/13
2217 views
0 replies

How Many Lights Can You Turn On?

Suppose you have a large n × n grid of lights, some turned on and some turned off. Along the side of each row is a switch that can toggle the lights in that row, turning on lights that were originally off and vice versa. There are similar switches along the top that can toggle the lights in each column. How many lights can you turn on?

Eric Gregory06/14/13
1404 views
0 replies

How Data Scientists Solve Problems

This fifteen minute video from Troy Sadkowsky explores how data scientists approach problem-solving -- starting with recognizing your problem for what it is.

Nitin Aggarwal06/14/13
209 views
0 replies

Running Mediator Instances Issue - Oracle SOA 11g

We encountered an issue with one of our clients when the SOA Purge wasn’t being very effective due to the running mediator instances even though the rest of the flow trace had completed, This wasn’t an issue for business as such however in most cases caused them to fall out of the criteria for Purge due to the state in which these mediator instances were in.

John Cook06/13/13
1555 views
0 replies

Computing Skewness and Kurtosis in One Pass

If you compute the standard deviation of a data set by directly implementing the definition, you’ll need to pass through the data twice: once to find the mean, then a second time to accumulate the squared differences from the mean.

Nishant Chandra06/13/13
1770 views
0 replies

Graph Analytics: Discovering the Undiscovered!

Graph analysis and big data are overlapping areas and then I came across this piece of text which beautifully summarizes the difficulty of discovering the unknown.