Ravi Kalakota05/25/13
765 views
0 replies
The billion dollar question facing executives everywhere: How do I monetize my data? What small data or big data monetization strategies should I adopt? Which analytical investments and strategies really increase revenue? What pilots should I run to test data monetization ideas out?
Trisha Gee05/24/13
1706 views
0 replies
You know how I keep banging on about attracting different types of people into programming? You know how we say we need to get them young?
Gary Sieling05/24/13
2624 views
0 replies
A/B testing gets a lot of attention on Hacker News, inbound.org, and other forums, and appeals to me as a data analysis exercise. As a software engineer with a practical bent, I like the concept of data analysis techniques which produce useful results while treating a system as a black box.
Steven Lott05/24/13
1603 views
0 replies
It's important to address language or platform incompatibility as consequences of technology modernization. The reason why we have to do manual conversions of software is because of the language incompatibility issue. We must convert manually when no tool can do the conversion.
Kay Cichini05/24/13
1262 views
0 replies
Quite often I have long procedures running and want to do this over night. However, my computer would still be running all night after the script has finished.
Eric Gregory05/23/13
1723 views
0 replies
In this really excellent talk from Strata 2013, Twitter's Nathan Marz walks through the challenges and serious rewards of building systems that are resilient even in the face of human error...
John Cook05/23/13
1402 views
0 replies
When you expand (x + y)n, the coefficients increase then decrease. The largest coefficient is in the middle if n is even; it’s the two in the middle if n is odd. For example, the coefficients for (1 +x)4 are 1, 4, 6, 4, 1 and the coefficients for (1 + x)5 are 1, 5, 10, 10, 5, 1.
Brian O' Neill05/23/13
1274 views
0 replies
Brian O'Neill gives us a dive into the "Big Data Quadfecta" from Philly JUG.
Col Wilson05/23/13
123 views
0 replies
I'm using a 64 bit Linux, and I was having a lot of trouble getting Calibre to convert books into the Kindle-loving mobi format...
John Cook05/22/13
1176 views
0 replies
You may have seen the joke “Enter any 12-digit prime number to continue.” I’ve seen it floating around as the punchline in several contexts.
Nick Johnson05/22/13
4705 views
1 replies
Suppose you have a very large dataset - far too large to hold in memory - with duplicate entries. You want to know how many duplicate entries, but your data isn't sorted, and it's big enough that sorting and counting is impractical. How do you estimate how many unique entries the dataset contains?
Rafał Kuć05/22/13
2451 views
0 replies
With the release of Solr 4.2 we’ve got the possibility to use the HTTP protocol to get information about Solr index structure. Let's look at the new API by example.
Mikio Braun05/22/13
1672 views
0 replies
It’s one thing to know abstractly that the Silicon Valley is home to most computer related companies, and to drive down Highway 101 and see another well known company every 30 seconds or so.
Eric Gregory05/21/13
2097 views
0 replies
Data scientist Monica Rogati discusses data scaling at LinkedIn and reflects on the evolving role of the data scientist.
Doug Turnbull05/21/13
5466 views
0 replies
A new feature of Lucene 4 – pluggable codecs – allows for the modification of Lucene’s underlying storage engine. Working with codecs and examining their output yields fascinating insights into how exactly Lucene’s search works in its most fundamental form.
Christopher Taylor05/21/13
1907 views
0 replies
As CBS showed, Intel is already selling software to recognize the rough demographic of an individual in order to deliver more a targeted advertisement, and as the report stated, “Big Brother is no longer Big Government. Big Brother is Big Business,” but without the rules that restrict government.
Arthur Charpentier05/21/13
1184 views
0 replies
Arthur Charpentier's regular data link roundup explores quantified consensus on anthropogenic global warming, compares SAS and R for business analysts, and much more. Plus: zombies (with R).
Arthur Charpentier05/20/13
2211 views
0 replies
Now, I have to confess that I have been surprised, while I was looking for mathematical models for shuffling, to find so many deterministic techniques (and results related to algebra, and cycles).
Raymond Camden05/20/13
1369 views
0 replies
Given a list of dates, how would you rewrite them so that two (or more) consecutive dates are displayed together?
Alex Holmes05/20/13
1387 views
0 replies
This is the first blog post in a series which looks at some data organization patterns in MapReduce. We’ll look at how to bucket output across multiple files in a single task, how to multiplex data across multiple files, and also how to coalesce data. These are all common patterns that are useful to have in your MapReduce toolkit.
John Cook05/20/13
157 views
0 replies
"When we’re faced with a 'prove or disprove,' we’re usually better off trying first to disprove with a counterexample, for two reasons..."
Tharindu Mathew05/19/13
1421 views
1 replies
Recently, I read this post about Richard Stallman’s (RMS) visit to India. And read more about RMS’s visit impacted the country.
George London05/19/13
1822 views
0 replies
We need a way to match queries to entities in our Postgres database. At first, this might seem like a simple problem with a simple solution, especially if you’re using the ORM; just jam the user input into an ORM filter and retrieve every matching string. But there’s a problem.
John Cook05/18/13
2164 views
0 replies
Lisp practically has no syntax. It simply has parenthesized expressions. This makes it very easy to start using the language. And above all, it makes it easy to treat code as data. Lisp macros are very powerful, and these macros are made possible by the fact that the language is simple to parse.