Big Data

Big Data’s first IPO

Big Data’s first IPO has hit NASDAQ and I’m sure there will be plenty more to come. Why’s this a big deal? Big Data, defined as crunching numbers in the Giga, Tera and Petabyte levels, is what is taking business intelligence to the next level. Just for comparison, your PC or Mac probably has about 100 to 200 GigaBytes of…

Continue reading

Share

Overview of Hive for Hadoop

Hive is a data warehousing software system that sits on top of Hadoop and facilitates querying by users not literate in MapReduce. Hive was originally developed by Facebook and now enjoys support by many companies after Facebook donated the software to the Apache Foundation. Hadoop by itself has no notion of data types or formats, it is essentially just a…

Continue reading

Share

Cloud Technologies: ZooKeeper

ZooKeeper synchronizes machines across various processes within large clusters. It is one of the cloud based technologies that is gaining notable acceptance. The biggest public adopters include Rackspace, Yahoo and Zynga. Imagine maintaining configuration information that each machine in a 1000 node cluster must be aware of and sensitive to any changes. ZooKeeper is specifically for solving these types of…

Continue reading

Share

ETL for Hadoop- Sqoop

Enter Sqoop, the ETL (Extraction, Transformation, Load) tool for Hadoop. Hadoop runs on data, the bulk of it might be in flat files, but must include data across a business’ entire platform. In a classic data warehouse, ETL tools are required to merge data from operational databases into the warehouse. In the same fashion, to effectively provide useful aggregations, it…

Continue reading

Share

Oracle’s new Hadoop product??

Oracle, king of normalized, structured data, has announced its entry into the real Big Data field. Oracle’s no fan of open source projects and has “sought to expose their limitations and sow some serious doubt over their open-source roots” of the NoSql contenders on the market. And yet, they are forced to use Hadoop the best product out there for…

Continue reading

Share

Improving Hadoop documentation and configuration

Ari Rabkin, an intern with Cloudera and a Ph’d student at UC Berkeley, came up with a tool to improve documentation of configuration options in Hadoop. Many open source projects struggle with documentation. With limited time available, open source developers tend to focus on the highest priority: writing code. Hadoop is such a powerful piece of software that too much…

Continue reading

Share

Switch to our mobile site

Google Analytics integration offered by Wordpress Google Analytics Plugin