Apache Giraph

Apache Giraph is real-time graph processing software that is mostly used to analyze social media data. Yahoo, Facebook and Twitter all are users of Giraph, tweaking the software to their own purposes.

The software enables natural language-based relational searches that make social network data more informative. Giraph can be used, for example, to show connections between entities in a social graph. Entities in a graph are called nodes and the relationships or connections between them are called edges. Search queries like “Popular Japanese restaurants” or “Friend's pictures in Egypt” return results from your own social network connections as well as public results based on the keywords.

Facebook chose Giraph for its Graph Search because of the software's scalability and speed: Engineers were able to tweak its performance to analyze a trillion edges in under four minutes, using it on a network of 200 commodity computers. Previous benchmarks on other social networks include Yahoo!/Altavista data where it graphed 6.6 billion edges and Twitter where it graphed 1.5 billion edges. Yahoo! developed Giraph and turned it over to the Apache Software Foundation for future management.

Due to the enhanced searching ability that Giraph enables, users who may have liberally shared embarrassing material in the past may want to adjust their privacy settings: Searches may begin to show things people posted long ago that would normally stay hidden far back in the user’s timeline.

Another issue is the fact that unpredictable things can affect search accuracy. For examples, users often click the Like button for reasons other than genuinely liking a brand, service, product or company. Such reasons include contest requirements, support for the page owner and use as a kind of “I was there” marker.

See a video introduction to Giraph:

This was last updated in June 2015

Continue Reading About Apache Giraph

Dig Deeper on Database management