Apache Spark News
October 03, 2017
Everybody is loving that thing we're all calling machine learning aren't they? Splunk wants to make it mainstream, several other firms want to demystify it (or probably democratise it... or both) ...
August 31, 2017
SQL on Hadoop arrived -- so did SQL on Spark. Now, SQL on Kafka is emerging to provide a different way to look at Kafka data as it streams through the enterprise.
August 10, 2017
Information Builders has released a free developer version of its big data integration platform, which could potentially help channel companies take on Hadoop projects.
June 06, 2017
Databricks brings new features to its managed Spark platform -- as well as to open source Spark -- that it hopes will make the computing engine more widely usable.
Apache Spark Get Started
Bring yourself up to speed with our introductory content
IoT will help energy suppliers meet global energy demands, but the challenge lies in finding connectivity which offers both security and simplicity. Continue Reading
This Essential Guide explores enterprise data analytics strategies and how to select the right infrastructure, management tactics and technologies for your organization. Continue Reading
The desire to accelerate operational decision-making processes is leading organizations looking for a competitive edge to deploy streaming analytics platforms fed by real-time data. Continue Reading
Evaluate Apache Spark Vendors & Products
Weigh the pros and cons of technologies, products and projects you are considering.
Has the Hadoop elephant left the room? At NBC, ad analytics have evolved in Hadoop style, but with Spark and S3 at the core, as discussed at the Big Data Innovation Summit in Boston. Continue Reading
It's not too late to consider signing up for data management conferences in June. Here's a quick rundown of four events focused on Hadoop, Spark, data governance and other topics. Continue Reading
In this Talking Data podcast, Spark users are finding that latency and development challenges can make it difficult to start doing machine learning with Spark systems. Continue Reading
Manage Apache Spark
Learn to apply best practices and optimize your operations.
Organizations hungry for more revenue are using Hadoop and other big data technologies to break their existing business molds and pursue new strategies and product offerings. Continue Reading
Organizations with big data environments are starting to prepare data for analysis before making it available to data scientists and other users, instead of leaving the work to them. Continue Reading
Processing in big data systems can slow to a crawl if queries are not properly tuned or workloads not well balanced -- issues that call for careful monitoring of clusters. Continue Reading
Problem Solve Apache Spark Issues
We’ve gathered up expert advice and tips from professionals like you so that the answers you need are always available.
Data center networking is no longer just a maze of physical cables; it's a tangled web of overlays and firewall rules. Database management is more than ensuring you have enough capacity as your company collects increasing volumes of data and expects real-time analysis.
Yet users demand simplicity; they expect the underlying infrastructure to be invisible. Executives want IT to function like a utility. When they turn on the tap, they don't care about the plumbing required to deliver the water; they simply want it to work. This is the tension threatening to plunge IT shops into chaos -- to build and support ever more complex data center infrastructure while making it appear effortless.
Nowhere is this tension more clear than the growing demand to store and digest big data. However, it's not just about big data networking today. It's about doing something with that data -- and doing it now. Curiously, technologies that once aimed to streamline operations have sometimes led to more complexity. Networking overlays, for example, have given operators the ability to steer traffic and create logical resource pools, but they also come with additional management overhead.
All these topics and much more in this month's Modern Infrastructure.Continue Reading
The challenges encountered in deriving business benefits from big data are huge, but so are the rewards. Hadoop and related technologies are easing those challenges to the point where companies are willing to graduate from experimental to full-blown big data analytics deployments. Still, the march toward that goal can be long and arduous, and not just from a technological and architectural standpoint. Before taking the plunge, big data users, including data scientists, managers and evangelists, are faced with the sometimes monumental task of justifying big data's return on investment to business executives focused on competition, profit margins and allocation of funds. "For a lot of organizations like ours, big data has not yet become a core foundation of running the business," said Beata Puncevic, director of analytics, data engineering and data management at Blue Cross Blue Shield of Michigan. Yet, actionable insights gained from big data analytics can be indispensable in driving revenue, reducing costs and developing new products.
This handbook on big data analytics examines the trials and tribulations of big data users who are on the front lines, devising and implementing partial and full-blown applications. In the first feature, editor Craig Stedman interviews battle-tested IT and analytics warriors from Blue Cross, Macy's and Progressive Insurance who reveal the business challenges in justifying the worthiness of big data applications. In the second feature, Stedman explains how real-time big data analytics is helping companies like Comcast and eBay to move quickly on massive amounts of incoming information. And in the third feature, reporter Ed Burns spotlights the decisions at Neilsen and Nasdaq to run or not to run big data systems in the cloud.Continue Reading
In the business intelligence and analytics world, data lakes are their own region -- one in which today's multifarious forms of information can be stored in their native forms until used -- and cheaply at that. But these vast storage repositories, which are based on open source Apache Hadoop are not for those seeking rest and recreation. They take serious work -- and often sought-after skills -- to build and maintain.
In this guide, SearchDataManagement peers across several types of data lakes to discover how different organizations today are implementing them. First, editor Craig Stedman talks to three companies that have taken the dive -- and learns about the challenges and benefits presented to each. Next, reporter Jack Vaughan tells one executive's story -- how a Hadoop-based system opened new doors for his company. Finally, Vaughan quizzes Forrester analyst Mike Gualtieri on whether data lakes can compete with data warehouses to quench organizations' thirst for storing and analyzing business data. Continue Reading