Screen scraping is the act of copying information that shows on a digital display so it can be used for another purpose. Visual data can be collected as raw text from on-screen elements such as a text or images that appear on the desktop, in an application or on a website. Screen scraping can be performed automatically with a scraping program or manually with an individual extracting data.
Screen scraping has a variety of uses, both ethical and unethical. Brief examples of both include either an app for banking, for gathering data from multiple accounts for a user, or for stealing data from applications. A developer might be tempted to steal code from another application to make the process of development faster and easier for themselves.
What is it used for?
Screen scrapers have been applied in a broad number of fields for a variety of use cases. Some potential uses include:
- banking applications and financial transactions;
- saving meaningful data for later use;
- to perform actions a user would on a website;
- to translate data from a legacy application to a modern application;
- for data aggregators such as price comparison websites;
- to track user profiles to see online activities; and
- to steal data.
One of the largest use cases has been in banking. Lenders may want to use screen scraping to gather a customer's financial data. Financial-based applications may use screen scraping to access multiple accounts from a user, aggregating all the information in one place. Users would need to explicitly trust the application, however, as they are trusting that organization with their accounts, customer data and passwords. Screen scraping can also be used for mortgage provider applications.
An organization might also want to use screen scraping to translate between legacy application programs and new user interfaces (UIs) so that the logic and data associated with the legacy programs can continue to be used. This option is rarely used and is only seen as an option when other methods are impractical.
This article is part of
Ultimate guide to RPA (robotic process automation)
If an individual can gain access to the underlying code in an application, the user could use screen scraping to steal the code and use it in their own application. This would save the individual time and effort or allow them to learn how a feature in an application works without permission.
A portion of the time, screen scraping will involve a third-party system. For example, screen scraping would allow a third-party organization to access data on financial transactions in a budgeting app.
Screen scraping has changed its main use cases over time. A recent example of this comes from 2019 when screen scraping began to be phased out of one of its larger use cases, banking. This was done to ease security concerns surrounding the practice. Budgeting apps now must use a single, open banking technology.
How does screen scraping work?
Screen scraping can be accomplished in several ways, depending on what the process is being used for. For example, through Java, an individual can copy and paste source code from one application into their own if they have a pathway of direct access to it.
In general, screen scraping allows a user to extract screen display data from a specific UI element or documents. Different methods can be used to obtain all the text on a page, unformatted, or all the text on a page, formatted, with exact positioning. Screen scrapers can be based around applications such as Selenium or PhantomJS, which allows users to obtain information from HTML in a browser. Unix tools, such as Shell scripts, can also be used as a simple screen scraper.
In banking, a third-party will request users share their login information so they can access financial transaction data by logging into digital portals for the customers. A budgeting app can then retrieve the incoming and outgoing transactions across accounts.
Regarding the use of transferring data from a legacy program, a data scraping program must take the data coming from the legacy program that is formatted for the screen of an older type of terminal such as an IBM 3270 display and reformat it for Windows 10 or someone using a web browser. The program must also reformat user input from the newer user interfaces (such as a Windows graphical user interface or a web browser) so that the request can be handled by the legacy application as if it came from the user of the older device and user interface.
How to prevent screen scraping
To help deter screen scaping, an organization can:
- use one-time passwords, because screen scrapers will not be able to see a password until it is used;
- use web application firewalls, which can help detect signature- or behavior-based actions;
- make sure endpoints or APIs aren't exposed;
- run fraud detection software to catch screen scraping potentially while it is happening; and/or
- set content to be shown as an image, which won't stop screen scraping from happening but will stop programs that can't translate images.
All these methods can help deter screen scraping, but it won't stop it completely. In addition, organizations must make sure that their actions won't make the end-user experience worse. For example, setting a website's content to appear as an image may make it difficult for individuals to find the page, because it will affect how search engines find the page to begin with.
Screen scraping tools
If individuals don't want to screen scrape manually, there are several tools that can help automate the process, such as:
- Macro Scheduler
- ScreenScraper Studio
These tools include automation features such as automated user interfaces, macro recorders and editors. They work with Windows or web applications. Some tools have specific features over others and focus on specific platforms.
Screen scraping vs. web scraping
While screen scraping is the process of extracting data shown on a screen, web scraping extracts data from the web. The two concepts share many similarities to the point where it can be said that web scraping is like a specific type of screen scraping. The main differences lie in where the data is being taken from and what is it being used for.
Web scraping is used to extract data exclusively from the web -- unlike screen scraping, which can also scrape data from a user's desktop or applications. This form of data extraction can be used to compare prices for goods on an e-commerce shop, for web indexing and data mining.
The process accesses the web through HTTP over a web browser and can either be done manually or automatically through a bot or web crawler.
Difference between screen scraping and data scraping
Data scraping is a variant of screen scraping that is used to copy data from documents and web applications. Data scraping is a technique where structured, human-readable data is extracted. This method is mostly used for exchanging data with a legacy system and making it readable by modern applications.
Screen scraping and open banking
Open banking is the concept of sharing secured financial information to be used by third-party developers for the creation of banking applications. This concept is based on the sharing of APIs, which allows an application to use the same API to aggregate information from different accounts into one place. This is what allows a banking app to let users look at their multiple accounts from different banks in one place.
In the past, some banking apps would gather information using screen scraping. This process would require a user to share their bank logon credentials to the third-party app. The application would then log on to the user's accounts on his or her behalf and screen scrape the needed data to show in-app.
By contrast, open banking now uses shared APIs, meaning the exact data needed is copied without requiring the user to share logon credentials. The concept was introduced in 2018 and is now becoming a standard over the use of screen scraping.
Read our comprehensive guide to robotic process automation software
RPA basics: What it is, benefits, downsides, use cases
3 intelligent process automation use cases and how they work