Statistical noise is unexplained variability within a data sample. The term noise, in this context, came from signal processing where it was used to refer to unwanted electrical or electromagnetic energy that degrades the quality of signals and data. The presence of noise means that the results of sampling might not be duplicated if the process were repeated.
Noisy data is data that’s rendered meaningless by the existence of too much variation. It’s assumed that the signal (deterministic or meaningful data) is present but obscured by the noise (random data). Originally considered to refer to corrupt data, noisy data now refers to any data that is not machine readable. As such, it includes unstructured text as well as any data that has been altered in some way that it is no longer compatible with the program used to create it.
The problem of separating out the noise from the signal has long been a focus in statistics, so that the meaningful data could inform the researchers. Often, however, the portion of noisy data that is meaningful is too small to be useful.
In a non-scientific context, people often use the term statistical noise to dismiss data that doesn’t please them or conform to their expectations.