Header
TwinTree




The great data garbage heap

Rinckside 2020; 31,2: 3-4.


mong the plethora of disturbing news, some “long-term” topics easily get lost. Some days ago I realized that I had too many files in my back-ups. I should and have deleted what I believe I won’t need any more — or those I had forgotten that they existed. It took me several days. During the waiting times while they were deleted I pondered: How big is really big data?

The German weekly Die Zeit revealed in 2015 that the German Federal Intelligence Service BND rakes in 220 million worldwide telecommunication metadata every day and passes them on to its American counterparts NSA and CIA. The German agency states that they keep these data for “only” half a year [1]. Tens of thousands of people are employed for this kind of non-targeted mass screening.

I have written about (medical) screening and its outcomes several times, the last time in 2014 [2]. There are always pros and cons; the supporters of the pros usually claim that they want to save lives by finding, in the case of secret services, “early” terrorists or, in the case of medical diagnostics, “early” cancer. Who can oppose this — in particular if you are not being asked. Taxpayers’ money is just spent without asking the payer.

However, I don’t want to discuss screening or mass surveillance, but rather look at the problem of data storage and selection, in our case in medicine and in radiology in particular.


There is a collecting and archiving mania.



spaceholder red600   There is a collecting and archiving mania. Today, everything in radiology has to be archived. Data does not really age, we are told, although in reality it does, and data storage carriers do too — rapidly so.

Suppose we are in the year 2040. Google has finally been broken into 50 smaller, independent companies by the anti-trust authorities. Because of the hilarious amount of data doing a Google search does not show any data and publications created before 2020. If you have published a paper in 2014 it’s lost in the cloud — if there still is a cloud. If you haven’t paid your cloud fees your data pool is gone anyway. Or, perhaps somebody has accessed your data, processed them for purposes unknown to you, or altered them. Perhaps your data have been destroyed without your knowledge. Whom can you trust? Nobody.

There is another problem with the “cloud” — a term that sounds rather pleasant, white puffy clouds in front of blue skies, the perfect picture selling a green and clean environment. However, this kind of data storage, data crunching or, often, data cemetery facilities is definitely not clean and environmentally friendly; there is no sustainability, on the contrary. It needs an outrageous amount of energy for the server machines, for cooling and air conditioning.

In addition, the wide scale potential of on-line banking, social networking, e-commerce, e-government, information processing and others, result in unthought-of server workloads.

Then this question arises: Once we have placed our trust in a cloud provider — are we then completely at its mercy? It remains a fact that you give your data into the hands of strangers. What can we do against dependence?

Cloud computing can be an incalculable risk. Of course you can keep your data under your control if you don’t want to hand it over to the big monopolies. However, which hospital, which private radiology office has the capacity and the financial resources to store all image and written data for 30 years? Handing out copies of the images on CDs to the patients is also impractical because CDs are not a reliable storage media.

spaceholder red600   The explosion of data is being countered by an increasing ignorance of how it came into being. We have more and more information, but less and less information about the information itself. How do you sort out data garbage? Old formats are no longer readable. People create enormous archives of digital content, but after a short while they don't know what's inside [3].

I have had the unpleasant experience that I cannot read images made in scientific studies thirty years ago: they were stored on magnetic tapes, then on floppy disks, later on diskettes, then on CDs, then on USB sticks or hard disks. The half-life of digital media carriers is getting shorter and shorter. Just think of a CD-ROM or a VHS cassette. They are significantly less resistant to aging than books, and the data can no longer be read after just two to three decades. More so, there is no software that can decipher the early image formats. This holds not only for images but also for text files. For instance, Adobe Pagemaker was a leading layout software for publications, among them scientific papers and books. In the meantime Adobe has discontinued their erstwhile Pagemaker format; it cannot be deciphered any more today.

Future generations will suffer from a kind of digital amnesia because old formats are no longer readable. Will they have to return to printed books?

There are only unlikely or unappealing solutions — thus, the topic will be adjourned sine die, which means indefinitely. Let’s shoot it into the cloud to be processed there.



References

1. Biermann K. BND stores 220 million telephone data – every day. Zeit Online (in English). 2 February 2015.
2. Rinck PA. Screening mammography: the sequel. Rinckside 2014; 25,3: 5-6.
3. Rinck PA. Datarrhea – the great data revolution. Rinckside 2016; 27,3: 7-8.



Citation: Rinck PA. The great data garbage heap. Rinckside 2020; 31,2: 3-4.

A digest version of this column was published as:
Cloud computing isn't the answer to all our prayers.
Aunt Minnie Europe. Maverinck. 13 April 2020.


TurnPrevPage TurnNextPage

Rinckside • ISSN 2364-3889
is pub­lish­ed both in an elec­tro­nic and in a prin­ted ver­sion. It is listed by the Ger­man Na­tio­nal Lib­rary.


Cover-Vol31


→ Print version (pdf).



The Author

PAR

Rinck is my last name, and a rink is an area of com­bat or con­test.

Rink­side means by the rink. In a double mean­ing “Rinck­side” means the page by Rinck. Some­times I could also imagine “Rinck­sighs”, “Rinck­sights” or “Rinck­sites” …
⇒ more

Contact



Bulletin Board

00-f1

00-f2

00-f3