
Product test
NAS: data backup in the modern home
by Dominik Bärlocher
I've already seen the film "Valerian and the City of a Thousand Planets". Apart from the spectacle, one thing in particular has stayed with me: The huge data store in the space city Alpha. That's why I decided to look into the topic.
Luc Besson's new cinema film "Valerian and the City of a Thousand Planets" creates a world that only exists in the classic comic. Spaceships flying at faster-than-light speeds through space in the 28th century and somewhere out there: The Alpha space station. Alpha was once something like the International Space Station ISS, which has been orbiting the Earth for 18 years now. (A small detail in passing: the ISS's call sign is also "Alpha"). In the film, Alpha is continually expanded, diplomatic contacts are established between Earth and space peoples and at some point Alpha becomes too big. As the space station has reached the size of a moon and its gravity affects the Earth, massive engines are installed and Alpha embarks on a long journey through space.
Everywhere Alpha goes, and no matter what races settle on the ever-expanding station: The station's government collects knowledge. In the 28th century, when the plot about agents Valerian and Laureline kicks off, an entire neighbourhood - Alpha is consistently regarded as a city - is dedicated solely to storing this knowledge. This neighbourhood is shown in two short scenes: Walls full of golden storage discs, maintained by the alien race of Omelites, who swap what must be failing hard drives and manage the accumulated knowledge of over 3300 planets. According to the film, this district does not even contain the personal data of the 30 million inhabitants of Alpha, but only the collected knowledge.
Madness.
Of course, the film offers far more than just an insane amount of data. There's the fast-paced plot, the thoroughly convincing special effects, my favourite supporting character in the form of the Boulan-Bathor stylist - let's be honest, nobody is as genuinely loyal to their job as she is - and the titular hero Valerian with his partner Laureline, whose name could have unabashedly appeared in the film's title. Because as a comic fan, I know that Laureline is more than just Valerian's appendage.
In the end, however, the technology enthusiast in me got stuck on one thought: Data storage. Because hard drives are something we rarely think about. Most computers come pre-installed with a hard drive, if it breaks it is replaced by any other part and external storage is often as important to us as Hans. The main thing is that it works. I would therefore like to use this occasion, along with a film recommendation, to quickly talk about storage media. Good transition, huh?
Data is extremely important in our world. More and more business is being conducted digitally. After all, who wants to have folders and folders of paper at home when it all fits on a 3.5-inch disc? But the questions that hardly anyone asks themselves are the following:
Currently, I recommend a NAS as a long-term solution, which I am currently testing.
I'm not so keen on clouds, even though I use Google Drive. I like it when the data sovereignty lies with me. My recommended setup, because the IronWolf discs offer additional features on the Synology NAS:
WD Reds are installed in my PC. Two 2 TB discs that are designed for 24-hour operation and an SSD whose content is essentially "everything I can install". The SSD is the disc that I can simply format in the event of a virus attack or similar and then set up again. This is how I try to make my storage as durable as possible. This has worked quite well for a few years now.
M.2 discs are installed in my server computer, which does nothing at home other than compute data and send it to the devices I work on outside my home network. The discs look like a RAM bar and are still pretty fast. I replaced the original drive that came with the NUC because I have a Linux distro running on the NUC and didn't want to destroy a perfectly good Windows 10 configuration.
I also have various USB sticks of all kinds, plus a few external USB hard drives. I get them from manufacturers with press data and so, after formatting, they also find their way into my everyday office life. Personally, however, I have had good experiences with HyperX sticks. The only important thing is that the form factor of the sticks is as narrow as possible and that they are USB 3-compatible.
Let's calculate how much data storage I have at home:
I admit, my NAS is a bit excessively large. If you have any criticism, suggestions for improvement or ideas, please let me know in a comment. Because even if all the PC parts are eventually rubbish, storage is important to me.
This is just space for my personal data, i.e. holiday photos, scanned invoices, texts, films, music and whatever else is lying around on a hard drive. The oldest data on my discs dates back to before the year 2000, and during this calculation I wondered how public databases develop.
If I already have 48.7 TB, then Wikipedia must be gigantic. Because there's hardly anything that Wikipedia doesn't know or doesn't have any information about. How big is Wikipedia actually?
Since Wikipedia is constantly growing, I simply took a look at the main page on 14 July. There, Wikipedia states that the ten largest language versions together have over 17,708,000 articles. But I'm interested in the size of the data in gigabytes. Or terabytes.
For the sake of simplicity, let's assume that the Earth in "Valerian and the City of a Thousand Planets" stores its knowledge in one language. At this point in time, that would be English. This also makes sense because in Valerian every being is equipped with a translator implant that enables communication between species. So there is no need for different languages. Only the file format of the stored knowledge has to match. So we use the English Wikipedia version, as this is the largest edition to date with 5,430,000 articles.
Wikipedia is nice enough to provide us with the following data:
As of June 2015, the dump of all pages with complete edit history in XML format at enwiki dump progress on 20150602 is about 100 GB compressed using 7-Zip, and 10 TB uncompressed.
Then there is Wikimedia Commons, the Wikipedias image database. All wikis can access this database.
The size of the media files in Wikimedia Commons, which includes the images, videos and other media used across all the language-specific Wikipedias was described as well over 23 TB near the end of 2014.
Since we don't know if the data in the City of a Thousand Planets is compressed, we take the uncompressed data as a reference. So we have 33 TB of data, uncompressed, which makes up the collective knowledge of English speakers.
Let's extrapolate. Since Wikipedia's development over the years follows a rather complex formula and we have no data for the future, we turn the curve into a line. To do this, we calculate the average growth of Wikipedia.
The current size of the English-language Wikipedia is 33 terabytes, i.e. 33,000,000 MB, which accounts for 5,440,850 articles.
34 603 008 / 5 440 850 = 6,359853331740445
The average size of a Wiki article is 6.36 MB. With images and all edits and everything else.
In order to find the trend line according to which Wikipedia is growing, we calculate the average growth based on the number of articles. That's 302,269,4444 articles per year, which corresponds to data growth of 1,922,389,333 MB per year.
As a final step, let's do the maths into the future: how big will Wikipedia be in the time of Valerian and Laureline? The film is set in the 28th century, so sometime between 2700 and 2800. Assuming the setting of the film is the year 2700, the calculation looks like this:
(2700 - 2002) * 302 269,444 * 6,35985331740445 = 1341827755
The data volume of 1,341,827,755 MB or 1,249674479 terabytes.
In the year 2799, things look a little different: 265 052 711 articles correspond to 1 534 066 688 megabytes, or 1463 terabytes.
But we can do better than that, because computer people know that if the data size exceeds the value 1024, it gets a new unit. 1024 megabytes is a gigabyte. 1024 gigabytes is a terabyte. 1024 terabytes are a petabyte. So the Wikipedia of the Earth will be between 1,249674479 and 1,428710938 petabytes in size at the time of the film.
But that's not the end of the story. The film states that the knowledge of the approximately 3300 peoples living on Alpha is collected in the database. Assuming they all started collecting data in 2002, we can multiply that.
3300 * 1.249674479 = 4123.93568007
And
3300 * 1.428710938 = 4714.7460954
This means that the database of the City of a Thousand Planets is between 4.0 and 4.6 exabytes in size. Uncompressed, of course. I also sent an enquiry to Pornhub about the size of their database. Just out of interest, so that I could do a similar calculation. The answer from Chris Jackson, Communications Director Pornhub:
We saw your recent inquiry and decided that it's not something we'd like to pursue at this time. Appreciate you thinking of us!
What a shame
.
This is all amateur maths now. But I'm sure you or one of your friends can do better. If you can do it and calculate the dates of Valerian's and Laureline's times more precisely, with curves and leap years and leap seconds, please send me an email with your figures.
And finally this: According to our calculations, Wikipedia will cross the petabyte threshold in 2530, on 5 March 2530 at 02:10:55 to be precise.
Pathé Films has kindly sent us a bag full of goodies that we would like to give away to you. But you have to answer a question as creatively as possible:
If you had access to the collective knowledge of 3300 planets, what would you look up?
The prizes:
Please write in the comments which prize you would like and we will try to send you the prize you have ordered. The competition runs until 26 July 2017, after which we will evaluate the results. And free tip: Your answers do not necessarily have to be related to the competition.
Pathé Films was kind enough to send us more prizes. So we now have three complete sets from:
In addition: anyone who does the wiki calculation according to all the rules of the art and not just calculates linearly, but with a curve and everything, gets a free ticket for the film.
If you want to secure an extra ticket, you can take part in the second competition
.
Journalist. Author. Hacker. A storyteller searching for boundaries, secrets and taboos – putting the world to paper. Not because I can but because I can’t not.