Calculating Books on 40GB Hard Drive: Assumptions & Solutions

  • Thread starter Pauly Man
  • Start date
  • Tags
    Information
In summary, the conversation revolves around the question of how many books can fit onto a 40GB hard drive given certain assumptions about character encoding and page length. Different techniques such as a naive encoding and a more sophisticated Huffman coding are discussed, and it is estimated that between 66,666 and 80,000 books can fit on the drive without compression. The conversation also touches on the use of compression to fit even more books on the drive.
  • #1
Pauly Man
129
0
How many books could you write onto a 40GB hard drive given the following assumptions?

Assumptions-

  • Must be able to uniquely encode the numerals 0-9
  • Must be able to uniquely encode the lower and upper case letters of the alphabet
  • Must be able to uniquely encode spaces and full stops
  • A book is assumed to have approximately 400 pages in it

Basically the problem is asking these questions:

  • How many bits does it take to uniquely encode a character as described in the above assumptions?
  • How many characters can be encoded onto a 40GB hard drive?
  • How many books does that correspond to?

I want to see how people go about finding a solution to this problem. I got bored on the train to university this morning and wasted the time thinking about this problem. When a few people have had a go at the problem I'll post my working out.
 
Mathematics news on Phys.org
  • #2
Take the total number of symbols in the alphabet you've specified to be N.

If you want to see how many books you can fit naively, just assume that you need ceil(log2(N)) bits to encode each symbol. The number of books you can store using this naive encoding is extremely easy to calculate.

If you want to be more sophisticated and see how many books you can actually fit into 40GB, perform a frequency analysis over a good-sized sample, and create a Huffman coding for your alphabet. Encode your books using this coding. You may then want to apply some kind of sliding-window or dictionary-based compression scheme. The number of books you can store using these techniques is likely to be something on the order of seven times as many as you can store in the naive encoding, largely because English text has only (IIRC) 1.8 bits per symbol of irreducible information.

- Warren
 
  • #3
You have required that there be 63 distinct symbols. n bits can encode 2^n symbols so we would need 6 bits (2^6= 63). This is basically the "ceil(log2(N))" chroot gave.

There are, however, a few pieces of information you did not give. You say "A book is assumed to have approximately 400 pages in it" but you DON'T say how many symbols to the page! Also, while you talk about individual characters, your "40 GB" is 40 million BYTES (well actually 40*(1,024,000)= 40,960,000 bytess). You don't say how we are to assume bits are fit into bytes. Assuming that we ONLY need the 63 symbols given and are trying to fit as many symbols as possible onto a drive, so that we abut one 6 bit symbol to another, 40 GB= 327680000 or 327680000/6 = 54,613,333 symbols on the drive (with 2 bits left over).

That would be a lot of work and it would be much simpler to use 8 bits, the entire byte, to represent a symbol as is actually done.
In that case we could fit 327680000/8= 40,960,000 symbols on a drive,
allowing for 2^8= 256 different symbols.

We still can't answer "How many books?" because you haven't told us how many symbols to a page (or how many symbols to a book).



If a drive can hold 40GB
 
  • #4
And there are also some fancy space saving techniques like saving multiple occurances of a single string in multiple books as one entry on the hard drive. In an extreme case, you could fit an infinite number of books in your 40 GB hard drive if they were all the same by just saving one copy and symbolically linking all other copies to that one book.

Hurkyl
 
  • #5
1GB=1024MB=1024*1024KB ~ 1 billion bytes=8 billion bits...
Or am I wrong ?
 
  • #6
Good points Hurkyl.

I used microsoft word to work out that on an A4 page, you can fit around 2500-3000 symbols on a page (using 12 sized font and times new roman), and since an A4 page is roughly twice the size of a typical paperback page, you could fit roughly 1250-1500 symbols to a paperback page. Assuming 400 pages to a book then you could typically fit 500,000-600,000 symbols in a book.

Using your 54,613,333 symbols on a 40GB hard drive then you could fit 91-109 books on a 40GB hard drive.
 
  • #7
^^^ Whoa, that's way to low. Uncompressed:

There are 64 characters you require, so that 6 bits. Let's round up to ASCII, 8 bits/character, to give us some breathing space: 1char/byte.

So 40 billion symbols in a 40gig drive = 80,000 books

Plain Huffman compression should at least double or triple this, and a book-optimized encoding scheme even more -- with chroot's 7x figure, ~half a million books.
 
  • #8
Originally posted by damgo

So 40 billion symbols in a 40gig drive = 80,000 books


I somehow lost an entire order of magnitude in my calculations. You are entirely correct, it is between 66,666 and 80,000 books. That seems much more reasonable than approximately 100 books.

I have ignored compression throughout this problem, as I wanted to see the numbers that popped out assuming only very basic encoding. Those numbers above are still very impressive indeed.
 
Last edited:
  • #9
1 GB is defined (by hard-drive manufacturers anyway) to be 10^9 (1 billion) bytes.

- Warren
 

FAQ: Calculating Books on 40GB Hard Drive: Assumptions & Solutions

What is the storage capacity of a 40GB hard drive?

The storage capacity of a 40GB hard drive is 40 gigabytes, or approximately 40 billion bytes. This is equivalent to about 8,000 songs or 10,000 photos.

How many books can be stored on a 40GB hard drive?

The number of books that can be stored on a 40GB hard drive depends on the size of each book. On average, a digital book is about 2 MB in size, which means a 40GB hard drive can store approximately 20,000 books.

What assumptions are made when calculating the number of books on a 40GB hard drive?

When calculating the number of books that can be stored on a 40GB hard drive, it is assumed that the average size of a digital book is 2 MB, and that the hard drive is solely used for storing books. It also assumes that there is no additional data or programs taking up space on the hard drive.

Can a 40GB hard drive hold more books if they are compressed?

Yes, a 40GB hard drive can hold more books if they are compressed. Compressed books take up less space, so more can be stored on the hard drive. However, the actual number of books that can be stored will depend on the compression method used and the size of each book.

Are there any solutions for increasing the number of books that can be stored on a 40GB hard drive?

Yes, there are a few solutions for increasing the number of books that can be stored on a 40GB hard drive. One option is to compress the books to take up less space, as mentioned in the previous question. Another option is to upgrade to a larger hard drive with a higher storage capacity. Additionally, utilizing cloud storage or external hard drives can also help increase the storage capacity for books.

Back
Top