Comp Sci Why Are Hotspots Rare in GFS with Sequential Reads of Large Multi-Chunk Files?

shivajikobardan · Jan 8, 2022

In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?

hotspot-: region of computer program where a high proportion of executed instructions occur

Lazy space allocation-:https://stackoverflow.com/questions/18109582/what-is-lazy-space-allocation-in-google-file-system

With lazy space allocation, the physical allocation of space is delayed as long as possible, until data at the size of the chunk size (in GFS's case, 64 MB according the 2003 paper) is accumulated.
Large chunk size in GFS-:
=>A large chunk size, even with lazy space allocation has its disadvantages.
=> A small file consists of a small number of chunks, perhaps just one.
=> The chunkservers storing those chunks may become hot spots if many clients are accessing the same file.
=> In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially.
I don't understand how hotspots are no issue when we read large multi chunk files sequentially. They say hotspots are issue if clients are accessing same small file(file of just 1 chunk).

I will represent scenario where small file=small no. of chunks is being accesed by multiple clients.

it makes sense why chunkservers will be hotspot in this case as they will be active if they are being accessed by multiple clients.
but it absolutely doesn't make sense when the research paper say " In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially." What's the difference. If I imagine a scenario like above, here file is made up of multiple chunks and rest is same, what difference is made here?

jedishrfu · Jan 8, 2022

Collision issues can occur when multiple clients try to read / write or append to a common file. When writing only one client is given permission to write and all others must wait until the operation is complete before they can access the file.

shivajikobardan · Jan 8, 2022

jedishrfu said:

Collision issues can occur when multiple clients try to read / write or append to a common file.

Alright I get this.

jedishrfu said:

When writing only one client is given permission to write and all others must wait until the operation is complete before they can access the file.

So what? I don't get this.
In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?
is my question

pbuk · Jan 9, 2022

shivajikobardan said:

In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?

Because

shivajikobardan said:

our applications mostly read large multi chunk files sequentially

then the situation where multiple clients try to read or write the same chunk at the same time does not occur often so it has not been a major issue.

shivajikobardan · Jan 9, 2022

pbuk said:

Because

then the situation where multiple clients try to read or write the same chunk at the same time does not occur often so it has not been a major issue.

can you tell me why this. I have one example but I prefer listening to your idea.

jedishrfu · Jan 10, 2022

Hmm so we are giving you helpful suggestions here and you have an example but don't want to share until you hear someone else’s example first.

Thats not being very open. I would have provided my example which would get me even more comments but now I guess Ill just wait and see what happens.

If your example is proprietary to your work then I understand but must also say you should not be discussing work related stuff on the internet.

shivajikobardan · Jan 10, 2022

jedishrfu said:

Hmm so we are giving you helpful suggestions here and you have an example but don't want to share until you hear someone else’s example first.

Thats not being very open. I would have provided my example which would get me even more comments but now I guess Ill just wait and see what happens.

If your example is proprietary to your work then I understand but must also say you should not be discussing work related stuff on the internet.

LOL what are you saying, why wouldn't I share it? It is here
Imagine you have a large barrel (file). In it, there is one tennis ball (chunk). Then, reach in blindfolded and grab the tennis ball (read file), Ok. Now put the ball back and get nine friends to join you. Then, have everyone grab the ball. There WILL be contention (hotspot). Now put 100 tennis balls into the barrel and you and your friends try to grab a ball. Most of the time, everyone will get a ball. Occasionally, there will be contention (hotspot) but it will be far less frequent.

jedishrfu · Jan 10, 2022

It’s an interesting analogy though it’s unlikely that google chunks data in tennis balls. In filesystems or databases contention occurs when trying to update a specific resource. Locks are used to insure only one client may write to that resource.

It may be that Google logs some information as each client tries to read a given chunk which causes other clients to wait on that chunk. It may be that the web service that handles the reads has serialized the client requests which appears to the client as a wait. I’ve seen that in some web services but wouldn’t expect it in a Google service.

I’ve found this writeup on how it works so maybe you can find your answer there:

https://computer.howstuffworks.com/internet/basics/google-file-system.htm

and here’s a stackoverflow discussion on GFS hotspots

https://stackoverflow.com/questions...es-create-hot-spots-in-the-google-file-system

shivajikobardan · Jan 10, 2022

jedishrfu said:

It’s an interesting analogy though it’s unlikely that google chunks data in tennis balls. In filesystems or databases contention occurs when trying to update a specific resource. Locks are used to insure only one client may write to that resource.

It may be that Google logs some information as each client tries to read a given chunk which causes other clients to wait on that chunk. It may be that the web service that handles the reads has serialized the client requests which appears to the client as a wait. I’ve seen that in some web services but wouldn’t expect it in a Google service.

I’ve found this writeup on how it works so maybe you can find your answer there:

https://computer.howstuffworks.com/internet/basics/google-file-system.htm

and here’s a stackoverflow discussion on GFS hotspots

https://stackoverflow.com/questions...es-create-hot-spots-in-the-google-file-system

Hmm I didn't make it, someone from another forum did it. It clicked with my brain immediately.

Comp Sci Why Are Hotspots Rare in GFS with Sequential Reads of Large Multi-Chunk Files?

Similar threads

Hot Threads

Engineering Why is my output current so low in this Transconductance Amplifier cell?

LTspice: Implementing a Single Balanced BJT Mixer

Engineering Diff gain of a push pull degenerated differential pair

Engineering AGMA pitting resistance factor of safety (SH)

PLL - How to find all the gains of a PI corrector and fix Ki ? MATLAB

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers