Pointers, Ram, Hard drives and Databases

In summary, databases provide a convenient way to store and organize information, but they may not be the fastest when it comes to retrieving and writing data. Alternatives such as serializing objects or using a client-server model with shared memory can be useful for managing large amounts of data between programs. COM and CORBA are binary serialization technologies, while web services use text serialization and offer platform independence. Modern databases are efficient at caching and transferring data, but careful query design is necessary for optimal performance.
  • #1
John Creighto
495
2
A long time ago, I was once told that too much reliance on databases in programing can create slow programs. Databases provide a nice way to store and organize information but they may not retrieve and write information quickly. Clearly ram is the fastest way to access meomry but the size of ram is limited and their is overhead associated with transferring information between ram and the hard drive.

One way to store information is to serialize it. For instance you can serialize objects. You often have the option to select between several formats and I presume that one format represents the way the information is stored in Ram. So my first question is with regards to pointers. For programing languages that can use pointers, does the pointer care if the pointer points to ram or non volatile memory or in order to treat non volatile memory as ram is it necessary to deal with virtual memory? This could be handy if you have a large object that would take considerable time to load (unserialize) into memory.

I'm thinking if you have a data source, that is a binary file and you want to access that information though several sources (process, threads, separate programing languages)...a client server model might be good (not I'm not sure if this would be slow though), the server decides to keep the object on the hard disk or load into ram depending on the size of the object, the amount of free memory available and the demand for that object.

I've scene some bridge programs between languages that use client sever models (com connect for instance) and they seem to pass strings around to tell the object what method to use. This strikes me as not the most efficient way to do things. I'm wondering about dynamic link libraries, are these accessible from multiple sources? I need to do more research, but I'd be interested in any comments people might have for managing large amounts of data between programs.

Perhaps a good idea is to use a model like is use in com connect for programing languages that don't support pointers but give the option to get a pointer for programing languages that do. So for instance, on the java side of com connect, you can use the client server model but on the com side (visual basic, C, etc...) you can get a pointer and treat it as a native object.
 
Technology news on Phys.org
  • #2
John Creighto said:
Databases provide a nice way to store and organize information but they may not retrieve and write information quickly.

To a user, a database is just an interface for storing and retrieving data. That data could be stored on a disk, or it could be stored in RAM.

For programing languages that can use pointers, does the pointer care if the pointer points to ram or non volatile memory or in order to treat non volatile memory as ram is it necessary to deal with virtual memory?

Pointers are (literally!) just addresses. They're just numbers. What the addresses actually mean is arbitrary. Typically, the addresses are in some very large virtual memory space. Pages of memory that are not often used will eventually be sent to the disk, while pages that are used frequently will remain in RAM. The program has no direct knowledge of where exactly each page of memory currently exists.

I'm thinking if you have a data source, that is a binary file and you want to access that information though several sources (process, threads, separate programing languages)...a client server model might be good (not I'm not sure if this would be slow though), the server decides to keep the object on the hard disk or load into ram depending on the size of the object, the amount of free memory available and the demand for that object.

All modern operating systems include a facility called "memory mapping," which maps a range of addresses in the program's virtual address space to a file. If you read from those addresses, you'll get data from the file. It is up to the operating system to determine whether to load the data into RAM all at once, or to read it from the disk in chunks as necessary.

I need to do more research, but I'd be interested in any comments people might have for managing large amounts of data between programs.

If you're trying to share large amounts of memory between two programs running on the same computer, you should note that all modern operating systems provide mechanisms for shared memory. These shared memory segments can be mapped into the virtual address space of multiple programs simultaneously. Two or more programs can read or write to the shared memory exactly as if it were normal, private memory. (But you should include some thread-safety mechanisms, like mutexes, to make sure your programs won't step on each other's toes.)

If you're trying to share large amounts of memory between programs running on separate computers, use MPI or some other multi-processing library.

- Warren
 
  • #3
Since you menion COM, I assume that by client-server communication, you mean the ability to call a subroutine across a network. There are 2 basic approaches to this--binary serialization and text serialization.

COM (Microsoft) and CORBA (UNIX/Linux) are binary serialization technologies. Each is also operating-system specific, i.e., both client and server must have exactly compatible OS and compilers.

So-called "web services" is an OS-independent way of calling across a network, where the serialization of the call and return information is text. This approach has worse performance but it can be platform and version independent, which can sometimes be very useful.

Modern database such as SQL Server, MySQL and Oracle are very efficient at caching information in memory and moving it efficiently across a network, though you do have to be careful about what kind of pre-processing you ask the database to do (i.e., what kind of query you send it).

Hope this helps.
 

FAQ: Pointers, Ram, Hard drives and Databases

What are pointers and how do they work?

Pointers are variables that store memory addresses of other variables. They are used to indirectly access and manipulate data stored in memory. When a pointer is dereferenced, it retrieves the value stored at the memory address it points to.

What is RAM and how is it different from a hard drive?

RAM stands for Random Access Memory, and it is a type of computer memory that is used to temporarily store data and instructions that the CPU needs to access quickly. It is volatile, meaning the data is lost when the computer is turned off. A hard drive, on the other hand, is a non-volatile storage device that permanently stores data and programs.

How does a hard drive store and retrieve data?

A hard drive stores data magnetically on spinning disks called platters. A read/write head hovers just above the surface of the platter and reads and writes data in the form of 1s and 0s. When data is needed, the head moves to the correct location on the platter and retrieves the data. The speed at which the platters spin and the read/write head moves determines the speed of data retrieval.

What is a database and why is it important?

A database is a collection of organized and structured data that can be easily accessed, managed, and updated. It is important because it allows for efficient storage and retrieval of large amounts of data, making it an essential tool for businesses and organizations to organize and analyze their data.

What are the different types of databases?

There are several types of databases, including relational databases, object-oriented databases, and NoSQL databases. Relational databases use tables and relationships to store and organize data, while object-oriented databases store data as objects with attributes and methods. NoSQL databases are non-relational databases that can handle large amounts of unstructured data. Other types include graph databases, document databases, and key-value databases.

Back
Top