[Хд] logo

What is a hash table and how it works

Hash table (or hash map) is a special data structure for storing pairs of keys and values. In fact, it is an associative array, where the key is represented as a hash function. Hash table

Perhaps the most important property of hash table is that all three operations insert, search and delete have the average run of O(1), and the average seek time is also O(1) or O(n) in the worst case.

Simple representation of hash tables

To understand what a hash table is, imagine that you were asked to create a library and fill it with books. But you do not want to fill the shelves at random.

The first thing that comes to mind is to put all the books in alphabetical order and write down everything in a reference book. In this case, you won't search a needed book through the entire library, but only through the reference-book.

And there's even more convenient way. If from the beginning arrange books by title or author's name, it is best to use a hash algorithm, which processes the incoming value and outputs cabinets and shelves numbers for the desired book.

Knowing this hashing algorithm, you will quickly find the desired book by its title.

Note that the hash function must follow these properties:

  • Always return the same address for the same key;
  • Not necessarily returns different addresses for different keys;
  • Use all the address space with equal probability;
  • Calculate adress fast.

Collision resolution

In the ideal case, when you know all the key-value pairs in advance, it is fairly easy to implement a perfect hash table in which the search time is constant (using a perfect hash function, which determines the position in the table for integer values without collisions).

But in most cases, it is necessary to deal with collisions. Usually applied methods of separate chaining and open addressing.

Separate chaining with linked lists

This method is often called the open hashing. Its essence is simple — elements with the same hash fall into one cell in a linked list. Hash chaining

That is if the cell is already occupied, but a new key differs from the existing, then the new element is inserted in the list in the form of key-value pairs.

In separate chaining method the insertion of a new element occurs in O(1) and the seek time depends on the length of the list, and in the worst case is O(n). If there's n keys, that are distributed on m cells, the ratio n/m is the filling factor.

In C ++ chains method is implemented as follows:

class LinkedHashEntry {
private:
      int key;
      int value;
      LinkedHashEntry *next;
public:
      LinkedHashEntry(int key, int value) {
            this->key = key;
            this->value = value;
            this->next = NULL;
      }
 
      int getKey() {
            return key;
      }
 
      int getValue() {
            return value;
      }
 
      void setValue(int value) {
            this->value = value;
      }
 
      LinkedHashEntry *getNext() {
            return next;
      }
 
      void setNext(LinkedHashEntry *next) {
            this->next = next;
      }
};

# Checking the cell and creating a list

Open adressing

The second common method is an open indexing. This means that the key-value pairs are stored directly in the hash table. And insert algorithm checks the cells in a certain order, until an empty cell is reached. The procedure is calculated on the fly. Open addressing linear probing

The simplest implementation is a linear probing. It's simple — in the event of a collision, these cells are checked linearly until an empty cell is reached.

A search algorithm seeks for a cell in the same order as that of the insert until it finds a needed element or a blank cell, which says that the key is missing. If the table is full, it will have to expand dynamically.

Linear probing in C++ looks like this:

class HashEntry {
private:
      int key;
      int value;
public:
      HashEntry(int key, int value) {
            this->key = key;
            this->value = value;
      }
 
      int getKey() {
            return key;
      }
 
      int getValue() {
            return value;
      }
 
      void setValue(int value) {
            this->value = value;
      }
};

# Check cells and insert values

The most important

Hashing and hash tables are used for easy storing of key-value pairs. If you want maximum efficiency, use a hash table with chained lists, as they will be much faster than a regular table.

  читать на русском
[Хд]

Sign up to read high quality stuff on advanced development

Google Email

Esc for later