Connection Layer API

All of the classes responsible for handling the connection to the Elasticsearch cluster. The default subclasses used can be overridden by passing parameters to the Elasticsearch class. All of the arguments to the client will be passed on to Transport, ConnectionPool and Connection.

For example if you wanted to use your own implementation of the ConnectionSelector class you can just pass in the selector_class parameter.

Note

ConnectionPool and related options (like selector_class) will only be used if more than one connection is defined. Either directly or via the Sniffing mechanism.

Note

Known binary format mimetypes like application/mapbox-vector-tile will return the response body as bytes instead of the usually UTF-8 encoded text.

Transport

class elasticsearch.Transport(hosts, connection_class=Urllib3HttpConnection, connection_pool_class=ConnectionPool, host_info_callback=construct_hosts_list, sniff_on_start=False, sniffer_timeout=None, sniff_on_connection_fail=False, serializer=JSONSerializer(), max_retries=3, **kwargs)

Encapsulation of transport-related to logic. Handles instantiation of the individual connections as well as creating a connection pool to hold them.

Main interface is the perform_request method.

Parameters
  • hosts – list of dictionaries, each containing keyword arguments to create a connection_class instance

  • connection_class – subclass of Connection to use

  • connection_pool_class – subclass of ConnectionPool to use

  • host_info_callback – callback responsible for taking the node information from /_cluster/nodes, along with already extracted information, and producing a list of arguments (same as hosts parameter)

  • sniff_on_start – flag indicating whether to obtain a list of nodes from the cluster at startup time

  • sniffer_timeout – number of seconds between automatic sniffs

  • sniff_on_connection_fail – flag controlling if connection failure triggers a sniff

  • sniff_timeout – timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if sniff_on_start is on) when the connection still isn’t initialized.

  • serializer – serializer instance

  • serializers – optional dict of serializer instances that will be used for deserializing data coming from the server. (key is the mimetype)

  • default_mimetype – when no mimetype is specified by the server response assume this mimetype, defaults to ‘application/json’

  • max_retries – maximum number of retries before an exception is propagated

  • retry_on_status – set of HTTP status codes on which we should retry on a different node. defaults to (502, 503, 504)

  • retry_on_timeout – should timeout trigger a retry on different node? (default False)

  • send_get_body_as – for GET requests with body this option allows you to specify an alternate way of execution for environments that don’t support passing bodies with GET requests. If you set this to ‘POST’ a POST method will be used instead, if to ‘source’ then the body will be serialized and passed as a query parameter source.

  • meta_header – If True will send the ‘X-Elastic-Client-Meta’ HTTP header containing simple client metadata. Setting to False will disable the header. Defaults to True.

Any extra keyword arguments will be passed to the connection_class when creating and instance unless overridden by that connection’s options provided as part of the hosts parameter.

DEFAULT_CONNECTION_CLASS

alias of elasticsearch.connection.http_urllib3.Urllib3HttpConnection

add_connection(host)

Create a new Connection instance and add it to the pool.

Parameters

host – kwargs that will be used to create the instance

close()

Explicitly closes connections

get_connection()

Retrieve a Connection instance from the ConnectionPool instance.

mark_dead(connection)

Mark a connection as dead (failed) in the connection pool. If sniffing on failure is enabled this will initiate the sniffing process.

Parameters

connection – instance of Connection that failed

perform_request(method, url, headers=None, params=None, body=None)

Perform the actual request. Retrieve a connection from the connection pool, pass all the information to it’s perform_request method and return the data.

If an exception was raised, mark the connection as failed and retry (up to max_retries times).

If the operation was successful and the connection used was previously marked as dead, mark it as live, resetting it’s failure count.

Parameters
  • method – HTTP method to use

  • url – absolute url (without host) to target

  • headers – dictionary of headers, will be handed over to the underlying Connection class

  • params – dictionary of query parameters, will be handed over to the underlying Connection class for serialization

  • body – body of the request, will be serialized using serializer and passed to the connection

set_connections(hosts)

Instantiate all the connections and create new connection pool to hold them. Tries to identify unchanged hosts and re-use existing Connection instances.

Parameters

hosts – same as __init__

sniff_hosts(initial=False)

Obtain a list of nodes from the cluster and create a new connection pool using the information retrieved.

To extract the node connection parameters use the nodes_to_host_callback.

Parameters

initial – flag indicating if this is during startup (sniff_on_start), ignore the sniff_timeout if True

Connection Pool

class elasticsearch.ConnectionPool(connections, dead_timeout=60, selector_class=RoundRobinSelector, randomize_hosts=True, **kwargs)

Container holding the Connection instances, managing the selection process (via a ConnectionSelector) and dead connections.

It’s only interactions are with the Transport class that drives all the actions within ConnectionPool.

Initially connections are stored on the class as a list and, along with the connection options, get passed to the ConnectionSelector instance for future reference.

Upon each request the Transport will ask for a Connection via the get_connection method. If the connection fails (it’s perform_request raises a ConnectionError) it will be marked as dead (via mark_dead) and put on a timeout (if it fails N times in a row the timeout is exponentially longer - the formula is default_timeout * 2 ** (fail_count - 1)). When the timeout is over the connection will be resurrected and returned to the live pool. A connection that has been previously marked as dead and succeeds will be marked as live (its fail count will be deleted).

Parameters
  • connections – list of tuples containing the Connection instance and it’s options

  • dead_timeout – number of seconds a connection should be retired for after a failure, increases on consecutive failures

  • timeout_cutoff – number of consecutive failures after which the timeout doesn’t increase

  • selector_classConnectionSelector subclass to use if more than one connection is live

  • randomize_hosts – shuffle the list of connections upon arrival to avoid dog piling effect across processes

close()

Explicitly closes connections

get_connection()

Return a connection from the pool using the ConnectionSelector instance.

It tries to resurrect eligible connections, forces a resurrection when no connections are available and passes the list of live connections to the selector instance to choose from.

Returns a connection instance and it’s current fail count.

mark_dead(connection, now=None)

Mark the connection as dead (failed). Remove it from the live pool and put it on a timeout.

Parameters

connection – the failed instance

mark_live(connection)

Mark connection as healthy after a resurrection. Resets the fail counter for the connection.

Parameters

connection – the connection to redeem

resurrect(force=False)

Attempt to resurrect a connection from the dead pool. It will try to locate one (not all) eligible (it’s timeout is over) connection to return to the live pool. Any resurrected connection is also returned.

Parameters

force – resurrect a connection even if there is none eligible (used when we have no live connections). If force is specified resurrect always returns a connection.

Connection Selector

class elasticsearch.ConnectionSelector(opts)

Simple class used to select a connection from a list of currently live connection instances. In init time it is passed a dictionary containing all the connections’ options which it can then use during the selection process. When the select method is called it is given a list of currently live connections to choose from.

The options dictionary is the one that has been passed to Transport as hosts param and the same that is used to construct the Connection object itself. When the Connection was created from information retrieved from the cluster via the sniffing process it will be the dictionary returned by the host_info_callback.

Example of where this would be useful is a zone-aware selector that would only select connections from it’s own zones and only fall back to other connections where there would be none in it’s zones.

Parameters

opts – dictionary of connection instances and their options

select(connections)

Select a connection from the given list.

Parameters

connections – list of live connections to choose from

Urllib3HttpConnection (default connection_class)

If you have complex SSL logic for connecting to Elasticsearch using an SSLContext object might be more helpful. You can create one natively using the python SSL library with the create_default_context (https://docs.python.org/3/library/ssl.html#ssl.create_default_context) method.

To create an SSLContext object you only need to use one of cafile, capath or cadata:

>>> from ssl import create_default_context
>>> context = create_default_context(cafile=None, capath=None, cadata=None)
  • cafile is the path to your CA File

  • capath is the directory of a collection of CA’s

  • cadata is either an ASCII string of one or more PEM-encoded certificates or a bytes-like object of DER-encoded certificates.

Please note that the use of SSLContext is only available for urllib3.

class elasticsearch.Urllib3HttpConnection(host='localhost', port=None, http_auth=None, use_ssl=False, verify_certs=<object object>, ssl_show_warn=<object object>, ca_certs=None, client_cert=None, client_key=None, ssl_version=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, maxsize=10, headers=None, ssl_context=None, http_compress=None, cloud_id=None, api_key=None, opaque_id=None, **kwargs)

Default connection class using the urllib3 library and the http protocol.

Parameters
  • host – hostname of the node (default: localhost)

  • port – port to use (integer, default: 9200)

  • url_prefix – optional url prefix for elasticsearch

  • timeout – default timeout in seconds (float, default: 10)

  • http_auth – optional http auth information as either ‘:’ separated string or a tuple

  • use_ssl – use ssl for the connection if True

  • verify_certs – whether to verify SSL certificates

  • ssl_show_warn – show warning when verify certs is disabled

  • ca_certs – optional path to CA bundle. See https://urllib3.readthedocs.io/en/latest/security.html#using-certifi-with-urllib3 for instructions how to get default set

  • client_cert – path to the file containing the private key and the certificate, or cert only if using client_key

  • client_key – path to the file containing the private key if using separate cert and key files (client_cert will contain only the cert)

  • ssl_version – version of the SSL protocol to use. Choices are: SSLv23 (default) SSLv2 SSLv3 TLSv1 (see PROTOCOL_* constants in the ssl module for exact options for your environment).

  • ssl_assert_hostname – use hostname verification if not False

  • ssl_assert_fingerprint – verify the supplied certificate fingerprint if not None

  • maxsize – the number of connections which will be kept open to this host. See https://urllib3.readthedocs.io/en/1.4/pools.html#api for more information.

  • headers – any custom http headers to be add to requests

  • http_compress – Use gzip compression

  • cloud_id – The Cloud ID from ElasticCloud. Convenient way to connect to cloud instances. Other host connection params will be ignored.

  • api_key – optional API Key authentication as either base64 encoded string or a tuple.

  • opaque_id – Send this value in the ‘X-Opaque-Id’ HTTP header For tracing all requests made by this transport.

close()

Explicitly closes connection

API Compatibility HTTP Header

The Python client can be configured to emit an HTTP header Accept: application/vnd.elasticsearch+json; compatible-with=7 which signals to Elasticsearch that the client is requesting 7.x version of request and response bodies. This allows for upgrading from 7.x to 8.x version of Elasticsearch without upgrading everything at once. Elasticsearch should be upgraded first after the compatibility header is configured and clients should be upgraded second.

from elasticsearch import Elasticsearch

client = Elasticsearch("http://...", headers={"accept": "application/vnd.elasticsearch+json; compatible-with=7"})

If you’d like to have the client emit the header without configuring headers you can use the environment variable ELASTIC_CLIENT_APIVERSIONING=1.