Connection Layer API¶
All of the classes responsible for handling the connection to the Elasticsearch
cluster. The default subclasses used can be overridden by passing parameters to the
Elasticsearch
class. All of the arguments to the client
will be passed on to Transport
,
ConnectionPool
and Connection
.
For example if you wanted to use your own implementation of the
ConnectionSelector
class you can just pass in the
selector_class
parameter.
Note
ConnectionPool
and related options (like
selector_class
) will only be used if more than one connection is defined.
Either directly or via the Sniffing mechanism.
Note
Known binary format mimetypes like application/mapbox-vector-tile
will return
the response body as bytes
instead of the usually UTF-8 encoded text.
Transport¶
- class elasticsearch.Transport(hosts, connection_class=Urllib3HttpConnection, connection_pool_class=ConnectionPool, host_info_callback=construct_hosts_list, sniff_on_start=False, sniffer_timeout=None, sniff_on_connection_fail=False, serializer=JSONSerializer(), max_retries=3, **kwargs)¶
Encapsulation of transport-related to logic. Handles instantiation of the individual connections as well as creating a connection pool to hold them.
Main interface is the perform_request method.
- Parameters
hosts – list of dictionaries, each containing keyword arguments to create a connection_class instance
connection_class – subclass of
Connection
to useconnection_pool_class – subclass of
ConnectionPool
to usehost_info_callback – callback responsible for taking the node information from /_cluster/nodes, along with already extracted information, and producing a list of arguments (same as hosts parameter)
sniff_on_start – flag indicating whether to obtain a list of nodes from the cluster at startup time
sniffer_timeout – number of seconds between automatic sniffs
sniff_on_connection_fail – flag controlling if connection failure triggers a sniff
sniff_timeout – timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if
sniff_on_start
is on) when the connection still isn’t initialized.serializer – serializer instance
serializers – optional dict of serializer instances that will be used for deserializing data coming from the server. (key is the mimetype)
default_mimetype – when no mimetype is specified by the server response assume this mimetype, defaults to ‘application/json’
max_retries – maximum number of retries before an exception is propagated
retry_on_status – set of HTTP status codes on which we should retry on a different node. defaults to
(502, 503, 504)
retry_on_timeout – should timeout trigger a retry on different node? (default False)
send_get_body_as – for GET requests with body this option allows you to specify an alternate way of execution for environments that don’t support passing bodies with GET requests. If you set this to ‘POST’ a POST method will be used instead, if to ‘source’ then the body will be serialized and passed as a query parameter source.
meta_header – If True will send the ‘X-Elastic-Client-Meta’ HTTP header containing simple client metadata. Setting to False will disable the header. Defaults to True.
Any extra keyword arguments will be passed to the connection_class when creating and instance unless overridden by that connection’s options provided as part of the hosts parameter.
- DEFAULT_CONNECTION_CLASS¶
alias of
elasticsearch.connection.http_urllib3.Urllib3HttpConnection
- add_connection(host)¶
Create a new
Connection
instance and add it to the pool.- Parameters
host – kwargs that will be used to create the instance
- close()¶
Explicitly closes connections
- get_connection()¶
Retrieve a
Connection
instance from theConnectionPool
instance.
- mark_dead(connection)¶
Mark a connection as dead (failed) in the connection pool. If sniffing on failure is enabled this will initiate the sniffing process.
- Parameters
connection – instance of
Connection
that failed
- perform_request(method, url, headers=None, params=None, body=None)¶
Perform the actual request. Retrieve a connection from the connection pool, pass all the information to it’s perform_request method and return the data.
If an exception was raised, mark the connection as failed and retry (up to max_retries times).
If the operation was successful and the connection used was previously marked as dead, mark it as live, resetting it’s failure count.
- Parameters
method – HTTP method to use
url – absolute url (without host) to target
headers – dictionary of headers, will be handed over to the underlying
Connection
classparams – dictionary of query parameters, will be handed over to the underlying
Connection
class for serializationbody – body of the request, will be serialized using serializer and passed to the connection
- set_connections(hosts)¶
Instantiate all the connections and create new connection pool to hold them. Tries to identify unchanged hosts and re-use existing
Connection
instances.- Parameters
hosts – same as __init__
- sniff_hosts(initial=False)¶
Obtain a list of nodes from the cluster and create a new connection pool using the information retrieved.
To extract the node connection parameters use the
nodes_to_host_callback
.- Parameters
initial – flag indicating if this is during startup (
sniff_on_start
), ignore thesniff_timeout
ifTrue
Connection Pool¶
- class elasticsearch.ConnectionPool(connections, dead_timeout=60, selector_class=RoundRobinSelector, randomize_hosts=True, **kwargs)¶
Container holding the
Connection
instances, managing the selection process (via aConnectionSelector
) and dead connections.It’s only interactions are with the
Transport
class that drives all the actions within ConnectionPool.Initially connections are stored on the class as a list and, along with the connection options, get passed to the ConnectionSelector instance for future reference.
Upon each request the Transport will ask for a Connection via the get_connection method. If the connection fails (it’s perform_request raises a ConnectionError) it will be marked as dead (via mark_dead) and put on a timeout (if it fails N times in a row the timeout is exponentially longer - the formula is default_timeout * 2 ** (fail_count - 1)). When the timeout is over the connection will be resurrected and returned to the live pool. A connection that has been previously marked as dead and succeeds will be marked as live (its fail count will be deleted).
- Parameters
connections – list of tuples containing the
Connection
instance and it’s optionsdead_timeout – number of seconds a connection should be retired for after a failure, increases on consecutive failures
timeout_cutoff – number of consecutive failures after which the timeout doesn’t increase
selector_class –
ConnectionSelector
subclass to use if more than one connection is liverandomize_hosts – shuffle the list of connections upon arrival to avoid dog piling effect across processes
- close()¶
Explicitly closes connections
- get_connection()¶
Return a connection from the pool using the ConnectionSelector instance.
It tries to resurrect eligible connections, forces a resurrection when no connections are available and passes the list of live connections to the selector instance to choose from.
Returns a connection instance and it’s current fail count.
- mark_dead(connection, now=None)¶
Mark the connection as dead (failed). Remove it from the live pool and put it on a timeout.
- Parameters
connection – the failed instance
- mark_live(connection)¶
Mark connection as healthy after a resurrection. Resets the fail counter for the connection.
- Parameters
connection – the connection to redeem
- resurrect(force=False)¶
Attempt to resurrect a connection from the dead pool. It will try to locate one (not all) eligible (it’s timeout is over) connection to return to the live pool. Any resurrected connection is also returned.
- Parameters
force – resurrect a connection even if there is none eligible (used when we have no live connections). If force is specified resurrect always returns a connection.
Connection Selector¶
- class elasticsearch.ConnectionSelector(opts)¶
Simple class used to select a connection from a list of currently live connection instances. In init time it is passed a dictionary containing all the connections’ options which it can then use during the selection process. When the select method is called it is given a list of currently live connections to choose from.
The options dictionary is the one that has been passed to
Transport
as hosts param and the same that is used to construct the Connection object itself. When the Connection was created from information retrieved from the cluster via the sniffing process it will be the dictionary returned by the host_info_callback.Example of where this would be useful is a zone-aware selector that would only select connections from it’s own zones and only fall back to other connections where there would be none in it’s zones.
- Parameters
opts – dictionary of connection instances and their options
- select(connections)¶
Select a connection from the given list.
- Parameters
connections – list of live connections to choose from
Urllib3HttpConnection (default connection_class)¶
If you have complex SSL logic for connecting to Elasticsearch using an SSLContext object might be more helpful. You can create one natively using the python SSL library with the create_default_context (https://docs.python.org/3/library/ssl.html#ssl.create_default_context) method.
To create an SSLContext object you only need to use one of cafile, capath or cadata:
>>> from ssl import create_default_context
>>> context = create_default_context(cafile=None, capath=None, cadata=None)
cafile is the path to your CA File
capath is the directory of a collection of CA’s
cadata is either an ASCII string of one or more PEM-encoded certificates or a bytes-like object of DER-encoded certificates.
Please note that the use of SSLContext is only available for urllib3.
- class elasticsearch.Urllib3HttpConnection(host='localhost', port=None, http_auth=None, use_ssl=False, verify_certs=<object object>, ssl_show_warn=<object object>, ca_certs=None, client_cert=None, client_key=None, ssl_version=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, maxsize=10, headers=None, ssl_context=None, http_compress=None, cloud_id=None, api_key=None, opaque_id=None, **kwargs)¶
Default connection class using the urllib3 library and the http protocol.
- Parameters
host – hostname of the node (default: localhost)
port – port to use (integer, default: 9200)
url_prefix – optional url prefix for elasticsearch
timeout – default timeout in seconds (float, default: 10)
http_auth – optional http auth information as either ‘:’ separated string or a tuple
use_ssl – use ssl for the connection if True
verify_certs – whether to verify SSL certificates
ssl_show_warn – show warning when verify certs is disabled
ca_certs – optional path to CA bundle. See https://urllib3.readthedocs.io/en/latest/security.html#using-certifi-with-urllib3 for instructions how to get default set
client_cert – path to the file containing the private key and the certificate, or cert only if using client_key
client_key – path to the file containing the private key if using separate cert and key files (client_cert will contain only the cert)
ssl_version – version of the SSL protocol to use. Choices are: SSLv23 (default) SSLv2 SSLv3 TLSv1 (see
PROTOCOL_*
constants in thessl
module for exact options for your environment).ssl_assert_hostname – use hostname verification if not False
ssl_assert_fingerprint – verify the supplied certificate fingerprint if not None
maxsize – the number of connections which will be kept open to this host. See https://urllib3.readthedocs.io/en/1.4/pools.html#api for more information.
headers – any custom http headers to be add to requests
http_compress – Use gzip compression
cloud_id – The Cloud ID from ElasticCloud. Convenient way to connect to cloud instances. Other host connection params will be ignored.
api_key – optional API Key authentication as either base64 encoded string or a tuple.
opaque_id – Send this value in the ‘X-Opaque-Id’ HTTP header For tracing all requests made by this transport.
- close()¶
Explicitly closes connection
API Compatibility HTTP Header¶
The Python client can be configured to emit an HTTP header
Accept: application/vnd.elasticsearch+json; compatible-with=7
which signals to Elasticsearch that the client is requesting
7.x
version of request and response bodies. This allows for
upgrading from 7.x to 8.x version of Elasticsearch without upgrading
everything at once. Elasticsearch should be upgraded first after
the compatibility header is configured and clients should be upgraded
second.
from elasticsearch import Elasticsearch client = Elasticsearch("http://...", headers={"accept": "application/vnd.elasticsearch+json; compatible-with=7"})
If you’d like to have the client emit the header without configuring headers
you
can use the environment variable ELASTIC_CLIENT_APIVERSIONING=1
.