Contents

bibtexparser: API

bibtexparser — Parsing and writing BibTeX files

BibTeX is a bibliographic data file format.

The bibtexparser module can parse BibTeX files and write them. The API is similar to the json module. The parsed data is returned as a simple BibDatabase object with the main attribute being entries representing bibliographic sources such as books and journal articles.

The following functions provide a quick and basic way to manipulate a BibTeX file. More advanced features are also available in this module.

Parsing a file is as simple as:

import bibtexparser
with open('bibtex.bib') as bibtex_file:
   bibtex_database = bibtexparser.load(bibtex_file)

And writing:

import bibtexparser
with open('bibtex.bib', 'w') as bibtex_file:
    bibtexparser.dump(bibtex_database, bibtex_file)
bibtexparser.dump(bib_database, bibtex_file, writer=None)[source]

Dump BibDatabase object as a BibTeX text file

Parameters
  • bib_database (BibDatabase) – bibliographic database object

  • bibtex_file (file) – file to write to

  • writer (BibTexWriter) – custom writer to use (optional) (not yet implemented)

Example:

import bibtexparser
with open('bibtex.bib', 'w') as bibtex_file:
    bibtexparser.dump(bibtex_database, bibtex_file)
bibtexparser.dumps(bib_database, writer=None)[source]

Dump BibDatabase object to a BibTeX string

Parameters
  • bib_database (BibDatabase) – bibliographic database object

  • writer (BibTexWriter) – custom writer to use (optional) (not yet implemented)

Returns

BibTeX string

Return type

unicode

bibtexparser.load(bibtex_file, parser=None)[source]

Load BibDatabase object from a file

Parameters
  • bibtex_file (file) – input file to be parsed

  • parser (BibTexParser) – custom parser to use (optional)

Returns

bibliographic database object

Return type

BibDatabase

Example:

import bibtexparser
with open('bibtex.bib') as bibtex_file:
   bibtex_database = bibtexparser.load(bibtex_file)
bibtexparser.loads(bibtex_str, parser=None)[source]

Load BibDatabase object from a string

Parameters
  • bibtex_str (str or unicode) – input BibTeX string to be parsed

  • parser (BibTexParser) – custom parser to use (optional)

Returns

bibliographic database object

Return type

BibDatabase

bibtexparser.bibdatabase — The bibliographic database object

class bibtexparser.bibdatabase.BibDatabase[source]

Bibliographic database object that follows the data structure of a BibTeX file.

comments

List of BibTeX comment (@comment{…}) blocks.

entries

List of BibTeX entries, for example @book{…}, @article{…}, etc. Each entry is a simple dict with BibTeX field-value pairs, for example ‘author’: ‘Bird, R.B. and Armstrong, R.C. and Hassager, O.’ Each entry will always have the following dict keys (in addition to other BibTeX fields):

  • ID (BibTeX key)

  • ENTRYTYPE (entry type in lowercase, e.g. book, article etc.)

property entries_dict

Return a dictionary of BibTeX entries. The dict key is the BibTeX entry key

preambles

List of BibTeX preamble (@preamble{…}) blocks.

strings

OrderedDict of BibTeX string definitions (@string{…}). In order of definition.

bibtexparser.bparser — Tune the default parser

class bibtexparser.bparser.BibTexParser(data=None, **args)[source]

A parser for reading BibTeX bibliographic data files.

Example:

from bibtexparser.bparser import BibTexParser

bibtex_str = ...

parser = BibTexParser()
parser.ignore_nonstandard_types = False
parser.homogenize_fields = False
parser.common_strings = False
bib_database = bibtexparser.loads(bibtex_str, parser)
Parameters
  • customization – function or None (default) Customization to apply to parsed entries.

  • ignore_nonstandard_types – bool (default True) If True ignores non-standard bibtex entry types.

  • homogenize_fields – bool (default False) Common field name replacements (as set in alt_dict attribute).

  • interpolate_strings – bool (default True) If True, replace bibtex string by their value, else uses BibDataString objects.

  • common_strings – bool (default False) Include common string definitions (e.g. month abbreviations) to the bibtex file.

  • add_missing_from_crossref – bool (default False) Resolve BibTeX references set in the crossref field for BibTeX entries and add the fields from the referenced entry to the referencing entry.

common_strings

Load common strings such as months abbreviation Default: False.

customization

Callback function to process BibTeX entries after parsing, for example to create a list from a string with multiple values. By default all BibTeX values are treated as simple strings. Default: None.

homogenize_fields

Sanitize BibTeX field names, for example change url to link etc. Field names are always converted to lowercase names. Default: False.

ignore_nonstandard_types

Ignore non-standard BibTeX types (book, article, etc). Default: True.

interpolate_strings

Interpolate Bibtex Strings or keep the structure

parse(bibtex_str, partial=False)[source]

Parse a BibTeX string into an object

Parameters
  • bibtex_str – BibTeX string

  • partial – If True, print errors only on parsing failures. If False, an exception is raised.

Type

str or unicode

Type

boolean

Returns

bibliographic database

Return type

BibDatabase

parse_file(file, partial=False)[source]

Parse a BibTeX file into an object

Parameters
  • file – BibTeX file or file-like object

  • partial – If True, print errors only on parsing failures. If False, an exception is raised.

Type

file

Type

boolean

Returns

bibliographic database

Return type

BibDatabase

bibtexparser.customization — Functions to customize records

A set of functions useful for customizing bibtex fields. You can find inspiration from these functions to design yours. Each of them takes a record and return the modified record.

bibtexparser.customization.add_plaintext_fields(record)[source]

For each field in the record, add a plain_ field containing the plaintext, stripped from braces and similar. See https://github.com/sciunto-org/python-bibtexparser/issues/116.

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.author(record)[source]

Split author field into a list of “Name, Surname”.

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.convert_to_unicode(record)[source]

Convert accent from latex to unicode style.

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.doi(record)[source]
Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.editor(record)[source]

Turn the editor field into a dict composed of the original editor name and a editor id (without coma or blank).

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.getnames(names)[source]

Convert people names as surname, firstnames or surname, initials.

Parameters

names (list) – a list of names

Returns

list – Correctly formated names

Note

This function is known to be too simple to handle properly the complex rules. We would like to enhance this in forthcoming releases.

bibtexparser.customization.homogenize_latex_encoding(record)[source]

Homogenize the latex enconding style for bibtex

This function is experimental.

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.journal(record)[source]

Turn the journal field into a dict composed of the original journal name and a journal id (without coma or blank).

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.keyword(record, sep=',|;')[source]

Split keyword field into a list.

Parameters
  • record (string, optional) – the record.

  • sep – pattern used for the splitting regexp.

Returns

dict – the modified record.

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.page_double_hyphen(record)[source]

Separate pages by a double hyphen (–).

Parameters

record (dict) – the record.

Returns

dict – the modified record.

bibtexparser.customization.splitname(name, strict_mode=True)[source]

Break a name into its constituent parts: First, von, Last, and Jr.

Parameters
  • name (string) – a string containing a single name

  • strict_mode (Boolean) – whether to use strict mode

Returns

dictionary of constituent parts

Raises

customization.InvalidName – If an invalid name is given and strict_mode = True.

In BibTeX, a name can be represented in any of three forms:
  • First von Last

  • von Last, First

  • von Last, Jr, First

This function attempts to split a given name into its four parts. The returned dictionary has keys of first, last, von and jr. Each value is a list of the words making up that part; this may be an empty list. If the input has no non-whitespace characters, a blank dictionary is returned.

It is capable of detecting some errors with the input name. If the strict_mode parameter is True, which is the default, this results in a customization.InvalidName exception being raised. If it is False, the function continues, working around the error as best it can. The errors that can be detected are listed below along with the handling for non-strict mode:

  • Name finishes with a trailing comma: delete the comma

  • Too many parts (e.g., von Last, Jr, First, Error): merge extra parts into First

  • Unterminated opening brace: add closing brace to end of input

  • Unmatched closing brace: add opening brace at start of word

bibtexparser.customization.type(record)[source]

Put the type into lower case.

Parameters

record (dict) – the record.

Returns

dict – the modified record.

Exception classes

class bibtexparser.customization.InvalidName[source]

Exception raised by customization.splitname() when an invalid name is input.

bibtexparser.bwriter — Tune the default writer

class bibtexparser.bwriter.BibTexWriter(write_common_strings=False)[source]

Writer to convert a BibDatabase object to a string or file formatted as a BibTeX file.

Example:

from bibtexparser.bwriter import BibTexWriter

bib_database = ...

writer = BibTexWriter()
writer.contents = ['comments', 'entries']
writer.indent = '  '
writer.order_entries_by = ('ENTRYTYPE', 'author', 'year')
bibtex_str = bibtexparser.dumps(bib_database, writer)
add_trailing_comma

BibTeX syntax allows the comma to be optional at the end of the last field in an entry. Use this to enable writing this last comma in the bwriter output. Defaults: False.

comma_first

BibTeX syntax allows comma first syntax (common in functional languages), use this to enable comma first syntax as the bwriter output

common_strings

Whether common strings are written

contents

List of BibTeX elements to write, valid values are entries, comments, preambles, strings.

display_order

Tuple of fields for display order in a single BibTeX entry. Fields not listed here will be displayed alphabetically at the end. Set to ‘[]’ for alphabetical order. Default: ‘[]’

entry_separator

Characters(s) for separating BibTeX entries. Default: new line.

indent

Character(s) for indenting BibTeX field-value pairs. Default: single space.

order_entries_by

Tuple of fields for ordering BibTeX entries. Set to None to disable sorting. Default: BibTeX key (‘ID’, ).

write(bib_database)[source]

Converts a bibliographic database to a BibTeX-formatted string.

Parameters

bib_database (BibDatabase) – bibliographic database to be converted to a BibTeX string

Returns

BibTeX-formatted string

Return type

str or unicode

bibtexparser.bibtexexpression — Parser’s core relying on pyparsing

class bibtexparser.bibtexexpression.BibtexExpression[source]

Gives access to pyparsing expressions.

Attributes are pyparsing expressions for the following elements:

  • main_expression: the bibtex file

  • string_def: a string definition

  • preamble_decl: a preamble declaration

  • explicit_comment: an explicit comment

  • entry: an entry definition

  • implicit_comment: an implicit comment

exception ParseException(pstr: str, loc: int = 0, msg: Optional[str] = None, elem=None)

Exception thrown when a parse expression doesn’t match the input string

Example:

try:
    Word(nums).set_name("integer").parse_string("ABC")
except ParseException as pe:
    print(pe)
    print("column: {}".format(pe.column))

prints:

Expected integer (at char 0), (line:1, col:1)
 column: 1
add_log_function(log_fun)[source]

Add notice to logger on entry, comment, preamble, string definitions.

Parameters

log_fun – logger function

set_string_expression_parse_action(fun)[source]

Set the parseAction for string_expression expression.

Note

See set_string_name_parse_action.

set_string_name_parse_action(fun)[source]

Set the parseAction for string name expression.

Note

For some reason pyparsing duplicates the string_name expression so setting its parseAction a posteriori has no effect in the context of a string expression. This is why this function should be used instead.

bibtexparser.bibtexexpression.add_logger_parse_action(expr, log_func)[source]

Register a callback on expression parsing with the adequate message.

bibtexparser.bibtexexpression.field_to_pair(string_, location, token)[source]

Looks for parsed element named ‘Field’.

Returns

(name, value).

bibtexparser.bibtexexpression.in_braces_or_pars(exp)[source]

exp -> (exp)|{exp}

bibtexparser.bibtexexpression.strip_after_new_lines(s)[source]

Removes leading and trailing whitespaces in all but first line.

Parameters

s – string or BibDataStringExpression