Package epydoc :: Module docparser
[hide private]
[frames] | no frames]

Module docparser

source code

Extract API documentation about python objects by parsing their source code.

The function parse_docs(), which provides the main interface of this module, reads and parses the Python source code for a module, and uses it to create an APIDoc object containing the API documentation for the variables and values defined in that modules.

Currently, parse_docs() extracts documentation from the following source code constructions:

parse_docs() does not yet support the following source code constructions:

By default, parse_docs() will expore the contents of top-level try and if blocks. If desired, parse_docs() can also be configured to explore the contents of while and for blocks. (See the configuration constants, below.)


To Do: Make it possible to extend the functionality of parse_docs(), by replacing process_line with a dispatch table that can be customized (similarly to docintrospector.register_introspector()).

Classes [hide private]
  ParseError
An exception that is used to signify that docparser encountered syntactically invalid Python code while processing a Python source file.
Functions [hide private]
    Module parser
ValueDoc
parse_docs(filename=None, name=None, context=None, is_script=False)
Generate the API documentation for a specified object by parsing Python source files, and return it as a ValueDoc.
source code
call graph 
 
_parse_package(package_dir)
If the given directory is a package directory, then parse its __init__.py file (and the __init__.py files of all ancestor packages); and return its ModuleDoc.
source code
call graph 
 
handle_special_module_vars(module_doc) source code
call graph 
 
_module_var_toktree(module_doc, name) source code
call graph 
    Module Lookup
 
_find(name, package_doc=None)
Return the API documentaiton for the object whose name is name.
source code
call graph 
 
_is_submodule_import_var(module_doc, var_name)
Return true if var_name is the name of a variable in module_doc that just contains an imported_from link to a submodule of the same name.
source code
call graph 
 
_find_in_namespace(name, namespace_doc) source code
call graph 
 
_get_filename(identifier, path=None) source code
call graph 
    File tokenization loop
 
process_file(module_doc)
Read the given ModuleDoc's file, and add variables corresponding to any objects defined in that file.
source code
call graph 
 
add_to_group(container, api_doc, group_name) source code
call graph 
 
script_guard(line)
Detect the idiomatic trick if __name__ == "__main__":
source code
call graph 
    Shallow parser
 
shallow_parse(line_toks)
Given a flat list of tokens, return a nested tree structure (called a token tree), whose leaves are identical to the original list, but whose structure reflects the structure implied by the grouping tokens (i.e., parenthases, braces, and brackets).
source code
call graph 
    Line processing
 
process_line(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)
Returns: new-doc, decorator..?
source code
call graph 
 
process_control_flow_line(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding) source code
call graph 
 
process_import(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding) source code
call graph 
 
process_from_import(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding) source code
call graph 
 
_process_fromstar_import(src, parent_docs)
Handle a statement of the form:
source code
call graph 
 
_import_var(name, parent_docs)
Handle a statement of the form:
source code
call graph 
 
_import_var_as(src, name, parent_docs)
Handle a statement of the form:
source code
call graph 
 
_add_import_var(src, name, container)
Add a new imported variable named name to container, with imported_from=src.
source code
call graph 
 
_global_name(name, parent_docs)
If the given name is package-local (relative to the current context, as determined by parent_docs), then convert it to a global name.
source code
call graph 
 
process_assignment(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding) source code
call graph 
 
lhs_is_instvar(lhs_pieces, parent_docs) source code
call graph 
 
rhs_to_valuedoc(rhs, parent_docs) source code
call graph 
 
get_lhs_parent(lhs_name, parent_docs) source code
call graph 
 
process_one_line_block(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)
The line handler for single-line blocks, such as:
source code
call graph 
 
process_multi_stmt(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)
The line handler for semicolon-separated statements, such as:
source code
call graph 
 
process_del(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)
The line handler for delete statements, such as:
source code
call graph 
 
process_docstring(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)
The line handler for bare string literals.
source code
call graph 
 
process_funcdef(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)
The line handler for function declaration lines, such as:
source code
call graph 
 
apply_decorator(decorator_name, func_doc) source code
call graph 
 
init_arglist(func_doc, arglist) source code
call graph 
 
process_classdef(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)
The line handler for class declaration lines, such as:
source code
call graph 
 
_proxy_base(**attribs) source code
call graph 
 
find_base(name, parent_docs) source code
call graph 
    Parsing
 
dotted_names_in(elt_list)
Return a list of all simple dotted names in the given expression.
source code
call graph 
 
parse_name(elt, strip_parens=False)
If the given token tree element is a name token, then return that name as a string.
source code
call graph 
 
parse_dotted_name(elt_list, strip_parens=True, parent_name=None) source code
call graph 
 
split_on(elt_list, split_tok) source code
call graph 
 
parse_funcdef_arg(elt)
If the given tree token element contains a valid function definition argument (i.e., an identifier token or nested list of identifiers), then return a corresponding string identifier or nested list of string identifiers.
source code
call graph 
 
parse_classdef_bases(elt)
If the given tree token element contains a valid base list (that contains only dotted names), then return a corresponding list of DottedNames.
source code
call graph 
 
parse_dotted_name_list(elt_list)
If the given list of tree token elements contains a comma-separated list of dotted names, then return a corresponding list of DottedName objects.
source code
 
parse_string(elt_list) source code
call graph 
 
parse_string_list(elt_list) source code
call graph 
    Variable Manipulation
 
set_variable(namespace, var_doc, preserve_docstring=False)
Add var_doc to namespace.
source code
call graph 
 
del_variable(namespace, name) source code
call graph 
    Name Lookup
VariableDoc or None
lookup_name(identifier, parent_docs)
Find and return the documentation for the variable named by the given identifier.
source code
call graph 
 
lookup_variable(dotted_name, parent_docs) source code
call graph 
 
lookup_value(dotted_name, parent_docs)
Find and return the documentation for the value contained in the variable with the given name in the current namespace.
source code
call graph 
    Docstring Comments
 
add_docstring_from_comments(api_doc, comments) source code
call graph 
    Tree tokens
 
_join_toktree(s1, s2) source code
 
_pp_toktree_add_piece(spacing, pieces, piece) source code
call graph 
 
pp_toktree(elts, spacing='normal', indent=0) source code
call graph 
 
_pp_toktree(elts, spacing, indent, pieces) source code
call graph 
    Helper Functions
 
get_module_encoding(filename) source code
call graph 
 
_get_module_name(filename, package_doc)
Return (dotted_name, is_package)
source code
call graph 
 
flatten(lst, out=None)
Returns: a flat list containing the leaves of the given nested list.
source code
Variables [hide private]
dict _moduledoc_cache = {'/home/edloper/newdata/projects/docutils/d...
A cache of ModuleDocs that we've already created.
    Configuration Constants: Control Flow
  PARSE_TRY_BLOCKS = True
Should the contents of try blocks be examined?
  PARSE_EXCEPT_BLOCKS = True
Should the contents of except blocks be examined?
  PARSE_FINALLY_BLOCKS = True
Should the contents of finally blocks be examined?
  PARSE_IF_BLOCKS = True
Should the contents of if blocks be examined?
  PARSE_ELSE_BLOCKS = True
Should the contents of else and elif blocks be examined?
  PARSE_WHILE_BLOCKS = False
Should the contents of while blocks be examined?
  PARSE_FOR_BLOCKS = False
Should the contents of for blocks be examined?
    Configuration Constants: Imports
  IMPORT_HANDLING = 'link'
What should docparser do when it encounters an import statement?
  IMPORT_STAR_HANDLING = 'parse'
When docparser encounters a 'from m import *' statement, and is unable to parse m (either because IMPORT_HANDLING='link', or because parsing failed), how should it determine the list of identifiers expored by m?
  DEFAULT_DECORATOR_BEHAVIOR = 'transparent'
When DocParse encounters an unknown decorator, what should it do to the documentation of the decorated function?
  BASE_HANDLING = 'parse'
What should docparser do when it encounters a base class that was imported from another module?
    Configuration Constants: Comment docstrings
  COMMENT_DOCSTRING_MARKER = '#:'
The prefix used to mark comments that contain attribute docstrings for variables.
    Configuration Constants: Grouping
  START_GROUP_MARKER = '#{'
The prefix used to mark a comment that starts a group.
  END_GROUP_MARKER = '#}'
The prefix used to mark a comment that ends a group.
    Line processing
  CONTROL_FLOW_KEYWORDS = ['if', 'elif', 'else', 'while', 'for',...
A list of the control flow keywords.
Function Details [hide private]

parse_docs(filename=None, name=None, context=None, is_script=False)

source code 
call graph 

Generate the API documentation for a specified object by parsing Python source files, and return it as a ValueDoc. The object to generate documentation for may be specified using the filename parameter or the name parameter. (It is an error to specify both a filename and a name; or to specify neither a filename nor a name).

Parameters:
  • filename - The name of the file that contains the python source code for a package, module, or script. If filename is specified, then parse will return a ModuleDoc describing its contents.
  • name - The fully-qualified python dotted name of any value (including packages, modules, classes, and functions). parse_docs() will automatically figure out which module(s) it needs to parse in order to find the documentation for the specified object.
  • context - The API documentation for the package that contains filename. If no context is given, then filename is assumed to contain a top-level module or package. It is an error to specify a context if the name argument is used.
Returns: ValueDoc

_find(name, package_doc=None)

source code 
call graph 

Return the API documentaiton for the object whose name is name. package_doc, if specified, is the API documentation for the package containing the named object.

_is_submodule_import_var(module_doc, var_name)

source code 
call graph 

Return true if var_name is the name of a variable in module_doc that just contains an imported_from link to a submodule of the same name. (I.e., is a variable created when a package imports one of its own submodules.)

process_file(module_doc)

source code 
call graph 

Read the given ModuleDoc's file, and add variables corresponding to any objects defined in that file. In particular, read and tokenize module_doc.filename, and process each logical line using process_line().

shallow_parse(line_toks)

source code 
call graph 

Given a flat list of tokens, return a nested tree structure (called a token tree), whose leaves are identical to the original list, but whose structure reflects the structure implied by the grouping tokens (i.e., parenthases, braces, and brackets). If the parenthases, braces, and brackets do not match, or are not balanced, then raise a ParseError.

Assign some structure to a sequence of structure (group parens).

process_line(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)

source code 
call graph 
Returns:
new-doc, decorator..?

_process_fromstar_import(src, parent_docs)

source code 
call graph 

Handle a statement of the form:

>>> from <src> import *

If IMPORT_HANDLING is 'parse', then first try to parse the module <src>, and copy all of its exported variables to parent_docs[-1].

Otherwise, try to determine the names of the variables exported by <src>, and create a new variable for each export. If IMPORT_STAR_HANDLING is 'parse', then the list of exports if found by parsing <src>; if it is 'introspect', then the list of exports is found by importing and introspecting <src>.

_import_var(name, parent_docs)

source code 
call graph 

Handle a statement of the form:

>>> import <name>

If IMPORT_HANDLING is 'parse', then first try to find the value by parsing; and create an appropriate variable in parentdoc.

Otherwise, add a variable for the imported variable. (More than one variable may be created for cases like 'import a.b', where we need to create a variable 'a' in parentdoc containing a proxy module; and a variable 'b' in the proxy module.

_import_var_as(src, name, parent_docs)

source code 
call graph 

Handle a statement of the form:

>>> import src as name

If IMPORT_HANDLING is 'parse', then first try to find the value by parsing; and create an appropriate variable in parentdoc.

Otherwise, create a variables with its imported_from attribute pointing to the imported object.

process_one_line_block(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)

source code 
call graph 

The line handler for single-line blocks, such as:

>>> def f(x): return x*2

This handler calls process_line twice: once for the tokens up to and including the colon, and once for the remaining tokens. The comment docstring is applied to the first line only.

Returns:
None

process_multi_stmt(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)

source code 
call graph 

The line handler for semicolon-separated statements, such as:

>>> x=1; y=2; z=3

This handler calls process_line once for each statement. The comment docstring is not passed on to any of the sub-statements.

Returns:
None

process_del(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)

source code 
call graph 

The line handler for delete statements, such as:

>>> del x, y.z

This handler calls del_variable for each dotted variable in the variable list. The variable list may be nested. Complex expressions in the variable list (such as x[3]) are ignored.

Returns:
None

process_docstring(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)

source code 
call graph 

The line handler for bare string literals. If prev_line_doc is not None, then the string literal is added to that APIDoc as a docstring. If it already has a docstring (from comment docstrings), then the new docstring will be appended to the old one.

process_funcdef(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)

source code 
call graph 

The line handler for function declaration lines, such as:

>>> def f(a, b=22, (c,d)):

This handler creates and initializes a new VariableDoc containing a RoutineDoc, adds the VariableDoc to the containing namespace, and returns the RoutineDoc.

process_classdef(line, parent_docs, prev_line_doc, lineno, comments, decorators, encoding)

source code 
call graph 

The line handler for class declaration lines, such as:

>>> class Foo(Bar, Baz):

This handler creates and initializes a new VariableDoc containing a ClassDoc, adds the VariableDoc to the containing namespace, and returns the ClassDoc.

parse_name(elt, strip_parens=False)

source code 
call graph 

If the given token tree element is a name token, then return that name as a string. Otherwise, raise ParseError.

Parameters:
  • strip_parens - If true, then if elt is a single name enclosed in parenthases, then return that name.

parse_dotted_name(elt_list, strip_parens=True, parent_name=None)

source code 
call graph 
Parameters:
  • parent_name (DottedName) - canonical name of referring module, to resolve relative imports.

Bug: does not handle 'x.(y).z'

parse_funcdef_arg(elt)

source code 
call graph 

If the given tree token element contains a valid function definition argument (i.e., an identifier token or nested list of identifiers), then return a corresponding string identifier or nested list of string identifiers. Otherwise, raise a ParseError.

parse_classdef_bases(elt)

source code 
call graph 

If the given tree token element contains a valid base list (that contains only dotted names), then return a corresponding list of DottedNames. Otherwise, raise a ParseError.

Bug: Does not handle either of:

   - class A( (base.in.parens) ): pass
   - class B( (lambda:calculated.base)() ): pass

parse_dotted_name_list(elt_list)

source code 

If the given list of tree token elements contains a comma-separated list of dotted names, then return a corresponding list of DottedName objects. Otherwise, raise ParseError.

set_variable(namespace, var_doc, preserve_docstring=False)

source code 
call graph 

Add var_doc to namespace. If namespace already contains a variable with the same name, then discard the old variable. If preserve_docstring is true, then keep the old variable's docstring when overwriting a variable.

get_module_encoding(filename)

source code 
call graph 

See Also: PEP 263

flatten(lst, out=None)

source code 
Parameters:
  • lst - The nested list that should be flattened.
Returns:
a flat list containing the leaves of the given nested list.

Variables Details [hide private]

_moduledoc_cache

A cache of ModuleDocs that we've already created. _moduledoc_cache is a dictionary mapping from filenames to ValueDoc objects.

Type:
dict
Value:
{'/home/edloper/newdata/projects/docutils/docutils/__init__.py': <Modu\
leDoc docutils>,
 '/home/edloper/newdata/projects/docutils/docutils/nodes.py': <ModuleD\
oc docutils.nodes>,
 '/home/edloper/newdata/projects/docutils/docutils/utils.py': <ModuleD\
oc docutils.utils>,
 '/home/edloper/newdata/projects/docutils/docutils/writers/__init__.py\
': <ModuleDoc docutils.writers>,
...

IMPORT_HANDLING

What should docparser do when it encounters an import statement?

  • 'link': Create variabledoc objects with imported_from pointers to the source object.
  • 'parse': Parse the imported file, to find the actual documentation for the imported object. (This will fall back to the 'link' behavior if the imported file can't be parsed, e.g., if it's a builtin.)
Value:
'link'

IMPORT_STAR_HANDLING

When docparser encounters a 'from m import *' statement, and is unable to parse m (either because IMPORT_HANDLING='link', or because parsing failed), how should it determine the list of identifiers expored by m?

  • 'ignore': ignore the import statement, and don't create any new variables.
  • 'parse': parse it to find a list of the identifiers that it exports. (This will fall back to the 'ignore' behavior if the imported file can't be parsed, e.g., if it's a builtin.)
  • 'introspect': import the module and introspect it (using dir) to find a list of the identifiers that it exports. (This will fall back to the 'ignore' behavior if the imported file can't be parsed, e.g., if it's a builtin.)
Value:
'parse'

DEFAULT_DECORATOR_BEHAVIOR

When DocParse encounters an unknown decorator, what should it do to the documentation of the decorated function?

  • 'transparent': leave the function's documentation as-is.
  • 'opaque': replace the function's documentation with an empty ValueDoc object, reflecting the fact that we have no knowledge about what value the decorator returns.
Value:
'transparent'

BASE_HANDLING

What should docparser do when it encounters a base class that was imported from another module?

  • 'link': Create a valuedoc with a proxy_for pointer to the base class.
  • 'parse': Parse the file containing the base class, to find the actual documentation for it. (This will fall back to the 'link' behavior if the imported file can't be parsed, e.g., if it's a builtin.)
Value:
'parse'

START_GROUP_MARKER

The prefix used to mark a comment that starts a group. This marker should be followed (on the same line) by the name of the group. Following a start-group comment, all variables defined at the same indentation level will be assigned to this group name, until the parser reaches the end of the file, a matching end-group comment, or another start-group comment at the same indentation level.

Value:
'#{'

END_GROUP_MARKER

The prefix used to mark a comment that ends a group. See START_GROUP_MARKER.

Value:
'#}'

CONTROL_FLOW_KEYWORDS

A list of the control flow keywords. If a line begins with one of these keywords, then it should be handled by process_control_flow_line.

Value:
['if', 'elif', 'else', 'while', 'for', 'try', 'except', 'finally']