Currently SciPy project consists of two packages:
NumPy — it provides packages like:
numpy.distutils - extension to Python distutils
numpy.f2py - a tool to bind Fortran/C codes to Python
numpy.core - future replacement of Numeric and numarray packages
numpy.lib - extra utility functions
numpy.testing - numpy-style tools for unit testing
etc
SciPy — a collection of scientific tools for Python.
The aim of this document is to describe how to add new tools to SciPy.
SciPy consists of Python packages, called SciPy packages, that are available to Python users via the scipy namespace. Each SciPy package may contain other SciPy packages. And so on. Therefore, the SciPy directory tree is a tree of packages with arbitrary depth and width. Any SciPy package may depend on NumPy packages but the dependence on other SciPy packages should be kept minimal or zero.
scipy
A SciPy package contains, in addition to its sources, the following files and directories:
setup.py — building script __init__.py — package initializer tests/ — directory of unittests
setup.py — building script
setup.py
__init__.py — package initializer
__init__.py
tests/ — directory of unittests
tests/
Their contents are described below.
In order to add a Python package to SciPy, its build script (setup.py) must meet certain requirements. The most important requirement is that the package define a configuration(parent_package='',top_path=None) function which returns a dictionary suitable for passing to numpy.distutils.core.setup(..). To simplify the construction of this dictionary, numpy.distutils.misc_util provides the Configuration class, described below.
configuration(parent_package='',top_path=None)
numpy.distutils.core.setup(..)
numpy.distutils.misc_util
Configuration
Below is an example of a minimal setup.py file for a pure SciPy package:
#!/usr/bin/env python3 def configuration(parent_package='',top_path=None): from numpy.distutils.misc_util import Configuration config = Configuration('mypackage',parent_package,top_path) return config if __name__ == "__main__": from numpy.distutils.core import setup #setup(**configuration(top_path='').todict()) setup(configuration=configuration)
The arguments of the configuration function specify the name of parent SciPy package (parent_package) and the directory location of the main setup.py script (top_path). These arguments, along with the name of the current package, should be passed to the Configuration constructor.
configuration
parent_package
top_path
The Configuration constructor has a fourth optional argument, package_path, that can be used when package files are located in a different location than the directory of the setup.py file.
package_path
Remaining Configuration arguments are all keyword arguments that will be used to initialize attributes of Configuration instance. Usually, these keywords are the same as the ones that setup(..) function would expect, for example, packages, ext_modules, data_files, include_dirs, libraries, headers, scripts, package_dir, etc. However, the direct specification of these keywords is not recommended as the content of these keyword arguments will not be processed or checked for the consistency of SciPy building system.
setup(..)
packages
ext_modules
data_files
include_dirs
libraries
headers
scripts
package_dir
Finally, Configuration has .todict() method that returns all the configuration data as a dictionary suitable for passing on to the setup(..) function.
.todict()
In addition to attributes that can be specified via keyword arguments to Configuration constructor, Configuration instance (let us denote as config) has the following attributes that can be useful in writing setup scripts:
config
config.name - full name of the current package. The names of parent packages can be extracted as config.name.split('.').
config.name
config.name.split('.')
config.local_path - path to the location of current setup.py file.
config.local_path
config.top_path - path to the location of main setup.py file.
config.top_path
config.todict() — returns configuration dictionary suitable for passing to numpy.distutils.core.setup(..) function.
config.todict()
config.paths(*paths) --- applies ``glob.glob(..) to items of paths if necessary. Fixes paths item that is relative to config.local_path.
config.paths(*paths) --- applies ``glob.glob(..)
paths
config.get_subpackage(subpackage_name,subpackage_path=None) — returns a list of subpackage configurations. Subpackage is looked in the current directory under the name subpackage_name but the path can be specified also via optional subpackage_path argument. If subpackage_name is specified as None then the subpackage name will be taken the basename of subpackage_path. Any * used for subpackage names are expanded as wildcards.
config.get_subpackage(subpackage_name,subpackage_path=None)
subpackage_name
subpackage_path
None
*
config.add_subpackage(subpackage_name,subpackage_path=None) — add SciPy subpackage configuration to the current one. The meaning and usage of arguments is explained above, see config.get_subpackage() method.
config.add_subpackage(subpackage_name,subpackage_path=None)
config.get_subpackage()
config.add_data_files(*files) — prepend files to data_files list. If files item is a tuple then its first element defines the suffix of where data files are copied relative to package installation directory and the second element specifies the path to data files. By default data files are copied under package installation directory. For example,
config.add_data_files(*files)
files
config.add_data_files('foo.dat', ('fun',['gun.dat','nun/pun.dat','/tmp/sun.dat']), 'bar/car.dat'. '/full/path/to/can.dat', )
will install data files to the following locations
<installation path of config.name package>/ foo.dat fun/ gun.dat pun.dat sun.dat bar/ car.dat can.dat
Path to data files can be a function taking no arguments and returning path(s) to data files – this is a useful when data files are generated while building the package. (XXX: explain the step when this function are called exactly)
config.add_data_dir(data_path) — add directory data_path recursively to data_files. The whole directory tree starting at data_path will be copied under package installation directory. If data_path is a tuple then its first element defines the suffix of where data files are copied relative to package installation directory and the second element specifies the path to data directory. By default, data directory are copied under package installation directory under the basename of data_path. For example,
config.add_data_dir(data_path)
data_path
config.add_data_dir('fun') # fun/ contains foo.dat bar/car.dat config.add_data_dir(('sun','fun')) config.add_data_dir(('gun','/full/path/to/fun'))
<installation path of config.name package>/ fun/ foo.dat bar/ car.dat sun/ foo.dat bar/ car.dat gun/ foo.dat bar/ car.dat
config.add_include_dirs(*paths) — prepend paths to include_dirs list. This list will be visible to all extension modules of the current package.
config.add_include_dirs(*paths)
config.add_headers(*files) — prepend files to headers list. By default, headers will be installed under <prefix>/include/pythonX.X/<config.name.replace('.','/')>/ directory. If files item is a tuple then it’s first argument specifies the installation suffix relative to <prefix>/include/pythonX.X/ path. This is a Python distutils method; its use is discouraged for NumPy and SciPy in favour of config.add_data_files(*files).
config.add_headers(*files)
<prefix>/include/pythonX.X/<config.name.replace('.','/')>/
<prefix>/include/pythonX.X/
config.add_scripts(*files) — prepend files to scripts list. Scripts will be installed under <prefix>/bin/ directory.
config.add_scripts(*files)
<prefix>/bin/
config.add_extension(name,sources,**kw) — create and add an Extension instance to ext_modules list. The first argument name defines the name of the extension module that will be installed under config.name package. The second argument is a list of sources. add_extension method takes also keyword arguments that are passed on to the Extension constructor. The list of allowed keywords is the following: include_dirs, define_macros, undef_macros, library_dirs, libraries, runtime_library_dirs, extra_objects, extra_compile_args, extra_link_args, export_symbols, swig_opts, depends, language, f2py_options, module_dirs, extra_info, extra_f77_compile_args, extra_f90_compile_args.
config.add_extension(name,sources,**kw)
Extension
name
add_extension
define_macros
undef_macros
library_dirs
runtime_library_dirs
extra_objects
extra_compile_args
extra_link_args
export_symbols
swig_opts
depends
language
f2py_options
module_dirs
extra_info
extra_f77_compile_args
extra_f90_compile_args
Note that config.paths method is applied to all lists that may contain paths. extra_info is a dictionary or a list of dictionaries that content will be appended to keyword arguments. The list depends contains paths to files or directories that the sources of the extension module depend on. If any path in the depends list is newer than the extension module, then the module will be rebuilt.
config.paths
The list of sources may contain functions (‘source generators’) with a pattern def <funcname>(ext, build_dir): return <source(s) or None>. If funcname returns None, no sources are generated. And if the Extension instance has no sources after processing all source generators, no extension module will be built. This is the recommended way to conditionally define extension modules. Source generator functions are called by the build_src sub-command of numpy.distutils.
def <funcname>(ext, build_dir): return <source(s) or None>
funcname
build_src
numpy.distutils
For example, here is a typical source generator function:
def generate_source(ext,build_dir): import os from distutils.dep_util import newer target = os.path.join(build_dir,'somesource.c') if newer(target,__file__): # create target file return target
The first argument contains the Extension instance that can be useful to access its attributes like depends, sources, etc. lists and modify them during the building process. The second argument gives a path to a build directory that must be used when creating files to a disk.
sources
config.add_library(name, sources, **build_info) — add a library to libraries list. Allowed keywords arguments are depends, macros, include_dirs, extra_compiler_args, f2py_options, extra_f77_compile_args, extra_f90_compile_args. See .add_extension() method for more information on arguments.
config.add_library(name, sources, **build_info)
macros
extra_compiler_args
.add_extension()
config.have_f77c() — return True if Fortran 77 compiler is available (read: a simple Fortran 77 code compiled successfully).
config.have_f77c()
config.have_f90c() — return True if Fortran 90 compiler is available (read: a simple Fortran 90 code compiled successfully).
config.have_f90c()
config.get_version() — return version string of the current package, None if version information could not be detected. This methods scans files __version__.py, <packagename>_version.py, version.py, __svn_version__.py for string variables version, __version__, <packagename>_version.
config.get_version()
__version__.py
<packagename>_version.py
version.py
__svn_version__.py
version
__version__
<packagename>_version
config.make_svn_version_py() — appends a data function to data_files list that will generate __svn_version__.py file to the current package directory. The file will be removed from the source directory when Python exits.
config.make_svn_version_py()
config.get_build_temp_dir() — return a path to a temporary directory. This is the place where one should build temporary files.
config.get_build_temp_dir()
config.get_distribution() — return distutils Distribution instance.
config.get_distribution()
Distribution
config.get_config_cmd() — returns numpy.distutils config command instance.
config.get_config_cmd()
config.get_info(*names) —
config.get_info(*names)
.src
NumPy distutils supports automatic conversion of source files named <somefile>.src. This facility can be used to maintain very similar code blocks requiring only simple changes between blocks. During the build phase of setup, if a template file named <somefile>.src is encountered, a new file named <somefile> is constructed from the template and placed in the build directory to be used instead. Two forms of template conversion are supported. The first form occurs for files named <file>.ext.src where ext is a recognized Fortran extension (f, f90, f95, f77, for, ftn, pyf). The second form is used for all other cases.
This template converter will replicate all function and subroutine blocks in the file with names that contain ‘<…>’ according to the rules in ‘<…>’. The number of comma-separated words in ‘<…>’ determines the number of times the block is repeated. What these words are indicates what that repeat rule, ‘<…>’, should be replaced with in each block. All of the repeat rules in a block must contain the same number of comma-separated words indicating the number of times that block should be repeated. If the word in the repeat rule needs a comma, leftarrow, or rightarrow, then prepend it with a backslash ‘ ‘. If a word in the repeat rule matches ‘ \<index>’ then it will be replaced with the <index>-th word in the same repeat specification. There are two forms for the repeat rule: named and short.
A named repeat rule is useful when the same set of repeats must be used several times in a block. It is specified using <rule1=item1, item2, item3,…, itemN>, where N is the number of times the block should be repeated. On each repeat of the block, the entire expression, ‘<…>’ will be replaced first with item1, and then with item2, and so forth until N repeats are accomplished. Once a named repeat specification has been introduced, the same repeat rule may be used in the current block by referring only to the name (i.e. <rule1>).
A short repeat rule looks like <item1, item2, item3, …, itemN>. The rule specifies that the entire expression, ‘<…>’ should be replaced first with item1, and then with item2, and so forth until N repeats are accomplished.
The following predefined named repeat rules are available:
<prefix=s,d,c,z>
<_c=s,d,c,z>
<_t=real, double precision, complex, double complex>
<ftype=real, double precision, complex, double complex>
<ctype=float, double, complex_float, complex_double>
<ftypereal=float, double precision, \0, \1>
<ctypereal=float, double, \0, \1>
Non-Fortran files use a separate syntax for defining template blocks that should be repeated using a variable expansion similar to the named repeat rules of the Fortran-specific repeats.
NumPy Distutils preprocesses C source files (extension: .c.src) written in a custom templating language to generate C code. The @ symbol is used to wrap macro-style variables to empower a string substitution mechanism that might describe (for instance) a set of data types.
.c.src
@
The template language blocks are delimited by /**begin repeat and /**end repeat**/ lines, which may also be nested using consecutively numbered delimiting lines such as /**begin repeat1 and /**end repeat1**/:
/**begin repeat
/**end repeat**/
/**begin repeat1
/**end repeat1**/
/**begin repeat on a line by itself marks the beginning of a segment that should be repeated.
Named variable expansions are defined using #name=item1, item2, item3, ..., itemN# and placed on successive lines. These variables are replaced in each repeat block with corresponding word. All named variables in the same repeat block must define the same number of words.
#name=item1, item2, item3, ..., itemN#
In specifying the repeat rule for a named variable, item*N is short- hand for item, item, ..., item repeated N times. In addition, parenthesis in combination with *N can be used for grouping several items that should be repeated. Thus, #name=(item1, item2)*4# is equivalent to #name=item1, item2, item1, item2, item1, item2, item1, item2#.
item*N
item, item, ..., item
*N
#name=(item1, item2)*4#
#name=item1, item2, item1, item2, item1, item2, item1, item2#
*/ on a line by itself marks the end of the variable expansion naming. The next line is the first line that will be repeated using the named rules.
*/
Inside the block to be repeated, the variables that should be expanded are specified as @name@.
@name@
/**end repeat**/ on a line by itself marks the previous line as the last line of the block to be repeated.
A loop in the NumPy C source code may have a @TYPE@ variable, targeted for string substitution, which is preprocessed to a number of otherwise identical loops with several strings such as INT, LONG, UINT, ULONG. The @TYPE@ style syntax thus reduces code duplication and maintenance burden by mimicking languages that have generic type support.
@TYPE@
INT
LONG
UINT
ULONG
The above rules may be clearer in the following template source example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
/* TIMEDELTA to non-float types */ /**begin repeat * * #TOTYPE = BYTE, UBYTE, SHORT, USHORT, INT, UINT, LONG, ULONG, * LONGLONG, ULONGLONG, DATETIME, * TIMEDELTA# * #totype = npy_byte, npy_ubyte, npy_short, npy_ushort, npy_int, npy_uint, * npy_long, npy_ulong, npy_longlong, npy_ulonglong, * npy_datetime, npy_timedelta# */ /**begin repeat1 * * #FROMTYPE = TIMEDELTA# * #fromtype = npy_timedelta# */ static void @FROMTYPE@_to_@TOTYPE@(void *input, void *output, npy_intp n, void *NPY_UNUSED(aip), void *NPY_UNUSED(aop)) { const @fromtype@ *ip = input; @totype@ *op = output; while (n--) { *op++ = (@totype@)*ip++; } } /**end repeat1**/ /**end repeat**/
The preprocessing of generically-typed C source files (whether in NumPy proper or in any third party package using NumPy Distutils) is performed by conv_template.py. The type-specific C files generated (extension: .c) by these modules during the build process are ready to be compiled. This form of generic typing is also supported for C header files (preprocessed to produce .h files).
.c
.h
get_numpy_include_dirs() — return a list of NumPy base include directories. NumPy base include directories contain header files such as numpy/arrayobject.h, numpy/funcobject.h etc. For installed NumPy the returned list has length 1 but when building NumPy the list may contain more directories, for example, a path to config.h file that numpy/base/setup.py file generates and is used by numpy header files.
get_numpy_include_dirs()
numpy/arrayobject.h
numpy/funcobject.h
config.h
numpy/base/setup.py
numpy
append_path(prefix,path) — smart append path to prefix.
append_path(prefix,path)
path
prefix
gpaths(paths, local_path='') — apply glob to paths and prepend local_path if needed.
gpaths(paths, local_path='')
local_path
njoin(*path) — join pathname components + convert /-separated path to os.sep-separated path and resolve .., . from paths. Ex. njoin('a',['b','./c'],'..','g') -> os.path.join('a','b','g').
njoin(*path)
/
os.sep
..
.
njoin('a',['b','./c'],'..','g') -> os.path.join('a','b','g')
minrelpath(path) — resolves dots in path.
minrelpath(path)
rel_path(path, parent_path) — return path relative to parent_path.
rel_path(path, parent_path)
parent_path
def get_cmd(cmdname,_cache={}) — returns numpy.distutils command instance.
def get_cmd(cmdname,_cache={})
all_strings(lst)
has_f_sources(sources)
has_cxx_sources(sources)
filter_sources(sources) — return c_sources, cxx_sources, f_sources, fmodule_sources
filter_sources(sources)
c_sources, cxx_sources, f_sources, fmodule_sources
get_dependencies(sources)
is_local_src_dir(directory)
get_ext_source_files(ext)
get_script_files(scripts)
get_lib_source_files(lib)
get_data_files(data)
dot_join(*args) — join non-zero arguments with a dot.
dot_join(*args)
get_frame(level=0) — return frame object from call stack with given level.
get_frame(level=0)
cyg2win32(path)
mingw32() — return True when using mingw32 environment.
mingw32()
True
terminal_has_colors(), red_text(s), green_text(s), yellow_text(s), blue_text(s), cyan_text(s)
terminal_has_colors()
red_text(s)
green_text(s)
yellow_text(s)
blue_text(s)
cyan_text(s)
get_path(mod_name,parent_path=None) — return path of a module relative to parent_path when given. Handles also __main__ and __builtin__ modules.
get_path(mod_name,parent_path=None)
__main__
__builtin__
allpath(name) — replaces / with os.sep in name.
allpath(name)
cxx_ext_match, fortran_ext_match, f90_ext_match, f90_module_name_match
cxx_ext_match
fortran_ext_match
f90_ext_match
f90_module_name_match
numpy.distutils.system_info
get_info(name,notfound_action=0)
combine_paths(*args,**kws)
show_all()
numpy.distutils.cpuinfo
cpuinfo
numpy.distutils.log
set_verbosity(v)
numpy.distutils.exec_command
get_pythonexe()
find_executable(exe, path=None)
exec_command( command, execute_in='', use_shell=None, use_tee=None, **env )
The header of a typical SciPy __init__.py is:
""" Package docstring, typically with a brief description and function listing. """ # import functions into module namespace from .subpackage import * ... __all__ = [s for s in dir() if not s.startswith('_')] from numpy.testing import Tester test = Tester().test bench = Tester().bench
It is possible to specify config_fc options in setup.py scripts. For example, using
config.add_library(‘library’,sources=[…], config_fc={‘noopt’:(__file__,1)})
sources=[…], config_fc={‘noopt’:(__file__,1)})
will compile the library sources without optimization flags.
library
It’s recommended to specify only those config_fc options in such a way that are compiler independent.
Some old Fortran codes need special compiler options in order to work correctly. In order to specify compiler options per source file, numpy.distutils Fortran compiler looks for the following pattern:
CF77FLAGS(<fcompiler type>) = <fcompiler f77flags>
in the first 20 lines of the source and use the f77flags for specified type of the fcompiler (the first character C is optional).
f77flags
C
TODO: This feature can be easily extended for Fortran 90 codes as well. Let us know if you would need such a feature.