Single Structures¶

The main data structure in BiobOx is the Structure class, which handles collections of 3D points. Points are stored in a MxNx3 coordinates array, where M is the number of alternative points arrangements, and N is the amount of points.

At any moment, one of the loaded points conformations is considered to be the active one (a.k.a. current). Any rototranslation or measuring operation will be performed on the current structure only. Some methods, e.g. rmsd allow comparing different conformations, independently from which is the current one.

The current conformation in the coordinates array can be changed by calling the set_current method (altering the Structure’s current attribute). For comfort, the current conformation is accessible in the points Nx3 array, where:

>>> self.points = self.coordinates[self.current]

Points attributes are stored in a pandas dataframe. For instance, self.data[“radius”] contains each point’s radius. A property of the whole point cloud can be stored in a properties dictionary (e.g. self.properties[“center”]).

Several Structure subclasses are available (Molecule, Ellipsoid, Cylinder, Cone, Sphere, Prism, Density, see below).

Structure¶

class Structure(p=array([], shape=(2, 0), dtype=float64), r=1.0)[source]¶

A Structure consists of an ensemble of points in 3D space, and metadata associated to each of them.

Point coordinates and properties data structures are first initialized. properties is a dictionary initially containing an entry for ‘center’ (center of geometry) and ‘radius’ (average radius of points).

Parameters

p – coordinates data structure as a mxnx3 numpy array (alternative conformation x atom x 3D coordinate). nx3 numpy array can be supplied, in case a single conformation is present.
r – average radius of every point in dataset (float), or radius of every point (numpy array)

add_xyz(coords)[source]¶

add a new alternative conformation to the database

Parameters: coords – array of 3D points, or array of arrays of 3D points (in case multiple alternative coordinates must be added at the same time)

align_axes()[source]¶

Align structure on its principal axes.

First principal axis aligned along x, second along y and third along z.

apply_transformation(M)[source]¶

apply a 3x3 transformation matrix

Parameters: M – 3x3 transformation matrix (2D numpy array)

center_to_origin()[source]¶: move the structure so that its center of geometry is at the origin.

clear()[source]¶: remove all the coordinates and empty metadata

convex_hull()[source]¶

compute Structure’s convex Hull using QuickHull algorithm.

Note

Qhull available only on scipy >=0.12

Returns: Structure object, containing the coordinates of vertices composing the convex hull

coordinates¶: numpy array containing an ensemble of alternative coordinates in 3D space

current¶: index of currently selected conformation

data¶: metadata about each atom (pandas Dataframe)

delete_xyz(index)[source]¶

remove one conformation from the conformations database.

the new current conformation will be the previous one.

Parameters: index – alternative coordinates set to remove

get_center()[source]¶: compute protein center of geometry (also assigns it to self.properties[“center”] key).

get_density(step=1.0, sigma=1.0, kernel_half_width=5, buff=3)[source]¶

generate density map from points

Parameters

step – size of cubic voxels, in Angstrom
sigma – gaussian kernel sigma
kernel_half_width – kernel half width, in voxels
buff – padding to add at points cloud boundaries

Returns

Density object, containing a simulated density map

get_principal_axes()[source]¶

compute Structure’s principal axes.

Returns: 3x3 numpy array, containing the 3 principal axes ranked from smallest to biggest.

get_size()[source]¶: compute the dimensions of the object along x, y and z.

get_xyz(indices=[])[source]¶

get points coordinates.

Parameters: indices – indices of points to select. If none is provided, all points coordinates are returned.
Returns: coordinates of all points indexed by the provided indices list, or all of them if no list is provided.

pca(components, indices=[])[source]¶

compute Principal Components Analysis (PCA) on specific points within all the alternative coordinates.

Parameters

components – eigenspace dimensions
indices – points indices to be considered for PCA

Returns

numpy array of projection of each conformation into the n-dimensional eigenspace

Returns

sklearn PCA object

points¶: pointer to currently selected conformation

properties¶: collection of properties. By default, ‘center’ (geometric center of the Structure) is defined

rmsd(i, j, points_index=[], full=False)[source]¶

Calculate the RMSD between two structures in alternative coordinates ensemble. uses Kabsch alignement algorithm.

Parameters

i – index of the first structure
j – index of the second structure
points_index – if set, only specific points will be considered for comparison
full – if True, RMSD an rotation matrx are returned, RMSD only otherwise

Returns

RMSD of the two structures. If full is True, the rotation matrix is also returned

rmsd_distance_matrix(points_index=[], flat=False)[source]¶

compute distance matrix between structures (using RMSD as metric).

Parameters

points_index – if set, only specific points will be considered for comparison
flat – if True, returns flattened distance matrix

Returns

RMSD distance matrix

rmsd_one_vs_all(ref_index, points_index=[], align=False)[source]¶

Calculate the RMSD between all structures with respect of a reference structure. uses Kabsch alignement algorithm.

Parameters

ref_index – index of reference structure in conformations database
points_index – if set, only specific points will be considered for comparison
align – if set to true, all conformations will be aligned to reference (note: cannot be undone!)

Returns

RMSD of all structures with respect of reference structure (in a numpy array)

rmsf(indices=- 1, step=1)[source]¶

compute Root Mean Square Fluctuation (RMSF) of selected atoms.

Parameters

indices – indices of points for which RMSF will be calculated. If no indices list is provided, RMSF of all points will be calculated.
step – timestep between two conformations (useful when using conformations extracted from molecular dynamics)

Returns

numpy aray with RMSF of all provided indices, in the same order

rotate(x, y, z)[source]¶

rotate the structure provided angles of rotation around x, y and z axes (in degrees).

This is a rotation with respect of the origin. Make sure that the center of your structure is at the origin, if you don’t want to get a translation as well! rotating an object being not centered requires to first translate the ellipsoid at the origin, rotate it, and bringing it back.

Parameters

x – rotation around x axis
y – rotation around y axis
z – rotation around z axis

rotation_matrix(axis, theta)[source]¶

compute matrix needed to rotate the system around an arbitrary axis (using Euler-Rodrigues formula).

Parameters

axis – 3d vector (numpy array), representing the axis around which to rotate
theta – desired rotation angle

Returns

3x3 rotation matrix

set_current(pos)[source]¶

select current frame (place frame pointer at desired position)

Parameters: pos – number of alternative conformation (starting from 0)

set_xyz(coords)[source]¶

set point coordinates.

Parameters: coords – array of 3D points

translate(x, y, z)[source]¶

translate the whole structure by a given amount.

Parameters

x – translation around x axis
y – translation around y axis
z – translation around z axis

write_pdb(filename, index=[])[source]¶

write a multi PDB file where every point is a sphere. VdW radius is written into beta factor.

Parameters

filename – name of file to output
index – list of frame indices to write to file. By default, a multipdb with all frames will be produced.

Molecule¶

class Molecule[source]¶

Bases: biobox.classes.structure.Structure

Subclass of Structure, allows reading, manipulating and analyzing molecular structures.

At instantiation, properties associated to every individual atoms are stored in a pandas Dataframe self.data. The columns of the self.data have the following names: atom, index, name, resname, chain, resid, beta, occupancy, atomtype, radius, charge.

self.knowledge contains a knowledge base about atoms and residues properties. Default values are:

‘residue_mass’ property stores the average mass for most common aminoacids (values from Expasy website)
‘atom_vdw’ vdw radius of common atoms
‘atom_mass’ mass of common atoms

The knowledge base can be edited. For instance, to add information about residue “TST” mass in molecule M type: M.knowledge[‘mass_residue’][“TST”]=142.42

apply_biomatrix()[source]¶

if biomatrix information is provided, generate a new molecule with all symmetry operators applied.

Returns: new Molecule containing several copies of the current Molecule, arranged according to BIOMT statements contained in pdb, or -1 if no transformation matrix is provided

apply_symmetry()[source]¶

if symmetry information is provided, generate a new molecule with all symmetry operators applied.

Returns: new Molecule containing several copies of the current Molecule, arranged according to SMTRY statements contained in pdb, or -1 if no transformation matrix is provided

assign_atomtype()[source]¶: guess atomtype from atom names

atomignore(chain, res, atom, get_index=False, use_resname=False)[source]¶

Select specific atoms that do not match a specific query (chain, residue ID and atom name). Useful to remove from a molecule atoms unwanted for further analysis, alternative conformations, etc…

Parameters

chain – chain name (accepts * as wildcard). Can also be a list or numpy array of strings.
res – residue ID (accepts * as wildcard). Can also be a list or numpy array of of int.
atom – atom name (accepts * as wildcard). Can also be a list or numpy array of strings.
get_index – if set to True, returns the indices of atoms in self.points array (and self.data)
use_resname – if set to True, consider information in “res” variable as resnames, and not resids

Returns

coordinates of the selected points not matching the query, if get_index is set to true, their indices in self.points array.

atomselect(chain, res, atom, get_index=False, use_resname=False)[source]¶

Select specific atoms in the protein providing chain, residue ID and atom name.

Parameters

chain – selection of a specific chain name (accepts * as wildcard). Can also be a list or numpy array of strings.
res – residue ID of desired atoms (accepts * as wildcard). Can also be a list or numpy array of of int.
atom – name of desired atom (accepts * as wildcard). Can also be a list or numpy array of strings.
get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)
use_resname – if set to True, consider information in “res” variable as resnames, and not resids

Returns

coordinates of the selected points and, if get_index is set to true, their indices in self.points array.

beta_factor_from_rmsf(indices=- 1, step=1)[source]¶

estimate atoms beta factor on the base of their RMSF.

Parameters

indices – indices of atoms of interest. If not set all atoms will be considered.
step – timestep between two conformations (useful when using conformations extracted from molecular dynamics)

get_atoms_ccs()[source]¶

return array with atomic CCS radii of every atom in molecule

Returns: CCS in Angstrom^2

get_couples(idx, cutoff)[source]¶

given a list of indices, compute the all-vs-all distance and return only couples below a given cutoff distance

useful for the detection of disulfide bridges or linkable sites via cross-linking (approximation, supposing euclidean distances)’

Parameters

idx – indices of atoms to check.
cutoff – minimal distance to consider a couple as linkable.

Returns

nx3 numpy array containing, for every valid connection, id of first atom, id of second atom and distance between the two.

get_data(indices=[], columns=[])[source]¶

Return information about atoms of interest (i.e., slice the data DataFrame)

Parameters

indices – list of indices, if not provided all atom data is returned
columns – list of columns (e.g. [“resname”, “resid”, “chain”]), if not provided all columns are returned

Returns

numpy array containing a slice of molecule’s data

get_dipole_density(dipole_map, orig, min_val, V, outname, vox_in_window=3.0, eqn='gauss', T=310.15, P=101000.0, epsilonE=54.0, resolution=1.0)[source]¶

Method to generate an electron density map based on a voxel grid of dipole vectors

Parameters

dipole_map – The dipole map input. Can be generated with get_dipole_map above
orig – Origin points for voxel grid
min_val – Minimum coordinates of edge points for the voxel grid (i.e. a single x, y, z point defining the start point of the grid to match with the multipdb)
V – Volume of a voxel (can be found by resolution**3, but left blank in case later version institute a sphere)
outname – Name of electron density map file produced
vox_in_window – Amount of surrounding space to contribute to local dipole. vox_in_window * resolution gives window size (in Ang.)
eqn – Equation mode to model the electron density
T – Temperature of MD
P – Pressure of MD
epsilonE – Continuum dielectric surrounding the protein
resolution – Desired resolution of voxel

get_dipole_map(orig, pqr, time_start=0, time_end=2, resolution=1.0, vox_in_window=3.0, write_dipole_map=True, fname='dipole_map.tcl')[source]¶

Method for generating dipole maps to be used for electron density map generation. Also prints a dipole map as a result (and if desired). It calls a cython code in lib.

Parameters

orig – Origin points for voxel grid
pqr – PQR file for self. Can be generated by calling pdb2pqr above
time_start – First frame to parse in multipdb
time_end – Last frame to parse in multipdb
resolution – Desired resolution of voxel
vox_in_window – Amount of surrounding space to contribute to local dipole. vox_in_window * resolution gives window size (in Ang.)
write_dipole_map – Write a dipole map in TCL format to be read in via VMD.
fname – Name of desired dipole map to be written

get_electrostatics(step=1.0, buff=3, threshold=0.01, vdw_kernel_half_width=5, elect_kernel_half_width=12, chain='*', clear_mass=True)[source]¶

generate electrostatics map from points

Parameters

step – size of cubic voxels, in Angstrom
buff – padding to add at points cloud boundaries
threshold – Threshold used for removing mass occupied space from the electron density map
vdw_kernel_half_width – kernel half width, in voxels
elect_kernel_half_width – kernel half width, in voxels
chain – select chain to use, default all chains

Returns

positive Density object

Returns

negative Density object

Returns

mass density object

get_mass_by_atom(skip_resname=[])[source]¶

compute protein mass using atoms in pdb

sum the mass of all atoms (using a knowledge base of atom masses in Dalton) The knowledge base can be expanded or edited by adding or editing entries to the molecule’s mass dictionary, e.g. to add the atom “PI” mass in molecule M type: M.knowledge[‘atom_mass’][“PI”]=3.141592

Parameters: skip_resname – list of resnames to skip. Useful to exclude ions water or other ligands from the calculation.
Returns: mass of molecule in Dalton

get_mass_by_residue(skip_resname=[])[source]¶

Compute protein mass using residues (i.e. account also for atoms not present in the structure)

Sum the average mass all every residue (using a knowledge base of atom masses in Dalton) The knowledge base can be expanded or edited by adding entries to the molecule’s mass dictionary, e.g. to add the residue “TST” mass in molecule M type: M.knowledge[‘mass_residue’][“TST”]=142.42

Parameters: skip_resname – list of resnames to skip. Useful to exclude ions water or other ligands from the calculation.
Returns: mass of molecule in Dalton

get_pdb_data(index=[])[source]¶

aggregate data and point coordinates, and return in a unique data structure

Returned data is a list containing strings for points data and floats for point coordinates in the same order as a pdb file, i.e. ATOM/HETATM, index, name, resname, chain name, residue ID, x, y, z, occupancy, beta factor, atomtype.

Returns: list aggregated data and coordinates for every point, as string.

get_secondary_structure(dssp_path='')[source]¶: compute the protein’s secondary structure, calling DSSP :param dssp_path: DSSP executable (path and filename). If not provided, the default behaviour is to seek for this information in the environment variable DSSPPATH :returns: numpy array of characters, with one-letter-coded secondary sctructure according to DSSP.

get_subset(idxs, conformations=[])[source]¶

Return a Molecule object containing only the selected atoms and frames

Parameters

ixds – atoms to extract
conformations – frames to extract (by default, all)

Returns

Molecule object

get_vdw_density(buff=3, step=0.5, kernel_half_width=10)[source]¶

generate density map from points based on the van der Waals readius of the atoms

Parameters

buff – Buffer used to create the boundaries of the density map
step – Stepsize for creating the density object
kernel_half_width – Kernel half width of the gaussian kernel, will be scaled by atom specific sigma

Returns

Density object, containing a density map

guess_chain_split(distance=3, use_backbone=True)[source]¶

reassign chain name, using distance cutoff (cannot be undone). If two consecutive atoms are beyond a cutoff, a new chain is assigned.

Parameters

distance – distance cutoff distanceR: no atomtype found!
use_backbone – if True, splitting will be performed considering backbone atoms (N and C), all atoms in a sequence otherwise

import_gro(filename)[source]¶

read a gro possibly containing multiple structures.

Parameters: filename – name of .gro file to import

import_pdb(pdb, include_hetatm=False)[source]¶

read a pdb (possibly containing containing multiple models).

Models are split according to ENDMDL and END statement. All alternative coordinates are expected to have the same atoms. After loading, the first model (M.current_model=0) will be set as active.

Parameters

pdb – PDB filename
include_hetatm – if True, HETATM will be included (they get skipped if False)

import_pqr(pqr, include_hetatm=False)[source]¶

Read a pqr (possibly containing containing multiple models).

models are split according to ENDMDL and END statement. All alternative coordinates are expected to have the same atoms. After loading, the first model (M.current_model=0) will be set as active.

Parameters

pqr – PQR filename
include_hetatm – if True, HETATM will be included (they get skipped if False)

know(prop)[source]¶

return information from knowledge base

Parameters: prop – desired property to extract from knowledge base
Returns: value associated to requested property, or nan if failed

match_residue(M2, sec=5)[source]¶

Compares two bb.Molecule() peptide strands and returns the resids within both peptides when the two are homogenous beyond a certain secondary structure threashold. The default is 5 amino acids (given by sec) in a row must be identical

Useful when aligning PDB structures that have been crystallised separately - so one may be missing the odd residue or have a few extra at the end.

Parameters

M2 – The second bb.Molecule() to compare with
sec – Number of consecutive amino acids in a row that must match before resid’s are recorded

pdb2pqr(ff='')[source]¶

Parses data from the pdb input into a pqr format. This uses the panda dataframe with the information regarding atom indexes, types etc. in the self.data files. It outputs a panda dataframe with the pqr equivilent information. It requires a datafile forcefield input. The default is the amber14sb forcefield file held within the classes/ folder.

Parameters: ff – name of forcefield text file input that needs to be read to read charges / vdw radii.

query(query_text, get_index=False)[source]¶

Select specific atoms in a multimer un the basis of a text query.

Parameters

query_text – string selecting atoms of interest. Uses the pandas query syntax, can access all columns in the dataframe self.data.
get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)

Returns

coordinates of the selected points (in a unique array) and, if get_index is set to true, a list of their indices in subunits’ self.points array.

rmsf_from_beta_factor(indices=[])[source]¶

calculate RMSF from atoms beta factors.

Parameters: indices – indices of atoms of interest. If not set all atoms will be considered.

s2(atomname1='N', atomname2='H')[source]¶

compute s2, given two atoms defining the vector of interest.

Parameters

atomname1 – name of the first atom
atomname2 – name of the second atom

Returns

data numpy array containing information about residues for which measuring has been performed (i.e.[chain, resid])

Returns

s2 s2 of residues for which both provided input atoms have been found

same_residue(index, get_index=False)[source]¶

Select atoms having the same residue and chain as a given atom (or list of atoms)

Parameters

indices (index) – of atoms of choice (integer of list of integers)
get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)

Returns

coordinates of the selected points and, if get_index is set to true, their indices in self.points array.

same_residue_unique(index, get_index=False)[source]¶

Select atoms having the same residue and chain as a given atom (or list of atoms)

Parameters

index – indices of atoms of choice (integer of list of integers)
get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)

Returns

coordinates of the selected points and, if get_index is set to true, their indices in self.points array.

set_data(value, indices=[], columns=[])[source]¶

Return information about atoms of interest (i.e., slice the data DataFrame)

Parameters

indices – list of indices, if not provided all atom data is returned
columns – list of columns (e.g. [“resname”, “resid”, “chain”]), if not provided all columns are returned

Returns

numpy array containing a slice of molecule’s data

write_gro(outname, conformations=[], index='')[source]¶

write structure(s) in .gro format.

Parameters

outname – name of .gro file to be generated.
index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1]
conformations – list of conformation indices to write to file. By default, all conformations will be returned.

write_pdb(outname, conformations=[], index=[])[source]¶

overload superclass method for writing (multi)pdb.

Parameters

outname – name of pdb file to be generated.
index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1]
conformations – list of conformation indices to write to file. By default, a multipdb with all conformations will be produced.

write_pqr(outname, conformations=[], index=[])[source]¶

overload superclass method for writing (multi)pqr.

Parameters

outname – name of pqr file to be generated.
index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1]
conformations – list of conformation indices to write to file. By default, a multipdb with all conformations will be produced.

Convex Point Clouds¶

The following classes allow generating clouds of points arranged according to specific (convex) geometries. All these classes are subclass of Structure.

class Cone(r, h, skew=0, radius=1.1, pts_density_r=0.09817477042468103, pts_density_h=0.2)[source]¶

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as a cone.

Parameters

r – radius
h – height
skew – skewing with respect of vertical axis
radius – size of the individual points composing it
pts_density_r – density of points along the rotation axis (using parametric function for cylinder)
pts_density_h – density of points along height (using parametric function for cylinder)

ccs(gas=1)[source]¶

compute cone CCS (use analytical solution using surface and gas effect)

Returns: CCS in A^2

get_surface()[source]¶

compute cone surface.

Returns: surface in A^2

get_volume()[source]¶

Compute cone volume.

Returns: volume in A^3

class Cylinder(r, h, squeeze=1.0, skew=0.0, radius=1.1, pts_density_u=0.09817477042468103, pts_density_h=0.2)[source]¶

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as an elliptical cylinder.

Parameters

r – radius
h – height
squeeze – create an elliptical base, having axes equal to r and squeeze*r
skew – skewing with respect of vertical axis
radius – size of the individual points composing it
density (pts_density_h) – of points along the u angle (using parametric function for cylinder)
density – of points along the v angle (using parametric function for cylinder)

ccs(gas=1)[source]¶

Compute cylinder CCS.

Uses Ramanujan approximation for base perimeter. Good, but not perfect for very elliptical cylinders!

Returns: CCS in A^2

get_surface()[source]¶

Compute cylinder surface.

Uses Ramanujan approximation for base perimeter. Good, but not perfect for very elliptical cylinders!

Returns: surface in A^2

get_volume()[source]¶

Compute cylinder volume.

Returns: volume in A^3

class Ellipsoid(a, b, c, radius=1.9, pts_density_u=0.08726646259971647, pts_density_v=0.08726646259971647)[source]¶

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as an ellipsoid.

Parameters

a – x radius of the ellipsoid
b – y radius of the ellipsoid
c – z radius of the ellipsoid
radius – size of the individual points composing it
pts_density_u – This parameter defines the density of points along the u angle (using parametric function for ellipsoid)
pts_density_v – This parameter defines the density of points along the v angle (using parametric function for ellipsoid)

ccs(gas=1)[source]¶

compute ellipsoid CCS.

Uses analytical approximation to surface area.

Returns: surface in A^2

check_inclusion(p)[source]¶

count how many points in the array p are included in the ellipsoid.

overloading of superclass function, which is slower (here we can use the ellipsoid functional form to speed up things)

Parameters: p – list of points (numpy array)
Returns: quantity of points located inside the ellipsoid

get_sphericity()[source]¶

compute ellipsoid sphericity.

Returns: ellipsoid sphericity

get_surface()[source]¶

compute ellipsoid surface.

Note: using analytical value, uses analytical approximation to surface area

Returns: surface in A^2

get_volume()[source]¶

compute ellipsoid volume.

Returns: volume in A^3

class Prism(r, h, n, skew=0.0, radius=1.1, pts_density_u=0.09817477042468103, pts_density_h=0.2)[source]¶

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as a prism (polygonal bottom and top, rectangular sides).

Parameters

r – distance of sides from center of symmetry
h – height
n – number of side faces
skew – skewing with respect of vertical axis
radius – size of the individual points composing it
pts_density_u – density of points along the r axis (polar coordinates)
pts_density_h – the density of points along the vertical axis

ccs(gas=1)[source]¶

compute prism CCS.

Returns: CCS in A^2

get_surface()[source]¶

compute prism surface.

Returns: surface in A^2

get_volume()[source]¶

compute prism volume.

Returns: volume in A^3

class Sphere(r, radius=1.9, n_sphere_point=960)[source]¶

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as a sphere.

using golden spiral to approximate an even distribution

Parameters

r – radius of the ellipsoid
radius – size of the individual points composing it
n_sphere_point – This parameter defines the amount of points in the sphere

ccs(gas=1)[source]¶

compute spheroid CCS.

Uses analytical approximation to surface area.

Returns: surface in A^2

check_inclusion(p)[source]¶

count how many points in the array p are included in the sphere.

overloading of superclass function, which is slower (here we can use the ellipsoid functional form to speed up things)

Parameters: p – list of points (numpy array)
Returns: quantity of points located inside the sphere

get_sphericity()[source]¶

compute sphericity (makes sense only for squeezed spheres, obviously..)

Returns: shape sphericity

get_surface()[source]¶

compute sphere surface.

Returns: surface in A^2

get_volume()[source]¶

compute ellipsoid volume.

Returns: volume in A^3

squeeze(deformation, preserve_volume=True)[source]¶

squeeze sphere according to deformation coefficient(s)

Parameters

deformation – deformation coefficient. Can be a float (deformation of one axis), or a list of 2 or 3 floats.
preserve_volume – If true and deformation is either a float or a list of 2 floats, correct remaining axes to preserve volume.

Density¶

class Density[source]¶

Bases: biobox.classes.structure.Structure

Subclass of Structure, allows importing density map, and transform them in a PDB file containing a collection of spheres placed on the map’s high density regions.

A density map is fully described by the following attributes, stored in the self.properties dictionary:

Parameters

density – density map
delta – scaling factor for voxels (default is [1, 1, 1] Angstrom)
size – dimensions in voxels
origin – bottom-left-front corner of the cube
radius – radius of points composing the density map
format – format name (only dx supported at the moment)

best_threshold(mass, density=0.782878356)[source]¶

If mass and density of object are known, try to filter the map so that the mass is best matched.

search for best threshold using bissection method.

Note

in proteins, an average value of 1.3 g/cm^3 (0.782878356 Da/A^3) can be assumed. Alternatively, the relation density=1.410+0.145*exp(-mass(kDa)/13) can be used.

Note

1 Da/A^3=1.660538946 g/cm^3

Note

1 g/cm^3=0.602214120 Da/A^3

Parameters

mass – target mass in Da
density – target density in Da/A^3

Returns

array reporting tested values and error on mass ([sigma, model_mass-target_mass])

blur(dimension=5, sigma=0.5)[source]¶

blur density applying a cubic gaussian kernel of given kernel dimension (cubic grid size).

Warning

cannot be undone

Parameters

dimension – size of the kernel grid.
sigma – standard deviation of gaussian kernel.

export_as_pdb(outname, step, threshold=0.1)[source]¶

Write a pdb file with points where the density exceeds a threshold

Parameters

outname – output file name
step – stepsize used to generate the density map
threshold – density to be exceeded to generate a point in pdb

find_data_from_ccs(ccs)[source]¶

map experimental data to given ccs

Parameters: ccs – target CCS (in A^2)

find_data_from_sigma(sigma, exact=True, append=False, noise_filter=0.01)[source]¶

map experimental data to given threshold

Parameters

sigma – density threshold
noise_filter – launch DBSCAN clustering algorithm to detect connected regions in density map. Regions representing less than noise_filter of the total will be removed. This is a ratio, value should be between 0 and 1.

find_data_from_volume(vol)[source]¶

map experimental data to given volume

Parameters: vol – volume

get_oversampled_points(sigma=0)[source]¶

return points obtained by oversampling the map (doule points on every axis)

Parameters: sigma – place points only on voxels having intensity greater than threshold
Returns: points 3D points placed on voxels having value higher than threshold
Returns: radius radius of produced points

get_sigma_from_thresh(t)[source]¶

convert cutoff value from actual threshold to sigma multiple

:param threshold value :returns sigma multiple

get_thresh_from_sigma(val)[source]¶

convert cutoff value from sigma multiples into actual threshold

Parameters: val – sigma scaling
Returns: cutoff

get_volume()[source]¶: compute density map volume. This is done by counting the points, and multiplying that by voxels’ volume.

Warning

can be called only after place_points has been called.

Warning

supposes unskewed voxels.

import_map(filename, fileformat='dx')[source]¶

Import density map and fill up the points and properties data structures.

Parameters

filename – name of density file to load
fileformat – at the moment supports dx, ccp4, mrc and imod

import_numpy(data, origin=[0, 0, 0], delta=array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]))[source]¶

import a numpy 3D array to allow manipulation as a density map

Parameters

data – numpy 3D array
origin – coordinates of bottom left corner of the map
delta – voxels’ shape (default is a cubic voxel of 1 Angstrom-long sides).

place_points(sigma=0, noise_filter=0.01)[source]¶

given density information, place points on every voxel above a given threshold.

Parameters

sigma – intensity threshold value.
noise_filter – launch DBSCAN clustering algorithm to detect connected regions in density map. Regions representing less than noise_filter of the total will be removed. This is a ratio, value should be between 0 and 1.

return_density_map()[source]¶

Returns: density map as 3D numpy array

scan_threshold(mass, density=0.782878356, sampling_points=1000)[source]¶

if mass and density of object are known, filter the map on a linear scale of threshold values, and compare the obtained mass to the experimental one.

Note

in proteins, an average value of 1.3 g/cm^3 (0.782878356 Da/A^3) can be assumed. Alternatively, the relation density=1.410+0.145*exp(-mass(kDa)/13) can be used.

Note

1 Da/A^3=0.602214120 g/cm^3

Parameters

mass – target mass in Da
density – target density in Da/A^3
sampling_points – number of measures to perform between min and max intensity in density map

Returns

array reporting tested values and error on mass ([threshold, model_mass-target_mass])

threshold_vol_ccs(low='', high='', sampling_points=1000, append=False, noise_filter=0.01)[source]¶

return the volume to threshold to CCS relationship

Parameters: sampling_points – number of measures to perform between min and max intensity in density map
Returns: array reporting tested values and error on mass ([threshold, model_mass-target_mass])

write_dx(fname='dens.dx')[source]¶

Write a density map in DX format

Parameters: fname – output file name

Single Structures¶

Structure¶

Molecule¶

Convex Point Clouds¶

Density¶

Table of Contents

Previous topic

Next topic

This Page