Single Structures

The main data structure in BiobOx is the Structure class, which handles collections of 3D points. Points are stored in a MxNx3 coordinates array, where M is the number of alternative points arrangements, and N is the amount of points.

At any moment, one of the loaded points conformations is considered to be the active one (a.k.a. current). Any rototranslation or measuring operation will be performed on the current structure only. Some methods, e.g. rmsd allow comparing different conformations, independently from which is the current one.

The current conformation in the coordinates array can be changed by calling the set_current method (altering the Structure’s current attribute). For comfort, the current conformation is accessible in the points Nx3 array, where:

>>> self.points = self.coordinates[self.current]

Points attributes are stored in a pandas dataframe. For instance, self.data[“radius”] contains each point’s radius. A property of the whole point cloud can be stored in a properties dictionary (e.g. self.properties[“center”]).

Several Structure subclasses are available (Molecule, Ellipsoid, Cylinder, Cone, Sphere, Prism, Density, see below).

Structure

class Structure(p=array([], shape=(2, 0), dtype=float64), r=1.0)[source]

A Structure consists of an ensemble of points in 3D space, and metadata associated to each of them.

Point coordinates and properties data structures are first initialized. properties is a dictionary initially containing an entry for ‘center’ (center of geometry) and ‘radius’ (average radius of points).

Parameters
  • p – coordinates data structure as a mxnx3 numpy array (alternative conformation x atom x 3D coordinate). nx3 numpy array can be supplied, in case a single conformation is present.

  • r – average radius of every point in dataset (float), or radius of every point (numpy array)

add_xyz(coords)[source]

add a new alternative conformation to the database

Parameters

coords – array of 3D points, or array of arrays of 3D points (in case multiple alternative coordinates must be added at the same time)

align_axes()[source]

Align structure on its principal axes.

First principal axis aligned along x, second along y and third along z.

apply_transformation(M)[source]

apply a 3x3 transformation matrix

Parameters

M – 3x3 transformation matrix (2D numpy array)

center_to_origin()[source]

move the structure so that its center of geometry is at the origin.

clear()[source]

remove all the coordinates and empty metadata

convex_hull()[source]

compute Structure’s convex Hull using QuickHull algorithm.

Note

Qhull available only on scipy >=0.12

Returns

Structure object, containing the coordinates of vertices composing the convex hull

coordinates

numpy array containing an ensemble of alternative coordinates in 3D space

current

index of currently selected conformation

data

metadata about each atom (pandas Dataframe)

delete_xyz(index)[source]

remove one conformation from the conformations database.

the new current conformation will be the previous one.

Parameters

index – alternative coordinates set to remove

get_center()[source]

compute protein center of geometry (also assigns it to self.properties[“center”] key).

get_density(step=1.0, sigma=1.0, kernel_half_width=5, buff=3)[source]

generate density map from points

Parameters
  • step – size of cubic voxels, in Angstrom

  • sigma – gaussian kernel sigma

  • kernel_half_width – kernel half width, in voxels

  • buff – padding to add at points cloud boundaries

Returns

Density object, containing a simulated density map

get_principal_axes()[source]

compute Structure’s principal axes.

Returns

3x3 numpy array, containing the 3 principal axes ranked from smallest to biggest.

get_size()[source]

compute the dimensions of the object along x, y and z.

get_xyz(indices=[])[source]

get points coordinates.

Parameters

indices – indices of points to select. If none is provided, all points coordinates are returned.

Returns

coordinates of all points indexed by the provided indices list, or all of them if no list is provided.

pca(components, indices=[])[source]

compute Principal Components Analysis (PCA) on specific points within all the alternative coordinates.

Parameters
  • components – eigenspace dimensions

  • indices – points indices to be considered for PCA

Returns

numpy array of projection of each conformation into the n-dimensional eigenspace

Returns

sklearn PCA object

points

pointer to currently selected conformation

properties

collection of properties. By default, ‘center’ (geometric center of the Structure) is defined

rmsd(i, j, points_index=[], full=False)[source]

Calculate the RMSD between two structures in alternative coordinates ensemble. uses Kabsch alignement algorithm.

Parameters
  • i – index of the first structure

  • j – index of the second structure

  • points_index – if set, only specific points will be considered for comparison

  • full – if True, RMSD an rotation matrx are returned, RMSD only otherwise

Returns

RMSD of the two structures. If full is True, the rotation matrix is also returned

rmsd_distance_matrix(points_index=[], flat=False)[source]

compute distance matrix between structures (using RMSD as metric).

Parameters
  • points_index – if set, only specific points will be considered for comparison

  • flat – if True, returns flattened distance matrix

Returns

RMSD distance matrix

rmsd_one_vs_all(ref_index, points_index=[], align=False)[source]

Calculate the RMSD between all structures with respect of a reference structure. uses Kabsch alignement algorithm.

Parameters
  • ref_index – index of reference structure in conformations database

  • points_index – if set, only specific points will be considered for comparison

  • align – if set to true, all conformations will be aligned to reference (note: cannot be undone!)

Returns

RMSD of all structures with respect of reference structure (in a numpy array)

rmsf(indices=- 1, step=1)[source]

compute Root Mean Square Fluctuation (RMSF) of selected atoms.

Parameters
  • indices – indices of points for which RMSF will be calculated. If no indices list is provided, RMSF of all points will be calculated.

  • step – timestep between two conformations (useful when using conformations extracted from molecular dynamics)

Returns

numpy aray with RMSF of all provided indices, in the same order

rotate(x, y, z)[source]

rotate the structure provided angles of rotation around x, y and z axes (in degrees).

This is a rotation with respect of the origin. Make sure that the center of your structure is at the origin, if you don’t want to get a translation as well! rotating an object being not centered requires to first translate the ellipsoid at the origin, rotate it, and bringing it back.

Parameters
  • x – rotation around x axis

  • y – rotation around y axis

  • z – rotation around z axis

rotation_matrix(axis, theta)[source]

compute matrix needed to rotate the system around an arbitrary axis (using Euler-Rodrigues formula).

Parameters
  • axis – 3d vector (numpy array), representing the axis around which to rotate

  • theta – desired rotation angle

Returns

3x3 rotation matrix

set_current(pos)[source]

select current frame (place frame pointer at desired position)

Parameters

pos – number of alternative conformation (starting from 0)

set_xyz(coords)[source]

set point coordinates.

Parameters

coords – array of 3D points

translate(x, y, z)[source]

translate the whole structure by a given amount.

Parameters
  • x – translation around x axis

  • y – translation around y axis

  • z – translation around z axis

write_pdb(filename, index=[])[source]

write a multi PDB file where every point is a sphere. VdW radius is written into beta factor.

Parameters
  • filename – name of file to output

  • index – list of frame indices to write to file. By default, a multipdb with all frames will be produced.

Molecule

class Molecule[source]

Bases: biobox.classes.structure.Structure

Subclass of Structure, allows reading, manipulating and analyzing molecular structures.

At instantiation, properties associated to every individual atoms are stored in a pandas Dataframe self.data. The columns of the self.data have the following names: atom, index, name, resname, chain, resid, beta, occupancy, atomtype, radius, charge.

self.knowledge contains a knowledge base about atoms and residues properties. Default values are:

  • ‘residue_mass’ property stores the average mass for most common aminoacids (values from Expasy website)

  • ‘atom_vdw’ vdw radius of common atoms

  • ‘atom_mass’ mass of common atoms

The knowledge base can be edited. For instance, to add information about residue “TST” mass in molecule M type: M.knowledge[‘mass_residue’][“TST”]=142.42

apply_biomatrix()[source]

if biomatrix information is provided, generate a new molecule with all symmetry operators applied.

Returns

new Molecule containing several copies of the current Molecule, arranged according to BIOMT statements contained in pdb, or -1 if no transformation matrix is provided

apply_symmetry()[source]

if symmetry information is provided, generate a new molecule with all symmetry operators applied.

Returns

new Molecule containing several copies of the current Molecule, arranged according to SMTRY statements contained in pdb, or -1 if no transformation matrix is provided

assign_atomtype()[source]

guess atomtype from atom names

atomignore(chain, res, atom, get_index=False, use_resname=False)[source]

Select specific atoms that do not match a specific query (chain, residue ID and atom name). Useful to remove from a molecule atoms unwanted for further analysis, alternative conformations, etc…

Parameters
  • chain – chain name (accepts * as wildcard). Can also be a list or numpy array of strings.

  • res – residue ID (accepts * as wildcard). Can also be a list or numpy array of of int.

  • atom – atom name (accepts * as wildcard). Can also be a list or numpy array of strings.

  • get_index – if set to True, returns the indices of atoms in self.points array (and self.data)

  • use_resname – if set to True, consider information in “res” variable as resnames, and not resids

Returns

coordinates of the selected points not matching the query, if get_index is set to true, their indices in self.points array.

atomselect(chain, res, atom, get_index=False, use_resname=False)[source]

Select specific atoms in the protein providing chain, residue ID and atom name.

Parameters
  • chain – selection of a specific chain name (accepts * as wildcard). Can also be a list or numpy array of strings.

  • res – residue ID of desired atoms (accepts * as wildcard). Can also be a list or numpy array of of int.

  • atom – name of desired atom (accepts * as wildcard). Can also be a list or numpy array of strings.

  • get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)

  • use_resname – if set to True, consider information in “res” variable as resnames, and not resids

Returns

coordinates of the selected points and, if get_index is set to true, their indices in self.points array.

beta_factor_from_rmsf(indices=- 1, step=1)[source]

estimate atoms beta factor on the base of their RMSF.

Parameters
  • indices – indices of atoms of interest. If not set all atoms will be considered.

  • step – timestep between two conformations (useful when using conformations extracted from molecular dynamics)

get_atoms_ccs()[source]

return array with atomic CCS radii of every atom in molecule

Returns

CCS in Angstrom^2

get_couples(idx, cutoff)[source]

given a list of indices, compute the all-vs-all distance and return only couples below a given cutoff distance

useful for the detection of disulfide bridges or linkable sites via cross-linking (approximation, supposing euclidean distances)’

Parameters
  • idx – indices of atoms to check.

  • cutoff – minimal distance to consider a couple as linkable.

Returns

nx3 numpy array containing, for every valid connection, id of first atom, id of second atom and distance between the two.

get_data(indices=[], columns=[])[source]

Return information about atoms of interest (i.e., slice the data DataFrame)

Parameters
  • indices – list of indices, if not provided all atom data is returned

  • columns – list of columns (e.g. [“resname”, “resid”, “chain”]), if not provided all columns are returned

Returns

numpy array containing a slice of molecule’s data

get_dipole_density(dipole_map, orig, min_val, V, outname, vox_in_window=3.0, eqn='gauss', T=310.15, P=101000.0, epsilonE=54.0, resolution=1.0)[source]

Method to generate an electron density map based on a voxel grid of dipole vectors

Parameters
  • dipole_map – The dipole map input. Can be generated with get_dipole_map above

  • orig – Origin points for voxel grid

  • min_val – Minimum coordinates of edge points for the voxel grid (i.e. a single x, y, z point defining the start point of the grid to match with the multipdb)

  • V – Volume of a voxel (can be found by resolution**3, but left blank in case later version institute a sphere)

  • outname – Name of electron density map file produced

  • vox_in_window – Amount of surrounding space to contribute to local dipole. vox_in_window * resolution gives window size (in Ang.)

  • eqn – Equation mode to model the electron density

  • T – Temperature of MD

  • P – Pressure of MD

  • epsilonE – Continuum dielectric surrounding the protein

  • resolution – Desired resolution of voxel

get_dipole_map(orig, pqr, time_start=0, time_end=2, resolution=1.0, vox_in_window=3.0, write_dipole_map=True, fname='dipole_map.tcl')[source]

Method for generating dipole maps to be used for electron density map generation. Also prints a dipole map as a result (and if desired). It calls a cython code in lib.

Parameters
  • orig – Origin points for voxel grid

  • pqr – PQR file for self. Can be generated by calling pdb2pqr above

  • time_start – First frame to parse in multipdb

  • time_end – Last frame to parse in multipdb

  • resolution – Desired resolution of voxel

  • vox_in_window – Amount of surrounding space to contribute to local dipole. vox_in_window * resolution gives window size (in Ang.)

  • write_dipole_map – Write a dipole map in TCL format to be read in via VMD.

  • fname – Name of desired dipole map to be written

get_electrostatics(step=1.0, buff=3, threshold=0.01, vdw_kernel_half_width=5, elect_kernel_half_width=12, chain='*', clear_mass=True)[source]

generate electrostatics map from points

Parameters
  • step – size of cubic voxels, in Angstrom

  • buff – padding to add at points cloud boundaries

  • threshold – Threshold used for removing mass occupied space from the electron density map

  • vdw_kernel_half_width – kernel half width, in voxels

  • elect_kernel_half_width – kernel half width, in voxels

  • chain – select chain to use, default all chains

Returns

positive Density object

Returns

negative Density object

Returns

mass density object

get_mass_by_atom(skip_resname=[])[source]

compute protein mass using atoms in pdb

sum the mass of all atoms (using a knowledge base of atom masses in Dalton) The knowledge base can be expanded or edited by adding or editing entries to the molecule’s mass dictionary, e.g. to add the atom “PI” mass in molecule M type: M.knowledge[‘atom_mass’][“PI”]=3.141592

Parameters

skip_resname – list of resnames to skip. Useful to exclude ions water or other ligands from the calculation.

Returns

mass of molecule in Dalton

get_mass_by_residue(skip_resname=[])[source]

Compute protein mass using residues (i.e. account also for atoms not present in the structure)

Sum the average mass all every residue (using a knowledge base of atom masses in Dalton) The knowledge base can be expanded or edited by adding entries to the molecule’s mass dictionary, e.g. to add the residue “TST” mass in molecule M type: M.knowledge[‘mass_residue’][“TST”]=142.42

Parameters

skip_resname – list of resnames to skip. Useful to exclude ions water or other ligands from the calculation.

Returns

mass of molecule in Dalton

get_pdb_data(index=[])[source]

aggregate data and point coordinates, and return in a unique data structure

Returned data is a list containing strings for points data and floats for point coordinates in the same order as a pdb file, i.e. ATOM/HETATM, index, name, resname, chain name, residue ID, x, y, z, occupancy, beta factor, atomtype.

Returns

list aggregated data and coordinates for every point, as string.

get_secondary_structure(dssp_path='')[source]

compute the protein’s secondary structure, calling DSSP :param dssp_path: DSSP executable (path and filename). If not provided, the default behaviour is to seek for this information in the environment variable DSSPPATH :returns: numpy array of characters, with one-letter-coded secondary sctructure according to DSSP.

get_subset(idxs, conformations=[])[source]

Return a Molecule object containing only the selected atoms and frames

Parameters
  • ixds – atoms to extract

  • conformations – frames to extract (by default, all)

Returns

Molecule object

get_vdw_density(buff=3, step=0.5, kernel_half_width=10)[source]

generate density map from points based on the van der Waals readius of the atoms

Parameters
  • buff – Buffer used to create the boundaries of the density map

  • step – Stepsize for creating the density object

  • kernel_half_width – Kernel half width of the gaussian kernel, will be scaled by atom specific sigma

Returns

Density object, containing a density map

guess_chain_split(distance=3, use_backbone=True)[source]

reassign chain name, using distance cutoff (cannot be undone). If two consecutive atoms are beyond a cutoff, a new chain is assigned.

Parameters
  • distance – distance cutoff distanceR: no atomtype found!

  • use_backbone – if True, splitting will be performed considering backbone atoms (N and C), all atoms in a sequence otherwise

import_gro(filename)[source]

read a gro possibly containing multiple structures.

Parameters

filename – name of .gro file to import

import_pdb(pdb, include_hetatm=False)[source]

read a pdb (possibly containing containing multiple models).

Models are split according to ENDMDL and END statement. All alternative coordinates are expected to have the same atoms. After loading, the first model (M.current_model=0) will be set as active.

Parameters
  • pdb – PDB filename

  • include_hetatm – if True, HETATM will be included (they get skipped if False)

import_pqr(pqr, include_hetatm=False)[source]

Read a pqr (possibly containing containing multiple models).

models are split according to ENDMDL and END statement. All alternative coordinates are expected to have the same atoms. After loading, the first model (M.current_model=0) will be set as active.

Parameters
  • pqr – PQR filename

  • include_hetatm – if True, HETATM will be included (they get skipped if False)

know(prop)[source]

return information from knowledge base

Parameters

prop – desired property to extract from knowledge base

Returns

value associated to requested property, or nan if failed

match_residue(M2, sec=5)[source]

Compares two bb.Molecule() peptide strands and returns the resids within both peptides when the two are homogenous beyond a certain secondary structure threashold. The default is 5 amino acids (given by sec) in a row must be identical

Useful when aligning PDB structures that have been crystallised separately - so one may be missing the odd residue or have a few extra at the end.

Parameters
  • M2 – The second bb.Molecule() to compare with

  • sec – Number of consecutive amino acids in a row that must match before resid’s are recorded

pdb2pqr(ff='')[source]

Parses data from the pdb input into a pqr format. This uses the panda dataframe with the information regarding atom indexes, types etc. in the self.data files. It outputs a panda dataframe with the pqr equivilent information. It requires a datafile forcefield input. The default is the amber14sb forcefield file held within the classes/ folder.

Parameters

ff – name of forcefield text file input that needs to be read to read charges / vdw radii.

query(query_text, get_index=False)[source]

Select specific atoms in a multimer un the basis of a text query.

Parameters
  • query_text – string selecting atoms of interest. Uses the pandas query syntax, can access all columns in the dataframe self.data.

  • get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)

Returns

coordinates of the selected points (in a unique array) and, if get_index is set to true, a list of their indices in subunits’ self.points array.

rmsf_from_beta_factor(indices=[])[source]

calculate RMSF from atoms beta factors.

Parameters

indices – indices of atoms of interest. If not set all atoms will be considered.

s2(atomname1='N', atomname2='H')[source]

compute s2, given two atoms defining the vector of interest.

Parameters
  • atomname1 – name of the first atom

  • atomname2 – name of the second atom

Returns

data numpy array containing information about residues for which measuring has been performed (i.e.[chain, resid])

Returns

s2 s2 of residues for which both provided input atoms have been found

same_residue(index, get_index=False)[source]

Select atoms having the same residue and chain as a given atom (or list of atoms)

Parameters
  • indices (index) – of atoms of choice (integer of list of integers)

  • get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)

Returns

coordinates of the selected points and, if get_index is set to true, their indices in self.points array.

same_residue_unique(index, get_index=False)[source]

Select atoms having the same residue and chain as a given atom (or list of atoms)

Parameters
  • index – indices of atoms of choice (integer of list of integers)

  • get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data)

Returns

coordinates of the selected points and, if get_index is set to true, their indices in self.points array.

set_data(value, indices=[], columns=[])[source]

Return information about atoms of interest (i.e., slice the data DataFrame)

Parameters
  • indices – list of indices, if not provided all atom data is returned

  • columns – list of columns (e.g. [“resname”, “resid”, “chain”]), if not provided all columns are returned

Returns

numpy array containing a slice of molecule’s data

write_gro(outname, conformations=[], index='')[source]

write structure(s) in .gro format.

Parameters
  • outname – name of .gro file to be generated.

  • index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1]

  • conformations – list of conformation indices to write to file. By default, all conformations will be returned.

write_pdb(outname, conformations=[], index=[])[source]

overload superclass method for writing (multi)pdb.

Parameters
  • outname – name of pdb file to be generated.

  • index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1]

  • conformations – list of conformation indices to write to file. By default, a multipdb with all conformations will be produced.

write_pqr(outname, conformations=[], index=[])[source]

overload superclass method for writing (multi)pqr.

Parameters
  • outname – name of pqr file to be generated.

  • index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1]

  • conformations – list of conformation indices to write to file. By default, a multipdb with all conformations will be produced.

Convex Point Clouds

The following classes allow generating clouds of points arranged according to specific (convex) geometries. All these classes are subclass of Structure.

class Cone(r, h, skew=0, radius=1.1, pts_density_r=0.09817477042468103, pts_density_h=0.2)[source]

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as a cone.

Parameters
  • r – radius

  • h – height

  • skew – skewing with respect of vertical axis

  • radius – size of the individual points composing it

  • pts_density_r – density of points along the rotation axis (using parametric function for cylinder)

  • pts_density_h – density of points along height (using parametric function for cylinder)

ccs(gas=1)[source]

compute cone CCS (use analytical solution using surface and gas effect)

Returns

CCS in A^2

get_surface()[source]

compute cone surface.

Returns

surface in A^2

get_volume()[source]

Compute cone volume.

Returns

volume in A^3

class Cylinder(r, h, squeeze=1.0, skew=0.0, radius=1.1, pts_density_u=0.09817477042468103, pts_density_h=0.2)[source]

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as an elliptical cylinder.

Parameters
  • r – radius

  • h – height

  • squeeze – create an elliptical base, having axes equal to r and squeeze*r

  • skew – skewing with respect of vertical axis

  • radius – size of the individual points composing it

  • density (pts_density_h) – of points along the u angle (using parametric function for cylinder)

  • density – of points along the v angle (using parametric function for cylinder)

ccs(gas=1)[source]

Compute cylinder CCS.

Uses Ramanujan approximation for base perimeter. Good, but not perfect for very elliptical cylinders!

Returns

CCS in A^2

get_surface()[source]

Compute cylinder surface.

Uses Ramanujan approximation for base perimeter. Good, but not perfect for very elliptical cylinders!

Returns

surface in A^2

get_volume()[source]

Compute cylinder volume.

Returns

volume in A^3

class Ellipsoid(a, b, c, radius=1.9, pts_density_u=0.08726646259971647, pts_density_v=0.08726646259971647)[source]

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as an ellipsoid.

Parameters
  • a – x radius of the ellipsoid

  • b – y radius of the ellipsoid

  • c – z radius of the ellipsoid

  • radius – size of the individual points composing it

  • pts_density_u – This parameter defines the density of points along the u angle (using parametric function for ellipsoid)

  • pts_density_v – This parameter defines the density of points along the v angle (using parametric function for ellipsoid)

ccs(gas=1)[source]

compute ellipsoid CCS.

Uses analytical approximation to surface area.

Returns

surface in A^2

check_inclusion(p)[source]

count how many points in the array p are included in the ellipsoid.

overloading of superclass function, which is slower (here we can use the ellipsoid functional form to speed up things)

Parameters

p – list of points (numpy array)

Returns

quantity of points located inside the ellipsoid

get_sphericity()[source]

compute ellipsoid sphericity.

Returns

ellipsoid sphericity

get_surface()[source]

compute ellipsoid surface.

Note: using analytical value, uses analytical approximation to surface area

Returns

surface in A^2

get_volume()[source]

compute ellipsoid volume.

Returns

volume in A^3

class Prism(r, h, n, skew=0.0, radius=1.1, pts_density_u=0.09817477042468103, pts_density_h=0.2)[source]

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as a prism (polygonal bottom and top, rectangular sides).

Parameters
  • r – distance of sides from center of symmetry

  • h – height

  • n – number of side faces

  • skew – skewing with respect of vertical axis

  • radius – size of the individual points composing it

  • pts_density_u – density of points along the r axis (polar coordinates)

  • pts_density_h – the density of points along the vertical axis

ccs(gas=1)[source]

compute prism CCS.

Returns

CCS in A^2

get_surface()[source]

compute prism surface.

Returns

surface in A^2

get_volume()[source]

compute prism volume.

Returns

volume in A^3

class Sphere(r, radius=1.9, n_sphere_point=960)[source]

Bases: biobox.classes.structure.Structure

Create an ensemble of points arranged as a sphere.

using golden spiral to approximate an even distribution

Parameters
  • r – radius of the ellipsoid

  • radius – size of the individual points composing it

  • n_sphere_point – This parameter defines the amount of points in the sphere

ccs(gas=1)[source]

compute spheroid CCS.

Uses analytical approximation to surface area.

Returns

surface in A^2

check_inclusion(p)[source]

count how many points in the array p are included in the sphere.

overloading of superclass function, which is slower (here we can use the ellipsoid functional form to speed up things)

Parameters

p – list of points (numpy array)

Returns

quantity of points located inside the sphere

get_sphericity()[source]

compute sphericity (makes sense only for squeezed spheres, obviously..)

Returns

shape sphericity

get_surface()[source]

compute sphere surface.

Returns

surface in A^2

get_volume()[source]

compute ellipsoid volume.

Returns

volume in A^3

squeeze(deformation, preserve_volume=True)[source]

squeeze sphere according to deformation coefficient(s)

Parameters
  • deformation – deformation coefficient. Can be a float (deformation of one axis), or a list of 2 or 3 floats.

  • preserve_volume – If true and deformation is either a float or a list of 2 floats, correct remaining axes to preserve volume.

Density

class Density[source]

Bases: biobox.classes.structure.Structure

Subclass of Structure, allows importing density map, and transform them in a PDB file containing a collection of spheres placed on the map’s high density regions.

A density map is fully described by the following attributes, stored in the self.properties dictionary:

Parameters
  • density – density map

  • delta – scaling factor for voxels (default is [1, 1, 1] Angstrom)

  • size – dimensions in voxels

  • origin – bottom-left-front corner of the cube

  • radius – radius of points composing the density map

  • format – format name (only dx supported at the moment)

best_threshold(mass, density=0.782878356)[source]

If mass and density of object are known, try to filter the map so that the mass is best matched.

search for best threshold using bissection method.

Note

in proteins, an average value of 1.3 g/cm^3 (0.782878356 Da/A^3) can be assumed. Alternatively, the relation density=1.410+0.145*exp(-mass(kDa)/13) can be used.

Note

1 Da/A^3=1.660538946 g/cm^3

Note

1 g/cm^3=0.602214120 Da/A^3

Parameters
  • mass – target mass in Da

  • density – target density in Da/A^3

Returns

array reporting tested values and error on mass ([sigma, model_mass-target_mass])

blur(dimension=5, sigma=0.5)[source]

blur density applying a cubic gaussian kernel of given kernel dimension (cubic grid size).

Warning

cannot be undone

Parameters
  • dimension – size of the kernel grid.

  • sigma – standard deviation of gaussian kernel.

export_as_pdb(outname, step, threshold=0.1)[source]

Write a pdb file with points where the density exceeds a threshold

Parameters
  • outname – output file name

  • step – stepsize used to generate the density map

  • threshold – density to be exceeded to generate a point in pdb

find_data_from_ccs(ccs)[source]

map experimental data to given ccs

Parameters

ccs – target CCS (in A^2)

find_data_from_sigma(sigma, exact=True, append=False, noise_filter=0.01)[source]

map experimental data to given threshold

Parameters
  • sigma – density threshold

  • noise_filter – launch DBSCAN clustering algorithm to detect connected regions in density map. Regions representing less than noise_filter of the total will be removed. This is a ratio, value should be between 0 and 1.

find_data_from_volume(vol)[source]

map experimental data to given volume

Parameters

vol – volume

get_oversampled_points(sigma=0)[source]

return points obtained by oversampling the map (doule points on every axis)

Parameters

sigma – place points only on voxels having intensity greater than threshold

Returns

points 3D points placed on voxels having value higher than threshold

Returns

radius radius of produced points

get_sigma_from_thresh(t)[source]

convert cutoff value from actual threshold to sigma multiple

:param threshold value :returns sigma multiple

get_thresh_from_sigma(val)[source]

convert cutoff value from sigma multiples into actual threshold

Parameters

val – sigma scaling

Returns

cutoff

get_volume()[source]

compute density map volume. This is done by counting the points, and multiplying that by voxels’ volume.

Warning

can be called only after place_points has been called.

Warning

supposes unskewed voxels.

import_map(filename, fileformat='dx')[source]

Import density map and fill up the points and properties data structures.

Parameters
  • filename – name of density file to load

  • fileformat – at the moment supports dx, ccp4, mrc and imod

import_numpy(data, origin=[0, 0, 0], delta=array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]))[source]

import a numpy 3D array to allow manipulation as a density map

Parameters
  • data – numpy 3D array

  • origin – coordinates of bottom left corner of the map

  • delta – voxels’ shape (default is a cubic voxel of 1 Angstrom-long sides).

place_points(sigma=0, noise_filter=0.01)[source]

given density information, place points on every voxel above a given threshold.

Parameters
  • sigma – intensity threshold value.

  • noise_filter – launch DBSCAN clustering algorithm to detect connected regions in density map. Regions representing less than noise_filter of the total will be removed. This is a ratio, value should be between 0 and 1.

return_density_map()[source]
Returns

density map as 3D numpy array

scan_threshold(mass, density=0.782878356, sampling_points=1000)[source]

if mass and density of object are known, filter the map on a linear scale of threshold values, and compare the obtained mass to the experimental one.

Note

in proteins, an average value of 1.3 g/cm^3 (0.782878356 Da/A^3) can be assumed. Alternatively, the relation density=1.410+0.145*exp(-mass(kDa)/13) can be used.

Note

1 Da/A^3=0.602214120 g/cm^3

Parameters
  • mass – target mass in Da

  • density – target density in Da/A^3

  • sampling_points – number of measures to perform between min and max intensity in density map

Returns

array reporting tested values and error on mass ([threshold, model_mass-target_mass])

threshold_vol_ccs(low='', high='', sampling_points=1000, append=False, noise_filter=0.01)[source]

return the volume to threshold to CCS relationship

Parameters

sampling_points – number of measures to perform between min and max intensity in density map

Returns

array reporting tested values and error on mass ([threshold, model_mass-target_mass])

write_dx(fname='dens.dx')[source]

Write a density map in DX format

Parameters

fname – output file name