Single Structures¶
The main data structure in BiobOx is the Structure class, which handles collections of 3D points.
Points are stored in a MxNx3 coordinates array, where M is the number of alternative points arrangements, and N is the amount of points.
At any moment, one of the loaded points conformations is considered to be the active one (a.k.a. current).
Any rototranslation or measuring operation will be performed on the current structure only.
Some methods, e.g. rmsd allow comparing different conformations, independently from which is the current one.
The current conformation in the coordinates array can be changed by calling the set_current method (altering the Structure’s current attribute).
For comfort, the current conformation is accessible in the points Nx3 array, where:
>>> self.points = self.coordinates[self.current]
Points attributes are stored in a pandas dataframe. For instance, self.data[“radius”] contains each point’s radius. A property of the whole point cloud can be stored in a properties dictionary (e.g. self.properties[“center”]).
Several Structure subclasses are available (Molecule, Ellipsoid, Cylinder, Cone, Sphere, Prism, Density, see below).
Structure¶
- class Structure(p=array([], shape=(2, 0), dtype=float64), r=1.0)[source]¶
- A Structure consists of an ensemble of points in 3D space, and metadata associated to each of them. - Point coordinates and properties data structures are first initialized. properties is a dictionary initially containing an entry for ‘center’ (center of geometry) and ‘radius’ (average radius of points). - Parameters
- p – coordinates data structure as a mxnx3 numpy array (alternative conformation x atom x 3D coordinate). nx3 numpy array can be supplied, in case a single conformation is present. 
- r – average radius of every point in dataset (float), or radius of every point (numpy array) 
 
 - add_xyz(coords)[source]¶
- add a new alternative conformation to the database - Parameters
- coords – array of 3D points, or array of arrays of 3D points (in case multiple alternative coordinates must be added at the same time) 
 
 - align_axes()[source]¶
- Align structure on its principal axes. - First principal axis aligned along x, second along y and third along z. 
 - apply_transformation(M)[source]¶
- apply a 3x3 transformation matrix - Parameters
- M – 3x3 transformation matrix (2D numpy array) 
 
 - convex_hull()[source]¶
- compute Structure’s convex Hull using QuickHull algorithm. - Note - Qhull available only on scipy >=0.12 - Returns
- Structureobject, containing the coordinates of vertices composing the convex hull
 
 - coordinates¶
- numpy array containing an ensemble of alternative coordinates in 3D space 
 - current¶
- index of currently selected conformation 
 - data¶
- metadata about each atom (pandas Dataframe) 
 - delete_xyz(index)[source]¶
- remove one conformation from the conformations database. - the new current conformation will be the previous one. - Parameters
- index – alternative coordinates set to remove 
 
 - get_center()[source]¶
- compute protein center of geometry (also assigns it to self.properties[“center”] key). 
 - get_density(step=1.0, sigma=1.0, kernel_half_width=5, buff=3)[source]¶
- generate density map from points - Parameters
- step – size of cubic voxels, in Angstrom 
- sigma – gaussian kernel sigma 
- kernel_half_width – kernel half width, in voxels 
- buff – padding to add at points cloud boundaries 
 
- Returns
- Densityobject, containing a simulated density map
 
 - get_principal_axes()[source]¶
- compute Structure’s principal axes. - Returns
- 3x3 numpy array, containing the 3 principal axes ranked from smallest to biggest. 
 
 - get_xyz(indices=[])[source]¶
- get points coordinates. - Parameters
- indices – indices of points to select. If none is provided, all points coordinates are returned. 
- Returns
- coordinates of all points indexed by the provided indices list, or all of them if no list is provided. 
 
 - pca(components, indices=[])[source]¶
- compute Principal Components Analysis (PCA) on specific points within all the alternative coordinates. - Parameters
- components – eigenspace dimensions 
- indices – points indices to be considered for PCA 
 
- Returns
- numpy array of projection of each conformation into the n-dimensional eigenspace 
- Returns
- sklearn PCA object 
 
 - points¶
- pointer to currently selected conformation 
 - properties¶
- collection of properties. By default, ‘center’ (geometric center of the Structure) is defined 
 - rmsd(i, j, points_index=[], full=False)[source]¶
- Calculate the RMSD between two structures in alternative coordinates ensemble. uses Kabsch alignement algorithm. - Parameters
- i – index of the first structure 
- j – index of the second structure 
- points_index – if set, only specific points will be considered for comparison 
- full – if True, RMSD an rotation matrx are returned, RMSD only otherwise 
 
- Returns
- RMSD of the two structures. If full is True, the rotation matrix is also returned 
 
 - rmsd_distance_matrix(points_index=[], flat=False)[source]¶
- compute distance matrix between structures (using RMSD as metric). - Parameters
- points_index – if set, only specific points will be considered for comparison 
- flat – if True, returns flattened distance matrix 
 
- Returns
- RMSD distance matrix 
 
 - rmsd_one_vs_all(ref_index, points_index=[], align=False)[source]¶
- Calculate the RMSD between all structures with respect of a reference structure. uses Kabsch alignement algorithm. - Parameters
- ref_index – index of reference structure in conformations database 
- points_index – if set, only specific points will be considered for comparison 
- align – if set to true, all conformations will be aligned to reference (note: cannot be undone!) 
 
- Returns
- RMSD of all structures with respect of reference structure (in a numpy array) 
 
 - rmsf(indices=- 1, step=1)[source]¶
- compute Root Mean Square Fluctuation (RMSF) of selected atoms. - Parameters
- indices – indices of points for which RMSF will be calculated. If no indices list is provided, RMSF of all points will be calculated. 
- step – timestep between two conformations (useful when using conformations extracted from molecular dynamics) 
 
- Returns
- numpy aray with RMSF of all provided indices, in the same order 
 
 - rotate(x, y, z)[source]¶
- rotate the structure provided angles of rotation around x, y and z axes (in degrees). - This is a rotation with respect of the origin. Make sure that the center of your structure is at the origin, if you don’t want to get a translation as well! rotating an object being not centered requires to first translate the ellipsoid at the origin, rotate it, and bringing it back. - Parameters
- x – rotation around x axis 
- y – rotation around y axis 
- z – rotation around z axis 
 
 
 - rotation_matrix(axis, theta)[source]¶
- compute matrix needed to rotate the system around an arbitrary axis (using Euler-Rodrigues formula). - Parameters
- axis – 3d vector (numpy array), representing the axis around which to rotate 
- theta – desired rotation angle 
 
- Returns
- 3x3 rotation matrix 
 
 - set_current(pos)[source]¶
- select current frame (place frame pointer at desired position) - Parameters
- pos – number of alternative conformation (starting from 0) 
 
 
Molecule¶
- class Molecule[source]¶
- Bases: - biobox.classes.structure.Structure- Subclass of - Structure, allows reading, manipulating and analyzing molecular structures.- At instantiation, properties associated to every individual atoms are stored in a pandas Dataframe self.data. The columns of the self.data have the following names: atom, index, name, resname, chain, resid, beta, occupancy, atomtype, radius, charge. - self.knowledge contains a knowledge base about atoms and residues properties. Default values are: - ‘residue_mass’ property stores the average mass for most common aminoacids (values from Expasy website) 
- ‘atom_vdw’ vdw radius of common atoms 
- ‘atom_mass’ mass of common atoms 
 - The knowledge base can be edited. For instance, to add information about residue “TST” mass in molecule M type: M.knowledge[‘mass_residue’][“TST”]=142.42 - apply_biomatrix()[source]¶
- if biomatrix information is provided, generate a new molecule with all symmetry operators applied. - Returns
- new Molecule containing several copies of the current Molecule, arranged according to BIOMT statements contained in pdb, or -1 if no transformation matrix is provided 
 
 - apply_symmetry()[source]¶
- if symmetry information is provided, generate a new molecule with all symmetry operators applied. - Returns
- new Molecule containing several copies of the current Molecule, arranged according to SMTRY statements contained in pdb, or -1 if no transformation matrix is provided 
 
 - atomignore(chain, res, atom, get_index=False, use_resname=False)[source]¶
- Select specific atoms that do not match a specific query (chain, residue ID and atom name). Useful to remove from a molecule atoms unwanted for further analysis, alternative conformations, etc… - Parameters
- chain – chain name (accepts * as wildcard). Can also be a list or numpy array of strings. 
- res – residue ID (accepts * as wildcard). Can also be a list or numpy array of of int. 
- atom – atom name (accepts * as wildcard). Can also be a list or numpy array of strings. 
- get_index – if set to True, returns the indices of atoms in self.points array (and self.data) 
- use_resname – if set to True, consider information in “res” variable as resnames, and not resids 
 
- Returns
- coordinates of the selected points not matching the query, if get_index is set to true, their indices in self.points array. 
 
 - atomselect(chain, res, atom, get_index=False, use_resname=False)[source]¶
- Select specific atoms in the protein providing chain, residue ID and atom name. - Parameters
- chain – selection of a specific chain name (accepts * as wildcard). Can also be a list or numpy array of strings. 
- res – residue ID of desired atoms (accepts * as wildcard). Can also be a list or numpy array of of int. 
- atom – name of desired atom (accepts * as wildcard). Can also be a list or numpy array of strings. 
- get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data) 
- use_resname – if set to True, consider information in “res” variable as resnames, and not resids 
 
- Returns
- coordinates of the selected points and, if get_index is set to true, their indices in self.points array. 
 
 - beta_factor_from_rmsf(indices=- 1, step=1)[source]¶
- estimate atoms beta factor on the base of their RMSF. - Parameters
- indices – indices of atoms of interest. If not set all atoms will be considered. 
- step – timestep between two conformations (useful when using conformations extracted from molecular dynamics) 
 
 
 - get_atoms_ccs()[source]¶
- return array with atomic CCS radii of every atom in molecule - Returns
- CCS in Angstrom^2 
 
 - get_couples(idx, cutoff)[source]¶
- given a list of indices, compute the all-vs-all distance and return only couples below a given cutoff distance - useful for the detection of disulfide bridges or linkable sites via cross-linking (approximation, supposing euclidean distances)’ - Parameters
- idx – indices of atoms to check. 
- cutoff – minimal distance to consider a couple as linkable. 
 
- Returns
- nx3 numpy array containing, for every valid connection, id of first atom, id of second atom and distance between the two. 
 
 - get_data(indices=[], columns=[])[source]¶
- Return information about atoms of interest (i.e., slice the data DataFrame) - Parameters
- indices – list of indices, if not provided all atom data is returned 
- columns – list of columns (e.g. [“resname”, “resid”, “chain”]), if not provided all columns are returned 
 
- Returns
- numpy array containing a slice of molecule’s data 
 
 - get_dipole_density(dipole_map, orig, min_val, V, outname, vox_in_window=3.0, eqn='gauss', T=310.15, P=101000.0, epsilonE=54.0, resolution=1.0)[source]¶
- Method to generate an electron density map based on a voxel grid of dipole vectors - Parameters
- dipole_map – The dipole map input. Can be generated with get_dipole_map above 
- orig – Origin points for voxel grid 
- min_val – Minimum coordinates of edge points for the voxel grid (i.e. a single x, y, z point defining the start point of the grid to match with the multipdb) 
- V – Volume of a voxel (can be found by resolution**3, but left blank in case later version institute a sphere) 
- outname – Name of electron density map file produced 
- vox_in_window – Amount of surrounding space to contribute to local dipole. vox_in_window * resolution gives window size (in Ang.) 
- eqn – Equation mode to model the electron density 
- T – Temperature of MD 
- P – Pressure of MD 
- epsilonE – Continuum dielectric surrounding the protein 
- resolution – Desired resolution of voxel 
 
 
 - get_dipole_map(orig, pqr, time_start=0, time_end=2, resolution=1.0, vox_in_window=3.0, write_dipole_map=True, fname='dipole_map.tcl')[source]¶
- Method for generating dipole maps to be used for electron density map generation. Also prints a dipole map as a result (and if desired). It calls a cython code in lib. - Parameters
- orig – Origin points for voxel grid 
- pqr – PQR file for self. Can be generated by calling pdb2pqr above 
- time_start – First frame to parse in multipdb 
- time_end – Last frame to parse in multipdb 
- resolution – Desired resolution of voxel 
- vox_in_window – Amount of surrounding space to contribute to local dipole. vox_in_window * resolution gives window size (in Ang.) 
- write_dipole_map – Write a dipole map in TCL format to be read in via VMD. 
- fname – Name of desired dipole map to be written 
 
 
 - get_electrostatics(step=1.0, buff=3, threshold=0.01, vdw_kernel_half_width=5, elect_kernel_half_width=12, chain='*', clear_mass=True)[source]¶
- generate electrostatics map from points - Parameters
- step – size of cubic voxels, in Angstrom 
- buff – padding to add at points cloud boundaries 
- threshold – Threshold used for removing mass occupied space from the electron density map 
- vdw_kernel_half_width – kernel half width, in voxels 
- elect_kernel_half_width – kernel half width, in voxels 
- chain – select chain to use, default all chains 
 
- Returns
- positive - Densityobject
- Returns
- negative - Densityobject
- Returns
- mass density object 
 
 - get_mass_by_atom(skip_resname=[])[source]¶
- compute protein mass using atoms in pdb - sum the mass of all atoms (using a knowledge base of atom masses in Dalton) The knowledge base can be expanded or edited by adding or editing entries to the molecule’s mass dictionary, e.g. to add the atom “PI” mass in molecule M type: M.knowledge[‘atom_mass’][“PI”]=3.141592 - Parameters
- skip_resname – list of resnames to skip. Useful to exclude ions water or other ligands from the calculation. 
- Returns
- mass of molecule in Dalton 
 
 - get_mass_by_residue(skip_resname=[])[source]¶
- Compute protein mass using residues (i.e. account also for atoms not present in the structure) - Sum the average mass all every residue (using a knowledge base of atom masses in Dalton) The knowledge base can be expanded or edited by adding entries to the molecule’s mass dictionary, e.g. to add the residue “TST” mass in molecule M type: M.knowledge[‘mass_residue’][“TST”]=142.42 - Parameters
- skip_resname – list of resnames to skip. Useful to exclude ions water or other ligands from the calculation. 
- Returns
- mass of molecule in Dalton 
 
 - get_pdb_data(index=[])[source]¶
- aggregate data and point coordinates, and return in a unique data structure - Returned data is a list containing strings for points data and floats for point coordinates in the same order as a pdb file, i.e. ATOM/HETATM, index, name, resname, chain name, residue ID, x, y, z, occupancy, beta factor, atomtype. - Returns
- list aggregated data and coordinates for every point, as string. 
 
 - get_secondary_structure(dssp_path='')[source]¶
- compute the protein’s secondary structure, calling DSSP :param dssp_path: DSSP executable (path and filename). If not provided, the default behaviour is to seek for this information in the environment variable DSSPPATH :returns: numpy array of characters, with one-letter-coded secondary sctructure according to DSSP. 
 - get_subset(idxs, conformations=[])[source]¶
- Return a - Moleculeobject containing only the selected atoms and frames- Parameters
- ixds – atoms to extract 
- conformations – frames to extract (by default, all) 
 
- Returns
- Moleculeobject
 
 - get_vdw_density(buff=3, step=0.5, kernel_half_width=10)[source]¶
- generate density map from points based on the van der Waals readius of the atoms - Parameters
- buff – Buffer used to create the boundaries of the density map 
- step – Stepsize for creating the density object 
- kernel_half_width – Kernel half width of the gaussian kernel, will be scaled by atom specific sigma 
 
- Returns
- Densityobject, containing a density map
 
 - guess_chain_split(distance=3, use_backbone=True)[source]¶
- reassign chain name, using distance cutoff (cannot be undone). If two consecutive atoms are beyond a cutoff, a new chain is assigned. - Parameters
- distance – distance cutoff distanceR: no atomtype found! 
- use_backbone – if True, splitting will be performed considering backbone atoms (N and C), all atoms in a sequence otherwise 
 
 
 - import_gro(filename)[source]¶
- read a gro possibly containing multiple structures. - Parameters
- filename – name of .gro file to import 
 
 - import_pdb(pdb, include_hetatm=False)[source]¶
- read a pdb (possibly containing containing multiple models). - Models are split according to ENDMDL and END statement. All alternative coordinates are expected to have the same atoms. After loading, the first model (M.current_model=0) will be set as active. - Parameters
- pdb – PDB filename 
- include_hetatm – if True, HETATM will be included (they get skipped if False) 
 
 
 - import_pqr(pqr, include_hetatm=False)[source]¶
- Read a pqr (possibly containing containing multiple models). - models are split according to ENDMDL and END statement. All alternative coordinates are expected to have the same atoms. After loading, the first model (M.current_model=0) will be set as active. - Parameters
- pqr – PQR filename 
- include_hetatm – if True, HETATM will be included (they get skipped if False) 
 
 
 - know(prop)[source]¶
- return information from knowledge base - Parameters
- prop – desired property to extract from knowledge base 
- Returns
- value associated to requested property, or nan if failed 
 
 - match_residue(M2, sec=5)[source]¶
- Compares two bb.Molecule() peptide strands and returns the resids within both peptides when the two are homogenous beyond a certain secondary structure threashold. The default is 5 amino acids (given by sec) in a row must be identical - Useful when aligning PDB structures that have been crystallised separately - so one may be missing the odd residue or have a few extra at the end. - Parameters
- M2 – The second bb.Molecule() to compare with 
- sec – Number of consecutive amino acids in a row that must match before resid’s are recorded 
 
 
 - pdb2pqr(ff='')[source]¶
- Parses data from the pdb input into a pqr format. This uses the panda dataframe with the information regarding atom indexes, types etc. in the self.data files. It outputs a panda dataframe with the pqr equivilent information. It requires a datafile forcefield input. The default is the amber14sb forcefield file held within the classes/ folder. - Parameters
- ff – name of forcefield text file input that needs to be read to read charges / vdw radii. 
 
 - query(query_text, get_index=False)[source]¶
- Select specific atoms in a multimer un the basis of a text query. - Parameters
- query_text – string selecting atoms of interest. Uses the pandas query syntax, can access all columns in the dataframe self.data. 
- get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data) 
 
- Returns
- coordinates of the selected points (in a unique array) and, if get_index is set to true, a list of their indices in subunits’ self.points array. 
 
 - rmsf_from_beta_factor(indices=[])[source]¶
- calculate RMSF from atoms beta factors. - Parameters
- indices – indices of atoms of interest. If not set all atoms will be considered. 
 
 - s2(atomname1='N', atomname2='H')[source]¶
- compute s2, given two atoms defining the vector of interest. - Parameters
- atomname1 – name of the first atom 
- atomname2 – name of the second atom 
 
- Returns
- data numpy array containing information about residues for which measuring has been performed (i.e.[chain, resid]) 
- Returns
- s2 s2 of residues for which both provided input atoms have been found 
 
 - same_residue(index, get_index=False)[source]¶
- Select atoms having the same residue and chain as a given atom (or list of atoms) - Parameters
- indices (index) – of atoms of choice (integer of list of integers) 
- get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data) 
 
- Returns
- coordinates of the selected points and, if get_index is set to true, their indices in self.points array. 
 
 - same_residue_unique(index, get_index=False)[source]¶
- Select atoms having the same residue and chain as a given atom (or list of atoms) - Parameters
- index – indices of atoms of choice (integer of list of integers) 
- get_index – if set to True, returns the indices of selected atoms in self.points array (and self.data) 
 
- Returns
- coordinates of the selected points and, if get_index is set to true, their indices in self.points array. 
 
 - set_data(value, indices=[], columns=[])[source]¶
- Return information about atoms of interest (i.e., slice the data DataFrame) - Parameters
- indices – list of indices, if not provided all atom data is returned 
- columns – list of columns (e.g. [“resname”, “resid”, “chain”]), if not provided all columns are returned 
 
- Returns
- numpy array containing a slice of molecule’s data 
 
 - write_gro(outname, conformations=[], index='')[source]¶
- write structure(s) in .gro format. - Parameters
- outname – name of .gro file to be generated. 
- index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1] 
- conformations – list of conformation indices to write to file. By default, all conformations will be returned. 
 
 
 - write_pdb(outname, conformations=[], index=[])[source]¶
- overload superclass method for writing (multi)pdb. - Parameters
- outname – name of pdb file to be generated. 
- index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1] 
- conformations – list of conformation indices to write to file. By default, a multipdb with all conformations will be produced. 
 
 
 - write_pqr(outname, conformations=[], index=[])[source]¶
- overload superclass method for writing (multi)pqr. - Parameters
- outname – name of pqr file to be generated. 
- index – indices of atoms to write to file. If empty, all atoms are returned. Index values obtaineable with a call like: index=molecule.atomselect(“A”, [1, 2, 3], “CA”, True)[1] 
- conformations – list of conformation indices to write to file. By default, a multipdb with all conformations will be produced. 
 
 
 
Convex Point Clouds¶
The following classes allow generating clouds of points arranged according to specific (convex) geometries.
All these classes are subclass of Structure.
- class Cone(r, h, skew=0, radius=1.1, pts_density_r=0.09817477042468103, pts_density_h=0.2)[source]¶
- Bases: - biobox.classes.structure.Structure- Create an ensemble of points arranged as a cone. - Parameters
- r – radius 
- h – height 
- skew – skewing with respect of vertical axis 
- radius – size of the individual points composing it 
- pts_density_r – density of points along the rotation axis (using parametric function for cylinder) 
- pts_density_h – density of points along height (using parametric function for cylinder) 
 
 
- class Cylinder(r, h, squeeze=1.0, skew=0.0, radius=1.1, pts_density_u=0.09817477042468103, pts_density_h=0.2)[source]¶
- Bases: - biobox.classes.structure.Structure- Create an ensemble of points arranged as an elliptical cylinder. - Parameters
- r – radius 
- h – height 
- squeeze – create an elliptical base, having axes equal to r and squeeze*r 
- skew – skewing with respect of vertical axis 
- radius – size of the individual points composing it 
- density (pts_density_h) – of points along the u angle (using parametric function for cylinder) 
- density – of points along the v angle (using parametric function for cylinder) 
 
 - ccs(gas=1)[source]¶
- Compute cylinder CCS. - Uses Ramanujan approximation for base perimeter. Good, but not perfect for very elliptical cylinders! - Returns
- CCS in A^2 
 
 
- class Ellipsoid(a, b, c, radius=1.9, pts_density_u=0.08726646259971647, pts_density_v=0.08726646259971647)[source]¶
- Bases: - biobox.classes.structure.Structure- Create an ensemble of points arranged as an ellipsoid. - Parameters
- a – x radius of the ellipsoid 
- b – y radius of the ellipsoid 
- c – z radius of the ellipsoid 
- radius – size of the individual points composing it 
- pts_density_u – This parameter defines the density of points along the u angle (using parametric function for ellipsoid) 
- pts_density_v – This parameter defines the density of points along the v angle (using parametric function for ellipsoid) 
 
 - ccs(gas=1)[source]¶
- compute ellipsoid CCS. - Uses analytical approximation to surface area. - Returns
- surface in A^2 
 
 - check_inclusion(p)[source]¶
- count how many points in the array p are included in the ellipsoid. - overloading of superclass function, which is slower (here we can use the ellipsoid functional form to speed up things) - Parameters
- p – list of points (numpy array) 
- Returns
- quantity of points located inside the ellipsoid 
 
 
- class Prism(r, h, n, skew=0.0, radius=1.1, pts_density_u=0.09817477042468103, pts_density_h=0.2)[source]¶
- Bases: - biobox.classes.structure.Structure- Create an ensemble of points arranged as a prism (polygonal bottom and top, rectangular sides). - Parameters
- r – distance of sides from center of symmetry 
- h – height 
- n – number of side faces 
- skew – skewing with respect of vertical axis 
- radius – size of the individual points composing it 
- pts_density_u – density of points along the r axis (polar coordinates) 
- pts_density_h – the density of points along the vertical axis 
 
 
- class Sphere(r, radius=1.9, n_sphere_point=960)[source]¶
- Bases: - biobox.classes.structure.Structure- Create an ensemble of points arranged as a sphere. - using golden spiral to approximate an even distribution - Parameters
- r – radius of the ellipsoid 
- radius – size of the individual points composing it 
- n_sphere_point – This parameter defines the amount of points in the sphere 
 
 - ccs(gas=1)[source]¶
- compute spheroid CCS. - Uses analytical approximation to surface area. - Returns
- surface in A^2 
 
 - check_inclusion(p)[source]¶
- count how many points in the array p are included in the sphere. - overloading of superclass function, which is slower (here we can use the ellipsoid functional form to speed up things) - Parameters
- p – list of points (numpy array) 
- Returns
- quantity of points located inside the sphere 
 
 - get_sphericity()[source]¶
- compute sphericity (makes sense only for squeezed spheres, obviously..) - Returns
- shape sphericity 
 
 - squeeze(deformation, preserve_volume=True)[source]¶
- squeeze sphere according to deformation coefficient(s) - Parameters
- deformation – deformation coefficient. Can be a float (deformation of one axis), or a list of 2 or 3 floats. 
- preserve_volume – If true and deformation is either a float or a list of 2 floats, correct remaining axes to preserve volume. 
 
 
 
Density¶
- class Density[source]¶
- Bases: - biobox.classes.structure.Structure- Subclass of - Structure, allows importing density map, and transform them in a PDB file containing a collection of spheres placed on the map’s high density regions.- A density map is fully described by the following attributes, stored in the self.properties dictionary: - Parameters
- density – density map 
- delta – scaling factor for voxels (default is [1, 1, 1] Angstrom) 
- size – dimensions in voxels 
- origin – bottom-left-front corner of the cube 
- radius – radius of points composing the density map 
- format – format name (only dx supported at the moment) 
 
 - best_threshold(mass, density=0.782878356)[source]¶
- If mass and density of object are known, try to filter the map so that the mass is best matched. - search for best threshold using bissection method. - Note - in proteins, an average value of 1.3 g/cm^3 (0.782878356 Da/A^3) can be assumed. Alternatively, the relation density=1.410+0.145*exp(-mass(kDa)/13) can be used. - Note - 1 Da/A^3=1.660538946 g/cm^3 - Note - 1 g/cm^3=0.602214120 Da/A^3 - Parameters
- mass – target mass in Da 
- density – target density in Da/A^3 
 
- Returns
- array reporting tested values and error on mass ([sigma, model_mass-target_mass]) 
 
 - blur(dimension=5, sigma=0.5)[source]¶
- blur density applying a cubic gaussian kernel of given kernel dimension (cubic grid size). - Warning - cannot be undone - Parameters
- dimension – size of the kernel grid. 
- sigma – standard deviation of gaussian kernel. 
 
 
 - export_as_pdb(outname, step, threshold=0.1)[source]¶
- Write a pdb file with points where the density exceeds a threshold - Parameters
- outname – output file name 
- step – stepsize used to generate the density map 
- threshold – density to be exceeded to generate a point in pdb 
 
 
 - find_data_from_ccs(ccs)[source]¶
- map experimental data to given ccs - Parameters
- ccs – target CCS (in A^2) 
 
 - find_data_from_sigma(sigma, exact=True, append=False, noise_filter=0.01)[source]¶
- map experimental data to given threshold - Parameters
- sigma – density threshold 
- noise_filter – launch DBSCAN clustering algorithm to detect connected regions in density map. Regions representing less than noise_filter of the total will be removed. This is a ratio, value should be between 0 and 1. 
 
 
 - get_oversampled_points(sigma=0)[source]¶
- return points obtained by oversampling the map (doule points on every axis) - Parameters
- sigma – place points only on voxels having intensity greater than threshold 
- Returns
- points 3D points placed on voxels having value higher than threshold 
- Returns
- radius radius of produced points 
 
 - get_sigma_from_thresh(t)[source]¶
- convert cutoff value from actual threshold to sigma multiple - :param threshold value :returns sigma multiple 
 - get_thresh_from_sigma(val)[source]¶
- convert cutoff value from sigma multiples into actual threshold - Parameters
- val – sigma scaling 
- Returns
- cutoff 
 
 - get_volume()[source]¶
- compute density map volume. This is done by counting the points, and multiplying that by voxels’ volume. - Warning - can be called only after - place_pointshas been called.- Warning - supposes unskewed voxels. 
 - import_map(filename, fileformat='dx')[source]¶
- Import density map and fill up the points and properties data structures. - Parameters
- filename – name of density file to load 
- fileformat – at the moment supports dx, ccp4, mrc and imod 
 
 
 - import_numpy(data, origin=[0, 0, 0], delta=array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]))[source]¶
- import a numpy 3D array to allow manipulation as a density map - Parameters
- data – numpy 3D array 
- origin – coordinates of bottom left corner of the map 
- delta – voxels’ shape (default is a cubic voxel of 1 Angstrom-long sides). 
 
 
 - place_points(sigma=0, noise_filter=0.01)[source]¶
- given density information, place points on every voxel above a given threshold. - Parameters
- sigma – intensity threshold value. 
- noise_filter – launch DBSCAN clustering algorithm to detect connected regions in density map. Regions representing less than noise_filter of the total will be removed. This is a ratio, value should be between 0 and 1. 
 
 
 - scan_threshold(mass, density=0.782878356, sampling_points=1000)[source]¶
- if mass and density of object are known, filter the map on a linear scale of threshold values, and compare the obtained mass to the experimental one. - Note - in proteins, an average value of 1.3 g/cm^3 (0.782878356 Da/A^3) can be assumed. Alternatively, the relation density=1.410+0.145*exp(-mass(kDa)/13) can be used. - Note - 1 Da/A^3=0.602214120 g/cm^3 - Parameters
- mass – target mass in Da 
- density – target density in Da/A^3 
- sampling_points – number of measures to perform between min and max intensity in density map 
 
- Returns
- array reporting tested values and error on mass ([threshold, model_mass-target_mass]) 
 
 - threshold_vol_ccs(low='', high='', sampling_points=1000, append=False, noise_filter=0.01)[source]¶
- return the volume to threshold to CCS relationship - Parameters
- sampling_points – number of measures to perform between min and max intensity in density map 
- Returns
- array reporting tested values and error on mass ([threshold, model_mass-target_mass])