Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
IDFilter

Filters protein identification engine results by different criteria.

potential predecessor tools $ \longrightarrow $ IDFilter $ \longrightarrow $ potential successor tools
MascotAdapter (or other ID engines) PeptideIndexer
IDFileConverter ProteinInference
FalseDiscoveryRate IDMapper
ConsensusID

This tool is used to filter the identifications found by a peptide/protein identification tool like Mascot. Different filters can be applied:

To enable any of the filters, just change their default value. All active filters will be applied in order.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
Version: 2.0.0 Aug 25 2015, 00:02:58, Revision: GIT-NOTFOUND

Usage:
  IDFilter <options>

Options (mandatory options marked with '*'):
  -in <file>*                                 Input file  (valid formats: 'idXML')
  -out <file>*                                Output file  (valid formats: 'idXML')

Filtering by precursor RT or m/z:
  -precursor:rt [min]:[max]                   Retention time range to extract. (default: ':')
  -precursor:mz [min]:[max]                   Mass-to-charge range to extract. (default: ':')
  -precursor:allow_missing                    When filtering by precursor RT or m/z, keep peptide IDs with 
                                              missing precursor information ('RT'/'MZ' meta values)?

Filtering by peptide/protein score. To enable any of the filters below, just change their default value. All 
active filters will be applied in order.:
  -score:pep <score>                          The score which should be reached by a peptide hit to be kept. 
                                              The score is dependent on the most recent(!) preprocessing -
                                              it could be Mascot scores (if a MascotAdapter was applied befor
                                              e), or an FDR (if FalseDiscoveryRate was applied before), etc.
                                              (default: '0')
  -score:prot <score>                         The score which should be reached by a protein hit to be kept. 
                                              Use in combination with 'delete_unreferenced_peptide_hits' to
                                              remove affected peptides. (default: '0')

Filtering by significance threshold:
  -thresh:pep <fraction>                      Keep a peptide hit only if its score is above this fraction of 
                                              the peptide significance threshold. (default: '0')
  -thresh:prot <fraction>                     Keep a protein hit only if its score is above this fraction of 
                                              the protein significance threshold. Use in combination with
                                              'delete_unreferenced_peptide_hits' to remove affected peptides.
                                              (default: '0')

Filtering by whitelisting (only instances also present in a whitelist file can pass):
  -whitelist:proteins <file>                  Filename of a FASTA file containing protein sequences.
                                              All peptides that are not a substring of a sequence in this fi
                                              le are removed
                                              All proteins whose accession is not present in this file are r
                                              emoved. (valid formats: 'fasta')
  -whitelist:by_seq_only                      Match peptides with FASTA file by sequence instead of accession
                                              and disable protein filtering.
  -whitelist:protein_accessions <accessions>  All peptides that are not referencing at least one of the provi
                                              ded protein accession are removed.
                                              Only proteins of the provided list are retained.

Filtering by blacklisting (only instances not present in a blacklist file can pass):
  -blacklist:peptides <file>                  Peptides having the same sequence and modification assignment 
                                              as any peptide in this file will be filtered out. Use with blac
                                              klist:ignore_modification flag to only compare by sequence.
                                              (valid formats: 'idXML')
  -blacklist:ignore_modifications             Compare blacklisted peptides by sequence only.
                                              

Filtering by RT predicted by 'RTPredict':
  -rt:p_value <float>                         Retention time filtering by the p-value predicted by RTPredict.
                                              (default: '0' min: '0' max: '1')
  -rt:p_value_1st_dim <float>                 Retention time filtering by the p-value predicted by RTPredict 
                                              for first dimension. (default: '0' min: '0' max: '1')

Filtering by mz:
  -mz:error <float>                           Filtering by deviation to theoretical mass (disabled for negati
                                              ve values). (default: '-1')
  -mz:unit <String>                           Absolute or relative error. (default: 'ppm' valid: 'Da', 'ppm')

Filtering best hits per spectrum (for peptides) or from proteins:
  -best:n_peptide_hits <integer>              Keep only the 'n' highest scoring peptide hits per spectrum 
                                              (for n>0). (default: '0' min: '0')
  -best:n_protein_hits <integer>              Keep only the 'n' highest scoring protein hits (for n>0). (defa
                                              ult: '0' min: '0')
  -best:strict                                Keep only the highest scoring peptide hit.
                                              Similar to n_peptide_hits=1, but if there are two or more high
                                              est scoring hits, none are kept.

  -min_length <integer>                       Keep only peptide hits with a length greater or equal this valu
                                              e. Value 0 will have no filter effect. (default: '0' min: '0')
  -max_length <integer>                       Keep only peptide hits with a length less or equal this value. 
                                              Value 0 will have no filter effect. Value is overridden by min_
                                              length, i.e. if max_length < min_length, max_length will be
                                              ignored. (default: '0' min: '0')
  -min_charge <integer>                       Keep only peptide hits for tandem spectra with charge greater 
                                              or equal this value. (default: '1' min: '1')
  -var_mods                                   Keep only peptide hits with variable modifications (fixed modif
                                              ications from SearchParameters will be ignored).
  -unique                                     If a peptide hit occurs more than once per PSM, only one instan
                                              ce is kept.
  -unique_per_protein                         Only peptides matching exactly one protein are kept. Remember 
                                              that isoforms count as different proteins!
  -keep_unreferenced_protein_hits             Proteins not referenced by a peptide are retained in the ids.
  -remove_decoys                              Remove proteins according to the information in the user parame
                                              ters. Usually used in combination with 'delete_unreferenced_pep
                                              tide_hits'.
  -delete_unreferenced_peptide_hits           Peptides not referenced by any protein are deleted in the ids. 
                                              Usually used in combination with 'score:prot' or 'thresh:prot'.
                                              
Common TOPP options:
  -ini <file>                                 Use the given TOPP INI file
  -threads <n>                                Sets the number of threads allowed to be used by the TOPP tool 
                                              (default: '1')
  -write_ini <file>                           Writes the default configuration file
  --help                                      Shows options
  --helphelp                                  Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+IDFilterFilters results from protein or peptide identification engines based on different criteria.
version2.0.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'IDFilter'
in input file input file*.idXML
out output file output file*.idXML
min_length0 Keep only peptide hits with a length greater or equal this value. Value 0 will have no filter effect.0:∞
max_length0 Keep only peptide hits with a length less or equal this value. Value 0 will have no filter effect. Value is overridden by min_length, i.e. if max_length < min_length, max_length will be ignored.0:∞
min_charge1 Keep only peptide hits for tandem spectra with charge greater or equal this value.1:∞
var_modsfalse Keep only peptide hits with variable modifications (fixed modifications from SearchParameters will be ignored).true,false
uniquefalse If a peptide hit occurs more than once per PSM, only one instance is kept.true,false
unique_per_proteinfalse Only peptides matching exactly one protein are kept. Remember that isoforms count as different proteins!true,false
keep_unreferenced_protein_hitsfalse Proteins not referenced by a peptide are retained in the ids.true,false
remove_decoysfalse Remove proteins according to the information in the user parameters. Usually used in combination with 'delete_unreferenced_peptide_hits'.true,false
delete_unreferenced_peptide_hitsfalse Peptides not referenced by any protein are deleted in the ids. Usually used in combination with 'score:prot' or 'thresh:prot'.true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false
+++precursorFiltering by precursor RT or m/z
rt: Retention time range to extract.
mz: Mass-to-charge range to extract.
allow_missingfalse When filtering by precursor RT or m/z, keep peptide IDs with missing precursor information ('RT'/'MZ' meta values)?true,false
+++scoreFiltering by peptide/protein score. To enable any of the filters below, just change their default value. All active filters will be applied in order.
pep0 The score which should be reached by a peptide hit to be kept. The score is dependent on the most recent(!) preprocessing - it could be Mascot scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscoveryRate was applied before), etc.
prot0 The score which should be reached by a protein hit to be kept. Use in combination with 'delete_unreferenced_peptide_hits' to remove affected peptides.
+++threshFiltering by significance threshold
pep0 Keep a peptide hit only if its score is above this fraction of the peptide significance threshold.
prot0 Keep a protein hit only if its score is above this fraction of the protein significance threshold. Use in combination with 'delete_unreferenced_peptide_hits' to remove affected peptides.
+++whitelistFiltering by whitelisting (only instances also present in a whitelist file can pass)
proteins filename of a FASTA file containing protein sequences.
All peptides that are not a substring of a sequence in this file are removed
All proteins whose accession is not present in this file are removed.
input file*.fasta
by_seq_onlyfalse Match peptides with FASTA file by sequence instead of accession and disable protein filtering.true,false
protein_accessions[] All peptides that are not referencing at least one of the provided protein accession are removed.
Only proteins of the provided list are retained.
+++blacklistFiltering by blacklisting (only instances not present in a blacklist file can pass)
peptides Peptides having the same sequence and modification assignment as any peptide in this file will be filtered out. Use with blacklist:ignore_modification flag to only compare by sequence.
input file*.idXML
ignore_modificationsfalse Compare blacklisted peptides by sequence only.
true,false
+++rtFiltering by RT predicted by 'RTPredict'
p_value0 Retention time filtering by the p-value predicted by RTPredict.0:1
p_value_1st_dim0 Retention time filtering by the p-value predicted by RTPredict for first dimension.0:1
+++mzFiltering by mz
error-1 Filtering by deviation to theoretical mass (disabled for negative values).
unitppm Absolute or relative error.Da,ppm
+++bestFiltering best hits per spectrum (for peptides) or from proteins
n_peptide_hits0 Keep only the 'n' highest scoring peptide hits per spectrum (for n>0).0:∞
n_protein_hits0 Keep only the 'n' highest scoring protein hits (for n>0).0:∞
strictfalse Keep only the highest scoring peptide hit.
Similar to n_peptide_hits=1, but if there are two or more highest scoring hits, none are kept.
true,false
n_to_m_peptide_hits: peptide hit rank range to extracts

OpenMS / TOPP release 2.0.0 Documentation generated on Tue Aug 25 2015 05:53:56 using doxygen 1.8.9.1