Articles
In the expanding field of pharmaceutical and biotechnology innovation, the proper and standardized publication of nucleotide and amino acid sequences is essential in the securing of intellectual property rights. The United States Patent and Trademark Office (USPTO) requires all sequence disclosures in patent applications to conform to a strict format called the “Sequence Listing.” Through 37 CFR §§ 1.821–1.825 as of 2025, these requirements ensure sequence data appears in a uniform, machine-readable format that makes patent examination, public access, and worldwide harmonization possible.
This manual gives a clear overview of what USPTO Sequence Listing, is currently requiring, and it dissects the requirements for formatting, content, filing methods (such as electronic submissions and paper filings), as well as compliance strategies. If you are a patent lawyer, bioinformatics expert, or life sciences researcher about to file a patent, this manual will guide you through the technical and legal requirements needed to satisfy the USPTO’s expectations and steer clear of frequent mistakes.
A sequence listing provides a standardized means of presenting the entirety of biological sequence data that is disclosed in a patent application in a single document. More specifically, it includes a list of the nucleotide (DNA or RNA) and/or amino acid protein sequences that are described in a patent application by enumeration of their residues and that meet sequence length thresholds.
Currently, the international standard for this sequence listing is World Intellectual Property Organization (WIPO) Standard ST.25. The current USPTO regulations regarding sequence listings (see 37 CFR 1.821 – 1.825) are based on ST.25. However, a new international standard, WIPO Standard ST.26, is being implemented internationally. The USPTO has adopted this standard and revised its regulations accordingly (see 37 CFR 1.831-1.835).
Section
§ 1.821
§ 1.822
§ 1.823
§ 1.824
§ 1.825
§ 1.831
§ 1.832
§ 1.833
§ 1.834
§ 1.835
Regulation Title
Definitions and Scope of Sequence Listings
Symbols and Sequence Representation
Content and Format of Sequence Listing (ST.25)
File Format and Submission of Sequence Listings
Correction and Amendment Requirements
Applicability of ST.26 Requirements
Content of Sequence Listings (ST.26)
ST.26 XML Structure and Syntax
File Submission and Incorporation by Reference
Amendment and Replacement of ST.26 Listings
Summary Description
Defines what constitutes a nucleotide or amino acid sequence; outlines applicability and basic standards.
Specifies acceptable characters, ambiguity codes, and formatting for sequence data.
Details numeric identifiers, formatting structure, and required components like SEQ ID NOs.
Technical specs for ASCII text file submission and required metadata like file size and name.
Guidelines for submitting replacement listings and certified computer-readable forms (CRFs).
Scope of current rules for applications filed on or after July 1, 2022; mandates XML format.
Details required data elements, sequence identifiers, and general sequence listing structure.
Technical XML standards, including root elements, encoding (UTF-8), and schema validation requirements.
File naming, incorporation into the specification, and submission method (e-filing or disc).
Rules for submitting corrected XML sequence listings and statements to ensure no new matter is added.
Generally, nucleotide sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 10 or more specifically defined nucleotides are required to be listed in a “Sequence Listing XML.”
Amino acid sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 4 or more specifically defined amino acids are required to be listed in a “Sequence Listing XML.”
Additionally, any sequence having fewer than 10 specifically defined nucleotides, or fewer than 4 specifically defined amino acids must be excluded from any “Sequence listing XML.”
WIPO Standard ST.26, paragraph 3(k), provides that “specifically defined” means any nucleotide other than those represented by the symbol “n” and any amino acid other than those represented by the symbol “X,”, wherein “n” and “X” are used in a conventional manner as shown below in Table 1 for nucleotide, Table 2 for modified nucleotide, and Table 3 for amino acids symbols.
Symbol
a
c
g
t
m
r
w
s
y
k
v
h
d
b
n
Definition
adenine
cytosine
guanine
thymine in DNA/uracil in RNA (t/u)
a or c
a or g
a or t/u
c or g
c or t/u
g or t/u
a or c or g; not t/u
a or c or t/u; not g
a or g or t/u; not c
c or g or t/u; not a
a or c or g or t/u; “unknown” or “other”
Abbreviation
ac4c
chm5u
cm
cmnm5s2u
cmnm5u
dhu
fm
gal q
gm
i
i6a
m1a
m1f
m1g
m1i
m22g
m2a
m2g
m3c
m4c
m5c
m6a
m7g
mam5u
Definition
4-acetylcytidine
5-(carboxyhydroxymethyl)uridine
2'-O-methylcytidine
5-carboxymethylaminomethyl-2- thiouridine
5-carboxymethylaminomethyluridine
dihydrouridine
2'-O-methylpseudouridine
beta, D-galactosylqueuosine
2'-O-methylguanosine
inosine
N6-isopentenyladenosine
1-methyladenosine
1-methylpseudouridine
1-methylguanosine
1-methylinosine
2,2-dimethylguanosine
2-methyladenosine
2-methylguanosine
3-methylcytidine
N4-methylcytosine
5-methylcytidine
N6-methyladenosine
7-methylguanosine
5-methylaminomethyluridine
Abbreviation
mam5s2u
man q
mcm5s2u
mcm5u
mo5u
ms2i6a
ms2i6a
mt6a
mv
o5u
osyw
p
q
s2c
s2t
s2u
s4u
m5u
t6a
tm
um
yw
x
OTHER
Definition
5-methoxyaminomethyl-2-thiouridine
beta, D-mannosylqueuosine
5-methoxycarbonylmethyl-2- thiouridine
5-methoxycarbonylmethyluridine
5-methoxyuridine
2-methylthio-N6- isopentenyladenosine
N-((9-beta-D-ribofuranosyl-2- methylthiopurine...
N-((9-beta-D-ribofuranosylpurine-6- yl)N-methy...
uridine-5-oxyacetic acid-methylester
uridine-5-oxyacetic acid
wybutoxosine
pseudouridine
queuosine
2-thiocytidine
5-methyl-2-thiouridine
2-thiouridine
4-thiouridine
5-methyluridine
N-((9-beta-D-ribofuranosylpurine-6- yl)-carba...
2'-O-methyl-5-methyluridine
2'-O-methyluridine
wybutosine
3-(3-amino-3-carboxy-propyl)uridine, (acp3)u
(requires note qualifier)
Table 3: List of Amino Acids Symbols
Definition
Alanine
Arginine
Asparagine
Aspartic acid (Aspartate)
Cysteine
Glutamine
Glutamic acid (Glutamate)
Glycine
Histidine
Isoleucine
Leucine
Lynesi
Methionine
Phenylalanine
Proline
Pyrrolysine
Serine
Selenocysteine
Threonine
Tryptophan
Tyrosine
Valine
Aspartic acid or Asparagine
Glutamine or Glutamic acid
Leucine or Isoleucine
A or R or N or D or C or Q or E or G or H or I or L or K or M or F or P or O or S or U or T or W or Y or V; “unknown” or “other”
A primary sequence and any variant of that sequence, each disclosed by enumeration of its residues and meeting the definition in 37 CFR 1.831(a) and 1.831(b), must each be included in the “Sequence Listing XML” and assigned its own sequence identifier. Where a variant sequence is disclosed as a single sequence with enumerated alternative residues at one or more positions, it must be included in the “Sequence Listing XML” and should be represented by a single sequence, wherein the enumerated alternative residues are represented by the most restrictive ambiguity symbol. Any variant sequence, disclosed only by reference to deletion(s), insertion(s), or substitution(s) in a primary sequence, should be included in the “Sequence Listing XML”.
The table below indicates the proper use of feature keys and qualifiers for nucleic acid and amino acid sequence variants:
Type of Sequence
Nucleic acid
Nucleic acid
Amino acid
Amino acid
Feature Key
variation
misc_difference
VAR_SEQ
VARIANT
Qualifier
replace or note
replace or note
note
note
Use
Naturally occurring mutations and polymorphisms, e.g., alleles, RFLPs.
Variability introduced artificially, e.g., by genetic manipulation or by chemical synthesis.
Variant produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.
Any type of variant for which VAR_SEQ is not applicable.
The application filing date determines whether a sequence listing must comply with ST.25 or ST.26.
All applications with a filing date or international filing date BEFORE July 1, 2022 MUST file sequence listings in ST.25 format
All applications with a filing date or international filing date ON OR AFTER July 1, 2022 MUST file sequence listings in ST.26 format
Below Table highlighting the differences between ST.25 and ST.26 sequence listing standards:
Feature
Format
Inclusion of Special Sequence Content
Sequence Annotation
Permitted/Prohibited Sequences
Priority Application Inclusion
Applicant/Inventor Names
Invention Titles
Character Set for Names and Titles
Sequence Types
Organism Names
Uracil Symbol
Amino Acid Representation
Variable Residues ("n"/"X")
Feature Location Format
Mixed Mode Sequences
ST.25
ASCII text format with numeric identifiers
Not required to include:– D-amino acids– Linear portions of branched sequences– Nucleotide analogs
Feature keys only
Permitted:– < 10 specifically defined nucleotides– < 4 specifically defined amino acids
All priority applications may be included
All applicant and inventor names may be included
Only one invention title allowed
Basic Latin characters only
DNA, RNA, or PRT only
Latin genus/species– Virus name– "artificial sequence"– "unknown"
“u” represents uracil
Three-letter abbreviations
Must provide a definition using a feature
Not clearly defined
Allowed (nucleotide + translation below)
ST.26
XML format, encoded in UTF-8 (Unicode), with elements and attributes
Must include:– D-amino acids– Linear portions of branched sequences– Nucleotide analogs
Feature keys and qualifiers
Prohibited:– < 10 specifically defined nucleotides– < 4 specifically defined amino acids(“Specifically defined” means any nucleotide other than “n” and any amino acid other than “X”)
Only the earliest priority application can be included
Only one applicant and optionally one inventor may be included (first or primary only)
Multiple titles allowed (each in a different language)
Unicode characters allowed (with Latin transliteration for names)
DNA, RNA, or AA with a mandatory mol_type qualifier
– Latin genus/species– Virus name– "synthetic construct"– "unidentified"
"u" not valid. Use “t” for uracil in RNA and thymine in DNA. Modified bases must be described with modified_base feature
One-letter abbreviations
Default assumed unless representing a non-default value (then feature definition required)
Strictly defined, supports symbols like , ^, join, order, complement
Not allowed; use translation qualifier instead
For all U.S. patent applications filed on or after July 1, 2022, that disclose nucleotide and/or amino acid sequences (as defined in 37 CFR 1.831), the sequence data must be submitted in a standardized XML format known as a “Sequence Listing XML.”
This requirement aligns with the WIPO Standard ST.26, which enables a single compliant XML listing to be used across all WIPO member states, including both PCT international applications and national/regional applications.
(a) The “Sequence Listing” must comply with the following:
(b)
The sequence data part is the part of the “Sequence Listing XML” that contains each individual nucleotide or amino acid sequence that meets the definition for inclusion in a “Sequence Listing XML” together with sequence-associated data. WIPO Standard ST.26, paragraph 50, specifies that the sequence data part must be composed of one or more SequenceData elements, each element containing information about one sequence.
WIPO Standard ST.26, paragraph 51, specifies that each SequenceData element must have a mandatory attribute sequenceIDNumber, in which the sequence identifier (see MPEP § 2412.05(a)) for each sequence is contained.
WIPO Standard ST.26 specifies that the Sequence Data element must contain a dependent element INSDSeq, consisting of further dependent elements as follows:
Element
INSDSeq_length
INSDSeq_moltype
INSDSeq_division
INSDSeq_feature-table
INSDSeq_sequence
Description
Length of the sequence
Molecule type
Indication that a sequence is related to a patent application
List of annotations of the sequence
Sequence
Mandatory Sequences
Mandatory
Mandatory
Mandatory with the value "PAT"
Mandatory
Mandatory
Mandatory/Not Included Intentionally Skipped Sequences
Mandatory with no value
Mandatory with no value
Mandatory with no value
Must NOT be included
Mandatory with the value "000"
Reproduced from paragraph 52 of WIPO Standard ST.26.
See MPEP § 2412.05(a) for information about intentionally skipped sequences.
WIPO Standard ST.26, paragraph 53, specifies that the element INSDSeq_length must disclose the number of nucleotides or amino acids of the sequence contained in the INSDSeq_sequence element.
WIPO Standard ST.26, paragraph 54, specifies that the element INSDSeq_moltype must disclose the type of molecule that is being represented. For nucleotide sequences, including nucleotide analogue sequences, the molecule type must be indicated as DNA or RNA. For amino acid sequences, the molecule type must be indicated as AA.
WIPO Standard ST.26, paragraph 55, specifies that for a nucleotide sequence that contains both DNA and RNA segments of one or more nucleotides, the molecule type must be indicated as DNA. The combined DNA/RNA molecule must be further described in the feature table, using the feature key “source” and the mandatory qualifier “organism” with the value “synthetic construct” and the mandatory qualifier “mol_type” with the value “other DNA.” Each DNA and RNA segment of the combined DNA/RNA molecule must be further described with the feature key “misc_feature” and the qualifier “note,” wherein the qualifier value indicates whether the segment is DNA or RNA.
WIPO Standard ST.26, paragraph 57, specifies that the element INSDSeq_sequence must disclose the sequence. Only the appropriate symbols set forth in Table 1: List of Nucleotides Symbols and Table 3: List of Amino Acids Symbols (see MPEP § 2412.03(a)) must be included in the sequence. The sequence must not include numbers, punctuation or whitespace characters.
According to WIPO Standard ST.26, a “feature table” “contains information on the location and roles of various regions within a particular sequence. A feature table is required for every sequence, except for any intentionally skipped sequence, in which case it must not be included. The feature table is contained in the element INSDSeq_feature-table, which consists of one or more INSDFeature elements.” (WIPO Standard ST.26, paragraph 60).
WIPO Standard ST.26 specifies that each INSDFeature element that comprises the feature table describes one feature, and consists of dependent elements as follows:
Element
INSDFeature_key
INSDFeature_location
INSDFeature_quals
Description
A word or abbreviation indicating a feature
Region of the sequence which corresponds to the feature
Qualifier containing auxiliary information about a feature
Mandatory/Optional
Mandatory
Mandatory
Mandatory where the feature key requires one or more qualifiers, e.g., source; otherwise, Optional
WIPO Standard ST.26, paragraph 64, specifies that the mandatory element INSDFeature_location must contain at least one location descriptor, which defines a site or a region corresponding to a feature of the sequence in the INSDSeq_sequence element. Amino acid sequences must contain one and only one location descriptor in the mandatory INSDFeature_location element. Nucleotide sequences may have more than one location descriptor in the mandatory INSDFeature_location element when used in conjunction with one or more location operator(s) (more information about location descriptors is discussed below).
WIPO Standard ST.26, paragraph 65, specifies that the location descriptor can be a single residue number, a region delimiting a contiguous span of residue numbers, or a site or region that extends beyond the specified residue or span of residues. The location descriptor must not include numbering for residues beyond the range of the sequence in the INSDSeq_sequence element. For nucleotide sequences only, a location descriptor can be a site between two adjacent residue numbers. Multiple location descriptors must be used in conjunction with a location operator when a feature corresponds to discontinuous sites or regions of a nucleotide sequence (more information about location descriptors and operators is discussed below).
WIPO Standard ST.26, paragraph 66, specifies that the syntax for each type of location descriptor is indicated in Tables (a)-(c) below, where x and y are residue numbers, indicated as positive integers, not greater than the length of the sequence in the INSDSeq_sequence element, and x is less than y.
Location descriptor type
Single residue number
Residue numbers delimitating a sequence span
Residues before the first or beyond the last specified residue number
Syntax
x
x..y
x, y, y
Description
Points to a single residue in a sequence.
Points to a continuous range of residues bounded by and including the starting and ending residues.
Points to a region including a specified residue or span of residues and extending beyond a specified residue. The '' symbols may be used with a single residue or the starting and ending residue numbers of a span of residues to indicate that a feature extends beyond the specified residue number.
Location descriptor type
A site between two adjoining nucleotides
Syntax
x^y
Description
Points to a site between two adjoining nucleotides, e.g., endonucleolytic cleavage site. The position numbers for the adjacent nucleotides are separated by a carat (^). The permitted formats for this descriptor are x^x+1 (for example 55^56), or, for circular nucleotides, x^1, where "x" is the full length of the molecule, i.e. 1000^1 for circular molecule with length 1000.
Location descriptor type
Residue numbers joined by an intrachain cross-link
Syntax
x..y
Description
Points to amino acids joined by an intrachain linkage when used with a feature that indicates an intrachain cross-link, such as "CROSSLNK" or "DISULFID".
Reproduced from paragraph 66 of WIPO Standard ST.26.
WIPO Standard ST.26 specifies that the INSDFeature_location element of nucleotide sequences may contain one or more location operators. A location operator is a prefix to either one location descriptor or a combination of location descriptors corresponding to a single but discontinuous feature, and specifies where the location corresponding to the feature on the indicated sequence is found or how the feature is constructed. A list of location operators is provided in the table below with their descriptions. Location operators can be used for nucleotides only.
Location syntax
join (location, location,..., location)
order (location, location,...,location)
complement (location)
Location description
The indicated locations are joined (placed end-to-end) to form one contiguous sequence.
The elements are found in the specified order but nothing is implied about whether joining those elements is reasonable.
Indicates that the feature is located on the strand complementary to the sequence span specified by the location descriptor, when read in the 5’ to 3’ direction or in the direction that mimics 5’ to 3’ direction.
Reproduced from paragraph 67 of WIPO Standard ST.26.
WIPO Standard ST.26, paragraph 68, specifies that the join and order location operators require that at least two comma-separated location descriptors be provided. Location descriptors involving sites between two adjacent residues, i.e. x^y, must not be used within a join or order combination of locations. Use of the join location operator implies that the residues described by the location descriptors are physically brought into contact by biological processes (for example, the exons that contribute to a coding region feature).
WIPO Standard ST.26, paragraph 69, specifies that the location operator “complement” can be used in combination with either “join” or “order” within the same location. Combinations of “join” and “order” within the same location must not be used. See paragraph 70, examples of WIPO Standard ST.26.
WIPO Standard ST.26, paragraph 71, specifies that in an XML instance of a “Sequence Listing XML”, the characters “<” and “>” in a location descriptor must be replaced by the appropriate predefined entities, “<” and “>”, respectively (see MPEP § 2413.01(a) regarding the predefined entities).
WIPO Standard ST.26, paragraph 72, specifies that qualifiers are used to supply information about features in addition to that conveyed by the feature key and feature location. There are three types of value formats to accommodate different types of information conveyed by qualifiers, namely:
WIPO Standard ST.26, paragraph 73, specifies that Section 6 of Annex I contains the exclusive listing of qualifiers and their specified value formats, if any, for each nucleotide sequence feature key and Section 8 of Annex I contains the exclusive listing of qualifiers and their specified value formats, if any, for each amino acid sequence feature key.
WIPO Standard ST.26, paragraph 74, specifies that any sequence encompassed by 37 CFR 1.831(b) (see MPEP § 2412.03) that is provided as a qualifier value must be separately included in the “Sequence Listing XML” and assigned its own sequence identifier as described in MPEP § 2412.05(a).
WIPO Standard ST.26, paragraph 75, specifies that one mandatory feature key, i.e., “source” requires two mandatory qualifiers, “organism” and “mol_type.” Some optional feature keys also require mandatory qualifiers. See Annex I of WIPO Standard ST.26, Sections 5 and 7, for listings of feature keys with mandatory qualifiers.
WIPO Standard ST.26 specifies that the element INSDFeature_quals contains one or more INSDQualifier elements. Each INSDQualifier element represents a single qualifier and consists of three dependent elements and one optional attribute, as shown below:
Element
INSDQualifier_name
INSDQualifier_value
NonEnglishQualifier_value
id
Description
Name of the qualifier (see Annex I, Sections 6 and 8).
Value of the qualifier, if any, in the specified format (see Annex I, Sections 6 and 8) and composed in the characters as set forth in paragraph 40(b).
Value of the qualifier, if any, in the specified format (see Annex I, Sections 6 and 8) and composed in the characters as set forth in paragraph 40(a).
A qualifier with a language-dependent free text value may be uniquely identified by using the optional XML attribute 'id' in the element INSDQualifier (see paragraph 87(d)). The value of the 'id' attribute must start with the letter 'q' and continue with any positive integer. The value of an 'id' attribute must be unique to one INSDQualifier element, i.e. the attribute value must only be used once in a sequence listing file.
Mandatory/Optional
Mandatory
Mandatory, when specified (see paragraph 87 and Annex I, Sections 6 and 8)
Mandatory, when specified (see paragraph 87 and Annex I, Sections 6 and 8)
Optional
WIPO Standard ST.26, paragraph 77, specifies that the organism qualifier, i.e., “organism” for nucleotide sequences (See Table 5: List of Qualifier Values for Nucleotide Sequences with Language-Dependent Free-Text Values reproduced in MPEP § 2413.01(h), Annex I, section 6, of WIPO Standard ST.26) and “organism” for amino acid sequences (see Table 6: List of Qualifiers for Amino Acid Sequences with Language-Dependent Free Text Values reproduced in MPEP § 2413.01(h), Annex I, section 6, of WIPO Standard ST.26) must disclose the source, i.e., a single organism or origin, of the sequence. Organism designations should be selected from a taxonomy database.
WIPO Standard ST.26, paragraph 78, specifies that if the sequence is naturally occurring and the source organism has a Latin genus and species designation, that designation must be used as the qualifier value. The preferred English common name may be specified using the qualifier “note” for nucleotide sequences and amino acid sequences, but must not be used in the organism qualifier value.
WIPO Standard ST.26, paragraph 80, specifies that if the sequence is naturally occurring and the source organism has a known Latin genus, but the species is unspecified or unidentified, then the organism qualifier value must indicate the Latin genus followed by “sp”.
WIPO Standard ST.26, paragraph 81, specifies that if the sequence is naturally occurring, but the Latin organism genus and species designation is unknown, then the organism qualifier value must be indicated as “unidentified”. Any known taxonomic information should be indicated in the qualifier “note” for nucleotide sequences and the qualifier “note” for amino acid sequences.
WIPO Standard ST.26, paragraph 82, specifies that if the sequence is naturally occurring and the source organism does not have a Latin genus and species designation, such as a virus, then another acceptable scientific name (e.g., “Canine adenovirus type 2”) must be used as the organism qualifier value.
WIPO Standard ST.26, paragraph 83, specifies that if the sequence is not naturally occurring, the organism qualifier value must be indicated as “synthetic construct.” Further information with respect to the way the sequence was generated may be specified using the qualifier “note” for nucleotide sequences and the qualifier “note” for amino acid sequences.
WIPO Standard ST.26, paragraph 84, specifies that the “mol_type” qualifier for nucleotide sequences and “mol_type” qualifier for amino acid sequences must disclose the type of molecule represented in the sequence. These qualifiers are distinct from the element INSDSeq_moltype discussed above where INSDSeq_moltype for nucleotide sequences, including nucleotide analogue sequences must be indicated as DNA or RNA, and for amino acid sequences, must be indicated as AA:
WIPO Standard ST.26, paragraph 85, specifies that “free text” is a type of value format for certain qualifiers presented in the form of a descriptive text phrase or other specified format (see MPEP § 2413.01(h) for the definition of “free text” and see Annex I of WIPO Standard ST.26 for controlled vocabulary).
WIPO Standard ST.26, paragraph 86, specifies that the use of free text must be limited to a few short terms indispensable for the understanding of a characteristic of the sequence. For each qualifier other than the “translation” qualifier, the free text must not exceed 1000 characters.
WIPO Standard ST.26, paragraph 87, specifies that language-dependent free text is the free text value of certain qualifiers that is language-dependent in that it may require translation for international, national, or regional procedures. Qualifiers for nucleotide sequences with a language-dependent free text value format are identified in Annex I, Table 5: List of Qualifiers with Language-Dependent FreeText Values for Nucleotide Sequences (reproduced in MPEP § 2413.01(h)). Qualifiers for amino acid sequences with a language-dependent free text value format are identified in Annex I, Table 6: List of Qualifiers with Language-Dependent Free Text Values for Amino Acid Sequences (reproduced in MPEP § 2413.01(h)).
WIPO Standard ST.26, paragraph 89, specifies that the “CDS” feature key may be used to identify coding sequences, i.e., sequences of nucleotides which correspond to the sequence of amino acids in a protein and the stop codon. The location of the “CDS” feature in the mandatory element INSDFeature_location must include the stop codon.
WIPO Standard ST.26, paragraph 90, specifies that the “transl_table” and “translation” qualifiers may be used with the “CDS” feature key (see Annex I of WIPO Standard ST.26). Where the “transl_table” qualifier is not used, the use of the Standard Code Table (see Annex I, Section 9, Table 7 of WIPO Standard ST.26) is assumed.
WIPO Standard ST.26, paragraph 91, specifies that the “transl_except” qualifier must be used with the “CDS” feature key and the “translation” qualifier to identify a codon that encodes either pyrrolysine or selenocysteine.
WIPO Standard ST.26, paragraph 92, specifies that an amino acid sequence encoded by the coding sequence and disclosed in a “translation” qualifier that is encompassed by the description of sequences found in MPEP § 2412.03 must be included in the sequence listing and assigned its own sequence identifier. The sequence identifier assigned to the amino acid sequence must be provided as the value in the qualifier “protein_id” with the “CDS” feature key. The “organism” qualifier of the “source” feature key for the amino acid sequence must be identical to that of its coding sequence.
MPEP § 2412.05(c) provides information about representation and inclusion of variants
WIPO Standard ST.26, paragraph 93, specifies that a primary sequence and any variant of that sequence, each disclosed by enumeration of its residues and encompassed by the description of sequences found in MPEP § 2412.03 must each be included in the sequence listing and assigned its own sequence identifier.
WIPO Standard ST.26, paragraph 94, specifies that any variant sequence, disclosed as a single sequence with enumerated alternative residues at one or more positions, must be included in the sequence listing and should be represented by a single sequence, wherein the enumerated alternative residues are represented by the most restrictive ambiguity symbol. See MPEP § 2412.05(b), subsection II, for more information regarding representing alternative nucleotide residues and MPEP § 2412.05(d), subsection II, for more information regarding representing alternative amino acid residues.
WIPO Standard ST.26, paragraph 95, specifies that any variant sequence, disclosed only by reference to deletion(s), insertion(s), or substitution(s) in a primary sequence in the sequence listing, should be included in the sequence listing. Where included in the sequence listing, such a variant sequence:
WIPO Standard ST.26, paragraph 96, specifies the proper use of feature keys and qualifiers for nucleic acid and amino acid sequence variants from the table List of Feature Keys and Qualifiers (reproduced in MPEP § 2412.05(c)).
WIPO Standard ST.26, paragraph 97, specifies that annotation of a sequence for a specific variant must include a feature key and qualifier, as indicated in the table in MPEP § 2412.05(c), and the feature location. The value for the “replace” qualifier must be only a single alternative nucleotide or nucleotide sequence using only the symbols in set forth Table 1: List of Nucleotides Symbols (see MPEP § 2413.01(a)), or empty. A listing of alternative residues may be provided as the value in the “note” qualifier. In particular, a listing of alternative amino acids must be provided as the value in the “note” qualifier where “X” is used in a sequence, and represents a value other than “any one of ‘A’, ‘R’, ‘N’, ‘D’, ‘C’, ‘Q’, ‘E’, ‘G’, ‘H’, ‘I’, ‘L’, ‘K’, ‘M’, ‘F’, ‘P’, ‘O’, ‘S’, ‘U’, ‘T’, ‘W’, ‘Y’, or ‘V.’” A deletion must be represented by an empty qualifier value for the “replace” qualifier or by an indication in the “note” qualifier that the residue may be deleted. An inserted or substituted residue(s) must be provided in the “replace” or “note” qualifier. The value format for the “replace” and “note” qualifiers is free text and must not exceed 1000 characters. See below for sequences encompassed by the definition in MPEP § 2412.03 that are provided as an insertion or a substitution in a qualifier value.
WIPO Standard ST.26, paragraph 98, specifies that the symbols set forth in Tables 1 to 4 of Annex I, reproduced in MPEP §§ 2412.03(a), 2412.03(c), and 2412.05(b), subsection III, should be used to represent variant residues where appropriate. For the “note” qualifier, where the variant residue is a modified residue not set forth in Tables 2 or 4 the complete unabbreviated name of the modified residue must be provided as the qualifier value. Modified residues must be further described in a feature table as described in MPEP § 2412.05(b), subsection III for modified nucleotides and MPEP § 2412.05(d), subsection III, for modified amino acids.
WIPO Standard ST.26, paragraph 100, specifies that a sequence encompassed by the description of sequences found in MPEP § 2412.03 that is provided as an insertion or a substitution in a qualifier value for a primary sequence annotation must also be included in the sequence listing and assigned its own sequence identifier.
The WIPO sequence tool is a free to download tool, that can be used on windows, Linux and Mac Operating system. The tool requires 64-Bit architecture, with 8 GB RAM for handling heavy sequence list data and up to 200 MB disk Space.
Once installed the user can create “new project” and feed the appropriate metadata and sequence information into the appropriate tabs. The tool allows the addition of sequence greater than 10 nucleotides or greater than 4 amino acids. The software only supports linear/unbranched amino acid/ nucleotide sequence.
Upon installing the latest version of WIPO sequence tool, A home screen appears. This home screen shows various options such as Projects, Persons & Organizations, Organisms, help, references, and language options.
Element
Application Identification
IPOfficeCode
ApplicationNumberText
FilingDate
ApplicantFileReference
EarliestPriorityApplicationId entification
ApplicantName
InventorName
InventorNameLatin
InventionTitle
SequenceTotalQuantity
Description
The application identification for which the sequence listing is submitted
ST.3 Code of the office of filing
The application number as provided by the office of filing (e.g., PCT/IB2013/099999).
The date of filing of the patent application for which the sequence listing is submitted (e.g., 2015-01-31)
A single unique identifier assigned by applicant to identify a particular application, typed in the characters
The identification of the earliest priority application
Name of the first mentioned applicant typed in the characters.
Name of the first mentioned inventor typed in the characters.
Where InventorName is typed in characters
Title of the invention typed in the characters in the language of filing.
The total number of all sequences in the sequence listing including intentionally skipped sequences
Mandatory/Optional
Mandatory when a sequence listing is furnished at any time following the assignment of the application number
Mandatory
Mandatory
Mandatory when a sequence listing is furnished at any time following the assignment of a filing date
Mandatory when a sequence listing is furnished at any time prior to assignment of the application number; otherwise, Optional
Mandatory where priority is claimed
Mandatory
Optional
Optional
Mandatory in the language of filing. Optional for additional languages
Mandatory
Upon adding the Project Name and adding optional Description. This action can be saved by clicking on “Save” button.
The sequence and other information can be added upon clicking on the project name that is now visible on the home screen. The screen now
The user to enter application identification number, Priority identification, applicant & inventor. The various option can be accessed by scrolling down the screen. One can manually add the sequence or import a file containing sequence data. This can be manually done clicking on” import sequence” button option. The file can be upload which can contain multi-sequence information in a FASTA Format.
Upon completion of the sequence data submission one can click on Validation to check for any errors that might have occurred in data submission.
Following Validation, one can easily generate report in ST.26 format upon clicking on the “Generate Sequence Listing” Option. A window appears requiring to specify the location to save the newly generated XML folder.
The XML file when open shows the sequence listing with the project name, software used, production date. This is followed by the bibliographical information, sequence information and sequence.
At our Sequence Listing Company, we specialize exclusively in creating perfect patent sequence listings for biotechnology and pharmaceutical companies. Founded by patent attorneys and bioinformatics specialists with over 10 years of experience, we understand the critical intersection of scientific innovation and intellectual property protection. Our dedicated team has helped hundreds of companies successfully navigate the complex regulatory requirements of sequence listings across global patent offices. We combine technical precision with regulatory expertise to ensure your valuable innovations receive the protection they deserve without delays or complications.
Effectual Services is an award-winning Intellectual Property (IP) management advisory & Consulting firm.