Articles

Complete Guide to USPTO Sequence Listing Requirements in 2025

Table of Contents

In the expanding field of pharmaceutical and biotechnology innovation, the proper and standardized publication of nucleotide and amino acid sequences is essential in the securing of intellectual property rights. The United States Patent and Trademark Office (USPTO) requires all sequence disclosures in patent applications to conform to a strict format called the “Sequence Listing.” Through 37 CFR §§ 1.821–1.825 as of 2025, these requirements ensure sequence data appears in a uniform, machine-readable format that makes patent examination, public access, and worldwide harmonization possible.

This manual gives a clear overview of what USPTO Sequence Listing, is currently requiring, and it dissects the requirements for formatting, content, filing methods (such as electronic submissions and paper filings), as well as compliance strategies. If you are a patent lawyer, bioinformatics expert, or life sciences researcher about to file a patent, this manual will guide you through the technical and legal requirements needed to satisfy the USPTO’s expectations and steer clear of frequent mistakes.

What is a Sequence Listing?

A sequence listing provides a standardized means of presenting the entirety of biological sequence data that is disclosed in a patent application in a single document. More specifically, it includes a list of the nucleotide (DNA or RNA) and/or amino acid protein sequences that are described in a patent application by enumeration of their residues and that meet sequence length thresholds.

Currently, the international standard for this sequence listing is World Intellectual Property Organization (WIPO) Standard ST.25. The current USPTO regulations regarding sequence listings (see 37 CFR 1.821 – 1.825) are based on ST.25. However, a new international standard, WIPO Standard ST.26, is being implemented internationally.  The USPTO has adopted this standard and revised its regulations accordingly (see 37 CFR 1.831-1.835).

Here is a well-structured table titled USPTO Sequence Listing Requirements summarizing the relevant sections:
Table: USPTO Sequence Listing Requirements (2025)

Section

§ 1.821

§ 1.822

§ 1.823

§ 1.824

§ 1.825

§ 1.831

§ 1.832

§ 1.833

§ 1.834

§ 1.835

Regulation Title

Definitions and Scope of Sequence Listings

Symbols and Sequence Representation

Content and Format of Sequence Listing (ST.25)

File Format and Submission of Sequence Listings

Correction and Amendment Requirements

Applicability of ST.26 Requirements

Content of Sequence Listings (ST.26)

ST.26 XML Structure and Syntax

File Submission and Incorporation by Reference

Amendment and Replacement of ST.26 Listings

Summary Description

Defines what constitutes a nucleotide or amino acid sequence; outlines applicability and basic standards.

Specifies acceptable characters, ambiguity codes, and formatting for sequence data.

Details numeric identifiers, formatting structure, and required components like SEQ ID NOs.

Technical specs for ASCII text file submission and required metadata like file size and name.

Guidelines for submitting replacement listings and certified computer-readable forms (CRFs).

Scope of current rules for applications filed on or after July 1, 2022; mandates XML format.

Details required data elements, sequence identifiers, and general sequence listing structure.

Technical XML standards, including root elements, encoding (UTF-8), and schema validation requirements.

File naming, incorporation into the specification, and submission method (e-filing or disc).

Rules for submitting corrected XML sequence listings and statements to ensure no new matter is added.

2412.03 Nucleotides and Amino Acids Included and Excluded from a “Sequence Listing XML” [R-01.2024]

Generally, nucleotide sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 10 or more specifically defined nucleotides are required to be listed in a “Sequence Listing XML.”

Amino acid sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 4 or more specifically defined amino acids are required to be listed in a “Sequence Listing XML.”

Additionally, any sequence having fewer than 10 specifically defined nucleotides, or fewer than 4 specifically defined amino acids must be excluded from any “Sequence listing XML.”

WIPO Standard ST.26, paragraph 3(k), provides that “specifically defined” means any nucleotide other than those represented by the symbol “n” and any amino acid other than those represented by the symbol “X,”, wherein “n” and “X” are used in a conventional manner as shown below in Table 1 for nucleotide, Table 2 for modified nucleotide, and Table 3 for amino acids symbols.

Table 1: List of Nucleotides Symbols

Symbol

a

c

g

t

m

r

w

s

y

k

v

h

d

b

n

Definition

adenine

cytosine

guanine

thymine in DNA/uracil in RNA (t/u)

a or c

a or g

a or t/u

c or g

c or t/u

g or t/u

a or c or g; not t/u

a or c or t/u; not g

a or g or t/u; not c

c or g or t/u; not a

a or c or g or t/u; “unknown” or “other”

Table 2: List of Modified Nucleotides

Abbreviation

ac4c

chm5u

cm

cmnm5s2u

cmnm5u

dhu

fm

gal q

gm

i

i6a

m1a

m1f

m1g

m1i

m22g

m2a

m2g

m3c

m4c

m5c

m6a

m7g

mam5u

Definition

4-acetylcytidine

5-(carboxyhydroxymethyl)uridine

2'-O-methylcytidine

5-carboxymethylaminomethyl-2- thiouridine

5-carboxymethylaminomethyluridine

dihydrouridine

2'-O-methylpseudouridine

beta, D-galactosylqueuosine

2'-O-methylguanosine

inosine

N6-isopentenyladenosine

1-methyladenosine

1-methylpseudouridine

1-methylguanosine

1-methylinosine

2,2-dimethylguanosine

2-methyladenosine

2-methylguanosine

3-methylcytidine

N4-methylcytosine

5-methylcytidine

N6-methyladenosine

7-methylguanosine

5-methylaminomethyluridine

Abbreviation

mam5s2u

man q

mcm5s2u

mcm5u

mo5u

ms2i6a

ms2i6a

mt6a

mv

o5u

osyw

p

q

s2c

s2t

s2u

s4u

m5u

t6a

tm

um

yw

x

OTHER

Definition

5-methoxyaminomethyl-2-thiouridine

beta, D-mannosylqueuosine

5-methoxycarbonylmethyl-2- thiouridine

5-methoxycarbonylmethyluridine

5-methoxyuridine

2-methylthio-N6- isopentenyladenosine

N-((9-beta-D-ribofuranosyl-2- methylthiopurine...

N-((9-beta-D-ribofuranosylpurine-6- yl)N-methy...

uridine-5-oxyacetic acid-methylester

uridine-5-oxyacetic acid

wybutoxosine

pseudouridine

queuosine

2-thiocytidine

5-methyl-2-thiouridine

2-thiouridine

4-thiouridine

5-methyluridine

N-((9-beta-D-ribofuranosylpurine-6- yl)-carba...

2'-O-methyl-5-methyluridine

2'-O-methyluridine

wybutosine

3-(3-amino-3-carboxy-propyl)uridine, (acp3)u

(requires note qualifier)

Table 3: List of Amino Acids Symbols

Symbol

A

R

N

D

C

Q

E

G

H

I

L

K

M

F

P

O

S

U

T

W

Y

V

B

Z

J

X

Definition

Alanine

Arginine

Asparagine

Aspartic acid (Aspartate)

Cysteine

Glutamine

Glutamic acid (Glutamate)

Glycine

Histidine

Isoleucine

Leucine

Lynesi

Methionine

Phenylalanine

Proline

Pyrrolysine

Serine

Selenocysteine

Threonine

Tryptophan

Tyrosine

Valine

Aspartic acid or Asparagine

Glutamine or Glutamic acid

Leucine or Isoleucine

A or R or N or D or C or Q or E or G or H or I or L or K or M or F or P or O or S or U or T or W or Y or V; “unknown” or “other”

2412.05(c) Representation and Inclusion of Variants [R-01.2024]

A primary sequence and any variant of that sequence, each disclosed by enumeration of its residues and meeting the definition in 37 CFR 1.831(a) and 1.831(b), must each be included in the “Sequence Listing XML” and assigned its own sequence identifier. Where a variant sequence is disclosed as a single sequence with enumerated alternative residues at one or more positions, it must be included in the “Sequence Listing XML” and should be represented by a single sequence, wherein the enumerated alternative residues are represented by the most restrictive ambiguity symbol. Any variant sequence, disclosed only by reference to deletion(s), insertion(s), or substitution(s) in a primary sequence, should be included in the “Sequence Listing XML”.

The table below indicates the proper use of feature keys and qualifiers for nucleic acid and amino acid sequence variants:

Type of Sequence

Nucleic acid

Nucleic acid

Amino acid

Amino acid

Feature Key

variation

misc_difference

VAR_SEQ

VARIANT

Qualifier

replace or note

replace or note

note

note

Use

Naturally occurring mutations and polymorphisms, e.g., alleles, RFLPs.

Variability introduced artificially, e.g., by genetic manipulation or by chemical synthesis.

Variant produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.

Any type of variant for which VAR_SEQ is not applicable.

2412.05(c) Representation and Inclusion of Variants [R-01.2024]

The application filing date determines whether a sequence listing must comply with ST.25 or ST.26.

All applications with a filing date or international filing date BEFORE July 1, 2022 MUST file sequence listings in ST.25 format

  • For 111(a) applications, the relevant date is the “official filing date” i.e., the date all the requirements for granting a filing date are met.
  • For U.S. national phase (371) applications, the relevant date is the PCT filing date, NOT the 371(c) date. 
  • You cannot choose to file in ST.26.

 

All applications with a filing date or international filing date ON OR AFTER July 1, 2022 MUST file sequence listings in ST.26 format

  • An application with benefit or priority to an earlier filed application (under 35 USC 119, 120, 121 or 365) that may have contained a sequence listing in accordance with ST.25 will nonetheless be REQUIRED to submit a compliant sequence listing in XML file format in accordance with 37 CFR 1.831-1.835 (i.e., be in ST.26 format, there will no “grandfathering”).
  • Provisional applications are not required to file a sequence listing, however, after July 1, 2022, if an applicant chooses to submit a sequence listing in provisional application, such sequence listing must be complying with 37 CFR 1.831-1.835 (i.e., be in ST.26 format).

Below Table highlighting the differences between ST.25 and ST.26 sequence listing standards:

Feature

Format

Inclusion of Special Sequence Content

Sequence Annotation

Permitted/Prohibited Sequences

Priority Application Inclusion

Applicant/Inventor Names

Invention Titles

Character Set for Names and Titles

Sequence Types

Organism Names

Uracil Symbol

Amino Acid Representation

Variable Residues ("n"/"X")

Feature Location Format

Mixed Mode Sequences

ST.25

ASCII text format with numeric identifiers

Not required to include:– D-amino acids– Linear portions of branched sequences– Nucleotide analogs

Feature keys only

Permitted:– < 10 specifically defined nucleotides– < 4 specifically defined amino acids

All priority applications may be included

All applicant and inventor names may be included

Only one invention title allowed

Basic Latin characters only

DNA, RNA, or PRT only

Latin genus/species– Virus name– "artificial sequence"– "unknown"

“u” represents uracil

Three-letter abbreviations

Must provide a definition using a feature

Not clearly defined

Allowed (nucleotide + translation below)

ST.26

XML format, encoded in UTF-8 (Unicode), with elements and attributes

Must include:– D-amino acids– Linear portions of branched sequences– Nucleotide analogs

Feature keys and qualifiers

Prohibited:– < 10 specifically defined nucleotides– < 4 specifically defined amino acids(“Specifically defined” means any nucleotide other than “n” and any amino acid other than “X”)

Only the earliest priority application can be included

Only one applicant and optionally one inventor may be included (first or primary only)

Multiple titles allowed (each in a different language)

Unicode characters allowed (with Latin transliteration for names)

DNA, RNA, or AA with a mandatory mol_type qualifier

– Latin genus/species– Virus name– "synthetic construct"– "unidentified"

"u" not valid. Use “t” for uracil in RNA and thymine in DNA. Modified bases must be described with modified_base feature

One-letter abbreviations

Default assumed unless representing a non-default value (then feature definition required)

Strictly defined, supports symbols like , ^, join, order, complement

Not allowed; use translation qualifier instead

2412 The Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures to Include a Sequence Listing in XML file format [R-01.2024]

For all U.S. patent applications filed on or after July 1, 2022, that disclose nucleotide and/or amino acid sequences (as defined in 37 CFR 1.831), the sequence data must be submitted in a standardized XML format known as a “Sequence Listing XML.”

This requirement aligns with the WIPO Standard ST.26, which enables a single compliant XML listing to be used across all WIPO member states, including both PCT international applications and national/regional applications.

  • The XML file must conform to 37 CFR §§ 1.831–1.834, which implement specific ST.26 provisions.
  • In U.S. applications, the sequence listing is considered part of the disclosure, even if not duplicated elsewhere in the specification.
  • In international applications, the sequence listing is also part of the application if present on the filing date, and must not rely on incorporation by reference.

1.823 Requirements for content of a “Sequence Listing” part of the specification

(a) The “Sequence Listing” must comply with the following:

  1. The order and presentation of the items of information in the “Sequence Listing” shall conform to the arrangement in Appendix G to this subpart. The submission of those items of information designated with an “M” is mandatory (Refer Table: Numeric Identifiers) The submission of those items designated with an “O” is optional.
  2. Each item of information shall begin on a new line, with the numeric identifier enclosed in angle brackets, as shown in Appendix G to this subpart.
  3. Set forth numeric identifiers <110> through <170> at the beginning of the “Sequence Listing.”
  4. Include each disclosed nucleotide and/or amino acid sequence, as defined in § 1.821(a).
  5. Assign a separate sequence identifier to each sequence, beginning with 1 and increasing sequentially by integers, and include the sequence identifier in numeric identifier <210>.
  6. Use the code “000” in place of the sequence where no sequence is present for a sequence identifier.
  7. Include the total number of SEQ ID NOs in numeric identifier <160>, as defined in Appendix G to this subpart, whether followed by a sequence or by the code “000.”
  8. Each line must not contain more than 74 characters.

 

(b)

  1. if the “Sequence Listing” required by § 1.821(c) is submitted as an ASCII plain text file via the USPTO patent electronic filing system or on a read-only optical disc, in compliance with § 1.52(e), then the specification must contain a statement in a separate paragraph (see § 1.77(b)(5)) that incorporates by reference the material in the ASCII plain text file, identifying:
    • (i) The name of the file;
    • (ii) The date of creation; and
    • (iii) The size of the file in bytes.
  2. If the “Sequence Listing” required by § 1.821(c) is submitted as an ASCII plain text file via the USPTO patent electronic filing system or on a read-only optical disc, in compliance with § 1.52(e), for an international application during the international stage, then incorporation by reference of the material in the ASCII plain text file is not required.
  3. A “Sequence Listing” required by § 1.821(c) that is submitted as a PDF file (§ 1.821(c)(2)) via the USPTO patent electronic filing system or on physical sheets of paper (§ 1.821(c)(3)), setting forth the nucleotide and/or amino acid sequence and associated information in accordance with paragraph (a) of this section:
    • (i) Must begin on a new page;
    • (ii) Must be titled “Sequence Listing”;
    • (iii) Must not include material other than the “Sequence Listing” itself;
    • (iv) Must have sheets containing no more than 66 lines, with each line containing no more than 74 characters;
    • (v) Should have sheets numbered independently of the numbering of the remainder of the application; and
    • (vi) Should use a fixed-width font exclusively throughout.

MPEP : The "Sequence Listing XML" Must Contain a Sequence Data Part

The sequence data part is the part of the “Sequence Listing XML” that contains each individual nucleotide or amino acid sequence that meets the definition for inclusion in a “Sequence Listing XML” together with sequence-associated data. WIPO Standard ST.26, paragraph 50, specifies that the sequence data part must be composed of one or more SequenceData elements, each element containing information about one sequence.

WIPO Standard ST.26, paragraph 51, specifies that each SequenceData element must have a mandatory attribute sequenceIDNumber, in which the sequence identifier (see MPEP § 2412.05(a)) for each sequence is contained.

WIPO Standard ST.26 specifies that the Sequence Data element must contain a dependent element INSDSeq, consisting of further dependent elements as follows:

Element

INSDSeq_length

INSDSeq_moltype

INSDSeq_division

INSDSeq_feature-table

INSDSeq_sequence

Description

Length of the sequence

Molecule type

Indication that a sequence is related to a patent application

List of annotations of the sequence

Sequence

Mandatory Sequences

Mandatory

Mandatory

Mandatory with the value "PAT"

Mandatory

Mandatory

Mandatory/Not Included Intentionally Skipped Sequences

Mandatory with no value

Mandatory with no value

Mandatory with no value

Must NOT be included

Mandatory with the value "000"

Reproduced from paragraph 52 of WIPO Standard ST.26.

See MPEP § 2412.05(a) for information about intentionally skipped sequences.

WIPO Standard ST.26, paragraph 53, specifies that the element INSDSeq_length must disclose the number of nucleotides or amino acids of the sequence contained in the INSDSeq_sequence element.

WIPO Standard ST.26, paragraph 54, specifies that the element INSDSeq_moltype must disclose the type of molecule that is being represented. For nucleotide sequences, including nucleotide analogue sequences, the molecule type must be indicated as DNA or RNA. For amino acid sequences, the molecule type must be indicated as AA.

WIPO Standard ST.26, paragraph 55, specifies that for a nucleotide sequence that contains both DNA and RNA segments of one or more nucleotides, the molecule type must be indicated as DNA. The combined DNA/RNA molecule must be further described in the feature table, using the feature key “source” and the mandatory qualifier “organism” with the value “synthetic construct” and the mandatory qualifier “mol_type” with the value “other DNA.” Each DNA and RNA segment of the combined DNA/RNA molecule must be further described with the feature key “misc_feature” and the qualifier “note,” wherein the qualifier value indicates whether the segment is DNA or RNA.

WIPO Standard ST.26, paragraph 57, specifies that the element INSDSeq_sequence must disclose the sequence. Only the appropriate symbols set forth in Table 1: List of Nucleotides Symbols and Table 3: List of Amino Acids Symbols (see MPEP § 2412.03(a)) must be included in the sequence. The sequence must not include numbers, punctuation or whitespace characters.

  1. Feature Table

According to WIPO Standard ST.26, a “feature table” “contains information on the location and roles of various regions within a particular sequence. A feature table is required for every sequence, except for any intentionally skipped sequence, in which case it must not be included. The feature table is contained in the element INSDSeq_feature-table, which consists of one or more INSDFeature elements.” (WIPO Standard ST.26, paragraph 60).

WIPO Standard ST.26 specifies that each INSDFeature element that comprises the feature table describes one feature, and consists of dependent elements as follows:

Element

INSDFeature_key

INSDFeature_location

INSDFeature_quals

Description

A word or abbreviation indicating a feature

Region of the sequence which corresponds to the feature

Qualifier containing auxiliary information about a feature

Mandatory/Optional

Mandatory

Mandatory

Mandatory where the feature key requires one or more qualifiers, e.g., source; otherwise, Optional

Reproduced from paragraph 61 of WIPO Standard ST.26.
2. Feature Keys
WIPO Standard ST.26, paragraph 62, specifies that Annex I contains the exclusive listing of feature keys that must be used when preparing and submitting a “Sequence Listing XML,” along with an exclusive listing of associated qualifiers and an indication as to whether those qualifiers are mandatory or optional. Section 5 of Annex I of WIPO Standard ST.26 provides the exclusive listing of feature keys for nucleotide sequences and Section 7 of Annex I of WIPO Standard ST.26 provides the exclusive listing of feature keys for amino acid sequences.
3. Mandatory Feature Keys
WIPO Standard ST.26, paragraph 63, specifies that the “source” feature key is mandatory for all nucleotide sequences and for all amino acid sequences, except for any intentionally skipped sequence. Each sequence must have a single “source” feature key spanning the entire sequence. Where a sequence originates from multiple sources, those sources may be further described in the feature table, using the feature key “misc_feature” and the qualifier “note” for nucleotide sequences, and the feature key “REGION” and the qualifier “note” for amino acid sequences.
4. Feature Location

WIPO Standard ST.26, paragraph 64, specifies that the mandatory element INSDFeature_location must contain at least one location descriptor, which defines a site or a region corresponding to a feature of the sequence in the INSDSeq_sequence element. Amino acid sequences must contain one and only one location descriptor in the mandatory INSDFeature_location element. Nucleotide sequences may have more than one location descriptor in the mandatory INSDFeature_location element when used in conjunction with one or more location operator(s) (more information about location descriptors is discussed below).

WIPO Standard ST.26, paragraph 65, specifies that the location descriptor can be a single residue number, a region delimiting a contiguous span of residue numbers, or a site or region that extends beyond the specified residue or span of residues. The location descriptor must not include numbering for residues beyond the range of the sequence in the INSDSeq_sequence element. For nucleotide sequences only, a location descriptor can be a site between two adjacent residue numbers. Multiple location descriptors must be used in conjunction with a location operator when a feature corresponds to discontinuous sites or regions of a nucleotide sequence (more information about location descriptors and operators is discussed below).

WIPO Standard ST.26, paragraph 66, specifies that the syntax for each type of location descriptor is indicated in Tables (a)-(c) below, where x and y are residue numbers, indicated as positive integers, not greater than the length of the sequence in the INSDSeq_sequence element, and x is less than y.

(a) Location descriptors for nucleotide and amino acid sequences:

Location descriptor type

Single residue number

Residue numbers delimitating a sequence span

Residues before the first or beyond the last specified residue number

Syntax

x

x..y

x, y, y

Description

Points to a single residue in a sequence.

Points to a continuous range of residues bounded by and including the starting and ending residues.

Points to a region including a specified residue or span of residues and extending beyond a specified residue. The '' symbols may be used with a single residue or the starting and ending residue numbers of a span of residues to indicate that a feature extends beyond the specified residue number.

Reproduced from paragraph 66 of WIPO Standard ST.26.
(b) Location descriptors for nucleotide sequences only:

Location descriptor type

A site between two adjoining nucleotides

Syntax

x^y

Description

Points to a site between two adjoining nucleotides, e.g., endonucleolytic cleavage site. The position numbers for the adjacent nucleotides are separated by a carat (^). The permitted formats for this descriptor are x^x+1 (for example 55^56), or, for circular nucleotides, x^1, where "x" is the full length of the molecule, i.e. 1000^1 for circular molecule with length 1000.

Reproduced from paragraph 66 of WIPO Standard ST.26.
(c) Location descriptors for amino acid sequences only:

Location descriptor type

Residue numbers joined by an intrachain cross-link

Syntax

x..y

Description

Points to amino acids joined by an intrachain linkage when used with a feature that indicates an intrachain cross-link, such as "CROSSLNK" or "DISULFID".

Reproduced from paragraph 66 of WIPO Standard ST.26.

WIPO Standard ST.26 specifies that the INSDFeature_location element of nucleotide sequences may contain one or more location operators. A location operator is a prefix to either one location descriptor or a combination of location descriptors corresponding to a single but discontinuous feature, and specifies where the location corresponding to the feature on the indicated sequence is found or how the feature is constructed. A list of location operators is provided in the table below with their descriptions. Location operators can be used for nucleotides only.

Location syntax

join (location, location,..., location)

order (location, location,...,location)

complement (location)

Location description

The indicated locations are joined (placed end-to-end) to form one contiguous sequence.

The elements are found in the specified order but nothing is implied about whether joining those elements is reasonable.

Indicates that the feature is located on the strand complementary to the sequence span specified by the location descriptor, when read in the 5’ to 3’ direction or in the direction that mimics 5’ to 3’ direction.

Reproduced from paragraph 67 of WIPO Standard ST.26.

WIPO Standard ST.26, paragraph 68, specifies that the join and order location operators require that at least two comma-separated location descriptors be provided. Location descriptors involving sites between two adjacent residues, i.e. x^y, must not be used within a join or order combination of locations. Use of the join location operator implies that the residues described by the location descriptors are physically brought into contact by biological processes (for example, the exons that contribute to a coding region feature).

WIPO Standard ST.26, paragraph 69, specifies that the location operator “complement” can be used in combination with either “join” or “order” within the same location. Combinations of “join” and “order” within the same location must not be used. See paragraph 70, examples of WIPO Standard ST.26.

WIPO Standard ST.26, paragraph 71, specifies that in an XML instance of a “Sequence Listing XML”, the characters “<” and “>” in a location descriptor must be replaced by the appropriate predefined entities, “&lt;” and “&gt;”, respectively (see MPEP § 2413.01(a) regarding the predefined entities).

5. Feature Qualifiers

WIPO Standard ST.26, paragraph 72, specifies that qualifiers are used to supply information about features in addition to that conveyed by the feature key and feature location. There are three types of value formats to accommodate different types of information conveyed by qualifiers, namely:

  • (a) free text (see MPEP §§ 2413.01(g), subsection IX and 2413.01(h), for more detail about “free text”);
  • (b) controlled vocabulary or enumerated values (e.g., a number or date); and
  • (c) sequences.

 

WIPO Standard ST.26, paragraph 73, specifies that Section 6 of Annex I contains the exclusive listing of qualifiers and their specified value formats, if any, for each nucleotide sequence feature key and Section 8 of Annex I contains the exclusive listing of qualifiers and their specified value formats, if any, for each amino acid sequence feature key.

WIPO Standard ST.26, paragraph 74, specifies that any sequence encompassed by 37 CFR 1.831(b) (see MPEP § 2412.03) that is provided as a qualifier value must be separately included in the “Sequence Listing XML” and assigned its own sequence identifier as described in MPEP § 2412.05(a).

6. Mandatory Feature Qualifiers

WIPO Standard ST.26, paragraph 75, specifies that one mandatory feature key, i.e., “source” requires two mandatory qualifiers, “organism” and “mol_type.” Some optional feature keys also require mandatory qualifiers. See Annex I of WIPO Standard ST.26, Sections 5 and 7, for listings of feature keys with mandatory qualifiers.

7. Qualifier Elements

WIPO Standard ST.26 specifies that the element INSDFeature_quals contains one or more INSDQualifier elements. Each INSDQualifier element represents a single qualifier and consists of three dependent elements and one optional attribute, as shown below:

Element

INSDQualifier_name

INSDQualifier_value

NonEnglishQualifier_value

id

Description

Name of the qualifier (see Annex I, Sections 6 and 8).

Value of the qualifier, if any, in the specified format (see Annex I, Sections 6 and 8) and composed in the characters as set forth in paragraph 40(b).

Value of the qualifier, if any, in the specified format (see Annex I, Sections 6 and 8) and composed in the characters as set forth in paragraph 40(a).

A qualifier with a language-dependent free text value may be uniquely identified by using the optional XML attribute 'id' in the element INSDQualifier (see paragraph 87(d)). The value of the 'id' attribute must start with the letter 'q' and continue with any positive integer. The value of an 'id' attribute must be unique to one INSDQualifier element, i.e. the attribute value must only be used once in a sequence listing file.

Mandatory/Optional

Mandatory

Mandatory, when specified (see paragraph 87 and Annex I, Sections 6 and 8)

Mandatory, when specified (see paragraph 87 and Annex I, Sections 6 and 8)

Optional

Reproduced from paragraph 76 of WIPO Standard ST.26.
8. Organism and Mol_type Qualifiers

WIPO Standard ST.26, paragraph 77, specifies that the organism qualifier, i.e., “organism” for nucleotide sequences (See Table 5: List of Qualifier Values for Nucleotide Sequences with Language-Dependent Free-Text Values reproduced in MPEP § 2413.01(h), Annex I, section 6, of WIPO Standard ST.26) and “organism” for amino acid sequences (see Table 6: List of Qualifiers for Amino Acid Sequences with Language-Dependent Free Text Values reproduced in MPEP § 2413.01(h), Annex I, section 6, of WIPO Standard ST.26) must disclose the source, i.e., a single organism or origin, of the sequence. Organism designations should be selected from a taxonomy database.

WIPO Standard ST.26, paragraph 78, specifies that if the sequence is naturally occurring and the source organism has a Latin genus and species designation, that designation must be used as the qualifier value. The preferred English common name may be specified using the qualifier “note” for nucleotide sequences and amino acid sequences, but must not be used in the organism qualifier value.

WIPO Standard ST.26, paragraph 80, specifies that if the sequence is naturally occurring and the source organism has a known Latin genus, but the species is unspecified or unidentified, then the organism qualifier value must indicate the Latin genus followed by “sp”.

WIPO Standard ST.26, paragraph 81, specifies that if the sequence is naturally occurring, but the Latin organism genus and species designation is unknown, then the organism qualifier value must be indicated as “unidentified”. Any known taxonomic information should be indicated in the qualifier “note” for nucleotide sequences and the qualifier “note” for amino acid sequences.

WIPO Standard ST.26, paragraph 82, specifies that if the sequence is naturally occurring and the source organism does not have a Latin genus and species designation, such as a virus, then another acceptable scientific name (e.g., “Canine adenovirus type 2”) must be used as the organism qualifier value.

WIPO Standard ST.26, paragraph 83, specifies that if the sequence is not naturally occurring, the organism qualifier value must be indicated as “synthetic construct.” Further information with respect to the way the sequence was generated may be specified using the qualifier “note” for nucleotide sequences and the qualifier “note” for amino acid sequences.

WIPO Standard ST.26, paragraph 84, specifies that the “mol_type” qualifier for nucleotide sequences and “mol_type” qualifier for amino acid sequences must disclose the type of molecule represented in the sequence. These qualifiers are distinct from the element INSDSeq_moltype discussed above where INSDSeq_moltype for nucleotide sequences, including nucleotide analogue sequences must be indicated as DNA or RNA, and for amino acid sequences, must be indicated as AA:

  • (1) For a nucleotide sequence, the “mol_type” qualifier value must be one of the following: “genomic DNA”, “genomic RNA”, “mRNA”, “tRNA”, “rRNA”, “other RNA”, “other DNA”, “transcribed RNA”, “viral cRNA”, “unassigned DNA”, or “unassigned RNA”. If the sequence is not naturally occurring, i.e. the value of the “organism” qualifier is “synthetic construct”, the “mol_type” qualifier value must be either “other RNA” or “other DNA”;
  • (2) For an amino acid sequence, the “mol_type” qualifier value is “protein.”
9. Free Text

WIPO Standard ST.26, paragraph 85, specifies that “free text” is a type of value format for certain qualifiers presented in the form of a descriptive text phrase or other specified format (see MPEP § 2413.01(h) for the definition of “free text” and see Annex I of WIPO Standard ST.26 for controlled vocabulary).

WIPO Standard ST.26, paragraph 86, specifies that the use of free text must be limited to a few short terms indispensable for the understanding of a characteristic of the sequence. For each qualifier other than the “translation” qualifier, the free text must not exceed 1000 characters.

WIPO Standard ST.26, paragraph 87, specifies that language-dependent free text is the free text value of certain qualifiers that is language-dependent in that it may require translation for international, national, or regional procedures. Qualifiers for nucleotide sequences with a language-dependent free text value format are identified in Annex I, Table 5: List of Qualifiers with Language-Dependent FreeText Values for Nucleotide Sequences (reproduced in MPEP § 2413.01(h)). Qualifiers for amino acid sequences with a language-dependent free text value format are identified in Annex I, Table 6: List of Qualifiers with Language-Dependent Free Text Values for Amino Acid Sequences (reproduced in MPEP § 2413.01(h)).

10. Coding Sequences

WIPO Standard ST.26, paragraph 89, specifies that the “CDS” feature key may be used to identify coding sequences, i.e., sequences of nucleotides which correspond to the sequence of amino acids in a protein and the stop codon. The location of the “CDS” feature in the mandatory element INSDFeature_location must include the stop codon.

WIPO Standard ST.26, paragraph 90, specifies that the “transl_table” and “translation” qualifiers may be used with the “CDS” feature key (see Annex I of WIPO Standard ST.26). Where the “transl_table” qualifier is not used, the use of the Standard Code Table (see Annex I, Section 9, Table 7 of WIPO Standard ST.26) is assumed.

WIPO Standard ST.26, paragraph 91, specifies that the “transl_except” qualifier must be used with the “CDS” feature key and the “translation” qualifier to identify a codon that encodes either pyrrolysine or selenocysteine.

WIPO Standard ST.26, paragraph 92, specifies that an amino acid sequence encoded by the coding sequence and disclosed in a “translation” qualifier that is encompassed by the description of sequences found in MPEP § 2412.03 must be included in the sequence listing and assigned its own sequence identifier. The sequence identifier assigned to the amino acid sequence must be provided as the value in the qualifier “protein_id” with the “CDS” feature key. The “organism” qualifier of the “source” feature key for the amino acid sequence must be identical to that of its coding sequence.

11. Variants

MPEP § 2412.05(c) provides information about representation and inclusion of variants

WIPO Standard ST.26, paragraph 93, specifies that a primary sequence and any variant of that sequence, each disclosed by enumeration of its residues and encompassed by the description of sequences found in MPEP § 2412.03 must each be included in the sequence listing and assigned its own sequence identifier.

WIPO Standard ST.26, paragraph 94, specifies that any variant sequence, disclosed as a single sequence with enumerated alternative residues at one or more positions, must be included in the sequence listing and should be represented by a single sequence, wherein the enumerated alternative residues are represented by the most restrictive ambiguity symbol. See MPEP § 2412.05(b), subsection II, for more information regarding representing alternative nucleotide residues and MPEP § 2412.05(d), subsection II, for more information regarding representing alternative amino acid residues.

WIPO Standard ST.26, paragraph 95, specifies that any variant sequence, disclosed only by reference to deletion(s), insertion(s), or substitution(s) in a primary sequence in the sequence listing, should be included in the sequence listing. Where included in the sequence listing, such a variant sequence:

  • (a) may be represented by annotation of the primary sequence, where it contains variation(s) at a single location or multiple distinct locations and the occurrence of those variations are independent;
  • (b) should be represented as a separate sequence and assigned its own sequence identifier, where it contains variations at multiple distinct locations and the occurrence of those variations are interdependent; and
  • (c) must be represented as a separate sequence and assigned its own sequence identifier, where it contains an inserted or substituted sequence that contains in excess of 1000 residues (see WIPO Standard ST.26, paragraph 86).

 

WIPO Standard ST.26, paragraph 96, specifies the proper use of feature keys and qualifiers for nucleic acid and amino acid sequence variants from the table List of Feature Keys and Qualifiers (reproduced in MPEP § 2412.05(c)).

WIPO Standard ST.26, paragraph 97, specifies that annotation of a sequence for a specific variant must include a feature key and qualifier, as indicated in the table in MPEP § 2412.05(c), and the feature location. The value for the “replace” qualifier must be only a single alternative nucleotide or nucleotide sequence using only the symbols in set forth Table 1: List of Nucleotides Symbols (see MPEP § 2413.01(a)), or empty. A listing of alternative residues may be provided as the value in the “note” qualifier. In particular, a listing of alternative amino acids must be provided as the value in the “note” qualifier where “X” is used in a sequence, and represents a value other than “any one of ‘A’, ‘R’, ‘N’, ‘D’, ‘C’, ‘Q’, ‘E’, ‘G’, ‘H’, ‘I’, ‘L’, ‘K’, ‘M’, ‘F’, ‘P’, ‘O’, ‘S’, ‘U’, ‘T’, ‘W’, ‘Y’, or ‘V.’” A deletion must be represented by an empty qualifier value for the “replace” qualifier or by an indication in the “note” qualifier that the residue may be deleted. An inserted or substituted residue(s) must be provided in the “replace” or “note” qualifier. The value format for the “replace” and “note” qualifiers is free text and must not exceed 1000 characters. See below for sequences encompassed by the definition in MPEP § 2412.03 that are provided as an insertion or a substitution in a qualifier value.

WIPO Standard ST.26, paragraph 98, specifies that the symbols set forth in Tables 1 to 4 of Annex I, reproduced in MPEP §§ 2412.03(a)2412.03(c), and 2412.05(b), subsection III, should be used to represent variant residues where appropriate. For the “note” qualifier, where the variant residue is a modified residue not set forth in Tables 2 or 4 the complete unabbreviated name of the modified residue must be provided as the qualifier value. Modified residues must be further described in a feature table as described in MPEP § 2412.05(b), subsection III for modified nucleotides and MPEP § 2412.05(d), subsection III, for modified amino acids.

WIPO Standard ST.26, paragraph 100, specifies that a sequence encompassed by the description of sequences found in MPEP § 2412.03 that is provided as an insertion or a substitution in a qualifier value for a primary sequence annotation must also be included in the sequence listing and assigned its own sequence identifier.

Trust Your Patent Sequence Listings to the Industry's Leading Experts

Diving deeper into the Sequence tool by WIPO

A. Requirement

The WIPO sequence tool is a free to download tool, that can be used on windows, Linux and Mac Operating system. The tool requires 64-Bit architecture, with 8 GB RAM for handling heavy sequence list data and up to 200 MB disk Space.

Once installed the user can create “new project” and feed the appropriate metadata and sequence information into the appropriate tabs. The tool allows the addition of sequence greater than 10 nucleotides or greater than 4 amino acids. The software only supports linear/unbranched amino acid/ nucleotide sequence.

B. Generating the sequence List

Upon installing the latest version of WIPO sequence tool, A home screen appears. This home screen shows various options such as Projects, Persons & Organizations, Organisms, help, references, and language options.

Table : Depicting List of elements of the general information part relate to patent application information

Element

Application Identification

IPOfficeCode

ApplicationNumberText

FilingDate

ApplicantFileReference

EarliestPriorityApplicationId entification

ApplicantName

InventorName

InventorNameLatin

InventionTitle

SequenceTotalQuantity

Description

The application identification for which the sequence listing is submitted

ST.3 Code of the office of filing

The application number as provided by the office of filing (e.g., PCT/IB2013/099999).

The date of filing of the patent application for which the sequence listing is submitted (e.g., 2015-01-31)

A single unique identifier assigned by applicant to identify a particular application, typed in the characters

The identification of the earliest priority application

Name of the first mentioned applicant typed in the characters.

Name of the first mentioned inventor typed in the characters.

Where InventorName is typed in characters

Title of the invention typed in the characters in the language of filing.

The total number of all sequences in the sequence listing including intentionally skipped sequences

Mandatory/Optional

Mandatory when a sequence listing is furnished at any time following the assignment of the application number

Mandatory

Mandatory

Mandatory when a sequence listing is furnished at any time following the assignment of a filing date

Mandatory when a sequence listing is furnished at any time prior to assignment of the application number; otherwise, Optional

Mandatory where priority is claimed

Mandatory

Optional

Optional

Mandatory in the language of filing. Optional for additional languages

Mandatory

C. Creating a Project
Upon clicking the on “projects” option one can initiate the sequence listing by clicking on the “New Project” option
D. Adding a Project Name & optional Description

Upon adding the Project Name and adding optional Description. This action can be saved by clicking on “Save” button.

E. Adding Sequence and additional Data

The sequence and other information can be added upon clicking on the project name that is now visible on the home screen.  The screen now

Adding Sequence and additional Data 2
Adding Sequence and additional Data 3

The user to enter application identification number, Priority identification, applicant & inventor. The various option can be accessed by scrolling down the screen. One can manually add the sequence or import a file containing sequence data. This can be manually done clicking on” import sequence” button option.  The file can be upload which can contain multi-sequence information in a FASTA Format.

F. Validation

Upon completion of the sequence data submission one can click on Validation to check for any errors that might have occurred in data submission.

Validation
G. Generating Sequence listing

Following Validation, one can easily generate report in ST.26 format upon clicking on the “Generate Sequence Listing” Option. A window appears requiring to specify the location to save the newly generated XML folder.

Generating Sequence listing
H. Result

The XML file when open shows the sequence listing with the project name, software used, production date. This is followed by the bibliographical information, sequence information and sequence.

We are the leading Patent Sequence Listing Company

At our Sequence Listing Company, we specialize exclusively in creating perfect patent sequence listings for biotechnology and pharmaceutical companies. Founded by patent attorneys and bioinformatics specialists with over 10 years of experience, we understand the critical intersection of scientific innovation and intellectual property protection. Our dedicated team has helped hundreds of companies successfully navigate the complex regulatory requirements of sequence listings across global patent offices. We combine technical precision with regulatory expertise to ensure your valuable innovations receive the protection they deserve without delays or complications.

Our Expertise

Trust Your Patent Sequence Listings to the Industry's Leading Experts

Powered by

Effectual Services is an award-winning Intellectual Property (IP) management advisory & Consulting firm.

Office
@2025 The Sequence Listing. All rights reserved.