Here I provide an ordered list of options (note that I am the author of bed2gtf and bed2gff):
bed2gtf
A high-performance BED-to-GTF converter written in Rust from https://github.com/alejandrogzi/bed2gtf.
Usage: bed2gtf[EXE] --bed/-b <BED> --isoforms/-i <ISOFORMS> --output/-o <OUTPUT> --threads/-t <THREADS>
where:
--bed <BED>: a .bed file
--isoforms <ISOFORMS>: a tab-delimited file
--output <OUTPUT>: path to output file (*.gtf)
The isoforms file specification:
a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):
> cat isoforms.txt
ENSG00000198888 ENST00000361390
ENSG00000198763 ENST00000361453
ENSG00000198804 ENST00000361624
ENSG00000188868 ENST00000595977
Converts
- Homo sapiens GRCh38 GENCODE 44 (252,835 transcripts) in 3.25 seconds.
- Mus musculus GRCm39 GENCODE 44 (149,547 transcritps) in 1.99 seconds.
- Canis lupus familiaris ROS_Cfam_1.0 Ensembl 110 (55,335 transcripts) in 1.20 seconds.
- Gallus galus bGalGal1 Ensembl 110 (72,689 transcripts) in 1.36 seconds.
bed2gff
A Rust BED-to-GFF3 translator that runs in parallel from https://github.com/alejandrogzi/bed2gff.
Usage: bed2gff[EXE] --bed/-b <BED> --isoforms/-i <ISOFORMS> --output/-o <OUTPUT> --threads/-t <THREADS>
where:
--bed <BED>: a .bed file
--isoforms <ISOFORMS>: a tab-delimited file
--output <OUTPUT>: path to output file (*.gff)
The isoforms file specification:
a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):
> cat isoforms.txt
ENSG00000198888 ENST00000361390
ENSG00000198763 ENST00000361453
ENSG00000198804 ENST00000361624
ENSG00000188868 ENST00000595977
Convert
- Homo sapiens GRCh38 GENCODE 44 (252,835 transcripts) in 4.16 seconds.
- Mus musculus GRCm39 GENCODE 44 (149,547 transcritps) in 2.15 seconds.
- Canis lupus familiaris ROS_Cfam_1.0 Ensembl 110 (55,335 transcripts) in 1.30 seconds.
- Gallus gallus bGalGal1 Ensembl 110 (72,689 transcripts) in 1.51 seconds.
bedToGenePred + genePredToGtf + refTable
UCSC offers a fast way to convert BED into GTF files through KentUtils or specific binaries using:
bedToGenePred in.bed /dev/stdout | genePredToGtf file /dev/stdin out.gtf
You can install these tools with bioconda, or download them here. The gene_id is only achieved when using refTables (a format specified in UCSC's web browser), you can see a more elaborate answer here Obtaining Ucsc Tables Via Ftp And Converting Them To Proper Gff3 Via Genepredtogtf?.
Other options
Other scripts/tools That DO NOT produce a complete GTF file (lacking gene_id attributes) are:
gtf2bed < foo.gtf | sort-bed - > foo.bed
awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' foo.bed > foo_from_gtf2bed.gtf
-kscript
from https://github.com/holgerbrandl/kscript:
kscript https://git.io/vbJ4B my.bed > my.gtf
from https://github.com/pfurio/bed2gtf:
python bed2gtf [options] <mandatory>
AGAT
Considering only the options that produce gene_ids attributes, bed2gtf and bed2gff are faster by ~3-4 seconds than UCSC's C binaries. More detailed instructions of this tools are explained in the sources linked.