EMODnet Ingestion

Data Ingestion Portal

Wake up your data - set them free for Blue Society

Contact us

Details of Annotation of the genome assembly (version 2) of the microalga Tisochrysis lutea

Dataset identification

© OpenStreetMap contributors

Title of dataset	Annotation of the genome assembly (version 2) of the microalga Tisochrysis lutea
Narrative summary of dataset	We recently published the first draft genome of T. lutea obtained with the Illumina short-read technology. While this technology has a very low sequencing error rates, the assemblers are known to misassemble the long repeated sequences, resulting into the fragmentation of the genome assembly. The genome of T. lutea was re-sequenced with the long-read technology Pacific Bioscience. Indeed, long-read assemblers show efficiency to resolve the assembly of long repeated elements such as TEs. However, this technology have to date a high sequencing error rates and its combination with short-read Illumina data is became a common method to overcome this error rate. A de novo genome assembly was perform from the long-reads and was improved with Illumina short-read data, used in the first genome assembly version. The de novo genome of T. lutea is composed of 193 contigs and has a size of 82 Mb. A gain of around 30 Mb was obtained (+34%), compared to the previous genome assembly, having a size of 54 Mb and composed of 7,659 contigs. The size of the coding regions has fewly increased between the both genome versions. While the de novo genome assembly encodes for ~16,000 genes, corresponding to a coding region length of 28 Mb, the previous gene proportion of the draft genome version was of 25 Mb. This suggest that the new assembled regions are mostly repeated elements. This new genome version is by far away more accurate than the previous one and was suitable to properly detect and annotate the TE content. To identify potential autonomous TEs, we designed a pipeline named PiRATE (Pipeline to Retrieve and Annotate TEs) and conducted an accurate TE annotation in a de novo genome of T. lutea. We established that its genome is composed of 15.9% and 4.9% of Class I and Class II TEs respectively. Among them 3.8% and 15.95% correspond to potentially autonomous and non-autonomous TEs respectively.
Supporting documentation	How to cite: Berthelier Jérémy, Casse Nathalie, Daccord Nicolas, Jamilloux Véronique, Saint-Jean Bruno, Carrier Gregory (2018). Annotation of the genome assembly (version 2) of the microalga Tisochrysis lutea. SEANOE. https://doi.org/10.17882/52231
Start date	1900-01-01
End date	2018-01-01

Responsible organisations

Country	France
Organisation name	Ifremer, Scientific Information Systems for the sea
Role of organisation	Dataset Holding Organisation

Country	France
Organisation name	SEA scieNtific Open data Edition
Role of organisation	Publisher

Dataset availability

Original dataset download link	https://cloud.emodnet-ingestion.eu/index.php/s/J58SGhDhKsSEELQ
Date of original dataset publication	2019-08-06
Dataset format	Text or Plaintext
	xls
	xlsx
Public access	No limitations
License for use	CC0 1.0
Type	Dataset
DOI	https://doi.org/10.17882/52231

Locations

Map	© OpenStreetMap contributors
Latitude north boundary	89
Longitude east boundary	180
Latitude south boundary	-89
Longitude west boundary	-180
Coordinate reference system	World Geodetic System 84

Data types, collection and processing

Observation type	Other biological measurements
Parameter	Concentration of other substances in biota
	Microphytobenthos biomass
Data quality processing information	Quality controlled data

Process information

Submitting organisation	Ifremer, Scientific Information Systems for the sea
Submission identifier (UUID)	4c25bb60-8d90-4127-92ed-c5df6a989e56
Date of dataset creation	2018-01-01
Date of metadata creation	2019-05-17
Date of metadata latest revision	2019-07-19
Date of publishing	2019-08-06
Processing data centre	Ifremer, Scientific Information Systems for the sea
Summary record-ID	544