Datasets
Datasets for AI and digital transformation
AI and digital transformation are moving mainstream at an accelerated pace. Combine your internal knowledge with our domain-specific curated and enriched data to answer R&D questions with greater precision. Explore data packages and custom options below.
Accelerate discovery
Integrate reliable and actionable scientific data into custom applications and third-party tools to enable business use cases, such as:
Enterprise, federated, and/or semantic search
Business intelligence dashboards
Knowledge graph creation
Rising star and KOL analyses
Accurate analyses
Transform validated data into scientific insights by incorporating Elsevier datasets within your computational ecosystems to:
Train algorithms and neural networks
Develop predictive models, such as material property predictions or drug-drug interactions
Perform protein-ligand binding QSAR
Automate and enhance tasks and workflows
Changing AI in drug discovery
Elsevier understands the challenges of life sciences R&D. Our comprehensive, high-quality and machine-readable datasets provide clear data provenance and support evidence-based decisions. Through our vast history of supporting the life sciences, we can provide:
Curated datasets from full-text articles in disciplines across life sciences, including medicine, chemistry, biochemistry, genetics, immunology, microbiology, pharmacology, toxicology and more.
FAIR data in 11 therapy areas
Specialized data, such as bioactivity data, biological relationships, substances, reactions and more
What types of datasets are available?
Flexible data packages are available tailored to your needs, including:
Data from 2,500 journals representing 24 major discipline areas
FAIR data in 11 therapy areas – full-text journals data enriched with machine-readable metadata, including premium collection titles from Cell Press and The Lancet
Read more about full-text scientific datasets from Elsevier
Download the factsheet opens in new tab/window
View the list of titles included in the Journals Data subject collections opens in new tab/window
Three datasets of abstracts, authors and affiliations, and evaluation metrics cover 24 research disciplines from 7,000 publishers. Extracted data from peer-reviewed scientific journals, books, serials, patents and conference proceedings includes:
1.8 billion cited references
17.6 million author profiles
94,800 institutional profiles
11.7 million conference papers from more than 149,000 events
Chemical structure, reaction and bioactivity data that has been experimentally validated available via API or flat file.
Datasets come from a variety of sources, including:
264 million substances and associated properties
62 million reactions with experimental conditions and literature references
65 million documents from 16,000 journals
38 million patents from 105 patent offices
44 million bioactivities
32,000 unique targets
54,000 species
Indexed with the Emtree life science thesaurus, data and data sources from biomedical peer-reviewed literature, in-press publications and conference abstracts include:
41 million records
8,300 journals including 2,900 not found in MEDLINE
3.6 million conference abstracts from 11,500 conferences
The Dataset currently includes 18.6 million biological relationships from 36 million MEDLINE abstracts and 7.6 million full-text articles. These include protein-protein interactions, and effects of proteins, compounds and cells on diseases and cell processes. The information comes from full-text literature on Elsevier’s ScienceDirect, and from other high-impact publishers, and public and proprietary databases. The Dataset also includes:
1.3 million small molecule protein interactions from Reaxys
150,000 data points from ClinicalTrials.gov
600,000 relationships from public databases of protein-protein interactions, small molecule protein interactions, MiRNA effects, SNP annotations and more
Download the EmBiology Dataset factsheet opens in new tab/window
A variety of APIs offer structured data extracted from FDA and EMA regulatory documents, including:
5,000 approved drugs
2 million extracted PK data records on over 95 PK parameters
600,000 extracted enzyme and transporter data records: drug as inducer, inhibitor or substrate
1.8 million extracted safety and adverse event data
3.8 million extracted efficacy data on clinical trials from regulatory packages
Data sources:
3 million pages of FDA approval documents including labels, approval packages, DESI documents, Advisory Board documents
384,000 pages of EMA approval packages
20 million FDA post-market reports (FAERS)