Email

pgferreira@fc.up.pt

Phone

+351 220402959

Location

Porto, Portugal

About me

Education and professional summary.

About

Pedro Ferreira is an Assistant Professor with Habilitation and Tenure at the Department of Computer Science at FCUP and a Senior Researcher at the Artificial Intelligence and Decision Support Lab at INESCTEC. He is the Director of the BSc in Bioinformatics, Sub-Director of the BSc in Artificial Intelligence and Data Science, and Sub-Director of the MSc in Bioinformatics and Computational Biology. His research combines artificial intelligence, machine learning, and bioinformatics to advance genomic data science and precision medicine. He has contributed to major international consortia such as ENCODE, GTEx, and ICGC-CLL, and published in leading journals including Nature, Science, Nature Communications or Machine Learning.

  • Name : Pedro G. Ferreira
  • Email : pgferreira@fc.up.pt
  • Nationality : Portugal
  • Phone : +351220402959
  • Position : Assistant Professor
  • Affiliation : Department of Computer Science, Faculty of Sciences, University of Porto
  • Work Address : FC6, Rua do Campo Alegre, 4169-007 Porto
  • Researcher : INESCTEC
Download My CV
My Expericence
Assistant Professor - Department of Computer Science, Faculty of Sciences University of Porto
2019 - present

Participates in the 1st cycle (L:BIOINF L:IACD ), 2nd cycle (M:CC, M:IERSI and M:BBC) and 3rd cycle MAPi.

Senior Researcher - INESC-TEC
2020 - present

Member of Artificial Intelligence and Decision Support Laboratory.

Associate Researcher - Ipatimup/i3s
2015 - 2019

Position funded by a FCT Investigator Starting Grant (overall success rate 15.1%).

Senior Bioinformatician - CBR Genomics
2014 - 2015

Genomics-as-a-Service company for personalized medicine.

Training
Habilitation in Computer Science - Faculty of Sciences University of Porto
2022
Advanced Computing Training - University of Texas at Austin
2018

Visiting Researcher at Texas Advanced Computing Center & Dell Medical School. Fellowship from UTAustin|Portugal.

Post-doctoral Fellow - University of Geneva, School of Medicine
2012 - 2014

Supervision: E. Dermitzakis.

Post-doctoral Fellow - Centre for Genomic Regulation
2018 - 2012

Supervision: R. Guigó. Supported by a Postdoctoral fellowship from FCT Portugal (2008-2010).

PhD in Artificial Intelligence - University of Minho
2003 - 2007

Thesis: Sequence Pattern Mining in Biochemical Data. Supervision: P. Azevedo. Supported by a PhD fellowship from FCT Portugal.

Bachelor Degree (5 Years) - University of Minho
1997 - 2002

Systems and Informatics Engineering.

Books

Two books that I have co-authored with Miguel Rocha on the topics of data analysis with R and Algorithms for Bioinformatics with Python.

Análise e Exploração de Dados em R

Bioinformatics Algorithms: Design and Implementation in Python

Publications

Some selected papers published in the last years and representative of my main research lines.

GenMed'25

The molecular impact of cigarette smoking resembles aging across tissues

Jose M. Ramirez*, Rogério Ribeiro*, Oleksandra Soldatkina, Athos Moraes, Raquel García-Pérez, Winona Oliveros, Pedro G Ferreira#, Marta Melé#

Genome Medicine

2025

We use data from the Genotype-Tissue Expression Project (GTEx) to perform a characterization of the effect of cigarette smoking across human tissues. We perform a multi-tissue analysis across 46 human tissues. Our multi-omics characterization includes analysis of gene expression, alternative splicing, DNA methylation, and histological alterations. We further analyze ex-smoker samples to assess the reversibility of these molecular alterations upon smoking cessation.

GYOSA'25

Exploiting Trusted Execution Environments and Distributed Computation for Genomic Association Tests

Cláudia V Brito, Pedro G Ferreira, João T Paulo

IEEE Journal of Biomedical and Health Informatics

2025

We introduce Gyosa, a secure and privacy-preserving distributed genomic analysis solution. By leveraging trusted execution environments (TEEs), Gyosa allows users to confidentially delegate their GWAS analysis to untrusted infrastructures. Gyosa implements a computation partitioning scheme that reduces the computation done inside the TEEs while safeguarding the users' genomic data privacy.

ML'24

Integration of multi-modal datasets to estimate human aging

Rogério Ribeiro, Athos Moraes, Marta Moreno, Pedro G Ferreira

Machine Learning

2024

Integration of multi-modal datasets is a powerful approach for the analysis of complex biological systems, with the potential to uncover novel aging biomarkers. In this study, we leveraged publicly available epigenomic, transcriptomic and telomere length data along with histological images from the Genotype-Tissue Expression project to build tissue-specific regression models for age prediction.

IE'23

Privacy-Preserving Machine Learning on Apache Spark

Cláudia V Brito, Pedro G Ferreira, Bernardo L Portela, Rui C Oliveira, João T Paulo

IEEE Access

2023

This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks.

BMC'22

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Moreno, M; Vilaça, R; Ferreira, PGC;

BMC Bioinformatics

2022

Background: Gene expression studies are an important tool in biological and bio- medical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifi- cally machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: Inthispaperwereviewthemainstepsandbottlenecksinmachinelearning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https://github.com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures. Keywords: Machine learning, Scalable data science, Gene expression, Transcriptomics, Data analysis

BI'20

Deep learning for drug response prediction in cancer

Baptista, D; Ferreira, PG; Rocha, M;

Briefings in Bioinformatics

2020

Predicting the sensitivity of tumors to specific anti-cancer treatments is a challenge of paramount importance for precision medicine. Machine learning(ML) algorithms can be trained on high-throughput screening data to develop models that are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Deep learning (DL) refers to a distinct class of ML algorithms that have achieved top-level performance in a variety of fields, including drug discovery. These types of models have unique characteristics that may make them more suitable for the complex task of modeling drug response based on both biological and chemical data, but the application of DL to drug response prediction has been unexplored until very recently. The few studies that have been published have shown promising results, and the use of DL for drug response prediction is beginning to attract greater interest from researchers in the field. In this article, we critically review recently published studies that have employed DL methods to predict drug response in cancer cell lines. We also provide a brief description of DL and the main types of architectures that have been used in these studies. Additionally, we present a selection of publicly available drug screening data resources that can be used to develop drug response prediction models. Finally, we also address the limitations of these approaches and provide a discussion on possible paths for further improvement.

FG'20

Gender Differential Transcriptome in Gastric and Thyroid Cancers

A Sousa, M Ferreira, C OliveiraC, PG FerreiraC

Frontiers in Genetics 11, 808

2020

Cancer has an important and considerable gender differential susceptibility confirmed by several epidemiological studies. Gastric (GC) and thyroid cancer (TC) are examples of malignancies with a higher incidence in males and females, respectively. Beyond environmental predisposing factors, it is expected that gender-specific gene deregulation contributes to this differential incidence. We performed a detailed characterization of the transcriptomic differences between genders in normal and tumor tissues from stomach and thyroid using Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) data. We found hundreds of sex-biased genes (SBGs). Most of the SBGs shared by normal and tumor belong to sexual chromosomes, while the normal and tumor-specific tend to be found in the autosomes. Expression of several cancer-associated genes is also found to differ between sexes in both types of tissue. Thousands of differentially expressed genes (DEGs) between paired tumor-normal tissues were identified in GC and TC. For both cancers, in the most susceptible gender, the DEGs were mostly under-expressed in the tumor tissue, with an enrichment for tumor-suppressor genes (TSGs). Moreover, we found gene networks preferentially associated to males in GC and to females in TC and correlated with cancer histological subtypes. Our results shed light on the molecular differences and commonalities between genders and provide novel insights in the differential risk underlying these cancers.

NC'18

The effects of death and post-mortem cold ischemia on human tissue transcriptomes

PG FerreiraC, M Muñoz-Aguirre, F Reverter, CPS Godinho, A Sousa, ...

Nature communications 9 (1), 1-15

2018

Post-mortem tissues samples are a key resource for investigating patterns of gene expression. However, the processes triggered by death and the post-mortem interval (PMI) can significantly alter physiologically normal RNA levels. We investigate the impact of PMI on gene expression using data from multiple tissues of post-mortem donors obtained from the GTEx project. We find that many genes change expression over relatively short PMIs in a tissue-specific manner, but this potentially confounding effect in a biological analysis can be minimized by taking into account appropriate covariates. By comparing ante-and postmortem blood samples, we identify the cascade of transcriptional events triggered by death of the organism. These events do not appear to simply reflect stochastic variation resulting from mRNA degradation, but active and ongoing regulation of transcription. Finally, we develop a model to predict the time since death from the analysis of the transcriptome of a few readily accessible tissues.

SC'16

Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

PG Ferreira, M Oti, M Barann, T Wieland, S Ezquina, MR Friedländer, ...

Scientific Reports 6, 32406

2016

Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

SC'15'A

The human transcriptome across tissues and individuals

M Melé*, PG Ferreira*, F Reverter*, DS DeLuca, J Monlong, M Sammeth, ...

* equal contribution

Science 348 (6235), 660-665

2015

Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show stability in postmortem samples. These signatures are dominated by a relatively small number of genes—which is most clearly seen in blood—though few are exclusive to a particular tissue and vary more across tissues than individuals. Genes exhibiting high interindividual expression variation include disease candidates associated with sex, ethnicity, and age. Primary transcription is the major driver of cellular specificity, with splicing playing mostly a complementary role; except for the brain, which exhibits a more divergent splicing program. Variation in splicing, despite its stochasticity, may play in contrast a comparatively greater role in defining individual phenotypes.

SC'15'B

The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans

GTEx Consortium (including PG Ferreira)

Science 348 (6235), 648-660

2015

Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.

SC'20

The GTEx Consortium atlas of genetic regulatory effects across human tissues

GTEx Consortium (including PG Ferreira)

Science 369 (6509), 1318-1330

2020

The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.

Teaching

In the last years I have been involved in teaching in the following courses either as chair or as a teacher for practical classes.

Advanced Topics on Machine Learning (MSc in AI)

Introduction to Data Science (MSc in Data Science and MSc in AI)

Algorithms for the Analysis of Biological Sequences (BSc in Bioinformatics)

Foundations and Applications of Machine Learning (Doctoral Program MAPi)

Algorithms for Bioinformatics / Bioinformatics

Bioinformatics for Master in Medical Informatics

Artificial Intelligence

Imperative Programming (C Language)

Introduction to Programming (Python Language)

Supervisions

Concluded PhD supervisions.

Study of control of protein abundance using multi-omics data from cancer samples

Abel Sousa PhD from University of Porto Supervision w/ Carla Oliveira and Pedro Beltrão 2017-2021

A framework for predicting drug sensitivity and synergy in cancer cells using deep learning.

Delora Soeiro Baptista PhD from University of Minho Co-Supervision w/ Miguel Rocha 2017-2021

Transcriptomics-based prediction of human phenotypes using scalable and secure machine learning approaches

Marta Moreno PhD from University of Porto Supervision w/ M. Melé and R. Oliveira 2018-2022

Towards a Privacy-Preserving Distributed Machine Learning.

Cláudia Brito PhD from University of Minho Co-supervision w/ João Tiago Paulo 2019-2023

Ongoing Supervisions

Ongoing PhD supervisions.

BioPredictor: a tool to predict the outcome of molecular alterations.

Marta Ferreira Candidate PhD Program in Computer Science at University of Porto Co-Supervision w/ Carla Oliveira 2021-2025

Integration of multi-modal genomics datasets with expert data: a patient centered approach to improve diagnosis and prognosis.

Rogério Ribeiro Candidate PhD Program in Computer Science at University of Porto 2021-2025

Deepening the understanding of Alzheimer’s disease risk and progression: a multimodal approach.

Miguel Ângelo Pontes Rebelo Candidate PhD Program in Computer Science at University of Porto 2022-2026

Implementation of Human-Assistant Based an Artificial Intelligence for Critical Networks Infrastructures.

Margarida Antunes da Costa Candidate PhD Program in Computer Science at University of Porto Supervision w/ Ricardo Bessa 2022-2026

Multimodal Deep Learning for Biomarker Discovery in Early-Onset Cancer: A Comprehensive Analysis Across Age Cohorts and Tissue Types.

Athos Mekanna Moraes Candidate PhD Program in Computer Science at University of Porto Supervision w/ Sule Canberk 2025-2029 (to Start in October)

Teste Genético Pré-Implantação em Portugal - impacto, inovação e expansão no âmbito da Medicina Reprodutiva.

Ana Paula Soares Pais Neto PhD Candidate at University of Porto 2025-2029 (to Start in October)

My Blog

The Last posts about Data Analysis and Machine Learning.

July, 2025 #ML
Researchers demonstrate that tobacco mimics and accelerates ageing in human tissues.

A team of researchers from INESC TEC, the Faculty of Sciences of the University of Porto, and the Barcelona Supercomputing Centre analysed 46 types of tissues from over 700 individuals. The team concluded that smoking impacts tissue architecture and can cause molecular changes not only in organs directly associated with smoke inhalation, e.g., lungs, but also in tissues from other organs, including the pancreas, thyroid, oesophagus, and specific regions of the brain. In many cases, the effects of smoking significantly overlap with those of ageing. Check out the coverage of our recent paper in Genome Medicine in the jornal Público.

Público UPorto INESCTEC
June, 2025 #ML
Secure genetics: innovate data processing in cloud computing.

A team of researchers at INESC TEC has developed a new tool called Gyosa, capable of performing genomic studies securely in cloud computing environments without compromising data privacy. Check the coverage of one of our recent publications at the IEEE Journal of Biomedical and Health Informatics.

UPorto Coverage INESCTEC Coverage
June, 2025 #ML
Bioinformatics: at the interface of the ongoing revolutions in biology and computer science.

The University of Porto offers a pioneering course in Bioinformatics that I am proud to be the current director. Bioinformatics is a multidisciplinary field at the intersection of the ongoing revolutions in biology and computer science. Its goal is to analyze and understand biological and biomedical data, particularly at the cellular and molecular level. It involves the development and application of algorithms and computational tools, namely artificial intelligence and machine learning, to organize, analyze, and extract relevant information from large biological data sets, contributing to advances in areas such as genomics and other omics, genetics, systems biology, pathology, ecology, evolution, epidemiology, among others.

BSc in Bioinformatics
7 June, 2021 #ML
New Degree in Artificial Intelligence and Data Science at the University of Porto

Checkout the new Degree in Artificial Intelligence and Data Science at the University of Porto that will open this year with 55 places!

Read More
1 June, 2021 #ML
Check out the conference: Cutaneous Melanoma: from where to where?

I am honoured to be part of the scientific committee of the anual conference of the Portuguese League Against Cancer.

Read More
1 June, 2021 #ML
Dimensionality Reduction: PCA, MDS, t-SNE and UMAP

Dimensionality reduction with Matrix factorization and Neighbour graph based methods. An application case with R.

Read More

Projects

Some of the many projects that I have been collaborating in the last 10 years as a collaborating researcher or as a principal investigator.

AI4REALNET project logo

AI4REALNET

AI4REALNET covers the perspective of AI-based solutions addressing critical systems (electricity, railway, and air traffic management) modelled by networks that can be simulated, and are traditionally operated by humans, and where AI systems complement and augment human abilities.

Role:Investigator

2023-2026

European Union’s Horizon Europe - Grant Agreements No 101119527

Natural Traces project logo

Natural Traces

NATURAL TRACES mission is the education of the next generation of forensic experts for non-human, biological trace evidence analysis.

Role:Investigator;

2023-2028

European Union’s Horizon Europe - Grant Agreements No 101120165

CDH1 3D CHROM project logo

CDH1 3D CHROM

Solving the 3D chromatin structure of CDH1 locus to identify disease-associated mechanisms

PTDC/BTM-TEC/30164/2017

2018-2021

Role:Co-PI

NASA project logo

NASa

Neoantigen signature algorithm to predict immunotherapy response

PTDC/MEC-ONC/32018/2017

2018-2021

Role:Investigator

Treg project logo

Treg

Tregs in cancer immune response

PTDC/MED-PAT/32462/2017

2018-2021

Role:Investigator

HDGC project logo

HDGC

Life-time Risk Estimations and Genetic Modifiers Of Hereditary Diffuse Gastric Cancer

PTDC/BTM-TEC/6706/2020

2021-2024

Role:Investigator

Gastric Cancer project logo

Gastric Cancer

Understanding the impact of acquired and germline genetic variants in the complexity of gastric cancer

IF/01127/2014/CP1245/CT0002

2015-2020

Role: PI

Genotype-Tissue Expression (GTEx) project logo

GTEx

The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq.

2012-2021

Role: PI (Non-funded member of AWG)

Geuvadis project logo

Geuvadis

The Geuvadis project aims to bring together the knowledge and resources on medical genome sequencing at a European level and allow researchers to develop and test new hypotheses on the genetic basis of disease

2012-2016

Role: Investigator

Contact Me

Feel Free To Contact Regarding Academic Issues.

Phone:

+315 220402700

Email:

pgferreira@fc.up.pt

Adress:

Rua do Campo Alegre s/n, 4169-007 Porto