학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Explainable Transcription Factor Prediction with Protein Language Models

Resource Type: Conference
Authors: Gao, Liyuan; Shu, Kyler; Zhang, Jun; Sheng, Victor. S
Source: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Bioinformatics and Biomedicine (BIBM), 2023 IEEE International Conference on. :853-856 Dec, 2023
Subject: Bioengineering
Computing and Processing
Engineering Profession
Robotics and Control Systems
Signal Processing and Analysis
Training
Proteins
Measurement
Protein engineering
Gradient methods
Biological system modeling
Predictive models
transcription factor
protein language model
integrated gradients
prediction
Language
ISSN: 2156-1133

Online Access

Full Text (IEEE)

초록

Language models have exhibited remarkable performance across diverse tasks, including those in the realm of biological research such as protein language modeling. Transcription factors (TFs) are pivotal in gene regulation, influencing gene expression through specific DNA sequence binding. While various TF prediction techniques exist, they often necessitate extensive training datasets or suffer from limited accuracy. In this study, we propose an ESM-TFpredict model, which leverages a pre-trained protein language model to encode amino acid sequences, followed by 1-D convolutional neural networks for TF prediction. To elucidate the model’s decision-making, we employ an integrated gradients method to highlight the important features driving TF identification. Comparative experimental analysis with existing models, DeepTFactor and TFpredict, reveals that the ESM-TFpredict achieves an accuracy exceeding 95% across four evaluation metrics, surpassing both competitors. By utilizing a slide window approach for protein representation compression, the training duration of ESM-TFpredict is 315.78 seconds, which is only 51% of the training time required by DeepTFactor and a mere 12% of the training time required by TFpredict. We further analyze the contributions of known TF-related regions (average attribution score 0.9152) versus Non-TF-related regions (average attribution score 0.0848), demonstrating that the TF-related regions have dominant influences on TF prediction.

공지

DAU Library

학술논문

요약정보

Explainable Transcription Factor Prediction with Protein Language Models

Online Access

초록