학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A Large-Scale Database for Chemical Structure Recognition and Preliminary Evaluation

Resource Type: Conference
Authors: Ding, Longfei; Zhao, Mengbiao; Yin, Fei; Zeng, Shuiling; Liu, Cheng-Lin
Source: 2022 26th International Conference on Pattern Recognition (ICPR) Pattern Recognition (ICPR), 2022 26th International Conference on. :1464-1470 Aug, 2022
Subject: Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Image recognition
Databases
Annotations
Biological system modeling
Benchmark testing
Character recognition
Chemical compounds
Chemical Structure Recognition
Database
Image-to-Markup
Language
ISSN: 2831-7475

Online Access

Full Text (IEEE)

초록

Chemical structure recognition (CSR), transforming chemical structure images into formulas in character strings (such as SMILES), is a challenging problem due to the complex 2D structures and relationships. For this research, there is not a database of sufficient scale and diversity for model design and fair evaluation. In this paper, we present a large-scale chemical structure database named CASIA-CSDB, containing 480,668 samples (images corresponding to SMILES strings). To construct the database, we select chemical structures from the ChEMBL, a well-known bioactive molecules database, and use the RDKit tool to generate images according to the chemical format SMILES strings. The selected structures represent the major types of chemical compounds covering eight weight partitions. We also select a subset of 97,309 samples of the database to form the Mini-CASIA-CSDB database. To provide a benchmark, we evaluate three state-of-the-art image-to-markup recognition methods on the database. The results demonstrate the challenge of the database. The database with its annotation is available at http://www.nlpr.ia.ac.cn/databases/CASIA-CSDB/index.html.

공지

DAU Library

학술논문

요약정보

A Large-Scale Database for Chemical Structure Recognition and Preliminary Evaluation

Online Access

초록