Project files provided as supporting information to the manuscript "In search of a dynamical vocabulary: a pipeline to construct a basis of shared traits in large-scale motions of proteins" The dataset contains the following files: - aligned_to_reference.zip: PDB files of the proteins included in the dataset, after dynamics-based alignment with respect to the reference protein. - basis_orthonormalized.zip: numpy files including the 45 basis components interpolated on the lattice and orthonormalized. - distance_in_family.zip: average distances (in terms of dynamics) between proteins of the dataset belonging to the same family (Fig. 4). - distance_matrix.zip: it includes the files with the RMSIP and z-score for each pair of proteins from the dataset, and the resulting matrix of the distance in dynamics. The folder includes also the scripts used to generate the dendrograms (Fig. S2 and S3) and those used to generate the distribution of the members of each subfamily among the different clusters, expressed as the percentage with respect to the total number of members of the subfamily (Fig. 5). - MD_analysis.zip: results from the analyses performed on the MD simulations of the proteins 1EKB, 1NPM, 4YOG and 3W94, both on the original trajectories and on the ones filtered on the basis. The folder includes data of the root-mean-square fluctuations (RMSF, Fig. 7 and S6) and cross-correlation matrices (Fig. S7 and S8). - resolution_relevance.zip: data of resolution and relevance computed from the clusterization of the dataset, and script used to compute them (Fig. 2). - rgyr_num_res.zip: radii of gyration and chain lenghts of the proteins included in the dataset (Fig. S1). - RMSIP.zip: root-mean-square inner product (RMSIP) between the subspaces spanned by the first 5 modes of each protein and the first n basis vectors, as a function of the basis size n (Fig. 6).