UsageΒΆ

To run the project, download the project files from the github repository using the following command:

git clone https://github.com/simonholmes001/amino_acid_feature_extraction.git
cd

into the folder containing the repo & run the following command to execute the project:

bash -i feature_extraction.sh

Running the:

feature_extraction.sh

script provides all of the functionalities required in order to create a dataset of amino acid chemical & physical properties:

  • Creates a virtual environment in which to run the code
  • FTP request to download the aaindex1 file (see [below](#aaindex))
  • API call to the [PubChem database](https://pubchem.ncbi.nlm.nih.gov/) to extract physical-chemical data (see [here](#pubchem))
  • Data pre-processing steps to extract the index information
  • Creates the folder structure necessary in the basic file structure to store the data (will create a ./data folder in which the downloaded aaindex1 file is stored & an ./output folder in which the extracted features are stored)
  • Combines the features extracted from PubChem to the features extracted from the aaindex1 file
  • Standardises the features
  • Saves all the data in a csv file & in a numpy array