UsageΒΆ
To run the project, download the project files from the github repository using the following command:
git clone https://github.com/simonholmes001/amino_acid_feature_extraction.git
cd
into the folder containing the repo & run the following command to execute the project:
bash -i feature_extraction.sh
Running the:
feature_extraction.sh
script provides all of the functionalities required in order to create a dataset of amino acid chemical & physical properties:
- Creates a virtual environment in which to run the code
- FTP request to download the aaindex1 file (see [below](#aaindex))
- API call to the [PubChem database](https://pubchem.ncbi.nlm.nih.gov/) to extract physical-chemical data (see [here](#pubchem))
- Data pre-processing steps to extract the index information
- Creates the folder structure necessary in the basic file structure to store the data (will create a ./data folder in which the downloaded aaindex1 file is stored & an ./output folder in which the extracted features are stored)
- Combines the features extracted from PubChem to the features extracted from the aaindex1 file
- Standardises the features
- Saves all the data in a csv file & in a numpy array