===== Usage ===== To run the project, download the project files from the github repository using the following command:: git clone https://github.com/simonholmes001/amino_acid_feature_extraction.git :: cd into the folder containing the repo & run the following command to execute the project:: bash -i feature_extraction.sh Running the:: feature_extraction.sh script provides all of the functionalities required in order to create a dataset of amino acid chemical & physical properties: - Creates a virtual environment in which to run the code - FTP request to download the aaindex1 file (see [below](#aaindex)) - API call to the [PubChem database](https://pubchem.ncbi.nlm.nih.gov/) to extract physical-chemical data (see [here](#pubchem)) - Data pre-processing steps to extract the index information - Creates the folder structure necessary in the basic file structure to store the data (will create a ./data folder in which the downloaded `aaindex1` file is stored & an ./output folder in which the extracted features are stored) - Combines the features extracted from PubChem to the features extracted from the `aaindex1` file - Standardises the features - Saves all the data in a csv file & in a numpy array