1. Installing Ubuntu/Linux via ‘VirtualBox’
[Example – AF’s computer: Windows 7 Home Basic; Intel® Core™ i9-2330M CPU @ 2.20 GHz; 4.00 GB RAM; 64-bit OS.]
Linux/Mac OSX./fastq-dump -X 5 -Z SRR390728 Windows: fastq-dump.exe -X 5 -Z SRR390728 If successful, the test should connect to NCBI, download a small amount of data from SRR390728 and the reference sequence needed to extract the data, and stream the first 5 spots of the file ('-X 5' option) to the screen ('-Z' option). FastQC A quality control application for high throughput sequence data. README; Installation and setup instructions; Release Notes Please read these before using the program. FastQC v0.11.9 (Win/Linux zip file) FastQC v0.11.9 (Mac DMG image) Source Code for the latest FastQC release; FastQ Screen A screening application for high througput.
RNA-seq Read Mapping with TOPHAT and STAR. Prepare the working directory. Find out the name of the computer that has been reserved for you.
1.1 Download and install Ubuntu
(i) Download VirtualBox for Windows (here: v5.0.4 or latest from https://www.virtualbox.org/wiki/Downloads).
(ii) Download Linux (here “Ubuntu 14.04.3 LTS; 64-bit” from http://www.ubuntu.com/download/desktop). [LTS = Long_Term_Support recommended; make a donation!]
(iii) Install VirtualBox. Double-click on file downloaded under 1. Press “Run”. A wizard opens, press “Next”. Keep default settings, and as you go you may be asked to install software devices (e.g., network adapters) – install all.
(iv) Open VirtualBox (might open by default), click “New” and create a virtual machine (fill in a “Name”, “Type” is Linux, “Version” is Ubuntu-64-bit); press “Next”. Allocate memory (stay in green zone). Check “Create a virtual hard-drive now”; click “Create”, check VDI, click “Next”. For ‘Storage…’, choose “dynamically allocated”, click “Next”, provide a name for the new virtual hard-disk file and select the size of the virtual hard disk, click “Create”.
(v) You should automatically return to “VirtualBox Manager”. [Click (orange) “Settings”, which opens a menu. Note what can be accessed. On left: e.g., “General”, “System”, etc… On right: choices. Look around; e.g., under “Advanced” allow “DragNDrop” bidirectionality.]
(a) Under “System”, change boot order and place Optical on top (we’ll boot from virtual optical file). Press ‘OK’.
(b) Under “Storage”, Select “Controller: IDE”. In highlighted area, there are two pluses – click the first (“Adds optical drive”), and “Choose Disk”. Select the Ubuntu file you downloaded under 2 (here “ubuntu-14.04.3-desktop-amd64.iso”), and click “Open”. Select this file in the “Controller: IDE” tree, and press “OK”.
(c) Start the VirtualBox by pressing “Start” (green). At “Welcome”, click on “Install Ubuntu”. You’ll be prompted to make choices – choose defaults settings. A screen “Installation type” should appear. High-stress moment! It should say: ‘This computer currently has no detected operating systems. https://pacemmeve.tistory.com/15. What would you like to do?’ This means that your Virtual Box contains no operating systems (which is separate from the rest of the computer) – so you can check “Erase disk and install Ubuntu” despite the horrendous warning, click “Install Now”, and select options as appropriate.
(vi) Various things may happen after installation (e.g., a request to switch off your computer). Next, open (or return to) Oracle VM Virtual Box. Click on “System” on the right, and (in the “Motherboard” tab) check boot order – put “Hard Disk” at the top. [To increase CPU number, select the “Processor” tab, and increase.]
(vii) On the “Manager” screen of VirtualBox, press “Start” (green arrow). The Ubuntu Desktop opens within VirtualBox. To get a full Ubuntu screen, press tab ‘Devices’, select ‘Insert Guest Additions cd image’ (installs Virtualbox-Guest-Additions), select “Run”, provide password, press return when asked. Close Ubuntu (press the cog at top right, and select restart). Alternatively, close Ubuntu, and reopen it using the green arrow in VirtualBox – Ubuntu should now fill the VirtualBox screen.
(viii) You’ve done it! Welcome to Linux/Ubuntu.
(ix) Some housekeeping. You may want to share folders, or hard-drives between your host (Windows) and guest (Ubuntu) systems. To share folder(s) or hard-drive(s), create a folder in your Host system (e.g. on Desktop in Windows). Here we make a folder called “shared_folder”. Open the VirtualBox, and go to “Settings” (orange). At the lower end of the sub-menu click on “Shared Folders”. At the right side of the following pop-up window you can find a folder symbol with a green “+”. Click on it and set the Folder Path to the “shared_folder” (for example if you made the “shared_folder” on the Windows Desktop, direct the Path to the folder on Windows Desktop. Also, tick auto-mount. When done click “Ok”.
1)Start Ubuntu (press green “Start”). Once you are logged into Ubuntu we’ll make the “shared_folder” accessible to Ubuntu.
In Ubuntu, open a terminal (also called console or command line) by “Ctrl+Alt+t”. A terminal window will open. If you can’t understand the following yet, just hang on – it’ll become clear. We’ll first need to tell Ubuntu where we will be “mounting” the “shared_folder” (that we originally created in windows). We will be mounting the “shared_folder” to a directory called “?mnt/share”. The folder “mnt” already exists on our Ubuntu system, but the folder “share” needs to be created. In the terminal type (without the “$”, which in the following is just used to visualize that we are doing something in the terminal –
$ sudo mkdir /mnt/share
You’ll probably be asked for your password (type the password that you set to log into Ubuntu – while typing, the password will not be shown, keep on typing and press ENTER afterwards. Junos pulse download mac catalina. If you did not do some spelling mistake, you’ll have a new folder now and the terminal will go back to the “$” sign, waiting for the next input0.
Next, we’ll be mounting the “shared_folder” (made in Windows) to the “/mnt/share” folder (made in Ubuntu).
$ sudo mount –t vboxsf shared_folder /mnt/share
Nhl 09 cd key generator download. (If required, provide your password.)
With this basic setup, we are ready to turn to NGS-data analysis.
At the very end of this document, we’ll have a compiled (non-exhaustive) list of useful Linux commands (since they are not specifically explained throughout the data analysis steps.
NGS-sequencing Data in Linux/Ubuntu
-Quality control (using fastqc).
0)The way we will be running the fastqc-application is used to demonstrate some principles of using a Linux/Ubuntu system. Those principles hold true for most of the other software/programs we’ll be using later on and will give you an idea how Linux/Ubuntu ticks.
1)For some initial quality check of the sequencing data (Chip-seq or RNA-seq), one can use FASTQC.
2)Download here - http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Click on “FastQC v0.11.3 (Win/Linux zip file” (or more recent version of available) and Save File.
Unless you change the destination folder for downloads, all files will be saved in the “Download” folder. You can access that folder via the terminal (“Ctrl+Alt+t”) then type
$ cd Downloads
Alternatively, click on the icon “Files” in the left toolbar, then double-click the folder “Downloads”. Copy the downloaded zip-file to the Desktop (“Ctrl x, ctrl v” works in Linux just as in Windows).
Once the FASTQC folder is on the Desktop, mark it, right mouse click, select “Extract here” to unpack the zip-file. In order to run fastqc in an interactive mode (displaying a user interface), one needs to make the file fastqc executable.
First, let’s have a look what is in the FastQC folder itself (open a terminal “Ctrl+Alt+t”) and go into the FastQC folder by typing
$ cd Desktop/FastQC
Type
$ ls –al
You will see files and directories that are within the FastQC folder.
At the very left side, you can see the so-called ‘permission bits’ (e.g. –rwxr-xr--, or drwxrwxrw-; the “d” designates something as a directory, r = read, w=write, x=execute).
If you take look at the file ‘fastqc’ the permission is initially set to
-rw-rw-r--.
Let’s take this apart (the colour coding is just for explanation purposes)
-rw-rw-r--
- means that it is a file (it would say “d” if it is a directory)
rw- means that the “user” can read and write the file, but not execute
rw- means that “groups” can read and write the file but not execute
r—means that “all others” can read the file, but not write to it or execute it
The permission bits are summed up per user/group/others
For the file “fastqc” the original permission is
-rw-rw-r—
which corresponds to the number/bit code 664 (r=4, w=2, x=1)
6 (r=4 + w=2)
6 (r=4 + w+2)
4 (r=4)
The following command will change the file permission and make it executable to everyone
$ chmod 755 fastqc
This will make the file executable. If you receive a ‘permission denied’ error retype the above command like this
$ sudo chmod 755 fastqc.
Check the permission bits of the fastqc file again and type
$ ls -al
It has now changed to –rwxr-xr-x (which equals 755 in permission bits).
Try to run fastqc from within the FastQC folder by typing
$ ./fastqc
That will likely return an error message, saying that it can’t exec “java”…etc… . fastqc is written in the “java” programming language, which so far is not installed on the Ubuntu distribution we initially installed (14.04.3LTS).
Type
$ java
You will now see, that the ‘program’ java can be found in the following packages – default-jre…,….,…., .
In order to install ‘java’ type
$ sudo apt-get install default-jre
Provide your password, and answer “y” when asked for using disk space for the installation.
After the installation is complete, type
$ java –version
which tells you which version of java is now on your Ubuntu system.
Let’s try again to run fastqc.
Within the FastQC folder (in case you forgot how to get there, open a terminal with “Ctrl+Alt+t”, and type
$ cd Desktop/FastQC
and press ENTER
Now type
$ ./fastqc
which should start FASTQC in interactive mode.
You can now load your .fastq files (zipped or unzipped) into the application to check the quality of them.
Please note – the way we called the fastqc program was from WITHIN the FastQC folder itself.
(user@user-VirtualBox:~/Desktop/FastQC$ ./fastqc).
If you’d just typed
user@user-VirtualBox:~/Desktop/FastQC$ fastqc
that is without the “./” before the word “fastqc” it would not work.
Why?
Whatever you type in the terminal, Linux will look if it can find the command/application/program (whatever you try to run), but by default it will look in a so-called PATH (which is nothing else, than a collection of folders, that Linux will go through to look for the command/application/software).
To take a look at what the PATH is type
$ echo $PATH
This will show something like this
/usr/local/sbin:/usr/local/bin:……,……/usr/local/games
These are all the folders that Linux will look into (in the order from left to right), and if it finds the respective command it will execute it – otherwise, it will give you an error (The program ‘xyz’ is currently not installed, orCommand not found etc…).
However, if you don’t want Linux to look into the PATH, but only within your current working directory (for example within the FastQC directory), you’ll need to run the command like this –
user@user-VirtualBox:~/Desktop/FastQC$ ./fastqc
since the sign “./” will instruct Linux to only look within your current
working directory.This also means, that currently the fastqc application can only be executed from the FastQC folder itself.
To be able to run fastqc (and other applications that we will be using soon) from anywhere and not only from within the FastQC folder, there is several ways one can do that. We’ll cover two different ways (one is a transient solution, the other a permanent solution).
Transient solution:
Open a terminal (“Ctrl+Alt+t”) and go to the FastQC folder.
$ cd Desktop/FastQC
Then have a look at the PATH (echo $PATH) before and after the following command
Hola vpn desktop. $ export PATH=$PATH:$PWD
You will see that your fastqc folder has been added to the default search PATH that Linux is using to find commands/applications/software.
You can now call fastqc from any folder/directory. However, that will be specific to the terminal that you used to type
$ export PATH=$PATH:$PWD
Permanent solution (different options, one is mentioned here):
In a terminal type
$ sudo ln -s /home/rudy/Desktop/FastQC/fastqc /usr/local/bin/fastqc
Provide password
The word “rudy” in the above command needs to be replaced with your personal username. The command will make a symbolic link (“sudo ln –s”) from the fastqc file in your folder FastQC (“/home/rudy/Desktop/FastQC/fastqc” – which specifies which file you want to link to) and place the link into the folder “/usr/local/bin” and name it “fastqc” = /usr/local/bin/fastqc. Here, you could give the link any name, for example instead of “/usr/local/bin/fastqc” you could have specified “/usr/local/bin/cqtsaf”.
“sudo” tells Linux that you are the user with administrator rights and therefore will always execute your commands if you provide the correct password.
If you want to find out more about the “ln” command (used to make links) type in a terminal
$ ln –help
which will show you how commands are run and what they are useful for, also the parameters you can provide (in the above example we used “ln -s” for example).
Now, having done this, you can run the fastqc application from any directory and from any terminal. In case you named the link differently from “fastqc”, you’ll need to call the fastqc application with the name you gave it, for example (the above example) by typing
Fastqc Download Mac
$ cqtasf
Downloading and installing the SRA Toolkit
- If you are using a web browser, the following page contains download links to the most current version of the toolkit for each of the supported platforms:
SRA Toolkit download page: Download Page - If you are instead working from a command line interface, you may use FTP or wget to obtain the software from the following directory: 'http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current'.
Example: wget 'http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz'
Unpacking the Toolkit
- Unix:tar -xzf sratoolkit.current-centos_linux64.tar.gz
- Mac OSX: Double-click on the .tar.gz file and the Archive Utility will unpack it. Alternatively, command-line tar will also work (see Linux example, above).
- Windows: use an archiving and compression utility (e.g., Winzip, 7-Zip, etc.), or simply double-click on the .zip file and drag the 'sratoolkit.' folder to the preferred install location.
Note: For most users, the Toolkit functions (fastq-dump, sam-dump, etc.) will not be located in their PATH environmental variable. This may require providing directory information about the location of the Toolkit. See the below examples for how 'fastq-dump' would be called in different circumstances:
Fastqc Command
- ~/[user_name]/sra-toolkit/fastq-dump
The Toolkit 'bin' directory has been placed in the user-specified directory 'sra-toolkit' - ./fastq-dump
The Toolkit components are the in the current working directory - fastq-dump
If the toolkit location is not specified in your $PATH variable, then the OS cannot locate the fastq-dump program, even if it is in the current directory.
NOTE: Windows users should be able to enter only 'fastq-dump.exe' if you have navigated to the Toolkit 'bin' directory.
Testing the Toolkit Configuration
The Toolkit comes with a default configuration that will work for most users. You may elect to perform the following tests to confirm that your configuration is working correctly. The default location for the 'download repository' is:
- Linux:/home/[user_name]/ncbi/public
- Mac OSX:/Users/[user_name]/ncbi/public
- Windows:C:Users[user_name]ncbipublic
Note that if the tests fail, or if you wish to specify the download location for files sourced from NCBI, you should configure your Toolkit installation. During normal operation, the Toolkit may be required to download the following types of data to the default location:
- Reference sequences: Small (most less than 70 MB) sequences used to decompress aligned SRA data.
- SRA data files: If data are downloaded 'on-demand' using the toolkit, then partial and whole SRA datasets (most are several Gb in size) can be located here. Note: Manually downloaded SRA data obtained using a web browser, wget, ascp, or FTP may be stored anywhere in the local file system.
Fastqc Download Linux
For the test, we are using an arbitrary dataset, SRR390728 (RNA-Seq (polyA+) analysis of DLBCL cell line HS0798), from the National Cancer Institute’s Cancer Genome Characterization Initiative (CGCI) Project. Automatic software update mac not working. It is a reasonably small SRA dataset that contains aligned (reference-compressed) data, allowing us to test multiple aspects of the toolkit simultaneously.
Install Fastqc Mac Terminal
- Open a terminal or command prompt and 'cd' into the directory containing the toolkit executables
(e.g., [download_location]/sratoolkit[version]/bin/).- Linux/Mac OSX:./fastq-dump -X 5 -Z SRR390728
- Windows:fastq-dump.exe -X 5 -Z SRR390728
- If successful, the test should connect to NCBI, download a small amount of data from SRR390728 and the reference sequence needed to extract the data, and stream the first 5 spots of the file ('-X 5' option) to the screen ('-Z' option).
- If the configuration is not valid, an error like the following will likely be displayed:
fastq-dump.2.x err: item not found while constructing within virtual database module - the path 'SRR390728' cannot be opened as database or table' - If you receive an error like the one above, please configure the toolkit (described in the next section). If you have already configured the toolkit but are still unable to complete the test successfully, please email [email protected] with a full description of steps taken and error messages received.