Pipeline walkthrough
The pipeline used in the Pipeline.DefaultConfig.runDefaultPipeline
function resembles the steps i used while scripting this tool.
The original script can be found here (its a bit messy dont me). The besic steps are:
- Creation of all possible primer pairs of length n flanking a template of length m for the input cDNA/gene
- Creating a blast search database from the cDNA source/transcriptome/genome using the ]Blast BioContainer
- Blasting all primer pairs against the search database using the Blast BioContainer
- Parsing the blast results in a deedle frame to handle grouping and filtering steps of the data
- Calculating self hybridization/internal Loop/fwd-rev primer hybridization energy using the IntaRNA BioContainer
I added an example dataset in form of Chlamydomonas reinhardtii cDNA here.
to test the pipeline on this dataset you can use the following script:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: |
|
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
Step by step
Creation of all possible primer pairs of length n flanking a template of length m for the input cDNA/gene
the generatePrimerPairs
function creates these primer pairs by moving over a sliding window of size 2*n+m and taking the flanking regions of length n
1:
|
|
Creating a blast search database from the cDNA source/transcriptome/genome using the Blast BioContainer
The preparePrimerBlastSearch
prepares a blast database for subsequent blast searches. For best feature calculation, use the full cDNA transcriptome/Genome of the organism
1: 2: 3: 4: 5: 6: |
|
Blasting all primer pairs against the search database using the Blast BioContainer
the blastPrimerPairs
blasts all generated primer pairs against the previously generated database. results are written to a file of choice.
1: 2: 3: 4: 5: 6: 7: |
|
Parsing the blast results in a deedle frame to handle grouping and filtering steps of the data
Calculating self hybridization/internal Loop/fwd-rev primer hybridization energy using the IntaRNA BioContainer
this is both handled by the getResultFrame
function, which parses the blast Results, Calculates hybridization energy features for the given blast results and groups them by query id and direction(fwd/rev)
The result of this function is a frame that contains the features for all primer pairs.
1: 2: 3: 4: 5: 6: 7: |
|
Short Conclusion
While this may be not the flashiest algorithm, i think my post highlights the strengths of F# in data science pretty well. In a little more than 3 days i was able to predict oligonucleotide interactions, blast sequences against genomes and group the results in a safe and visually acessible way during th exploratory data analysis.
Furthermore, the script was easily transferable to .fs files and therefore compiled as library in no time. I think F# has great applications in research and me and my group aswell will continue to use it for all kinds of (bioinformatic) workflows
static member CommandLine : string
static member CurrentDirectory : string with get, set
static member CurrentManagedThreadId : int
static member Exit : exitCode:int -> unit
static member ExitCode : int with get, set
static member ExpandEnvironmentVariables : name:string -> string
static member FailFast : message:string -> unit + 1 overload
static member GetCommandLineArgs : unit -> string[]
static member GetEnvironmentVariable : variable:string -> string + 1 overload
static member GetEnvironmentVariables : unit -> IDictionary + 1 overload
...
nested type SpecialFolder
nested type SpecialFolderOption
Environment.SetEnvironmentVariable(variable: string, value: string, target: EnvironmentVariableTarget) : unit
Environment.GetEnvironmentVariable(variable: string, target: EnvironmentVariableTarget) : string
from Microsoft.FSharp.Collections
static val DirectorySeparatorChar : char
static val AltDirectorySeparatorChar : char
static val VolumeSeparatorChar : char
static val InvalidPathChars : char[]
static val PathSeparator : char
static member ChangeExtension : path:string * extension:string -> string
static member Combine : [<ParamArray>] paths:string[] -> string + 3 overloads
static member GetDirectoryName : path:string -> string
static member GetExtension : path:string -> string
static member GetFileName : path:string -> string
...
Path.Combine(path1: string, path2: string) : string
Path.Combine(path1: string, path2: string, path3: string) : string
Path.Combine(path1: string, path2: string, path3: string, path4: string) : string
module Docker
from BioFSharp.BioTools
--------------------
namespace Docker
from BioFSharp.BioTools
type Async =
static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
static member AwaitTask : task:Task -> Async<unit>
static member AwaitTask : task:Task<'T> -> Async<'T>
static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
static member CancelDefaultToken : unit -> unit
static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
static member Choice : computations:seq<Async<'T option>> -> Async<'T option>
static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
...
--------------------
type Async<'T> =
| ImageId of string
| ImageName of string
| ContainerId of string
| ContainerName of string
| Tag of string * string
override ToString : unit -> string
from BioFSharp.IO
from BioFSharp
from AppliedFSharp
from AppliedFSharp.Pipeline
val string : value:'T -> string
--------------------
type string = String
String.Split(separator: string [], options: StringSplitOptions) : string []
String.Split(separator: char [], options: StringSplitOptions) : string []
String.Split(separator: char [], count: int) : string []
String.Split(separator: string [], count: int, options: StringSplitOptions) : string []
String.Split(separator: char [], count: int, options: StringSplitOptions) : string []