Fix cuts distribution issue

The following discussion from !20 should be addressed:

  • @kanitz started a discussion: (+1 comment)

    So you are using the number of cuts that you determined above and sample the four nucleotides that many

    So, basically, you are sampling from the four nucleotides the number of cuts (as determined previously) times, accounting for the different probabilities that cuts occur at any given nucleotide. Then you determine all the occurences of the resulting nucleotides, randomly select the position of one of them and add it to a list of cut sites (unless that particular site had already been added).

    Again, I'm a bit concerned that you don't take the fragment length distribution into account here. Intuitively, even though you may have used the fragment length distribution (or at least the mean) to determine the number of cuts, I have the feeling that the resulting fragments will not follow the fragment length distribution, meaning that the probabilities of getting a fragment length of a certain size with your method are not the same as when sampling from the fragment length distribution. At the very least you should test that empirically.

    Perhaps @zavolan would like to comment here?

Edited by Tanya Santosh Nandan