Python for identifying minimal chromosomal regions among samples2019 Community Moderator ElectionProtected Execution EnvironmentMerge fields in a fileWriting a program for editing .txt data - Python or Unix?Nested 'awk' in a 'while' loop, parse two files line by line and compare column valuesInvoke python script through make commandWhy won't MOTD display output from a Python script it calls outside of /etc/update-motd.d/ unless it's in this directory?Running daemon involving GPIO on Piscript to parse file for two consecutive lines of unequal lengthErrors were encountered while processing: python-minimalhow to substitute strings in a set of files with different strings?

Knife as defense against stray dogs

Error in master's thesis, I do not know what to do

If I cast the Enlarge/Reduce spell on an arrow, what weapon could it count as?

UK Tourist Visa- Enquiry

How to find the largest number(s) in a list of elements, possibly non-unique?

Norwegian Refugee travel document

How do you justify more code being written by following clean code practices?

Do people actually use the word "kaputt" in conversation?

Determine voltage drop over 10G resistors with cheap multimeter

Single word to change groups

How to test the sharpness of a knife?

Isn't the word "experience" wrongly used in this context?

How to balance a monster modification (zombie)?

Why are there no stars visible in cislunar space?

Do I need to convey a moral for each of my blog post?

Are hand made posters acceptable in Academia?

What are the rules for concealing thieves' tools (or items in general)?

Should I be concerned about student access to a test bank?

Does fire aspect on a sword, destroy mob drops?

Friend wants my recommendation but I don't want to

Is xar preinstalled on macOS?

Hot air balloons as primitive bombers

Is there any common country to visit for uk and schengen visa?

Interior of Set Notation



Python for identifying minimal chromosomal regions among samples



2019 Community Moderator ElectionProtected Execution EnvironmentMerge fields in a fileWriting a program for editing .txt data - Python or Unix?Nested 'awk' in a 'while' loop, parse two files line by line and compare column valuesInvoke python script through make commandWhy won't MOTD display output from a Python script it calls outside of /etc/update-motd.d/ unless it's in this directory?Running daemon involving GPIO on Piscript to parse file for two consecutive lines of unequal lengthErrors were encountered while processing: python-minimalhow to substitute strings in a set of files with different strings?










-2















I have multiple sample files (>20) that look like:



chr startpos endpos
1 14930 818094
1 818161 31595422
2 35593931 35865807
2 35868158 104785784


And I would like to output regions that are common among samples. E.g. if sample 1 has:



1 14900 818000


sample 2:



1 15000 605000


sample 3:



1 25000 705000


I would like to output:



1 25000 605000


I would also like to include a majority rule such that e.g if 10 out of totally 20 samples have a minimal region -> output the region. I.e. I would like to have it flexible how many samples that need to have the region for it to be printed to the output.



Does anyone have a python solution for this?










share|improve this question

















  • 2





    This question is not really about Unix/Linux, but about programming (coding, algorithms) so it's more appropriate for Stack Overflow rather than this site.

    – filbranden
    10 hours ago






  • 2





    Also note that people at Stack Exchange will typically not want to do your work/homework for you. These are volunteers here, who are happy to help, but you need to show you're making an effort too. So try to solve this on your own and, when you get stumped, ask a question specific about what is happening that is unexpected. You're more likely to get useful answers (and to learn!) that way.

    – filbranden
    10 hours ago















-2















I have multiple sample files (>20) that look like:



chr startpos endpos
1 14930 818094
1 818161 31595422
2 35593931 35865807
2 35868158 104785784


And I would like to output regions that are common among samples. E.g. if sample 1 has:



1 14900 818000


sample 2:



1 15000 605000


sample 3:



1 25000 705000


I would like to output:



1 25000 605000


I would also like to include a majority rule such that e.g if 10 out of totally 20 samples have a minimal region -> output the region. I.e. I would like to have it flexible how many samples that need to have the region for it to be printed to the output.



Does anyone have a python solution for this?










share|improve this question

















  • 2





    This question is not really about Unix/Linux, but about programming (coding, algorithms) so it's more appropriate for Stack Overflow rather than this site.

    – filbranden
    10 hours ago






  • 2





    Also note that people at Stack Exchange will typically not want to do your work/homework for you. These are volunteers here, who are happy to help, but you need to show you're making an effort too. So try to solve this on your own and, when you get stumped, ask a question specific about what is happening that is unexpected. You're more likely to get useful answers (and to learn!) that way.

    – filbranden
    10 hours ago













-2












-2








-2








I have multiple sample files (>20) that look like:



chr startpos endpos
1 14930 818094
1 818161 31595422
2 35593931 35865807
2 35868158 104785784


And I would like to output regions that are common among samples. E.g. if sample 1 has:



1 14900 818000


sample 2:



1 15000 605000


sample 3:



1 25000 705000


I would like to output:



1 25000 605000


I would also like to include a majority rule such that e.g if 10 out of totally 20 samples have a minimal region -> output the region. I.e. I would like to have it flexible how many samples that need to have the region for it to be printed to the output.



Does anyone have a python solution for this?










share|improve this question














I have multiple sample files (>20) that look like:



chr startpos endpos
1 14930 818094
1 818161 31595422
2 35593931 35865807
2 35868158 104785784


And I would like to output regions that are common among samples. E.g. if sample 1 has:



1 14900 818000


sample 2:



1 15000 605000


sample 3:



1 25000 705000


I would like to output:



1 25000 605000


I would also like to include a majority rule such that e.g if 10 out of totally 20 samples have a minimal region -> output the region. I.e. I would like to have it flexible how many samples that need to have the region for it to be printed to the output.



Does anyone have a python solution for this?







python bioinformatics






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 12 hours ago









lindaklindak

72




72







  • 2





    This question is not really about Unix/Linux, but about programming (coding, algorithms) so it's more appropriate for Stack Overflow rather than this site.

    – filbranden
    10 hours ago






  • 2





    Also note that people at Stack Exchange will typically not want to do your work/homework for you. These are volunteers here, who are happy to help, but you need to show you're making an effort too. So try to solve this on your own and, when you get stumped, ask a question specific about what is happening that is unexpected. You're more likely to get useful answers (and to learn!) that way.

    – filbranden
    10 hours ago












  • 2





    This question is not really about Unix/Linux, but about programming (coding, algorithms) so it's more appropriate for Stack Overflow rather than this site.

    – filbranden
    10 hours ago






  • 2





    Also note that people at Stack Exchange will typically not want to do your work/homework for you. These are volunteers here, who are happy to help, but you need to show you're making an effort too. So try to solve this on your own and, when you get stumped, ask a question specific about what is happening that is unexpected. You're more likely to get useful answers (and to learn!) that way.

    – filbranden
    10 hours ago







2




2





This question is not really about Unix/Linux, but about programming (coding, algorithms) so it's more appropriate for Stack Overflow rather than this site.

– filbranden
10 hours ago





This question is not really about Unix/Linux, but about programming (coding, algorithms) so it's more appropriate for Stack Overflow rather than this site.

– filbranden
10 hours ago




2




2





Also note that people at Stack Exchange will typically not want to do your work/homework for you. These are volunteers here, who are happy to help, but you need to show you're making an effort too. So try to solve this on your own and, when you get stumped, ask a question specific about what is happening that is unexpected. You're more likely to get useful answers (and to learn!) that way.

– filbranden
10 hours ago





Also note that people at Stack Exchange will typically not want to do your work/homework for you. These are volunteers here, who are happy to help, but you need to show you're making an effort too. So try to solve this on your own and, when you get stumped, ask a question specific about what is happening that is unexpected. You're more likely to get useful answers (and to learn!) that way.

– filbranden
10 hours ago










1 Answer
1






active

oldest

votes


















2














Not sure whether this a question for the Unix & Linux stackexchange. It sounds more like a general programming question.



However, I'd encourage you to look into using pandas.



You can import your sample file as a dataframe, specifying tab delineation as follows:



import pandas as pd
df = pd.read_csv('/tmp/samplefile.csv',sep='t')


If you know that startpos will always be smaller than endpos, you could find the output you're looking for by taking the maximum of df['startpos'] and the minimum of df['endpos'].






share|improve this answer
























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506999%2fpython-for-identifying-minimal-chromosomal-regions-among-samples%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Not sure whether this a question for the Unix & Linux stackexchange. It sounds more like a general programming question.



    However, I'd encourage you to look into using pandas.



    You can import your sample file as a dataframe, specifying tab delineation as follows:



    import pandas as pd
    df = pd.read_csv('/tmp/samplefile.csv',sep='t')


    If you know that startpos will always be smaller than endpos, you could find the output you're looking for by taking the maximum of df['startpos'] and the minimum of df['endpos'].






    share|improve this answer





























      2














      Not sure whether this a question for the Unix & Linux stackexchange. It sounds more like a general programming question.



      However, I'd encourage you to look into using pandas.



      You can import your sample file as a dataframe, specifying tab delineation as follows:



      import pandas as pd
      df = pd.read_csv('/tmp/samplefile.csv',sep='t')


      If you know that startpos will always be smaller than endpos, you could find the output you're looking for by taking the maximum of df['startpos'] and the minimum of df['endpos'].






      share|improve this answer



























        2












        2








        2







        Not sure whether this a question for the Unix & Linux stackexchange. It sounds more like a general programming question.



        However, I'd encourage you to look into using pandas.



        You can import your sample file as a dataframe, specifying tab delineation as follows:



        import pandas as pd
        df = pd.read_csv('/tmp/samplefile.csv',sep='t')


        If you know that startpos will always be smaller than endpos, you could find the output you're looking for by taking the maximum of df['startpos'] and the minimum of df['endpos'].






        share|improve this answer















        Not sure whether this a question for the Unix & Linux stackexchange. It sounds more like a general programming question.



        However, I'd encourage you to look into using pandas.



        You can import your sample file as a dataframe, specifying tab delineation as follows:



        import pandas as pd
        df = pd.read_csv('/tmp/samplefile.csv',sep='t')


        If you know that startpos will always be smaller than endpos, you could find the output you're looking for by taking the maximum of df['startpos'] and the minimum of df['endpos'].







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 9 hours ago

























        answered 10 hours ago









        mttpgnmttpgn

        18317




        18317



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506999%2fpython-for-identifying-minimal-chromosomal-regions-among-samples%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            getting Checkpoint VPN SSL Network Extender working in the command lineHow to connect to CheckPoint VPN on Ubuntu 18.04LTS?Will the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayVPN SSL Network Extender in FirefoxLinux Checkpoint SNX tool configuration issuesCheck Point - Connect under Linux - snx + OTPSNX VPN Ububuntu 18.XXUsing Checkpoint VPN SSL Network Extender CLI with certificateVPN with network manager (nm-applet) is not workingWill the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayImport VPN config files to NetworkManager from command lineTrouble connecting to VPN using network-manager, while command line worksStart a VPN connection with PPTP protocol on command linestarting a docker service daemon breaks the vpn networkCan't connect to vpn with Network-managerVPN SSL Network Extender in FirefoxUsing Checkpoint VPN SSL Network Extender CLI with certificate

            Cannot Extend partition with GParted The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsCan't increase partition size with GParted?GParted doesn't recognize the unallocated space after my current partitionWhat is the best way to add unallocated space located before to Ubuntu 12.04 partition with GParted live?I can't figure out how to extend my Arch home partition into free spaceGparted Linux Mint 18.1 issueTrying to extend but swap partition is showing as Unknown in Gparted, shows proper from fdiskRearrange partitions in gparted to extend a partitionUnable to extend partition even though unallocated space is next to it using GPartedAllocate free space to root partitiongparted: how to merge unallocated space with a partition

            Marilyn Monroe Ny fiainany manokana | Jereo koa | Meny fitetezanafanitarana azy.