Extracting Text between Two Strings in a Huge Ordered Text FileFinding text between two specific characters or stringsRemoving text between two specific stringsHow can I extract the text between two strings in a log file?Filtering multi-lines from a logRetrieve string between two stringsExtracting IPs from a web listRead text lines between two stringsExtracting columns from a huge, delimited text fileReplacing text between two specific stringsFinding Strings in very huge text files

Multi tool use
Multi tool use

Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).

Example of a relative pronoun

How long does it take to type this?

Email Account under attack (really) - anything I can do?

How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?

How does one intimidate enemies without having the capacity for violence?

Could a US political party gain complete control over the government by removing checks & balances?

How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?

Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)

What makes Graph invariants so useful/important?

Is there a familial term for apples and pears?

Banach space and Hilbert space topology

Copenhagen passport control - US citizen

How is it possible to have an ability score that is less than 3?

Is there a minimum number of transactions in a block?

Possibly bubble sort algorithm

Validation accuracy vs Testing accuracy

whey we use polarized capacitor?

What would the Romans have called "sorcery"?

Draw simple lines in Inkscape

Why did the Germans forbid the possession of pet pigeons in Rostov-on-Don in 1941?

How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?

What Brexit solution does the DUP want?

Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)



Extracting Text between Two Strings in a Huge Ordered Text File


Finding text between two specific characters or stringsRemoving text between two specific stringsHow can I extract the text between two strings in a log file?Filtering multi-lines from a logRetrieve string between two stringsExtracting IPs from a web listRead text lines between two stringsExtracting columns from a huge, delimited text fileReplacing text between two specific stringsFinding Strings in very huge text files






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








-1















We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.



For instance, this could be part of the file:



"2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
"2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
"2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
"2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
"2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
"2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
"2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
"2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
"2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
"2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047


If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:



"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045


Regular tools like grep or sed or awk are not optimized to be applied to sorted files. So they are not fast enough for. A tool which uses a binary search would be ideal for this type of problems.










share|improve this question









New contributor




vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


























    -1















    We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.



    For instance, this could be part of the file:



    "2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
    "2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
    "2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
    "2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
    "2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
    "2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
    "2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
    "2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
    "2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
    "2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
    "2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
    "2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
    "2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
    "2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
    "2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
    "2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047


    If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:



    "2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
    "2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
    "2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
    "2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
    "2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
    "2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045


    Regular tools like grep or sed or awk are not optimized to be applied to sorted files. So they are not fast enough for. A tool which uses a binary search would be ideal for this type of problems.










    share|improve this question









    New contributor




    vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      -1












      -1








      -1








      We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.



      For instance, this could be part of the file:



      "2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
      "2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
      "2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
      "2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
      "2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
      "2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
      "2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
      "2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
      "2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
      "2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
      "2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
      "2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
      "2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
      "2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
      "2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
      "2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047


      If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:



      "2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
      "2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
      "2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
      "2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
      "2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
      "2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045


      Regular tools like grep or sed or awk are not optimized to be applied to sorted files. So they are not fast enough for. A tool which uses a binary search would be ideal for this type of problems.










      share|improve this question









      New contributor




      vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.



      For instance, this could be part of the file:



      "2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
      "2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
      "2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
      "2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
      "2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
      "2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
      "2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
      "2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
      "2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
      "2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
      "2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
      "2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
      "2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
      "2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
      "2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
      "2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047


      If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:



      "2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
      "2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
      "2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
      "2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
      "2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
      "2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045


      Regular tools like grep or sed or awk are not optimized to be applied to sorted files. So they are not fast enough for. A tool which uses a binary search would be ideal for this type of problems.







      text-processing grep sort






      share|improve this question









      New contributor




      vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 days ago







      vahid-dan













      New contributor




      vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Apr 5 at 0:10









      vahid-danvahid-dan

      82




      82




      New contributor




      vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes


















          0














          For very large files, you could exploit the natural order of the prefix timestamp to use the look utility to perform a fast binary search for the largest common prefix of the start and end strings. This can then be followed by awk/sed post-processing to extract lines of interest from look's output



          in bash



          export start='"2018-04-05 13:00:00"'
          export end='"2018-04-05 13:05:00"'
          #determine common prefix ("2018-04-05 13:0 in this example)
          common_prefix=$(awk 'BEGIN
          start=ENVIRON["start"]; end=ENVIRON["end"];
          len=length(start) > length(end)? length(end): length(start);
          i=1;
          while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1))
          ++i

          print(substr(start, 1, i-1))
          ' </dev/null
          )
          #the -b option to look forces binary search.
          #My version of look on Ubuntu needs this flag to be passed,
          #some other versions of look perform a binary search by default and do not support a -b.
          look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'





          share|improve this answer

























          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • @vahid-dan, check the updated solution

            – iruvar
            2 days ago











          • Thanks, @iruvar! Worked like a charm. :-)

            – vahid-dan
            2 days ago


















          0














          Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"



          sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file


          or



          sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file


          Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.



          grep -m1 -A5 '2018-04-05 13:00:00' file





          share|improve this answer























          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

            – Freddy
            2 days ago












          • Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

            – vahid-dan
            2 days ago












          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f510601%2fextracting-text-between-two-strings-in-a-huge-ordered-text-file%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          For very large files, you could exploit the natural order of the prefix timestamp to use the look utility to perform a fast binary search for the largest common prefix of the start and end strings. This can then be followed by awk/sed post-processing to extract lines of interest from look's output



          in bash



          export start='"2018-04-05 13:00:00"'
          export end='"2018-04-05 13:05:00"'
          #determine common prefix ("2018-04-05 13:0 in this example)
          common_prefix=$(awk 'BEGIN
          start=ENVIRON["start"]; end=ENVIRON["end"];
          len=length(start) > length(end)? length(end): length(start);
          i=1;
          while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1))
          ++i

          print(substr(start, 1, i-1))
          ' </dev/null
          )
          #the -b option to look forces binary search.
          #My version of look on Ubuntu needs this flag to be passed,
          #some other versions of look perform a binary search by default and do not support a -b.
          look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'





          share|improve this answer

























          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • @vahid-dan, check the updated solution

            – iruvar
            2 days ago











          • Thanks, @iruvar! Worked like a charm. :-)

            – vahid-dan
            2 days ago















          0














          For very large files, you could exploit the natural order of the prefix timestamp to use the look utility to perform a fast binary search for the largest common prefix of the start and end strings. This can then be followed by awk/sed post-processing to extract lines of interest from look's output



          in bash



          export start='"2018-04-05 13:00:00"'
          export end='"2018-04-05 13:05:00"'
          #determine common prefix ("2018-04-05 13:0 in this example)
          common_prefix=$(awk 'BEGIN
          start=ENVIRON["start"]; end=ENVIRON["end"];
          len=length(start) > length(end)? length(end): length(start);
          i=1;
          while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1))
          ++i

          print(substr(start, 1, i-1))
          ' </dev/null
          )
          #the -b option to look forces binary search.
          #My version of look on Ubuntu needs this flag to be passed,
          #some other versions of look perform a binary search by default and do not support a -b.
          look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'





          share|improve this answer

























          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • @vahid-dan, check the updated solution

            – iruvar
            2 days ago











          • Thanks, @iruvar! Worked like a charm. :-)

            – vahid-dan
            2 days ago













          0












          0








          0







          For very large files, you could exploit the natural order of the prefix timestamp to use the look utility to perform a fast binary search for the largest common prefix of the start and end strings. This can then be followed by awk/sed post-processing to extract lines of interest from look's output



          in bash



          export start='"2018-04-05 13:00:00"'
          export end='"2018-04-05 13:05:00"'
          #determine common prefix ("2018-04-05 13:0 in this example)
          common_prefix=$(awk 'BEGIN
          start=ENVIRON["start"]; end=ENVIRON["end"];
          len=length(start) > length(end)? length(end): length(start);
          i=1;
          while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1))
          ++i

          print(substr(start, 1, i-1))
          ' </dev/null
          )
          #the -b option to look forces binary search.
          #My version of look on Ubuntu needs this flag to be passed,
          #some other versions of look perform a binary search by default and do not support a -b.
          look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'





          share|improve this answer















          For very large files, you could exploit the natural order of the prefix timestamp to use the look utility to perform a fast binary search for the largest common prefix of the start and end strings. This can then be followed by awk/sed post-processing to extract lines of interest from look's output



          in bash



          export start='"2018-04-05 13:00:00"'
          export end='"2018-04-05 13:05:00"'
          #determine common prefix ("2018-04-05 13:0 in this example)
          common_prefix=$(awk 'BEGIN
          start=ENVIRON["start"]; end=ENVIRON["end"];
          len=length(start) > length(end)? length(end): length(start);
          i=1;
          while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1))
          ++i

          print(substr(start, 1, i-1))
          ' </dev/null
          )
          #the -b option to look forces binary search.
          #My version of look on Ubuntu needs this flag to be passed,
          #some other versions of look perform a binary search by default and do not support a -b.
          look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 days ago

























          answered 2 days ago









          iruvariruvar

          12.2k63062




          12.2k63062












          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • @vahid-dan, check the updated solution

            – iruvar
            2 days ago











          • Thanks, @iruvar! Worked like a charm. :-)

            – vahid-dan
            2 days ago

















          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • @vahid-dan, check the updated solution

            – iruvar
            2 days ago











          • Thanks, @iruvar! Worked like a charm. :-)

            – vahid-dan
            2 days ago
















          Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

          – vahid-dan
          2 days ago





          Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

          – vahid-dan
          2 days ago













          @vahid-dan, check the updated solution

          – iruvar
          2 days ago





          @vahid-dan, check the updated solution

          – iruvar
          2 days ago













          Thanks, @iruvar! Worked like a charm. :-)

          – vahid-dan
          2 days ago





          Thanks, @iruvar! Worked like a charm. :-)

          – vahid-dan
          2 days ago













          0














          Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"



          sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file


          or



          sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file


          Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.



          grep -m1 -A5 '2018-04-05 13:00:00' file





          share|improve this answer























          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

            – Freddy
            2 days ago












          • Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

            – vahid-dan
            2 days ago
















          0














          Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"



          sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file


          or



          sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file


          Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.



          grep -m1 -A5 '2018-04-05 13:00:00' file





          share|improve this answer























          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

            – Freddy
            2 days ago












          • Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

            – vahid-dan
            2 days ago














          0












          0








          0







          Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"



          sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file


          or



          sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file


          Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.



          grep -m1 -A5 '2018-04-05 13:00:00' file





          share|improve this answer













          Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"



          sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file


          or



          sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file


          Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.



          grep -m1 -A5 '2018-04-05 13:00:00' file






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          FreddyFreddy

          1,514210




          1,514210












          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

            – Freddy
            2 days ago












          • Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

            – vahid-dan
            2 days ago


















          • Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

            – vahid-dan
            2 days ago











          • If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

            – Freddy
            2 days ago












          • Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

            – vahid-dan
            2 days ago

















          Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

          – vahid-dan
          2 days ago





          Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

          – vahid-dan
          2 days ago













          If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

          – Freddy
          2 days ago






          If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

          – Freddy
          2 days ago














          Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

          – vahid-dan
          2 days ago






          Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

          – vahid-dan
          2 days ago











          vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.












          vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.











          vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f510601%2fextracting-text-between-two-strings-in-a-huge-ordered-text-file%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          ipDwr p7g67Q4C2VB,BlIWh pk16
          pnBAA,TWOxNxmmAw69ugFsXgB7hoxskOh9 JOVB1 8zzQCR,S,VF6l zM4XUqv329tON

          Popular posts from this blog

          getting Checkpoint VPN SSL Network Extender working in the command lineHow to connect to CheckPoint VPN on Ubuntu 18.04LTS?Will the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayVPN SSL Network Extender in FirefoxLinux Checkpoint SNX tool configuration issuesCheck Point - Connect under Linux - snx + OTPSNX VPN Ububuntu 18.XXUsing Checkpoint VPN SSL Network Extender CLI with certificateVPN with network manager (nm-applet) is not workingWill the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayImport VPN config files to NetworkManager from command lineTrouble connecting to VPN using network-manager, while command line worksStart a VPN connection with PPTP protocol on command linestarting a docker service daemon breaks the vpn networkCan't connect to vpn with Network-managerVPN SSL Network Extender in FirefoxUsing Checkpoint VPN SSL Network Extender CLI with certificate

          NetworkManager fails with “Could not find source connection”Trouble connecting to VPN using network-manager, while command line worksHow can I be notified about state changes to a VPN adapterBacktrack 5 R3 - Refuses to connect to VPNFeed all traffic through OpenVPN for a specific network namespace onlyRun daemon on startup in Debian once openvpn connection establishedpfsense tcp connection between openvpn and lan is brokenInternet connection problem with web browsers onlyWhy does NetworkManager explicitly support tun/tap devices?Browser issues with VPNTwo IP addresses assigned to the same network card - OpenVPN issues?Cannot connect to WiFi with nmcli, although secrets are provided

          Marilyn Monroe Ny fiainany manokana | Jereo koa | Meny fitetezanafanitarana azy.