Extracting Text between Two Strings in a Huge Ordered Text FileFinding text between two specific characters or stringsRemoving text between two specific stringsHow can I extract the text between two strings in a log file?Filtering multi-lines from a logRetrieve string between two stringsExtracting IPs from a web listRead text lines between two stringsExtracting columns from a huge, delimited text fileReplacing text between two specific stringsFinding Strings in very huge text files

Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).

Example of a relative pronoun

How long does it take to type this?

Email Account under attack (really) - anything I can do?

How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?

How does one intimidate enemies without having the capacity for violence?

Could a US political party gain complete control over the government by removing checks & balances?

How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?

Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)

What makes Graph invariants so useful/important?

Is there a familial term for apples and pears?

Banach space and Hilbert space topology

Copenhagen passport control - US citizen

How is it possible to have an ability score that is less than 3?

Is there a minimum number of transactions in a block?

Possibly bubble sort algorithm

Validation accuracy vs Testing accuracy

whey we use polarized capacitor?

What would the Romans have called "sorcery"?

Draw simple lines in Inkscape

Why did the Germans forbid the possession of pet pigeons in Rostov-on-Don in 1941?

How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?

What Brexit solution does the DUP want?

Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)

Extracting Text between Two Strings in a Huge Ordered Text File

Finding text between two specific characters or stringsRemoving text between two specific stringsHow can I extract the text between two strings in a log file?Filtering multi-lines from a logRetrieve string between two stringsExtracting IPs from a web listRead text lines between two stringsExtracting columns from a huge, delimited text fileReplacing text between two specific stringsFinding Strings in very huge text files

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

-1

We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.

For instance, this could be part of the file:

"2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
"2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
"2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
"2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
"2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
"2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
"2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
"2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
"2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
"2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047

If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:

"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045

Regular tools like grep or sed or awk are not optimized to be applied to sorted files. So they are not fast enough for. A tool which uses a binary search would be ideal for this type of problems.

edited 2 days ago

asked Apr 5 at 0:10

vahid-dan

New contributor

add a comment |

-1

We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.

For instance, this could be part of the file:

"2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
"2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
"2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
"2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
"2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
"2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
"2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
"2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
"2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
"2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047

If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:

"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045

edited 2 days ago

asked Apr 5 at 0:10

vahid-dan

New contributor

add a comment |

-1

We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.

For instance, this could be part of the file:

"2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
"2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
"2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
"2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
"2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
"2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
"2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
"2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
"2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
"2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047

If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:

"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045

edited 2 days ago

asked Apr 5 at 0:10

vahid-dan

New contributor

We have a huge text file containing millions of ordered timestamped observations and given the start point and the end point, we need a fast method to extract the observations in that period.

For instance, this could be part of the file:

"2018-04-05 12:53:00",28,13.6,7.961,1746,104.7878,102.2,9.78,29.1,0,2.432,76.12,955,38.25,249.9,362.4,281.1,0.04
"2018-04-05 12:54:00",29,13.59,7.915,1738,104.2898,102.2,10.01,29.53,0,1.45,200.3,952,40.63,249.3,361.4,281.1,0.043
"2018-04-05 12:55:00",30,13.59,7.907,1734,104.0326,102.2,10.33,28.79,0,2.457,164.1,948,41.39,249.8,361.3,281.1,0.044
"2018-04-05 12:56:00",31,13.59,7.937,1718,103.0523,102.2,10.72,31.42,0,1.545,8.22,941,42.06,249.4,361.1,281.1,0.045
"2018-04-05 12:57:00",32,13.59,7.975,1719,103.1556,102.2,10.68,29.26,0,2.541,0.018,940,41.95,249.1,360.1,281.1,0.045
"2018-04-05 12:58:00",33,13.59,8,1724,103.4344,102.2,10.35,29.58,0,1.908,329.8,942,42.65,249.5,361.4,281.1,0.045
"2018-04-05 12:59:00",34,13.59,8,1733,103.9831,102.2,10.23,30.17,0,2.59,333.1,948,42.21,250.2,362,281.2,0.045
"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045
"2018-04-05 13:06:00",41,13.59,7.915,1756,105.3322,102.1,10.52,29.53,0,0.632,190.8,961,43.64,249.3,361.5,281,0.045
"2018-04-05 13:07:00",42,13.6,7.972,1758,105.4697,102.1,10.77,29.49,0,0.376,322.5,961,44.69,249.1,360.9,281.1,0.046
"2018-04-05 13:08:00",43,13.6,8.05,1754,105.233,102.1,11.26,28.66,0,0.493,216.8,959,44.8,248.4,360.1,281.2,0.047

If we want the datapoints between "2018-04-05 13:00:00" and "2018-04-05 13:05:00", the output should be:

"2018-04-05 13:00:00",35,13.59,7.98,1753,105.1546,102.2,10.17,29.06,0,3.306,332.4,960,42,250.4,362.7,281.1,0.044
"2018-04-05 13:01:00",36,13.59,7.964,1757,105.3951,102.2,10.24,30.75,0,2.452,0.012,962,42.03,250.4,362.4,281.1,0.044
"2018-04-05 13:02:00",37,13.59,7.953,1757,105.4047,102.2,10.31,31.66,0,3.907,2.997,961,41.1,250.6,362.4,281.1,0.043
"2018-04-05 13:03:00",38,13.59,7.923,1758,105.4588,102.2,10.28,29.64,0,4.336,50.19,962,40.85,250.3,362.6,281.1,0.042
"2018-04-05 13:04:00",39,13.59,7.893,1757,105.449,102.1,10.27,30.42,0,1.771,12.98,962,41.73,249.8,362.1,281.1,0.043
"2018-04-05 13:05:00",40,13.6,7.89,1757,105.4433,102.1,10.46,29.54,0,2.296,93.7,962,43.02,249.9,361.7,281,0.045

text-processing grep sort

edited 2 days ago

asked Apr 5 at 0:10

vahid-dan

New contributor

edited 2 days ago

asked Apr 5 at 0:10

vahid-dan

New contributor

edited 2 days ago

asked Apr 5 at 0:10

vahid-dan

New contributor

asked Apr 5 at 0:10

vahid-dan

asked Apr 5 at 0:10

vahid-dan

New contributor

vahid-dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

For very large files, you could exploit the natural order of the prefix timestamp to use the look utility to perform a fast binary search for the largest common prefix of the start and end strings. This can then be followed by awk/sed post-processing to extract lines of interest from look's output

in bash

export start='"2018-04-05 13:00:00"'
export end='"2018-04-05 13:05:00"'
#determine common prefix ("2018-04-05 13:0 in this example)
common_prefix=$(awk 'BEGIN 
 start=ENVIRON["start"]; end=ENVIRON["end"];
 len=length(start) > length(end)? length(end): length(start); 
 i=1;
 while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1)) 
 ++i
 
 print(substr(start, 1, i-1))
' </dev/null
)
#the -b option to look forces binary search. 
#My version of look on Ubuntu needs this flag to be passed, 
#some other versions of look perform a binary search by default and do not support a -b.
look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'

edited 2 days ago

answered 2 days ago

iruvar

12.2k63062

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

@vahid-dan, check the updated solution

– iruvar
2 days ago

Thanks, @iruvar! Worked like a charm. :-)

– vahid-dan
2 days ago

add a comment |

Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"

sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file

sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file

Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.

grep -m1 -A5 '2018-04-05 13:00:00' file

answered 2 days ago

Freddy

1,514210

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

– Freddy
2 days ago

Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

– vahid-dan
2 days ago

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f510601%2fextracting-text-between-two-strings-in-a-huge-ordered-text-file%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

in bash

export start='"2018-04-05 13:00:00"'
export end='"2018-04-05 13:05:00"'
#determine common prefix ("2018-04-05 13:0 in this example)
common_prefix=$(awk 'BEGIN 
 start=ENVIRON["start"]; end=ENVIRON["end"];
 len=length(start) > length(end)? length(end): length(start); 
 i=1;
 while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1)) 
 ++i
 
 print(substr(start, 1, i-1))
' </dev/null
)
#the -b option to look forces binary search. 
#My version of look on Ubuntu needs this flag to be passed, 
#some other versions of look perform a binary search by default and do not support a -b.
look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'

edited 2 days ago

answered 2 days ago

iruvar

12.2k63062

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

@vahid-dan, check the updated solution

– iruvar
2 days ago

Thanks, @iruvar! Worked like a charm. :-)

– vahid-dan
2 days ago

add a comment |

in bash

export start='"2018-04-05 13:00:00"'
export end='"2018-04-05 13:05:00"'
#determine common prefix ("2018-04-05 13:0 in this example)
common_prefix=$(awk 'BEGIN 
 start=ENVIRON["start"]; end=ENVIRON["end"];
 len=length(start) > length(end)? length(end): length(start); 
 i=1;
 while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1)) 
 ++i
 
 print(substr(start, 1, i-1))
' </dev/null
)
#the -b option to look forces binary search. 
#My version of look on Ubuntu needs this flag to be passed, 
#some other versions of look perform a binary search by default and do not support a -b.
look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'

edited 2 days ago

answered 2 days ago

iruvar

12.2k63062

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

@vahid-dan, check the updated solution

– iruvar
2 days ago

Thanks, @iruvar! Worked like a charm. :-)

– vahid-dan
2 days ago

add a comment |

in bash

export start='"2018-04-05 13:00:00"'
export end='"2018-04-05 13:05:00"'
#determine common prefix ("2018-04-05 13:0 in this example)
common_prefix=$(awk 'BEGIN 
 start=ENVIRON["start"]; end=ENVIRON["end"];
 len=length(start) > length(end)? length(end): length(start); 
 i=1;
 while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1)) 
 ++i
 
 print(substr(start, 1, i-1))
' </dev/null
)
#the -b option to look forces binary search. 
#My version of look on Ubuntu needs this flag to be passed, 
#some other versions of look perform a binary search by default and do not support a -b.
look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'

edited 2 days ago

answered 2 days ago

iruvar

12.2k63062

in bash

export start='"2018-04-05 13:00:00"'
export end='"2018-04-05 13:05:00"'
#determine common prefix ("2018-04-05 13:0 in this example)
common_prefix=$(awk 'BEGIN 
 start=ENVIRON["start"]; end=ENVIRON["end"];
 len=length(start) > length(end)? length(end): length(start); 
 i=1;
 while (i <= len && substr(ENVIRON["start"], i, 1) == substr(ENVIRON["end"], i, 1)) 
 ++i
 
 print(substr(start, 1, i-1))
' </dev/null
)
#the -b option to look forces binary search. 
#My version of look on Ubuntu needs this flag to be passed, 
#some other versions of look perform a binary search by default and do not support a -b.
look -b "$common_prefix" file | awk '$0 ~ "^"ENVIRON["start"],$0 ~ "^"ENVIRON["end"]'

edited 2 days ago

answered 2 days ago

iruvar

12.2k63062

edited 2 days ago

answered 2 days ago

iruvar

12.2k63062

answered 2 days ago

iruvar

12.2k63062

answered 2 days ago

iruvar

12.2k63062

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

@vahid-dan, check the updated solution

– iruvar
2 days ago

Thanks, @iruvar! Worked like a charm. :-)

– vahid-dan
2 days ago

add a comment |

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

@vahid-dan, check the updated solution

– iruvar
2 days ago

Thanks, @iruvar! Worked like a charm. :-)

– vahid-dan
2 days ago

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

@vahid-dan, check the updated solution

– iruvar
2 days ago

Thanks, @iruvar! Worked like a charm. :-)

– vahid-dan
2 days ago

add a comment |

Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"

sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file

sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file

Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.

grep -m1 -A5 '2018-04-05 13:00:00' file

answered 2 days ago

Freddy

1,514210

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

– Freddy
2 days ago

Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

– vahid-dan
2 days ago

add a comment |

Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"

sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file

sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file

Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.

grep -m1 -A5 '2018-04-05 13:00:00' file

answered 2 days ago

Freddy

1,514210

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

– Freddy
2 days ago

Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

– vahid-dan
2 days ago

add a comment |

Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"

sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file

sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file

Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.

grep -m1 -A5 '2018-04-05 13:00:00' file

answered 2 days ago

Freddy

1,514210

Print lines between "2018-04-05 13:00:00" and "2018-04-05 13:05:00"

sed -n '/2018-04-05 13:00:00/,/2018-04-05 13:05:00/p' file

sed -n /"2018-04-05 13:00:00"/,/"2018-04-05 13:05:00"/p file

Grep for start date "2018-04-05 13:00:00" and output the next 5 lines (=5 minutes), -m1 stops searching after the first match.

grep -m1 -A5 '2018-04-05 13:00:00' file

answered 2 days ago

Freddy

1,514210

answered 2 days ago

Freddy

1,514210

answered 2 days ago

Freddy

1,514210

answered 2 days ago

Freddy

1,514210

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

– Freddy
2 days ago

Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

– vahid-dan
2 days ago

add a comment |

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

– Freddy
2 days ago

Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

– vahid-dan
2 days ago

Thanks. It works but it is not fast enough for a huge file, let's say a 100GB+ file.

– vahid-dan
2 days ago

If you need to extract lots of lines it might make sense to split the file and note the first and last line of each part. Or feed a database with the timestamps as SQL TIMESTAMP or DATETIME and index (b-tree).

– Freddy
2 days ago

Thank you @freddy. Actually, the file is merged from several smaller files but I'm not sure working with several smaller files is faster than a single large file. We are looking for a general solution that can be applied to any size of data.

– vahid-dan
2 days ago

add a comment |

vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

vahid-dan is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ygtjki

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Àrd-bhaile Cathair chruinne/Baile mòr cruinne | Artagailean ceangailte | Clàr-taice na seòladaireachd

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Àrd-bhaile Cathair chruinne/Baile mòr cruinne | Artagailean ceangailte | Clàr-taice na seòladaireachd

2 Answers
2

2 Answers
2

2 Answers
2