Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

How best to manipulate a log file that is not well-formed

807605Jun 24 2007 — edited Aug 3 2007
I have a text file that look something like this:
[21/06/07] System DEBUG * BA_LOG_OUTPUT
System DEBUG Random text  Random text
System DEBUG Random text  Random text
System DEBUG Random text  Random text
System DEBUG Random text  Random text <?xml version=�1.0� encoding=�UFT-8�?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
[21/06/07] System DEBUG * BA_LOG_OUTPUT
System DEBUG Random text  Random text
System DEBUG Random text  Random text
System DEBUG Random text  Random text
System DEBUG Random text  Random text <?xml version=�1.0� encoding=�UFT-8�?>
<note>
<to>Tony</to>
<from>James</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
[21/06/07] System DEBUG * BA_LOG_OUTPUT
System DEBUG Random text  Random text
System DEBUG Random text  Random text
System DEBUG Random text  Random text
System DEBUG Random text  Random text <?xml version=�1.0� encoding=�UFT-8�?>
<note>
<to>Amy</to>
<from>Tobi</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
I am trying to read the file so that it picks the welled formed XML and put it through a SAX parser. For example, it should pick out the following before putting into a SAX parser:
<?xml version=�1.0� encoding=�UFT-8�?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
 
<?xml version=�1.0� encoding=�UFT-8�?>
<note>
<to>Tony</to>
<from>James</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
 
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Amy</to>
<from>Tobi</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
There is a pattern in the file, the line
<?xml version=�1.0� encoding=�UFT-8�?>
Always start on the 4th line after BA_LOG_OUTPUT line
	public void readFile(){
		String line = null;
		Pattern p = Pattern.compile( "BA_LOG_OUTPUT" ); 
		Pattern p1 = Pattern.compile( "[" );
		try{
			while( ( line = br.readLine() ) != null ){
		 
				Matcher m = p.matcher(line);
				while( m.find() ){
					for(int i = 0; i < 5; i++ ){
						line = br.readLine();
						
					}
					System.out.println(line);	
						
				}
					
			}
		}catch( Exception e ){
			System.err.println( e.getMessage() );
		}
	}
In the code above, I can find where I want to read from, but how do I make the code so that it read the rest of the xml save them in stack (which would eventually be passed to SAX parser) and stop when it find the pattern [.

Find the next pattern in the file (BA_LOG_OUTPUT), read the whole xml which start 4 line after the pattern, save it on the stack, stop reading when it find the pattern [. and so on.

It should continue to do the same throughout the file.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Aug 31 2007
Added on Jun 24 2007
5 comments
196 views