A fistfull of readers

October 23, 2007 by Michael

After presenting a working InputStream on a ByteBuffer, i have to more readers for you out there.

First, the StringBufferReader to efficient read data from a StringBuffer. One can use new java.io.StringReader(sb.toString()) but that would convert the whole StringBuffer (sb) to a string, loosing a whole lotta memory if the string is just big enough. If you can assure that the StringBuffer don’t need to be modified, use the following code:

import java.io.IOException;
import java.io.Reader;
 
public class StringBufferReader extends Reader {
	private int pos;
	private final StringBuffer sb;	
	private boolean closed;
 
	public StringBufferReader(final StringBuffer sb) {
		this.sb = sb;
		this.pos = 0;
		this.closed = false;
	}
 
	@Override
	public void close() throws IOException {
		this.closed = true;
	}
 
	@Override
	public int read(char[] cbuf, int off, int len) throws IOException {
		if(closed)
			throw new IOException("Reader is closed");
		int _len = Math.min(len, sb.length() - pos);		
		sb.getChars(pos, pos + _len, cbuf, off);
		pos += _len;
		return _len == 0 ? -1 : _len;
	}
}

It uses no intermediate buffer.

In the same context i stumbled upon the Byte Order Mark (BOM) in some UTF8 files (especially ones that were created with tools under Microsoft Windows).

The InputReaders and Streams in the JRE don’t skip the BOM, neither does the org.dom4j.io.SAXReader so parsing of such XML files or strings fails with something like “Content not allowed in prolog”. Enter my simple BOMSkippingReader:

import java.io.IOException;
import java.io.Reader;
 
/**
 * This reader skips a possible Byte Order Marker at the 
 * start of UTF8 files which java doesn't.
 * @author michael
 *
 */
public class BOMSkippingReader extends Reader {
	private final Reader decorated;	
	private final boolean rewrite;
	private final int firstchar;
	private int pos = 0;
 
	public BOMSkippingReader(final Reader decorated) throws IOException {
		this.decorated = decorated;
 
		this.firstchar = decorated.read();						
		this.rewrite = firstchar != 65279;			
	}
 
	@Override
	public void close() throws IOException {
		decorated.close();
	}
 
	@Override
	public int read(char[] cbuf, int off, int len) throws IOException {					
		int redone = 0;					
		while(rewrite && pos < 1) {
			cbuf[off + pos++] = (char) firstchar;
			++redone;
		}
		return redone + decorated.read(cbuf, off + redone, len - redone);				
	}
}

If there’s a BOM in the underlying reader, it’s skipped, otherwise written back to the reader. Works well for me and the small while loop doesn’t have much impact on the performance in my app.

No comments yet

Post a Comment

Your email is never published. We need your name and email address only for verifying a legitimate comment. For more information, a copy of your saved data or a request to delete any data under this address, please send a short notice to michael@simons.ac from the address you used to comment on this entry.
By entering and submitting a comment, wether with or without name or email address, you'll agree that all data you have entered including your IP address will be checked and stored for a limited time by Automattic Inc., 60 29th Street #343, San Francisco, CA 94110-4929, USA. only for the purpose of avoiding spam. You can deny further storage of your data by sending an email to support@wordpress.com, with subject “Deletion of Data stored by Akismet”.
Required fields are marked *