Wednesday, August 22, 2012

Cassandra 1.1 - Reading and Writing from SSTable Perspecitve

To keep things simple I will stick to read / write value of one column within single row, and single node deployment.

Writing

We will store one column given by row key and column name.

Each thrift insert request blocks until data is stored in commit log and memtable - this is all, other operations (like replication) are asynchronous. Additionally client can provide consistency level, in this case call will be blocked until required replicas respond, but asides form this, write operation can be seen as simple append.
Commit log is required, because memtable exists only in memory, in case of system crash, Cassandra would recreate memtables from commit log.

Memtable can be seen as dedicated cache created individually for each column family. It's based on ConcurrentSkipListMap - so there is no blocking on read or insert.  
Memtable contains all recent inserts, and each new insert for the same key and column will overwrite existing one. Multiple updates on single column will result in multiple entries in commit log, and single entry in memtable.
Memtable will be flushed to disk, when predefined criteria are met, like maximum size, timeout, or number of mutations. Flushing memtable creates SSTable and this one is immutable, it can be simply saved to disk as sequential write.


Compaction

Compaction process will merge few SSTables into one. The idea is, to clean up deleted data, and to merge together different modifications of single column. Before compaction, a few SSTables could contain value of single column, after compaction it will be only one.

Reading

We will try to find value of single column within one row.

First memtable is being searched, it's like write through cache, hit on it provides the most recent data - within single instance of course, not in a whole cluster.

As the second step Cassandra will search SSTables, but only those within single column familySSTables are grouped by column family, this is also reflected on disk, where SSTables for each column family are stored together in dedicated folder.

Each SSTable contains row bloom filter, it is build on row keys, not on column names. This gives Cassandra the possibility to quickly verify, whenever given SSTable at least contains particular row. Row bloom filers are always hold in memory, so checking them is performant. False positives are also not problem anymore, because latest Cassandra versions have improved hashing and increased size of bit masks.

So ... Cassandra have scanned all possible SSTables within particular column family, and found those with positive bloom filter for row key. However the fact, that given SSTable contains given row, does not necessary mean, that is also contains given column. Cassandra needs to "look into SSTable" to check whenever it also contains given column. But it does not have to blindly scan all SSTables with postie bloom filter on row key. First it will sort them by last modification time (max time from metadata). Now it has to find first (youngest) SSTable which contains our column. It is still possible, that this particular column is also stored in other SSTables, but those are definitely older, and therefore not interesting. This optimization comes first with Cassandra 1.1 (CASSANDRA-2498), previous version would need to go over all SSTables.

Cassandra has found all SSTables with positive bloom filter on row key, and it has sorted them by last modification time, now it needs to find this one which finally has our column - it's time to look inside SSTable:
First Cassandra will read row keys from index.db, and find our row key using binary search. Found key contains offset to column index. This index has two informations: file offset for each column value, and bloom filter build on column names. Cassandra checks bloom filter on column name, if it is positive it tries to read column value - this is all.

For the record:
  • index.db contains sorted row keys, not the column index as the name would suggest - this one can be found in data.db, under dedicated offset, which is stored together witch each row key.
  • SSTable has one bloom filter build on row keys. Additionally each row hat its own bloom filter, this one is build on column names. SSTable containing 100 rows will have 101 bloom filters.
  • In order to find given column in SSTable Cassandra will not immediately access column index, it will first check key cache - hit will lead directly from row key to column index. In this case only one disk access is required - to read column value.

Conclusion

Bloom filters for rows are always in memory, accessing them is fast. But accessing column index might require extra disk reads (row keys and column index), and this pro single SSTable.
Reading can get really slow, if Cassandra needs to scan large amount of SSTables, and key cache is disabled, or not loaded yet.

Cassandra sorts all SSTables by modification time, which at least optimizes case where single column is stored in many locations. On the other hand, it might need to go over many SSTables to find "old" column. Key cache in such situation increases performance significantly.

Row keys for each SSTable are stored in separate file called index.db, during start Cassandra "goes over those files", in order to warm up.  Cassandra uses memory mapped files, so there is hope, that when reading files during startup, then first access on those files will be served from memory.





Thursday, August 2, 2012

RESTEasy Spring Integration - Tutorial Part 2 - Exeption Handling

Second part will describe exception handling in REST scope - a few design patterns and way of implementing them. I will give also some examples of anti patters - just for a contrast.

In general, JEE projects are dividing exceptions into three parts - checked, unchecked exceptions (runtime), and throwable.

  • Checked exceptions are visible in method signature, and therefore they must be handled by the developer. When method declares such exception, it means, that someone can possibly process this exception, and do something useful with it. For example, it can display message to the user, that entered password is incorrect, or given account was not found. It is very important to NOT overuse checked exceptions - do not define every possible event as checked exception, this will decrease method readability. When defining check exception, ask yourself always a question: it is possible to react in a special way on this particular event? Can it be handled ? And for the record - creating long entry does not count ;) Also when you notice frequent catch(Exception e){....} blocks around your code, it means, that something went wrong during design phase - people just does not care about defined exceptions, so maybe they should be runtime? I am assuming, that generic catch blocks are rear in correctly written code, if existing, they are well documented so the others can understand the reason.
  • Unchecked exceptions are not visible in method signature - those are all possible events, which break current process and nothing can be done about it. In the most cases, log entry is created, monitoring framework is being notified, and GUI displays generic error message.
  • Throwables shoud never be handled by developer -  catch(Throwable e){....} block in not acceptable. Errors like LinkageError or OutOfMemoryError must be handled by container.
Remember, that decision whenever given exception is runtime or checed depends on given context. For example:
  • DBConnectionError on high level business interface is definitely runtime - here we even do not notice, that database is part of our transaction - this is being encapsulated behind service interface, which as every proper high level interface hides implementation details. On the other hand, connection pool maybe can handle such exception by reestablishing connection to another DB instance - on this level DBConnectionError would be checked exception, because it is clearly connection pool responsibility to react on this error.
  • Null-Pointer-Like-Exceptions are runtime in every scope - it breaks given transaction and this is all - no one will catch this exception, so it is not visible in method signature, and as usual not expected.
  • AccountNotFoundException is always checked - it does not matter on which level you are. Account was not found, and therefore certain action cannot be executed, in that case developer can create missing account, or in the worst case notify user, that he actually does not exists.
Enough theory - lets concentrate on exception handling in REST scope. We will extend project from previous post by few error use cases:
  • Checked exception: MessageForbidenException, it will be thrown, when hi-message begins with "jo!". REST interface in this case returns status code: "400 Bad Request" and sets response header "ERR_CODE=MESSAGE_FORBIDDEN"
  • Checked exception: IncorrectLengthException, it will be thrown, when message size in not between 3 and 10 characters. REST interface in this case returns status code: "400 Bad Request" and sets response header "ERR_CODE=INCORRECT_LENGTH"
  • We would like also to log all possible exceptions (runtimes) - there are several reasons for that - to create dedicated log entry, or to notify monitoring framework. In case of any exception, we would like to set status code: "500 Internal Server Error", and HTTP Header: EX_CLASS, which contains exception's class name, and message body should contain stack trace (this might be security issue for some systems - be cheerful here).
There are two approaches to implement exception handling - traditional by defining try-catch blocks, or exception mappers as providers.

Project update

We will extend project from previous tutorial, by adding new exceptions,  exception mappers, add new REST Resource: HelloRestServiceCatch.
The existing REST resource HelloRestService will remain almost unchanged, its methods will throw new exceptions, and those will be handled by exception mappers.
HelloRestServiceCatch in contrast will contain try-catch exception handling, so both approaches can be easily compared.

This is the updated project structure:


and this simple validation logic responsible for throwing exceptions (build into existing service):
@Named
public class HelloSpringService {

    public String sayTextHello(String msg) throws MessageForbidenException, 
    IncorrectLengthException {
        verifyIncommingMessage(msg);

        return msg + "--> Hello";
    }

    public HelloResponse sayJavaBeanHello(HelloMessage msg) throws MessageForbidenException, 
        IncorrectLengthException {
        verifyIncommingMessage(msg.getMsg());

        return new HelloResponse(msg.getMsg() + "--> Hello " + 
                (msg.getGender() == Gender.MALE ? "Sir" : "Madam"), new Date());
    }

    private void verifyIncommingMessage(String msg) throws MessageForbidenException, 
        IncorrectLengthException {
        if (msg == null) {
            throw new IncorrectLengthException("Empty message not allowed");
        }
        msg = msg.trim();
        int msgLength = msg.length();
        if (msgLength < 3 || msgLength > 10) {
            throw new IncorrectLengthException("Message length not between 3 and 10 characters");
        }

        if (msg.toLowerCase().startsWith("jo!")) {
            throw new MessageForbidenException("Jo! is not allowed");
        }
    }
}
Curl examples triggering new exceptions:
>> GET REQUEST <<
curl -iX POST -H "Content-Type: application/json" -d '{"msg":"Jo!","gender":"MALE"}' 
   http://localhost:8080/resteasy_spring_p2/rest/Hello/catch/javabean

HTTP/1.1 400 Bad Request
Server: Apache-Coyote/1.1
ERR_CODE: MESSAGE_FORBIDDEN
Content-Type: application/json
Content-Length: 18
Connection: close

>> RESPONSE <<
Jo! is not allowed

>> GET REQUEST <<
curl -iX POST -H "Content-Type: application/json" -d '{"msg":"Hi","gender":"MALE"}' 
   http://localhost:8080/resteasy_spring_p2/rest/Hello/catch/javabean

HTTP/1.1 400 Bad Request
Server: Apache-Coyote/1.1
ERR_CODE: INCORRECT_LENGTH
Content-Type: application/json
Content-Length: 46
Connection: close

>> RESPONSE <<
Message length not between 3 and 10 characters

>> GET REQUEST <<
curl -iX POST -H "Content-Type: application/json" -d '{"msg":"Hi","gender":"FRED"}' 
   http://localhost:8080/resteasy_spring_p2/rest/Hello/javabean

HTTP/1.1 500 Internal Server Error
Server: Apache-Coyote/1.1
EX_CLASS: org.codehaus.jackson.map.JsonMappingException
Content-Type: application/json
Content-Length: 4689
Connection: close

>> RESPONSE <<
Can not construct instance of org.mmiklas.resttutorial.model.Gender 
from String value 'FRED': value not one of declared Enum instance names.....

Traditional try-catch Approach

Traditional try-catch approach has several problems, and well .... no advantages:
  • Different methods throw the same exception, and have similar catch block, to handle it. This code could be extracted as separate method, but still you need to relay on developer, to call it. On the end there is no guarantee, that the same exception will be always handled in the same way. The goal of consistent error handling is, to map given exception to the same REST representation - MessageVorbidenException results always in 400.
  • Catching Exception is mostly bad practice. If we define new checked exception, we also expect that it will be handled by developers in dedicated way - at the begining code should not compile, since there is no catch block for this new exception. But in case of catch(Exception e){....} code will compile and our new exception will be treated as general error, and not in a special way as expected. We might even not notice that.
This is the REST service implementation using try-catch statements:
@Named
@Path("/Hello/catch")
public class HelloRestServiceCatch {

    private final static Logger LOG = Logger.getAnonymousLogger();

    @Inject
    private HelloSpringService halloService;

    // curl http://localhost:8080/resteasy_spring_p2/rest/Hello/catch/text?msg=Hi%20There
    @GET
    @Path("text")
    @Produces(MediaType.APPLICATION_FORM_URLENCODED)
    public Response sayTextHello(@QueryParam("msg") String msg) {
        try {
            String resp = halloService.sayTextHello(msg);
            return Response.ok(resp).build();

        } catch (MessageForbidenException e) {
            return handleMessageForbidenException(e);

        } catch (IncorrectLengthException e) {
            return handleIncorrectLengthException(e);

        } catch (Exception e) {
            return handleException(e);
        }
    }

    // curl -X POST -H "Content-Type: application/json" -d '{"msg":"Hi There","gender":"MALE"}'
    // http://localhost:8080/resteasy_spring_p2/rest/Hello/catch/javabean
    @POST
    @Path("javabean")
    @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
    @Consumes({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
    public Response sayJavaBeanHello(HelloMessage msg) {
        try {
            HelloResponse resp = halloService.sayJavaBeanHello(msg);
            return Response.ok(resp).build();

        } catch (MessageForbidenException e) {
            return handleMessageForbidenException(e);

        } catch (IncorrectLengthException e) {
            return handleIncorrectLengthException(e);

        } catch (Exception e) {
            return handleException(e);
        }
    }

    private Response handleException(Exception e) {
        LOG.log(Level.WARNING, e.getMessage(), e);

        return Response.status(Status.INTERNAL_SERVER_ERROR).header(Headers.EX_CLASS.name(), 
                e.getClass().getCanonicalName())
                .entity(e.getMessage() + " - " + getStackTrace(e)).build();
    }

    private Response handleIncorrectLengthException(IncorrectLengthException e) {
        return Response.status(Status.BAD_REQUEST).header(Headers.ERR_CODE.name(), 
                RespCodes.INCORRECT_LENGTH.name()).entity(e.getMessage()).build();
    }

    private Response handleMessageForbidenException(MessageForbidenException e) {
        return Response.status(Status.BAD_REQUEST).header(Headers.ERR_CODE.name(), 
                RespCodes.MESSAGE_FORBIDDEN.name()).entity(e.getMessage()).build();
    }

    private String getStackTrace(Exception ex) {
        StringWriter sw = new StringWriter();
        PrintWriter pw = new PrintWriter(sw, true);
        ex.printStackTrace(pw);
        return sw.getBuffer().toString();
    }
}

Exception Mappers

The approach with exception mappers has few nice futures:
  • Clan code - REST implementation is not overflowed with try-catch blocks
  • No code repetition
  • Centralized and automatic mapping from Java exceptions to REST representation
  • Each exception will always result in the same REST response - this improves interface stability and integrity
  • No Exception catch blocks - check style will be happy about that ;) 
How it works?
REST resource class do not catch exceptions, they are simply declared in method signature. By default RESTEasy would convert such exception to "500 Internal Server Error", since it was not handled by application logic.
In order to handle exceptions we need to register exception mappers. Each one can handle single exception, and also its subclasses - if there is no dedicated handler for particular subclass. For example: we can register handler for Exception and it will get called for every possible exception inclusive runtimes, but we can still register handler for IncorrectLengthException (subclass of Exception), and in this handler will be called when IncorrectLengthException occurs. This was not always the case - in the older RESTEasy versions, exception mapper registered on parent class would be called for all child exceptions, so it was not possible to register exception mapper on Exception because other mappers would not be called.

RESTEasy looks for classes marked @Provider and implementing ExceptionMapper<E>, where E declares exception class which will be handled by this mapper. In the normal case, RESTEasy would scann class path, but since we are using Spring integration, it will ask Spring instead. This is the reason, why mapper classes are annotated with @Named and @Provider.

Updated REST Resource (only exceptions in method signature):
@Named
@Path("/Hello")
public class HelloRestService {

    @Inject
    private HelloSpringService halloService;

    // curl http://localhost:8080/resteasy_spring_p1/rest/Hello/text?msg=Hi%20There
    @GET
    @Path("text")
    @Produces(MediaType.APPLICATION_FORM_URLENCODED)
    public Response sayTextHello(@QueryParam("msg") String msg) 
            throws MessageForbidenException, IncorrectLengthException {
        String resp = halloService.sayTextHello(msg);
        return Response.ok(resp).build();
    }

    // curl -X POST -H "Content-Type: application/json" -d '{"msg":"Hi There","gender":"MALE"}'
    // http://localhost:8080/resteasy_spring_p1/rest/Hello/javabean
    @POST
    @Path("javabean")
    @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
    @Consumes({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
    public Response sayJavaBeanHello(HelloMessage msg) 
            throws MessageForbidenException, IncorrectLengthException {
        HelloResponse resp = halloService.sayJavaBeanHello(msg);
        return Response.ok(resp).build();
    }
}
Exception mappers:
@Provider
@Named
public class IncorrectLengthExceptionMapper implements 
                                    ExceptionMapper<IncorrectLengthException> {

    public Response toResponse(IncorrectLengthException e) {
        return Response.status(Status.BAD_REQUEST).header(Headers.ERR_CODE.name(), 
                RespCodes.INCORRECT_LENGTH.name()).entity(e.getMessage()).build();
    }

}

@Provider
@Named
public class MessageForbidenExceptionMapper implements 
                                    ExceptionMapper<MessageForbidenException> {

    public Response toResponse(MessageForbidenException e) {
        return Response.status(Status.BAD_REQUEST).header(Headers.ERR_CODE.name(), 
                RespCodes.MESSAGE_FORBIDDEN.name()).entity(e.getMessage()).build();
    }

}

@Provider
@Named
public class UnhandledExceptionMapper implements ExceptionMapper<Exception> {

    private final static Logger LOG = Logger.getAnonymousLogger();

    public Response toResponse(Exception e) {
        LOG.log(Level.WARNING, e.getMessage(), e);

        return Response.status(Status.INTERNAL_SERVER_ERROR).
                header(Headers.EX_CLASS.name(), e.getClass().getCanonicalName()).
                    entity(e.getMessage() + " - " + getStackTrace(e)).build();
    }

    private String getStackTrace(Exception ex) {
        StringWriter sw = new StringWriter();
        PrintWriter pw = new PrintWriter(sw, true);
        ex.printStackTrace(pw);
        return sw.getBuffer().toString();
    }
}

Project Source Download

resteasy_spring_p2.zip

https://github.com/maciejmiklas/resteasy_spring_p2.git