Académique Documents
Professionnel Documents
Culture Documents
Prof. Li Erran Li
Adjunct Professor
Columbia University
nd
503-West, 122 Street
lierranli@cs.columbia.edu
1. Analysing and processing sensor data in the We debated over offloading our processing of
Android app data for extracting relevant information from
the Android app to the server. We followed
2. Analysing and processing sensor data in the our debate by two test cases as to how much
server the power is consumed when using both are
software developers to see the impact of
The first set of experiment that we used for design changes on power efficiency.
our analysis was using the power tutor app in Application users can also use it to determine
the android phone for testing as to how much how their actions are impacting battery life.
power was consumed as we worked our way PowerTutor uses a power consumption model
through single queries on S2AS using firstly built by direct measurements during careful
just the android app and then secondly control of device power management states.
offloading the processing on the server itself. This model generally provides power
We picked four random locations of our consumption estimates within 5% of actual
choice and conducted this experiment 4 values. A configurable display for power
times. consumption history is provided. It also
provides users with a text-file based output
PowerTutor is an application for Google
containing detailed results. You can use
phones that displays the power consumed by
PowerTutor to monitor the power
major system components such as CPU,
consumption of any application. PowerTutor's
network interface, display, and GPS receiver
power model was built on HTC G1, HTC G2
and different applications. The application
and Nexus one.
allows through our website the number of
sensors and the person that is registered with Each hardware component in a smartphone
our app and the number of samples for the has a couple of power states that influence
sensors to be asked for. Once this information the phone's power consumption. For
is attained, the django server sends a request example, CPU has CPU utilization and
message containing the metadata from the frequency level, OLED/LCD has brightness
website to the Twilio API. Twilio API routes levels. The power model in PowerTutor is
the message to the android app which then constructed by correlating the measured
requests the sensors mentioned in the power consumption with these hardware
message and gets the sensor data from them. power states. The power states we have
Android app then sends the information to considered in the power model include:
the RDS ( MySQL ) database that we maintain
in the cloud on Amazon EC2 machine . Finally - CPU: CPU utilization and frequency level.
our django server then extracts the specific
data from the RDS database and analyses it in - OLED/LCD: For hardware with LCD screen
server to gain some meaning out of the data and OLED screen without root, we consider
as per our use case. This information is finally brightness level; for hardware with OLED
displayed on our website for the user to screen with root access, we consider
check. brightness together with pixel information
displayed on the screen.
- Wifi: Uplink channel rate, uplink data rate 4.2.Thresholds
and packets transmitted per second.
We developed three use cases for our
- 3G: Packets transmitted per second and platform to provide with examples as to how
power states. our architecture can actually be used for a
number of applications. We plan to develop
- GPS: Number of satellites detected, power an API for S2AS so that it can in future easily
states of the GPS device (active, sleep, off). be integrated into any project thus
abstracting them from the interior details.
- Audio: Power states of the Audio device (on,
Some problems that we ended up facing were
off).
while processing data coming from the
The second set of experiment was physically sensors we needed to extract information or
keeping track our the battery consumption as meaning from the data and hence calculated
we worked our way through running single thresholds for each sensor data to correspond
query on our app and then multiple query on to an activity that would be most accurate.
our app on both just the android app and then
offloading the entire processing and analysis
For example in case of an accelerometer we
of data to the remote server that we
had a lot of raw data that was coming in
maintained on the EC2 machine. We ran the
terms of :
query for an hour flat therefore total of two
hours where each hour was spent on running 1. x : force acting on the x axis minus the
the single query and then multiple query and gravity
keeping track of battery and performance. 2. y : force acting on the y axis minus the
gravity
The results that were returned to us using 3. z : force acting on the z axis minus the
both these test cases for multiple locations gravity
echoed the same thought process that we had
envisioned. These values are raw data and correspond to
no unique activity. We proceeded to calculate
There was a very slight drop in terms of power
the G_force on the object using the x, y and z
consumption which was expected factoring in
value which basically corresponds to the force
the power consumption on sending the
acting on an object in the x, y, and z plane
sensor data from the device to the database
minus the gravity and reveals the push on it.
in cloud over the network when we tested for
single/multiple query over the local server i.e.
G_force = (x)^2 + (y)^2 + (z)^2 / (9.8)^2
in the Android as compared to the remote
server that we setup in the cloud using
django. Even though the power consumption The above is the actual true force acting on
was far less than we had envisioned it was still the body. Once we calculated the threshold
more efficient model than processing we wanted to identify specific incidents that
information in the android app itself. We, correspond to certain thresholds of G force.
therefore proceeded with our model of Without any reference , we faced major
offloading all the processing to a remote testing cycles to calculate the most optimum
server rather than doing it all in our local value of G_force for specific incidents. We
database. identified three scenarios for which we aimed
to calculate the most optimal threshold. After single/multiple query scenario where we
extensive testing of about 30 hours we came conjugate the inputs which in this case is the
down to the following values for our three data coming from the sensors. We use
scenarios: Boolean operator ‘AND’ wherever necessary
and ‘OR’ operator to evaluate the rest.
1. Accident (car) x>= 3.0 (G_force) The logic for such an analysis is that we
2. Jerk 1.2 <= y >= 2.1 calculate the first input of the multiple query
3. Walking .8 < z > 1.1 and if :
where:
1. In ‘AND’ query you calculate the first
1. x – Accidents input and if it is false then we ignore
2. y – Jerk the rest of the query no matter how
3. z – Walking. long it is because now no matter what
it is always going to be false. We
tested this in comparison to our
Thus we resolved to some optimum values for
normal query evaluation where we
three scenarios which our use cases utilized to calculate all the inputs one by one
reveal some information about the scenarios. and then check the power
Similarly we had to resolve most others consumption. Our analysis is based on
sensors and attach specific incidents to them both the test cases that we
which our use cases utilized. We would like to mentioned where we check using
power tutor and physically checking
clarify that even though we worked on the
the battery.
G_Force but that was to illuminate our use 2. In ‘Or’ query you calculate the first
case and we do not aim to dwell our time term and in case that is true then we
extracting information from these sensors but ignore the rest of the query because
provide the sensors to all the users who might in all probabilities it is going to be true
want to use it . in any case and hence we don’t need
to calculate it. We compared this to
4.3.Query Optimizations our simple model where we compute
all inputs .
Our most important optimization to provide
the sensor data in the most optimal way for
The following figure gives the actual power
the server to process it was to do query
readings for the ‘AND’ - ‘OR’ Query model
optimization to save on computational
explained above.
expense. We have our own four different
optimization models which we devised to
make the most optimum model. The models
have been described below in more detail:
RDS is a service offered by Amazon to makes c) Android client: The app is built for the
scaling a relational database on the cloud android platform. The app is responsible for
easy. RDS in this project was used to store collecting data from the sensors on the
data in five tables. phone. After registration, the app starts a
service in the background that monitors the
“myApp_wreckwatch” table was used to sensors.
store accelerometer information such as
displacement along 3 co-ordinate axis, time of d) Twilio Cloud Communications: The python
reading and a unique id for the reading. API of Twilio was used to send calls and SMS
messages to the contacts of the user in case
“myApp_contacts” table was used to store
of an emergency. This API integrates well with
contact information i.e. contact phone the backend Django framework to provide
numbers. this functionality.
“myApp_users_userscellnumber” table was
used to store users and contact information
6. Challenges associated
i.e. relation between each user and his There are many challenges associated with
corresponding contacts. implementation of this idea in the form of a
running service. Some of them that we faced
“myApp_users” table is used to store a
are listed below. Some others that may be
unique id for each user of the app.
potentially cumbersome challenges while
“myApp_battery” table is used to store implementing apps or end-user product that
battery levels along with time information. uses Sensor-as-a-service feature are also
listed below.
Two Amazon EC2 machines were used.
Tomcat Servlet was deployed on windows EC2 6.1. Challenges during design:
machine, which was used for the
communication between Android client and The main challenge during the architecture
RDS. The other one was a Linux EC2 that was design phase was to weigh the two brightest
used for deploying the Django application. options that seemed logically the best ones.
They were whether to follow the pull based
b) Django: The Django web server, written in approach or the push based approach. This is
python, handles the computation of the the main crux of the architecture. Referring
readings from the phone. For our project the WreckWatch paper which is the
purpose we only wanted the database par to inspiration behind building our test dummy
the Django web framework. Therefore we app, we figured that through their testing in
used a restricted model of Django called the paper, the authors found out that for
stand-along Django. simpler upload tasks, it is better to have a pull
based approach rather than the push based
Django was used in the design to reduce the
one. This was based on the inference that in
computational complexity overhead on the
case of push based approach (where the app
client and therefore reduce battery
on constantly pushed sensor data on the
cloud database), the client side cell phone coming in different shapes and sizes, meaning
battery usage and power consumption would a lot of knowledge can be conveyed by very
be futile when there is no need for the data to little information at some times, whereas
be posted on the database as there are no draining the entire battery to get the
jobs requested by a user that needs the appropriate data was sometimes insufficient.
sensors. To avoid this, we decided to follow These inconsistencies led to a wide range of
the pull based approach (where the backend case handling mechanisms in terms of
server on the cloud requests particular separate sensing modules having their own
sensors info for a particular time period at a separate data upload modules. In short, the
particular sampling rate, all computed on the main challenge is generalizing the entire
backend) which saves all the unnecessary single process of say sensing or uploading or
power consumption on the sensing devices. processing. These cannot be generalized due
to the different and extremely wide nature of
the use cases that are made possible by
offering sensors of the smartphones as a
6.2. Challenges during discussion about
service.
use cases:
6.3. Challenges during implementation:
We believe this was the most important
challenge in our process where in we had to As stated above, the most important
consider the possibilities of apps that can be challenges were figuring out the exact
built and the range of use cases that can be architecture and module structure of the
covered by the architecture of the design. In framework that we would implement so as to
order to consider the coverage of the broad satisfy most possible test cases for achieving a
span of use cases that we had figured, it was a wide coverage of different possible sensors
huge hurdle to finalize on the right methods domain.
to implement the architecture.
The approach that we pursued is that of not
Firstly, due to the variance in the separate having any computation to be done on the
modules of the architecture poises problems phone at all. Computation as in the processing
for generalizing a method that several of which sensors are required, how much
different use-cases can share and re-use. This sampling of that sensor data, for how long
led to separate tracks of varied versions of the and in what order of upload to the database
same module being formulated for taking care on the cloud was all decided on the server
of different possible sample spaces in that backend, which is again on the cloud. By
model. For example, in case of the sensor for simple message passing mechanism and
accelerometer to be accessed, the sampling better performance monitoring and
space was supposed to be quick for achieving optimizing this primitive idea of light
reasonable accuracy, however, for battery, computation at the client end and quicker
there was a separate heuristic to sample the data upload and processing at the backend,
monitoring events. There had to be a separate the system can be optimized even further,
listener for example in order to listen for which of course, can be further improved
updates in battery status. which is the next big challenge
Secondly, the data format for different use 6.4. Challenges during testing:
cases that we formulated, had a potential of
This was quite a unique challenge requiring a 6.4.2. Django backend server on Amazon’s
separate set of mechanisms for testing. The EC2:
main testing that needed to be done was of
the following characteristics of the dummy Separate from the EC2 web service that had
servlets for handling data uploads to the RDS
app that we built:
database, we had to run an EC2 instance that
6.4.1. Connecting Android phones to hosts the web-app that the end user would
Amazon’s RDS: use to access the data in many possible ways.
This web-app was hosted on an EC2 server on
This was a major hurdle in the project as the cloud which makes it highly scalable and
uploading the data in the fastest possible we implemented it in Django. The web-app
manner had to be of utmost priority. Initially, was required to process the data that was
the approach that we followed was to being uploaded onto the RDS and send alerts
connect the RDS directly to the Android whenever required. Since only the database
device through SQL connectors for Java. On part of being django was being utilized , it
experimentation, it turned out that the needed to be implemented as a stand-alone
connection and the drivers that run the django script. A web-app implemented with
connection for Java to AWS RDS are too heavy the django framework, follows the
for the light devices like a smartphone with MVC(Model View and Controller) design. The
limited computing resources. Because of this, models.py consists of the database part,
the data upload rates were far less than views.py consists of the business logic and
satisfactory to judge based on them whether different HTML pages , which are loaded
or not an accident has occurred or any other
according to how a given web request is
even for that matter. processed in view.py. The challenging part
The solution to this we figured out was to was to figure out how to make the back-end
have an intermediary servlet connector, work like a normal web-app without views.py
hosted on the cloud web server on an AWS file as there would be no web request that
EC2 machine, which directly entered data that could be processed in views.py.
it receives to the RDS. Hence, the connectivity This was successfully achieved by running the
overhead is now that of the high computing backend as a stand-alone application which
resourceful virtual machine on the cloud. As doesn’t require a separate views.py file with
for transferring the sensor data from the the business logic and a web request to
phone to this sort of intermediary web process it. The models.py contains all the
service, a simple lightweight HTTP request tables present in the database and the
was enough. This request had a JSON object program that needs to be run continuously
embedded within it which had all the needs to be placed in the manage.py file. This
necessary data. resolved the problems we were facing
This resolved the major challenge that we because of there not being an explicit web
faced and we achieved far better (almost 20 request for the web-app.
times faster) data transfer rates at around 5 We faced a lot of challenges while working on
records of accelerometer (the costliest sensor django as we had to tailor django as per our
upload unit) per second, which was down to needs and the project needs. In stead of a full
0.2 records per second in case of a direct RDS blown django we only required the database
connection. part of django therefore we have used the
database and the django functionality within same could not be said for the data that we
the framework. record from Microphone and GPS.
Due to the obvious reason of it being a very In case of the microphone data that we
special case there is not enough record, firstly, it starts the Voice Recorder app
documentaion available for a standalone in the android phone, which means there is
django and therefore we faced a lot of no simpler or less power consuming way to
challenges with the documentation record surrounding sound. Further, the data
availability and had to resort to a lot of that we received had huge analytical
testing and experiments to get what we challenges in that analysing the frequency or
actually wanted from the django aspect of the amplitude, syncing the data in terms of exact
project. Yet, it was a very satisfying time to the accelerometer data to eliminate
experience which covered all the parts of false positives and false negatives as well, so
django. as to confirm that a crash has occurred.
6.4.3. Accelerometer threshold: In case of the GPS, it was the most power
consuming operation, for which the main
Setting the threshold value right for detecting challenge was to devise a plan to pull the data
the jerks and the G-forces on the phone when
from GPS sensors in the phone as scarcely as
being operated normally and the sensors possible. To do this, we thought of using the
being activated was another testing issue that accelerometer data and check if there is a
we faced. According to the Wreckwatch constant movement in the “x” direction (x
paper, the authors calculated that a G-Force represents horizontal movement parallel to
value observed to be 3.5 was necessary to the ground). If there is a constant movement,
conclude that the observation was from a
then there is a high chance that the GPS
crash or accident as it was a serious jerk which location of the device has shifted. So
we were unable to replicate. As a result, we depending on this, we propose to fetch data
lowered the threshold and had to perform from the GPS sensor. We did not implement
several experiments with the phone being this as t was out of the scope of our dummy
tossed around for getting the data good app, but we propose this so that this can be
enough for the triggering of the event of an one of the very good use cases that can be
accident. For our dummy test app and implemented using the Sensor-as-a-service
demonstration settings of our prototype of platform as the data source.
the backend, we set the threshold to be
lesser, at around 1.75 which is good enough 7. Results
to detect moderate to heavy jerks (simulated
by throwing the cell phone). The results that we were returned during and
on concluding our project were very
Getting this parameter right was one of the satisfactory and as per what we had in our
main challenges as well. mind as we set out working on our project.
6.4.4. Microphone and GPS data: The results can be compiled in serial order as :
In general, getting the data from all the 1. We were successfully able to achieve
sensors (both hardware and indirect) a performance boost by offloading
mentioned in section 2 above was relatively our analysis and processing of the
more structured and concrete. However, the data to our server, which was remote.
Our performance increase as block his sensor service so that in any
compared to the normal power case when he doesn’t want to stream
consumption was minimal and not his sensors, he can stop it manually.
very huge but there still was some
kind of performance increase. Please find below a few graphs that illustrate
2. Our optimization over the query the accuracy for our use cases:
analysis improved our performance
by a huge margin. We were able to
achieve sizeable performance margins
using our models of query analysis,
‘AND’ and ‘OR’ queries gave us
performance boost of up to 60 % and
12 % . We coupled that with our
cheapest sensor first model to give us
additional performance increase of 10
%.
3. We were able to provide three uses
cases to justify the use of our
platform for use in various projects. The graph above shows the accuracy vs.
We have identified various other use number of samples plot for the battery alert
cases that can actually use our project use case.
and API once it is built on top of our
platform. These use case range from
very simple applications to very broad
sophisticated ones as described in our
Use Cases section.
4. We were able to achieve an accuracy
of about 77% over all three test cases
we implemented using up to 40
samples per test cases and exhaustive
testing.
5. We have used Amazon cloud services
in terms of amazon S3, Cloud Front, The graph above shows the accuracy vs.
Amazon EC2, Amazon RDS database number of samples plot for WreckWatch use
for storing data in the MySQL case.
database that we have in our
instance. The most positive thing
about this feature was that we could
scale up and down almost at need
with the infrastructure. We had
unlimited amount of storage and
infrastructure at our disposal as we
needed it.
6. Though as we would like to make it
clear we are not focusing at all on the
security of the app and we assume
that the user who has registered with
The above table indicates the average power
our service is ready to provide his
consumption for the listed sensor for the
sensor information at all times to
following cases:
anyone who requests it, we also have
(L) Single Q – Single query on local host.
provided the user with the option to
(S) Single Q – Single query on the server.
(S)Multiple Q – Multiple queries on the server. We repeated the experiment about 50-60
(L)Multiple Q – Multiple queries on local host. times to test if an accident/jerk is being
detected correctly. For this, we had to
observe the RDS tables and check if the data is
updated correctly and if the analysis on that
data satisfied the necessary criteria of an
accident/jerk.
8. Testing
The code for sending messages and calls using
Various aspects of the application have been Twilio API was tested by sending SMS
tested thoroughly. Testing could be broadly messages and calls to friends and team
divided into few categories. These include members. Friends of each teammate were
client-side testing, sending data to RDS, back- registered as a contact for that teammate
end processing, Twilio messages and calls. who was currently registered with the app.
This way, we could make sure if alerts are
One of the challenges we faced was that we going to the correct contacts and not to
could not check sensor and battery level everyone, whose details exist in the database.
readings on the emulator. The testing had to
be done on a real android phone. The user’s contacts get notified when his
phone discharges below a certain threshold.
As and when data was being sent to the RDS, For this, the value chosen was 10%. This part
it had to be taken care that processing was of testing could not be done on the emulator,
performed on the latest data. For this, we since the battery of the emulator is always set
passed the time-stamp of the readings along to 50%. We used a real android phone to do
with the sensor reading values. This way this testing and decreased its battery below
analysis is accurate since alerts (calls and 10% and observed the population of the data
messages) are sent immediately. in the RDS tables. This threshold was changed
to 20% and 30% and it was made sure that it
The G-Force formula, which is used, is
works for any value.
originally used to detect real-time accidents.
Hence, the G-Force threshold to detect such 9. Concluding Remarks:
impacts is very high. For our testing purposes,
we had to reduce this threshold to an optimal With the growing popularity of smartphones
value of 1.4 from 3.5, so that small jerks could providing the sensors information as a service
be detected. Doing multiple experiments and on the Cloud could provide app developers
trying with jerks of different magnitude led us with endless opportunities to develop
to choose this value. We had to choose a interesting and useful use cases.
threshold value such that detection of very
small jerks could be avoided. A thing that needs to be kept in mind is
although making data available on the Cloud
This optimal value was good enough for small is beneficial in so many ways it comes with
jerks such as dropping the phone from a the huge responsibility of keeping that data
certain height. This greatly helped in testing secure. The availability of the data should be
because it would have otherwise been controlled such that the privacy and the
impractical to create a real accident. security of the user are in no way
compromised.
Certain measures that Batsignal can take in precautions are considered very useful use
this direction are as follows: Login feature cases can be implemented.
should be included in the app so that only the
user can select the alert services and his 10. References:
emergency contacts, and so that the supplied
[1]. WreckWatch: Automatic Traffic Accident
information cannot be changed by anybody
Detection and Notification with Smartphones
else. A must add feature is seeking permission
from the person who has been registered as http://www.dre.vanderbilt.edu/~jules/wreck
someone’s emergency contact, to send alerts. watchj.pdf
This would make sure that the app is not
misused to unnecessarily flood someone’s [2]. Stack Overflow – A lot of difficulties that
phone with messages and calls. we faced while programming in android were
cleared by referring to this website.
Apart from security certain other features can
be added to ensure that Batsignal is one of It was also helpful while programming the
the sought after apps on the app store. For backend where the framework Django needed
example in the WreckWatch alert apart from to be used just to connect to the database
the accelerometer data we can also use and process its value.
microphone data to check for heightened
www.stackoverflow.com
noise levels after the high G force was
detected by the server. If that were to be [3]. Sandy Walsh (Blog) – This was referred to
recorded then it could be told with higher understand how to use Django framework for
certainty that the user might have met with a Stand-alone application.
an accident. Another feature that could be
added would be to record the user’s GPS data http://www.sandywalsh.com/2010/05/packagin
g-django-application-as-stand.html
find out his exact location and along with a
call to the emergency contacts also a make a [4]. Django – The official website for the
call to the nearest medical help available. framework.
http://aws.amazon.com/documentation/rds/
http://www.coderanch.com/t/366326/Servlet
s/java/Writing-JSON-data-client
http://aws.amazon.com/documentation/
http://developer.android.com/guide/topics/s
ensors/sensors_overview.html
http://www.twilio.com/docs
http://ziyang.eecs.umich.edu/projects/power
tutor/documentation.html