Vous êtes sur la page 1sur 7

Work Outline Thinking about where the work was on the iOS app, it broke down into these

areas :-

1. Data: courses, lessons, server JSON api, on-device caching and removing redundant lessons, and logon. 2. Downloading the json and mp3's - you'll need a decent HTTP client that allows you to access cookies, http headers, download progress and error handling, it also needs to handle SSL, 3. Scanning the mp3 files searching for silence - for this you'll need to get to the actual PCM frame data out of the mp3 file. Hopefully the platform API's can do this for you or you've got a serious amount of work ahead of you. I can send you details of the algorithm and parameters that then works well to detect the start and end of the silence. You'll need to persistently store this index of gaps so that during playback you know when one is coming up. It is quite CPU intensive so you just do it once per file and then save it. 4. Listening, whilst the app is listening for speech, you'll need access to raw microphone frame data at 8 - 16kHz and I have a good algorithm that detects speech against the background noise so you can use it whilst driving (don't tell Aran !) . 5. General UI and audio playback should be fairly easy but correctly switching between audio playback, audio recording and shutting audio off correctly during all possible events receiving a call, terminating the app, screen lock etc takes a bit of care to get right (on iOS at least), 6. Posting completion to Twitter and Facebook, easy just uses platform API's,

The hardest bits are 3 and 4, the UI is easy but there's more work there than you might think to handle background downloading and then updating the UI afterwards. Stu should have the graphics from the iOS app and some sample json files to get started. I'll start by documenting the data structures and the JSON api's because you'll need these early on and then move on to the audio stuff. Application URLs For the JQuery/Mobile HTML5 landing page that shows on the 2nd tab in the app :#define CONTENT_URL @"http://site.saysomethingin.com/mobile" For the main course content JSON api :#define COURSES_URL @"https://site.saysomethingin.com/api/v1/communities/welsh-for-englishspeakers" The logon URL :#define LOGON_SUBMIT_URL @"https://site.saysomethingin.com/api/v1/sessions"

The Landing Page

The landing page will have to change at some point because we'll probably want a different landing page for Android than iOS because on the useful external resources sections, there is an AppStore link to a Cymraeg/English dictionary app which wouldn't make a lot of sense on Android. The landing page has tailored content if you are logged in, currently just your name but this will be things like forum posts and it knows that you are logged in by checking your session cookie. The cookie is set by the logon operation and, on iOS, this value is remembered so the WebView automatically inherits this value which is great. From my knowledge of Android, I don't think this happens automatically so as part of the logon logic, you will need to retrieve the cookie value (actually it's the same as the 'token' which we will come to soon) and then when you start up the webview, you'll have to manually add a http header for the cookie with the correct name and token value. This will be on the initial GET request for the landing page URL. Sorry I don't know what the cookie name is so you'll have to sniff the traffic to find it. Everytime the app returns to the 2nd tab, it always calls the initial landing page url - this provides a way to get back to the home page even if you've navigated off somewhere. There is another useful feature on the iOS app where if you rotate the screen to landscape mode, the app removes the tab bar so you get full screen - this is useful for watching Hwb videos. It puts it back on again if you rotate back to portrait. This was awkward to get working on iOS but hopefully might be easier on Android. And that's all there is to the 2nd tab ! The Logon Process If the user has supplied a username and password on the settings screen, the app logs on; i) before requesting the courses files and ii) everytime the webview is shown. It remembers when it last logged on and just skips the process if it's within 1800 seconds (half an hour) to avoid pointless traffic on the server as the token and cookie values will still be valid. Currently the server never times users out but a 60 minute timeout may be implemented in the future. Logon consists of sending a HTTP POST request with the following parameters @{@"email_or_username": username, @"password": password} to the logon URL above (note SSL). This will then return a JSON response containing the following attributes :valid: boolean true or false (to indicate whether logon was successful or not) person: string (which I don't use) token: string (crucially important as it needs to be passed on all future data requests) and you may need to set this value in a cookie for the web view for the landing page. Data calls The main entry point into the server API is https://site.saysomethingin.com/api/v1/ ... h-speakers Try calling it from a browser (Chrome with JSONView extension is good). This lists the available courses that the user is able to access - it knows the user by checking for a HTTP Header like this :-

Authorization: Token token="token_value", where the token value is the one that you received from the logon operation (described in the Application URL's post). Unauthenticated users can access course 1 and the weekly practices. Subscribers can access all 3 courses, weekly and daily practices. The JSON response consists of various sections but the main one you are interested in is 'courses' :courses: [ { identifier: "course-1-cyen", title: "Course 1", access_level: "open", url: "https://site.saysomethingin.com/api/v1/communities/welsh-forenglish-speakers/courses/course-1-cyen", page_url: "https://site.saysomethingin.com/communities/welsh-forenglish-speakers/courses/course-1-cyen", media_items: [ ] }, { identifier: "weekly-practices-cyen", title: "Weekly practices", access_level: "member", url: "https://site.saysomethingin.com/api/v1/communities/welsh-forenglish-speakers/courses/weekly-practices-cyen", page_url: "https://site.saysomethingin.com/communities/welsh-forenglish-speakers/courses/weekly-practices-cyen", media_items: [ ] } ] which is an array of the available courses that the user can access. You'll need to parse this (= good JSON parser required) and keep an array of course titles and identifiers for use on the settings screen where they select the active course. Persist this JSON in local storage so the app can work without an internet connection. For each of these courses in the array, you'll need to fire off another HTTP request (with Authorization token) for the url in the url: attribute. This will retrieve a list of lessons. Each time the app starts up or becomes active from the background the app checks to see if at least an hour has gone by since it last checked and then fires off the 'courses' request again. It then iterates through each course to retrieve the lessons and if any have changed, it saves the updated json files locally and updates the user interface. I'll cover the individual course detail in the next post, For each course, I have an instance of a Course class with these properties :Course { NSString * identifier; NSString * title;

NSString * description; NSArray * lessons; } and each lesson looks like this Lesson { NSString * identifier; // comes from JSON NSString * title; // comes from JSON NSString * notes; // HTML formatted notes - comes from JSON NSString * url; // of the mp3 file - comes from JSON int listened; // number of times the lesson has been completed - local persistence Boolean doneSocialPost; // has the user posted completion to Twitter of Facebook - local persistence double position; // current position in the lesson in seconds e.g. 324.2 - local persistence NSArray * gaps; // index of gaps in the mp3, always pairs: start and end of gap - local persistence } There is some mapping required and logic to get from the JSON to this and some validation is required so that the app doesn't get screwed if a mistake is made on data entry on the server side. The mp3 files are downloaded to the app's caches directory on iOS which means that they could be automatically removed if the device gets low on storage. Since each lesson is around 15MB in size, storage could well be an issue on low-end Android handsets as many only have 256MB of internal memory. You will probably need to ensure that these are saved to the SD card or the user won't thank you for rendering their phone inoperable. I'm guessing that there isn't the equivalent of an automatic storage cache on Android, so instead you'll probably need to offer the option to manually delete lessons which the iOS app doesn't have. Here's a sample of the course JSON :{ identifier: "weekly-practices-cyen", title: "Weekly practices", access_level: "member", url: "https://site.saysomethingin.com/api/v1/communities/welsh-for-englishspeakers/courses/weekly-practices-cyen", page_url: "https://site.saysomethingin.com/communities/welsh-for-englishspeakers/courses/weekly-practices-cyen", media_items: [ ], content_text: "<h1>Weekly practices</h1>Last uploaded on 13th January 2013.<br>", lessons: [ { identifier: "weekly-practices-listening-cyen", title: "Listening", access_level: "member",

content_type: "lesson", url: "https://site.saysomethingin.com/api/v1/communities/welsh-for-englishspeakers/courses/weekly-practices-cyen/lessons/weekly-practices-listening-cyen", page_url: "https://site.saysomethingin.com/communities/welsh-for-englishspeakers/courses/weekly-practices-cyen/lessons/weekly-practices-listening-cyen", course_url: "https://site.saysomethingin.com/api/v1/communities/welsh-for-englishspeakers/courses/weekly-practices-cyen", media_items: [ { identifier: "delistenx2-2701783", title: "Southern Welsh Listening Practice", content_type: "audio/mp3", media_type: "audio", file_size: "2734144", notes: "", region: "south-cyen", file_name: "DeListenx2.mp3", url: "https://site.saysomethingin.com/api/v1/communities/welsh-for-englishspeakers/media_items/delistenx2-2701783/file_path" }, { identifier: "goglistenx2-2701909", title: "Northern Welsh Listening Practice", content_type: "audio/mp3", media_type: "audio", file_size: "2702811", notes: "", region: "north-cyen", file_name: "GogListenx2.mp3", url: "https://site.saysomethingin.com/api/v1/communities/welsh-for-englishspeakers/media_items/goglistenx2-2701909/file_path" } ] }, I use the course identifer as the primary key which forms the filename for local json file and also in the app to indicate which course is currently active. Title is the course title on the top of the lessons list (with the region in brackets) and also as the description on the course selector in the settings screen. Then you want the array of lessons, the identifier is crucially important and is unique across all courses. I use this for the filename for mp3's and gap file. If the identifier changes then you need to download this lesson again because it's content will have changed e.g. weekly practices. When this happens you will then have redundant mp3 and gap files on the device that are no longer referenced by any courses - you need a tidy up job that runs periodically and deletes any unreferenced mp3 and

gap files to free up space. The title is used on the lesson list screen and on the play screen. The next important bit are the media_items: these are the actual mp3 files (and associated notes). depending on which region is active (North [north-cyen] or South [south-cyen]) you choose the appropriate download url and notes to populate your lesson object. If there is no media file for the chosen region, then delete the lesson from the list - this does happen from time to time. When the user requests the download of the mp3 file, remember to use the authorization token on the GET request or they will be refused if it's for course 2 or 3. Just for clarity, I use the media item identifier as the primary key for the lesson rather than the lesson identifier. I also check that the type is audio before using the media item, just in case other types get added at some point. Gap Detection The player needs a 'gaps' file which is basically an array of doubles (floating point values). Each pair of values corresponds to a gap in the mp3 file i.e. lengthly silence intended for the user to be speaking. The first value in the pair is where the gap starts (in seconds) and the second value is where it finishes. If listen mode is on during playback, when the player gets to the first value, it pauses playback and starts the listener . When the listener has finished, the player advances to the second value and resumes playback and then gets ready for the next pair in the array. So you need the index of gaps before you start playback. When you download the mp3 file, as soon as the download completes, the app shows an unpacking icon but what is actually happening here is the gap detection to create the gaps file. The mp3 files consist of stereo audio at either 22kHz or 44kHz - be warned that there lots of variations in recording levels and quality which is why it was hard to get this working reliably. Each 'frame' consists of a 16 bit signed value for the left channel and one for the right channel and there will be 44,100 of these per second of audio (if the sampleRate is 44kHz). The method I use is to compute a 100 period simple moving average (SMA) using the sum of the absolute values of the left and right channel :UInt32 level = abs(buffer[2 * j]) + abs(buffer[2 * j + 1]); be careful of numeric overflows when doing this. A simple moving average is just the average of the current value and the previous 99 values, calculated for each frame. This effectively removes any noise and gives you a smoothed level value. Don't calculate this iteratively for each frame as this will be very expensive, instead keep an array of the last 100 values and a running total and for each frame subtract the last 100th item from the total and add in the current value to the total and the array. Then just divide the total by 100. So now you have a smoothed level, you now need to decide whether the level is low enough to designate silence. The app has a 'threshold' value that it uses so if the smoothed level falls below this then it is considered silence otherwise it's not. The threshold is calculated with each frame like this :-

threshold = min_level + ((max_level - min_level) / 20); where min_level is the lowest level value seen in the file and max_level is the maximum level seen. These are the initial values :UInt32 max_level = 0; UInt32 min_level = 10000; OK, so the level has fallen into the silence range, so we count how many frames elapse before it returns to noise again and if this is greater than the minimum pause value :int minPause = (int) (((double)format.mSampleRate) * 2.25); i.e. 2.25 seconds then we have found a gap ! You can work out your offset in seconds by dividing the frame number by the sample rate e.g. if you're on frame 88200 then you're exactly 2 seconds into the file. We store the gap as the start offset + 0.25 seconds and the end offset with no adjustment. If you don't add the 0.25 seconds then you get an annoying clip on the end of Aran's speech. Just repeat until all the gaps have been found in the file. You need to be careful on implementation as you can't load all of the frames from the mp3 into memory at once as it'll be too big. So you need to read them in chunks, I read them into 200 chunks i.e. just divide the total number of frames by 200 and read this many on each fetch. This also makes it easy to update the progress bar in 0.5% increments. Good luck and I can send you a list of gaps for one of the lessons if you want to compare results when you're ready. Knowing that this algorithm works well, the main dangers are numeric overflows and having to deal with some pretty horrible low-level API's - maybe it's easier on Android.

Vous aimerez peut-être aussi