This summer feels like its going to be a scorcher.
I need to restart playing Persona4 for the ps2. I was about 20hours into the game which I stopped playing in order to do something else. The game is time sink but it has great mechanics and cool story. I dusted the ps2 off but I have not connected it as yet because I have been distracted by 80s monster movies and westerns like HUMANOIDS FROM THE DEEP (1980) and Death Rides a Horse.
In April2018 I finished NewoFox a game I wrote in the C programming language on Wii hardware. Still tweaking it a little big but for the most part it is done at runs at 60fps. I wrote a postmortem on it as well.
I went to get my car aligned and apparently it needed a whole bunch of front end parts and shocks which had me doing alot of online search for parts since my car is a bit rare in these parts of the world. I am still going to need 4 tires because of the damage caused by the bad shocks.
In Jan2018 i finished the JMGov search project. The next phase of it will be to automate how it calculate that "JPS" == "Jamaica Public Service" with nothing but artificial Intelligence. This is going to be a challenge because my current database is 200mb and my computer is already at its limits.
I have a bunch of new tech articles in draft. I need to get them out before they burn up in the oven. I tend to go too deep in the metaphors, write too much or make nasty spelling errors so I am taking my time to fine tune all my articles before releasing them. Current drafts include "Technology, Tax Offices, Uber and Customer-Centricity", "Open Source, AI, Mind Control and Code Slaves" and "Ideas and bullshit circles". I am writing all these in parallel which kinda makes them take longer but it is a lot to cover and I have to actually use my brain to write them as opposed to simply clicking the retweet button. I have been slacking off on much of my writing but work/life has me a way.
I spent a weekend and did a mid year blog design refresh 2018v2 as seen in the attached photos. I am going back to a plain white background with big text almost like CNN. The text is 110% bigger than default and more roomy. Despite how simple it might seem the theme incorporates years of CSS knowledge and looks exactly the same in pretty much every browser in the history of the world.
I need to also start work on a new web analytics system for the blog to replace google analytics but I am not sure when I will jump on that grenade.
I also updated the algorithim for the news portal. Featured articles should be more varied and frequent now. You can also follow it on twitter @ twitter.com/softnewsmag because featured articles and categories are automatically pushed to the twitter feed using the dlvr.it service (free account, that is limited to 10 posts a day). It also has 500 followers on facebag but I am not sure that counts for anything. I need to get the twitter base up since its just at 60.
I think that is almost everything I have worked on in the past 6 months and will probably run me to the end of the year unless I side tracked by something more interesting.
In my last article I covered what I did to achieve the first phase of tackling the search problem. When you do not know how big a problem is it is best to start out with the low hanging fruit so you can make some positive progress really quickly. In this article I am going to go over features that I will need to implement if I ever hope to make a large scale government search engine.
Indexing, words, letters, and phrases
Right now I search the entire database whenever a user enters a search term. This is really fast - for now. I can do this because the database is fairly small (4mb, 3500+ entries) and limited to a specific set of government websites. If I want to go really large scale like Caribbean wide I would need to set up a persistent index that would allow me to search smaller parts of the dataset. You might ask why I did not do that in the first place and the answer to that is future geeks never finish anything because they over optimize too early. I purposely seek to avoid such progress traps. Indexing is a big problem.
Storing common search results
It would save a whole lot of time if I just kept a result cache every time someone did a searchhas already been done but caching is a sign of weakness. If you start caching too early you might miss potential bugs in your search logic. As the database grows bigger and more processing intensive there would need to be some form cache to reduce the amount of re-work that the search engine has to do for common searches.
Deep Cross matching and word association
Currently the "do you mean?" tool tip that appears when you get no results uses a sounddex() function to attempt to identify a mis-spelt words. I will have to figure out how to combine words and cross reference all the words that are closely associated so that I can infer meaning without hand picking them myself. This would have to be done on pure many-to-many logic basis or what people call nowadays "Artificial Intelligence" or machine learning. This would mostly be an offline process since I may have to make several trips through the database to map the relationships.
Mining User data/behavior
The common trick nowadays to is monitor what results the user clicks and using that to determine relevance. There are many downsides to this but it helps make the search engine seem smarter than it actually is. But this is only useful if you are getting millions of hits per hour.
Freshness, ranking, Dates and Updates
Currently whenever I refresh the search index I delete the whole http web page cache. I will need to have a way to identify when a page and been updated and the time or the freshness of a web page. This will bring into play a issues with websites that update more frequently and spammy blogs/websites like jis.gov.jm which are engineered to not maintain any form of history or structure. The http web cache is currently totally 152 mb on disk which is 6000 files. Each file equates to a unique url, I will have to devise a way to routinely cycle through the most frequently updated websites and gather more data without running out of harddrive space or picking up irrelevant crap.
Multi-Portal entry points
Currently I start crawling at the gov.jm portal only. This simplifies my robot and ensure that it does not go rouge off to a big unrelated website with 1 million links. But eventually I will have to create my own portal because the people that built gov.jm seem to have left it to stagnate which is the case with many such government brochure websites. New websites like www.nidsfacts.com/ and jamaicaeye.gov.jm/ cannot be found on it at all. I will either have to devise a new way of detecting changes in the gov.jm domain or go deeper into the dark web using some sort of blockchain.
Ranking important pages
I will have to develop a way to rank individual website against each other in terms of importance in relating to each other as well as particular keywords so I can speed up keyword searches by search them first. This is different from the indexing mentioned above. This purpose of this is to ensure that relevant sites do not get taken over by noisy and spammy websites.
I am probably missing a few but these are right off the top of my head. The government search problem is not feature complete but people seem to find it useful in its current state. Give it a whirl and let me know how it can be improved; owensoft.net/project/jmgov/
NB. Certain search terms/words like "jamaica" or "andrew" are still difficult to rank because they are either too common or have double meanings.
Kinda late down in the year for a redesign but the old design was over 2 years old and I figured that I would get a early start since I have both the motivation and the time (which is a rare occurrence in life).
This time around I am going for simply, clean grays inspired by xxiivv and v-os.ca. No new mastheads as yet but might add some in the future if I feel up to it.
I fixed some bugs here and there as well. Registered users can switch back to the old theme if they so desire.
The thing about software is that you never truly finish writing it and it is never in a state of "perfection" except for that moment right after you write a new line of code and no errors pop up. Right after that it starts going downhill as you discover more edge cases and faults.
I am going to go for a simpler setup now, probably fix some long time annoyances alot the way. Vue.js? Attached are screenshots of the blog as at Oct2017 using a browser plugin called "full page screen capture".
I have never been big on design or bueaty but I have grown to like whitespace over time. As far as the code goes it is very simple. I think I will skip css modules this time around. *I notice that the screenshots don't come out well might been time to increase the max resolution on my images.