Archive for category internet

Capturing users IP addresses in Apache httpd and Tomcat logs behind an ELB

When an Elastic Load Balancer handles a connection it sends it’s own (internal/private/10.x) address instead of the clients. It sends the clients along with the request as X-Forwarded-For. To log this you need to log X-Forwarded-For instead of the source IP.

Here are 2 links discussing the problem, the first covers a basic Apache & Tomcat setup, but the second one has a point about direct access getting not logged and has an Apache httpd specific solution.

http://blog.kenweiner.com/2009/09/amazon-elb-capturing-client-ip-address.html

http://blog.grahampoulter.com/2011/10/how-to-log-client-ip-from-apache-behind.html

Thanks @grahampoulter and @kweiner

 

 

, , , , ,

No Comments

Entrepreneur Links

I’ve always thought of myself as an entrepreneur. My first business was setup at a family reunion when I was a boy, and was a candy distribution network through my cousins. It ended in all of them getting some free candy, and me being out all of my $20 birthday money. I enjoy thinking up business ideas, and planning them out, and sharing them with friends. I have not launched many businesses, and my only “real” business venture was to start my own Internet Service Provider (ISP) in 1996. I learned a lot, and ended up selling it to a larger ISP.
Today the barrier to starting a business is lower than ever. There is lots of names for the advantages that modern technology bring new startups, and ‘The Lean Startup’ is probably the most commonly referred to.

I was browsing makeuseof.com and found these articles that have interesting links for those starting up new businesses.

3 Free Tools to Plan and Visualise Your Start-Up Business
10 Awesome & Inspiring Blogs for Entrepreneurs and Business Owners
10 Best Websites That Show You How to Start a Business

, ,

No Comments

Guest Blogger Brad Zobrist – How he would implement a Bitcasa

First I think it’s good to clarify what I understand Bitcasa is trying to do and if I was their architect how I would do it.

This of course is all guessing and speculation.

They are trying to store all clients data for ever and cache all recently accessed data locally & predictively cache other data.  All while expiring from the cache old infrequently accessed data.

Features they have or will be implementing are:

  • All data is backed up to a cloud. All data is deduplicated with all other client data
  • All data is NOT accessible by any other client or them
  • Files and / or folders can be shared seamlessly with other users.
  • Local hard drive space is used as a cache by predictive algorithms

Here is how I would implement each one of these to create an overall architecture similar to what I think Bitcasa is providing.  At a high level I would follow the process of deduplicating all data at the block level across ALL clients, have each client encrypt (with client’s key) the reference to what blocks that make up their files and folders, store those encrypted references in a users file database or key->value store, compress and encrypt (with Bitcasa’s master key) new unknown blocks, then write those blocks back to the Bitcasa cloud storage.  Bitcasa would retain a master hash table of all known blocks and each clint would send a list of it’s blocks and then Bitcasa master block hash table would respond and tell the client to only send new, unknown, blocks back to the storage compressed and encrypted.

So breaking it down here is how the address each feature:

  • All data is backed up to a cloud & All data is deduplicated with all other client data.

All un-encrypted blocks (not files) are hashed and those hashes are sent back to the master hash table at Bitcasa and then you get a list of what “new” blocks need to be backed up and duplicate blocks can be thrown away.

  • All data is NOT accessible by any other client.

The next level is to take the blocks / hashes that make up a file and create an encrypted user client key hash of what blocks make up that hash. You may even need to take the block level down to a sector level to get away files that could fit in a single block or segment size.

  • Files and / or folders can be shared seamlessly with other users.

Because file / folder references are only stored with an encrypted reference to what blocks make up those files you simple need to give the new client the list of those blocks.

  • Local hard drive space is used as a cache by predictive algorithms

An additional piece of code is monitoring what files / hashes / blocks are being accessed and knows if they’re cached locally or need to be pulled remotely.  I believe the predictive part is where most of their patents are but unfortunately we won’t be able to find out about them for ~18 months, read Bitcasa gets an early start on IP acquisition.

Well, that’s what I think?

Thoughts?

, , , , , , ,

2 Comments

Your IE new tab has been hijacked (but not by me)

The short version is that you are at my site because someone hijacked your new or private tab in Internet Explorer.

Please do not blame me, my web servers have been slammed with lots of traffic from this and it will cost me money, and I did not cause it.

Here is how to fix it. Below is a detailed description of what I think happened.

You will need to use the windows registry editor to fix it, or you can download this registry file, and double click on it.

Open Regedit and go to : HKLM\Software\Microsoft\Internet Explorer\AboutURLs
On the right pane double click on the tabs value and change it to : res://ieframe.dll/tabswelcome.htm.

I use Amazon Web Services to host this blog. Part of those services is a load balancer called an Enterprise Load Balancer (ELB).

I point my site at an amazon name (www-jonzobrist-com-954435911.us-east-1.elb.amazonaws.com) and Amazon handles the IP addresses and networking. The upside is I get great scalability for very low cost. Sometimes people set their load balancer incorrectly, and point a hostname (in this case gg.blogpear.com) directly at one of the IP addresses in their load balancer pool. This is wrong because Amazon can change at any time which IP address gets assigned to which load balancer, and they do not guarantee you will ever get that IP back. Someone, who is probably a malicious hacker type, hijacked your browser tab for either new tab or private tab in your Internet Explorer browser. They pointed it at a DNS name gg.blogpear.com, and that DNS name at an IP address on Amazon’s ELB. Somehow Amazon gave me that IP for my www.jonzobrist.com pool, so I got all the traffic. This killed my web servers quickly, and took me most of the day to recover from. I did so initially by setting up rules to return a quick 403 – permission denied error to all the requests. Then as I investigated it further, I figured out (I think) what happened. So now, you get redirected to this page, and hopefully you will get your computer cleaned up and we can all move on without too much trouble.

I recommend you also get a virus scanner, something like Avast, which is free for the non pro version. Download it from Avast.

I would also recommend you download and use Google Chrome and Mozilla Firefox, as they are both more secure (and generally better) web browsers.

I hope this helps!

-Jon Zobrist <jon@jonzobrist.com> http://www.jonzobrist.com/

, , , , , , ,

No Comments

Snowed in? 15 Thinks you should be using online CCOD – 9.6.2011

There are a ton of cool things to do on the Internet. New doors are open to everyone. I’m surprised how often we take it for granted that everyone is in on the latest trend in tech. Here is my humble addition to a list of things that I think people should be using online.

 

1. Twitter – News *stream*, or should I say FLOOD. Follow smart people, get smart (filtered) news and info. Want to blow your news mind? Get tweetdeck and put in a search for any hot topic. (Don’t follow #earthquake unless you want to feel constant fear).

2. Facebook – Connect with your family and friends. Be benign on Facebook! The Internet is public, immortal, and Facebook does hate your privacy.

3. Amazon AWS/EC2 - What you don’t need a virtual server? You sure about that? Not for your blog? Not even if it scales infinitely? Not even if it’s free?

4. WordPressJoomla and Drupal are cool, but WordPress is the king of the web page CMS.

5. Gmail – Seriously, stop deleting your email, get a gmail account. Use your own domains (Google Apps is still free for < 10 users).

6. Google Docs – If you haven’t had 10 people all editing the same spreadsheet at the same time you have not Cloud’d it up.

7. Cloud Music (Google Music, Amazon Music Locker, iCloud, Soundcloud, Spotify) – This is new, try them all out, find new music, sync your own.

8. Google – Search done right. Everyone has been playing catchup for a while now, and I’m sure that one day they will, but until then, google.com

9. Snopes – The Internet means rapid access to information sharing, but many people share false information. Sites like snopes.com

10. Shopping - Deal sites like slickdeals.net, fatwallet.com, woot and more track deals as they happen, often with good comments on how to maximize them. The people on some of these sites are mad geniuses when it comes to getting the most for your buck.

11. Skype – Everyone has it, get on and video chat your friends in other countries for free. Ride this one until Microsoft torpedos it, and we all move to Google Chat, which you should be on already via your gmail account.

12. Linux – If you are even slightly technically inclined, Linux opens the door to you (for free) to everything from high end movie effects to  computer forensics. Get started with a Live CD from Ubuntu (Your computer is probably 64-bit, and you probably want the desktop version – You can boot the CD and use Linux without doing anything to your computer), and NO it does not run Office or any Windows program, but it does run thousands of cool programs.

13. Photo sites (Picasa, Flickr, Smugmug) There is no reason you should be burning a photo CD to send to your friends and family. Get an upload utility, and start putting your photos on the ‘net. You don’t have to share them, and I would highly recommend NOT sharing them publicly unless they are very public information. I do not post pictures with faces in them without permission from the person owning the face, and, in general, don’t do this.

14. Education (Khanadademy, Alison.com, MIT Open Courseware, Instructables, k12) – There are too many to name, and pretty much access to infinite information is it’s own education. Don’t think that just because a skill isn’t directly computer related that you can’t learn howto do it online, and for maybe for free.

15. Wikipedia - What is a wikipedia? Well, a wiki is a website that anyone can edit the pages of, so, Wikipedia is an encyclopedia that anyone can edit. Not always right, but rarely uninformative.

 

Well, I hope this helps. Please send me your lists or additions (comment below, or email to jon@jonzobrist.com).

 

, , , , , ,

No Comments

Thinking about hosting a WordPress site on S3

I recently moved my Joomla backed consulting website completely to Amazon S3, and have been very happy with the results. I would like to do something similar for my personal blog site at jonzobrist.com, however I would like it to be more dynamic, or at least easily update-able.

For my Joomla site, I did a complete mirror to static html and then uploaded all of that to S3, in a bucket with the same name as the site’s (www.bluesun.net), and changed DNS to point to the CNAME for that bucket’s HTTP address. This involved running wget -r -k -E -p -U Mozilla http://www.bluesun.net, editing the files wget copied to all point at the right places for things like menus, etc, and then uploading the files to Amazon S3.

My goal here is to recreate that in a more automated way, so that I can have a main site that is dynamic, but most, if not all, of the content is served from a static repository on S3. The expected outcome I think will be to take a site that costs around $15-20/month and make it cost < $1 /month. And, if I get some huge surge of traffic, to handle the load gracefully, and scale into the many terabytes of serving up data affordably.

A few quick thoughts/notes;

First, if you don’t change permission on newly uploaded items on S3 they default to your default, which is usually no public access. However, if you upload a new version of a file, it keeps the permissions the previous version had.

Second, you cannot host a naked domain (in this case http://bluesun.net) on Amazon S3. This is more a limitation of the the standards that say you shouldn’t. It means that you need something to redirect your naked domain to your web server. A lot of people don’t do this at all, but I think it’s a good thing to do. I think the details of this limitation will actually come in handy in my hybrid dynamic/static WordPress site.

Third, it makes a lot of sense to compress objects, and setting the right headers on the object will, I believe, get S3 to automatically server it up in a way a browser can understand. Most of the things that make up web pages (HTML and javascript) are text based and compress very well. On the other side images used on the web are generally already very compressed.

Fourth, having a hybrid site means you will still have some dynamic objects and this will mean manually processing (or manually setting up automated processing) html files to separate dynamic from static content.

Fifth, I’m a huge fan of things like Google Analytics, which are hosted by Google, and only included in my site as a static snippet of code that pulls more code direct from their servers. I would love to have something similar for comments and other user generated content that messes up the static website paradigm. I think technologies like AJAX can really shine here.

Brief background, my site (jonzobrist.com) is a standard WordPress install, currently running on an EC2 Micro Instance running Ubuntu 10.04 with Apache/PHP/MySQL all running on one machine. It’s an EBS backed instance, and I snapshot the root volume. I don’t really make updates more than once or twice a week, and none of my content needs to be pushed live in any kind of urgent manner. That said, I use WP to Twitter to auto tweet new posts, so I need to be able to force an update, or handle not having new content on S3 gracefully. I don’t get a particularly large number of visitors, lately about 1,000 a month. My main motivation for doing this is to see if it can be done, so I can do it for other sites I support.

Here is a graphical representation of what I think it will look like when done.

Diagram of a static WordPress site on Amazon S3

Then I just need to push all the very static content to Cloud Front for CDN!

What do you think?

, , , , , , , , ,

No Comments

Moved my consulting website to Amazon S3

It’s a Joomla site, but I rarely have updated it, so I just made a static mirror of it with wget, then uploaded it to S3!

http://www.bluesun.net

Amazing how easy it is. I want to make either a WordPress plugin, or a set of scripts so I can keep my WordPress site dynamic locally (like a stage and master copy), and then when I want to push updates have it update a static directory, and put files in S3. Ideally this would also push to Cloud Front.

Now my website is up all the time!

In addition, my web site is cheap to run, requires no server (other than a core http://bluesun.net/ redirect, which many DNS hosts will do for free)

Plus how secure is that? Static HTML files on S3? You can download them, but that’s about it, unless you’re trying to hack Amazon, and good luck with that.

I can’t wait to see what it costs for the very few visitors I get to get things straight from S3, and in the future, CloudFront.

, , , , , , , ,

No Comments

Free QR Code Generator, via Gina Tripani @ smarterware.org

Thanks to Gina Tripani at SmarterWare I found a cool, free, 2D barcode generator!

Her article about 2D barcodes is here

And the QR code generator is here

And, here is my URL QR Code, it generated!

, , , ,

No Comments

Amazon keeps doing it again and again! AWS route53

Amazon almost makes me laugh whenever I sign up my small business for one of their “Amazon Web Services” (AWS).

This time, it’s their new DNS hosting called “Amazon Route53″

Checkout the screen shot of the pricing. Seriously? $1/month plus a whopping $.50 per BILLION queries. Seriously? A Billion? Almost 1 in 10 people on the planet would have to make a single DNS request to dent my pocket book a whopping half a dollar?

 

 

 

 

Well, Amazon will  indeed have wrested begrudgingly my half dollar from my hand that first month I get my billion queries worth.

 

, , , ,

1 Comment

Dear Google/Twitter/Facebook/Amazon/Microsoft

I think it’s time for a new solution to the problem of collaborative development communication.

I use Twitter for my news source. I believe strongly that it is a more powerful news medium than anything out there. The reason for this is similar to why Google’s PageRank has been so successful. It tapped into the best source of web site judgement at the time, people’s opinions and use of other websites, as shared by their personal websites. The power of this, I believe, was that you had a human intelligence refer to another piece of information in a context that lent partial judgement of value. Twitter is doing this on a much larger scale, and in real time. Twitter is providing collaborative, human-filtered news and information, delivered in a real time stream. This is not simply tapping into the categorical type of filtering by news source, but a categorical type of filtering based on an un-defined and unique personal phenotype. So I’m a tech geek, I follow another tech geek, he/she is interested in cloud computing, but also shares their interests in things that I probably would like, and possibly have not been previously exposed to. I follow this person and get tweets of news, commentary, and highly valuable, pre-qualified connections. Sorry, I get long winded about Twitter as a powerful human filtered news platform. I will write a separate post about this.

Twitter data paths are very one directional, and in a large top down tree/root structure. Replies, comments, and retweets try to be a feedback loop, but for me the implementation isn’t enough of a conversation to consider as an upstream path. That is, even when you’re sending data/info/judgements back “upstream” it really feels like it’s just an alteration on the branch that is put back into the running stream. Twitter data paths are also very transient in that if you don’t see it almost real-time, you miss it. For news, Twitter makes up for this with a shot-gun affect that ends up being the interesting news is re-filtered and arrives in your stream enough that you will hopefully see it. Critical news, such as the tsunami are deemed valid by almost anyone, and shared repeatedly, and practically impossible to miss.

I use Skype, chat (google, AIM, MSN, Yahoo) for real time conversations with people. I have noticed recently that more and more people have picked up on using twitter type tags when talking directly to someone in a group chat. These kinds of group chats are similar to Twitter’s feed in that they are real time. They differ from Twitter in that they are generally grouped by people and/or topic. Being generally setup based on topic or team. They are one directional in that they flow forwards, and are often not simple to graft people into the full stream, that is, newcomers to the conversation only get current and ongoing information. When these groups get bigger the most common type of Twitter tag is the persons name to get their attention. People will say @JonZ in the stream. This creates a topical stream of information flow that is collaborative, easily searchable historically, but lacking as a shared store of information for more than a few people who keep their history and have been in the stream for a long enough time. Skype group chats trump google groups type topical discussions because they are more personalized and very much more real time.

I use e-mail (gmail) for formal communication, for sharing larger pieces of information, and for asynchronous communications where the time scale is hours to days. I e-mail pieces of information to people so they can keep them in their permanent archive, and don’t have to come back and ask me for them. I communication with vendors, customers, and outside sources primarily via email. E-mail is ubiquitous in the current business world, and currently for people who are less technically savvy e-mail is the extent of their tools for digital communication.

I use wiki’s, google docs, web pages, Sharepointe, svn, git, forums, blogs, and other online http type resources as repositories of information for things that are few write to many read type, or documentation, guides, code, and shared static objects such as presentations, drawings, spreadsheets, documents, etc. These are great for organizing and sharing the rote knowledge gained from collaborative efforts, but lacking for inter personal communication. Throwing up a chat history does not directly map to a repository like this, and an additional layer of filtering and data processing needs to happen. Often this is done by going over the information and simply organizing and presenting it. There is a high barrier to entry as people are more reluctant to “publish” their opinions or thoughts as they are to chat them to a colleague.

I reluctantly use the phone. I like voice calls for short, direct, communication where nuance is important, or a personal touch required to avoid problems. I think the telephone call is largely deprecated in todays business world.

Now, I’m seeing people starting to use Twitter tags in group Skype chats. The chats are topical, say “project 101″. A group of people are hammering out details of the project, or it’s implementation or the testing of it. I realize that there is all this valuable, human filtered, information in this Skype thread, that could be processed and posted to a shared resource like a wiki. Doing that would mean re-reading the chat, summarizing key points, and formatting them for presentation. I’m thinking, I wonder if this discussion could be moved to Twitter. Well, it wouldn’t really apply because everyone would have to put a topic in each tweet like #project101. It wouldn’t work since it’s hard to have a private group conversation on Twitter. The business and legal teams probably wouldn’t be too thrilled if we started sharing our internal data with a 3rd party either. So, I start to wonder about setting up our own, internal Twitter. Seems easy enough, I’ve done and seen html discussion threads. But then we’d end up with a wiki of a Twitter style Skype thread. It wouldn’t be anything more than cut and paste a conversation, so we’re missing the summary, all we’re adding is some transparency and sharing amongst the company. Also, we’re only sharing words, links and sometimes files in Skype. What I really wanted was a Twitter for business that ended up automatically as a knowledge-base-documentation wiki thing. I want to capture the vast amounts of information, account for the human intelligence that’s already been applied to get the information up there in the first place, and disseminate it to appropriate parts of the company. Wouldn’t it also be cool if it could include standard objects that could be on one of them cool new HTML 5 web pages, like video, images, drawings, links, and all kinds of connections. Existing methodologies for web page data classification and extraction could be applied to these, and the human layer of creating organically and judging/filtering the value of the content, could all be applied, to the same data stream.

Now, what would this look like? When I started to think about it, it would probably look like Google Wave, with some kind of data analysis and formatting/presenting layer applied to the streams to turn them into those cool wiki-esque web pages.

I though Wave would take off when I heard about it, I have a wave account, and have used it once for actual collaboration. I think the reason it didn’t take off is it was an over-complicated chat platform that didn’t seduce the tech savvy crowd since they were already doing things like that, and failed with the less tech savvy crowd because it came off as a confusing replacement for something they didn’t see a point in having.

If I had the time, I would setup a business centric Wave server, have it spit out analyzed information dump / wiki pages, and provide a twitter like fire hose of all discussions going on in the company.

I think that would be the fastest path to bringing internal corporate communication into a smarter web 2.0 productivity HTML 5 whiz bang. And of course it would need to be buzz word friendly.

I hope this was a clear annunciation of the vision I see.
I was, originally, going to make pictures and drawings and embed them in it, but I think at this point it’s probably more productive to simply share the idea, and let the powerful imaginations out there envision their own version.
What do you think?

Please feel free to comment here, or e-mail me
jon@jonzobrist.com

, , , , , , , , , , ,

No Comments

Easy AdSense by Unreal