Archive for category Uncategorized

Guest Blogger Brad Zobrist – How he would implement a Bitcasa

First I think it’s good to clarify what I understand Bitcasa is trying to do and if I was their architect how I would do it.

This of course is all guessing and speculation.

They are trying to store all clients data for ever and cache all recently accessed data locally & predictively cache other data.  All while expiring from the cache old infrequently accessed data.

Features they have or will be implementing are:

  • All data is backed up to a cloud. All data is deduplicated with all other client data
  • All data is NOT accessible by any other client or them
  • Files and / or folders can be shared seamlessly with other users.
  • Local hard drive space is used as a cache by predictive algorithms

Here is how I would implement each one of these to create an overall architecture similar to what I think Bitcasa is providing.  At a high level I would follow the process of deduplicating all data at the block level across ALL clients, have each client encrypt (with client’s key) the reference to what blocks that make up their files and folders, store those encrypted references in a users file database or key->value store, compress and encrypt (with Bitcasa’s master key) new unknown blocks, then write those blocks back to the Bitcasa cloud storage.  Bitcasa would retain a master hash table of all known blocks and each clint would send a list of it’s blocks and then Bitcasa master block hash table would respond and tell the client to only send new, unknown, blocks back to the storage compressed and encrypted.

So breaking it down here is how the address each feature:

  • All data is backed up to a cloud & All data is deduplicated with all other client data.

All un-encrypted blocks (not files) are hashed and those hashes are sent back to the master hash table at Bitcasa and then you get a list of what “new” blocks need to be backed up and duplicate blocks can be thrown away.

  • All data is NOT accessible by any other client.

The next level is to take the blocks / hashes that make up a file and create an encrypted user client key hash of what blocks make up that hash. You may even need to take the block level down to a sector level to get away files that could fit in a single block or segment size.

  • Files and / or folders can be shared seamlessly with other users.

Because file / folder references are only stored with an encrypted reference to what blocks make up those files you simple need to give the new client the list of those blocks.

  • Local hard drive space is used as a cache by predictive algorithms

An additional piece of code is monitoring what files / hashes / blocks are being accessed and knows if they’re cached locally or need to be pulled remotely.  I believe the predictive part is where most of their patents are but unfortunately we won’t be able to find out about them for ~18 months, read Bitcasa gets an early start on IP acquisition.

Well, that’s what I think?

Thoughts?

, , , , , , ,

2 Comments

Your IE new tab has been hijacked (but not by me)

The short version is that you are at my site because someone hijacked your new or private tab in Internet Explorer.

Please do not blame me, my web servers have been slammed with lots of traffic from this and it will cost me money, and I did not cause it.

Here is how to fix it. Below is a detailed description of what I think happened.

You will need to use the windows registry editor to fix it, or you can download this registry file, and double click on it.

Open Regedit and go to : HKLM\Software\Microsoft\Internet Explorer\AboutURLs
On the right pane double click on the tabs value and change it to : res://ieframe.dll/tabswelcome.htm.

I use Amazon Web Services to host this blog. Part of those services is a load balancer called an Enterprise Load Balancer (ELB).

I point my site at an amazon name (www-jonzobrist-com-954435911.us-east-1.elb.amazonaws.com) and Amazon handles the IP addresses and networking. The upside is I get great scalability for very low cost. Sometimes people set their load balancer incorrectly, and point a hostname (in this case gg.blogpear.com) directly at one of the IP addresses in their load balancer pool. This is wrong because Amazon can change at any time which IP address gets assigned to which load balancer, and they do not guarantee you will ever get that IP back. Someone, who is probably a malicious hacker type, hijacked your browser tab for either new tab or private tab in your Internet Explorer browser. They pointed it at a DNS name gg.blogpear.com, and that DNS name at an IP address on Amazon’s ELB. Somehow Amazon gave me that IP for my www.jonzobrist.com pool, so I got all the traffic. This killed my web servers quickly, and took me most of the day to recover from. I did so initially by setting up rules to return a quick 403 – permission denied error to all the requests. Then as I investigated it further, I figured out (I think) what happened. So now, you get redirected to this page, and hopefully you will get your computer cleaned up and we can all move on without too much trouble.

I recommend you also get a virus scanner, something like Avast, which is free for the non pro version. Download it from Avast.

I would also recommend you download and use Google Chrome and Mozilla Firefox, as they are both more secure (and generally better) web browsers.

I hope this helps!

-Jon Zobrist <jon@jonzobrist.com> http://www.jonzobrist.com/

, , , , , , ,

No Comments

Don’t forget to regsiter with Amazon to get your free Micro EC2 cloud computer for a year Nov 1 2010!

Get a free micro instance for 12 months for new customers November 1, 2010

http://aws.amazon.com/free/

Great way to try out cloud computing free.
And, the micro instance, unlike the small instance, supports 64-bit OS.
I recommend the Ubuntu official 10.04 LTS AMI’s, you can find the latest ami-id’s here, use them when launching your micro instance.

http://uec-images.ubuntu.com/releases/10.04/release/

A few notes;
Micro instances seem to be on the low priority for cpu, and seem to hang at times.
You will still pay for services you use over the free amount.
CPU usage to Amazon is if the virtual instance is running, not how much of the cpu cycles you actually use.
I believe you could have several virtual computers and run them 1 at a time as micro instances and just pay for your extra EBS storage.
The API is powerful and rocks, download Amazon’s, but it requires java. I also use this perl/curl interface to it http://timkay.com/aws/

, , , , , , , ,

No Comments

LCOD – 5.26.10 – Compare 2 directories

md5sum * | md5sum

This will return an md5sum which will look something like
9277826461d2cb19731f6201c6b2c6b3 -

Run it in 2 directories, if the sums of the sums match, the files are identical.
If not, you may want to rsync between them with something like
rsync-avz -e ssh localdir/ user@remotehost:/remotedir/
or
rsync-avz -e ssh user@remotehost:/remotedir/ localdir/

, , , , ,

No Comments

LCOD – 4.8.05 – Getting rid of Ads/spyware with Squid proxy

Ok, this one is kind of a 3 step process, and I’m not going to put excessive details here about anything, I’ll link to sites that have already done that for us all!

The basic idea is to setup a squid proxy server, and have it use the hosts file that blocks lots of advertising, tracking and malware type sites. Also, I’ll post a quick howto setup a thttpd webserver with a simple page so your redirected ad sites show blank images instead of ugly 404 errors or squid cache errors.

First, setup squid on your Linux box or router. You can “emerge squid, apt-get install squid, yum install squid”, or just follow the instructions and manually install it. Now squid can be on your desktop linux box and you manually set other users on your network to connect to the web using your proxy, or squid can be set up on your linux/freebsd firewall and force users to use it, or to use iptables to do it transparently.

A good quick howto on confiruring squid is here

http://www.linuxhomenetworking.com/linux-adv/squid.htm

Note that in squid you MUST setup an acl (access control list) or you will get permission denied. You do NOT have to force squid users to put in a user/password but you can if you like, all of this is covered in that quick howto.

Next, go grab the hosts file from here

http://www.mvps.org/winhelp2002/

They update this pretty frequently, so this is something you may want to do monthly to keep the latest losers off of your network. Briefly what this is going to do is redirect all traffic to known bad people to your local box. What we’ll do is install this on your squid box, and then squid will look for a local webserver, which we’ll put up default pages & errors that make ads that get redirected look like blank spots instead of ugly errors.
So, grab the zip file, unzip it, and you’ll have a file called HOSTS, now you can use your favorite text editor (vi?) and copy everything in it to your /etc/hosts file, or you can just type cat HOSTS >> /etc/hosts
Make SURE you do 2 greater than signs (>>) and NOT 1 greater than sign (>) since that would overwrite your hosts file. Now, sometimes this gets the Windows style new lines into your /etc/hosts file, and sometimes it doesn’t, anyone know why? please post a reply with an explanation.

So now you have squid setup, and thanks to the /etc/hosts file on the squid server having lots of bad people redirected to 127.0.0.1, you’re seeing lots of http 404 errors or squid cache errors. If you’re seeing 404 errors you have a webserver running on the squid box, if you don’t want to be running a webserver and didn’t mean to install it you can probably go /etc/init.d/apache(2)? stop or /etc/init.d/httpd stop and then go find your runlevel startup dir (/etc/rc5.d or /etc/rc3.d, or on gentoo rc-update del apache2 default) and remove the link to your webserver, if you didn’t want to be running one.

Now, if you have apache/your own webserver and you don’t want to disable it you could go figure out how to setup custom 404 error pages and put up the page I’ll make below and be done. Or, if you don’t mind seeing the errors instead of ads you could be done. But I like to keep the pages I’m surfing looking good, so I run a small web server with some simple html files that are blank, here’s what you do..

First install thttpd

http://www.acme.com/software/thttpd/

or (emerge thttpd, yum install thttpd, apt-get install thttpd)
find the default directory for thttpd to serve files, grab this file and save it as index.html in that directory

http://www.submarinefund.com/lcod/files/blank.index.html

also grab this 1×1 pixel transparent gif image and put it in the same dir

http://www.submarinefund.com/lcod/files/1.gif

Now in the same directory, mkdir errors, cp index.html errors/err404.html, and cp 1.gif errors/

Fire up thttpd (/etc/init.d/thttpd start) and all your ad supported webpages should have nice blank spots all over them

You can also feel good that you’re not getting nailed with tracking cookies from big brother monitoring companies

, , , ,

No Comments

LCOD – 10.23.04 – DVD rippers / DivX/XviD re-encoders

I’ve always used dvdrip to rip DVD’s to AVI
http://www.exit1.org/dvdrip/

I stream them to my ps2 using BroadQ (now gone!)

http://www.broadq.com/

It works with 1 minor, annoying, problem. BroadQ doesn’t support lots of resolutions that are in use out there, so while I can watch my DVD archive, I can’t watch things like sample videos I download off the net and stuff… so I’m installing ripmake, which looks like an awesome tool to handle the job of re-encoding files easily and painlessly…

http://www.lallafa.de/bp/ripmake.html

, , ,

No Comments

Easy AdSense by Unreal