Micah Beck

Min Kao Building, Room 433
1520 Middle Drive
Knoxville, TN 37996-225

Email: mbeck at utk.edu
Office: (865) 974-3548
Fax: (865) 974-5483

My vita.
Associate Professor
Dept. of Electrical Eng. and Computer Sci.
University of Tennessee, Knoxville

Does ChatGPT Have Something Up It's Sleave?

Let's apply information theory to generative AI. My hypothesis is that LLMs like ChatGPT contain within their data structures the information equivalent of a (possibly compressed) version of portions of their training sets. I'm not saying that there is a bitwise copy of any particular portion of the training set. Just that the information content of the parts of the the training set on which there is statistical agreement is present in some form.

This hypothesis implies that LLMs are answering questions about this subset of the training set in a manner that is equivalent to choosing a good answer from its training set and then rewriting it to hide its source. If course, what LLMs actually do is different - they decompose the source into tokens and probabilities and then reconstruct answers in a randomized fashion. So they never actually "look at" an element of the training set in writing their answer. The suggestion is that for this subset, the decomposition and reconstruction is equivalent to looking at the source and rewriting it.

The suggestion is that LLMs are doing sometime akin to card counting in Blackjack. The card counter looks for situations where their statistical methods enable them to play very well, and is otherwise an ordinary player. LLMs may be dazzling us when they detect situations in which they "know the right answer" because they have its information content stored, and otherwise are passable at being coherent and generally saying relevant things.

Unfortunately, we cannot analyze the stored information in ChatGPT to see if it contains the information equivalent of copies of its training set because we are not allowed to see either the stored data structure or the training set. "Move along, there's nothing to see here!"

How We Ruined The Internet

In this paper we examine an assumption that underpinned the development of the Internet architecture, namely that a loosely synchronous point-to-point datagram delivery service could adequately meet the needs of all network applications, including those which deliver content and services to a mass audience at global scale. We examine how the inability of the Networking community to provide a public and affordable mechanism to support such asynchronous point-to-multipoint applications led to the development of private overlay infrastructure, namely CDNs and Cloud networks, whose architecture stands at odds with the Open Data Networking goals of the early Internet advocates. We argue that the contradiction between those initial goals and the monopolistic commercial imperatives of hypergiant overlay infrastructure operators is an important reason for the apparent contradiction posed by the negative impact of their most profitable applications (e.g., social media) and strategies (e.g., targeted advertisement).

How We Ruined The Internet
Micah Beck, Terry Moore
arXiv:2209.03482306.01101, June 2023
Submitted to Communications of the ACM

Breaking Up A Digital Monopoly

The dominating power of today's global data monopolies - most prominently Google, Facebook, and Amazon - has alarmed people around the world. Governments have been moved to seek ways to rein in such monopolies and establish reasonable conditions for competition in the services they offer. Their business models (e.g., targeted advertising) also raise major issues of personal security and privacy, so measures that control their tendencies toward monopoly may also help to address the threats they pose to civil and political liberties. We propose a regulatory strategy that addresses the naturally monopolistic nature of these services by isolating the core acquired data collection and management functions. Acquired data is derived from the discourse of society at large and so the public retains a legitimate ownership interest in it. Our proposal requires companies to compete by innovation rather than through monopolistic control over data.

Breaking Up A Digital Monopoly
Micah Beck and Terry Moore
Communications of the ACM, June 2023, Vol. 66 No. 6, Pages 38-41.

News & Information

The Hedge Podcast Episode 150: Universal Broadband

A discussion of whether a less synchronous form of broadband connectivity be more cheaply and easily deployed to the entire world.


Recent Publications

Is Universal Broadband Service Impossible?
Micah Beck, Terry Moore
IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS 2022). Denver, CO. Oct 19-21, 2022.

Compact unary coding for bosonic states as efficient as conventional binary encoding for fermionic states
Hatem Barghathi, Caleb Usadi, Micah Beck, and Adrian Del Maestro
Physiscs Review B 105 - L121116, 29 March 2022

Deployment Scalability in Exposed Buffer Processing
Micah Beck
17th IEEE International Conference on Mobile Ad-Hoc and Smart Systems (MASS 2020)
Delhi NCR, India, December 10-13, 2020 (Virtual conference)

IEEE MASS 2020 presentation, December 2020

"On The Hourglass Model"
Micah Beck
Communications of the ACM, July 2019, Vol. 62 No. 7, Pages 48-57.

Communications of the ACM, July 2019

"Interoperable Convergence of Storage, Networking and Computation"
Micah Beck, Terry Moore, Piotr Luszczek, Anthony Danalis
Future of Information and Communication Conference, 14-15 March 2019, San Francisco.

"Data Logistics: Toolkit and Applications"
Micah Beck, Nancy French, Ezra Kissel, Terry Moore, Martin Swany
GOODTECHS 2019 - 5th EAI International Conference on Smart Objects and Technologies for Social Good, 9/25-27/2019, Valencia.

White Papers and Presentations

Exposed Buffer Architecture
Micah Beck
arXiv:2209.03488, September 2022

Universal Digital Services Through Basic Broadband
Micah Beck, Terry Moore
arXiv:2107.12269, July 2021

Cybercosm: New Foundations for a Converged Science Data Ecosystem
Mark Asch, FranMark-ois Bodin, Micah Beck, Terry Moore, Michela Taufer, Jean-Pierre Vilotte
arXiv:2105.10680, June 2021

The Programmable Network Stack
Micah Beck
White paper included in US-Japan Workshop on Programmable Networking
November 16-19, 2020

Exposed Buffer Architecture for Continuum Convergence
Micah Beck & Terry Moore
arXiv:2008.00989, Aug 2020

A discussion on LinkedIn

The Hedge Podcast Episode 27: New directions in network and computing systems
On this episode of the Hedge, Micah Beck joins us to discuss a paper he wrote recently considering a new model of compute, storage, and networking.


Clark's "Funnel" Reconsidered
Micah Beck & Terry Moore
White paper included in the report of the FABRIC Community Visioning Workshop
April 15 & 16, 2020, Chicago IL

Location, Location, Location: The Exposed Buffer Approach to Problems of Data Logistics
Micah Beck & Martin D. Swany
White paper presented at the Large Scale Networking (LSN) Workshop on Huge Data
April 13 & 14, 2020, Chicago IL

"Pervasively Distributed CyberInfrastructure for Yottascale Data Ecosystems"
Micah Beck, Terry Moore
Presented at ASPLOS 2018 Workshop on Inter-displinary Research Challenges in Computer Systems, 3/24-25/2018.

"In Case of Rapture, Can I Have Your Data?"
Micah Beck, talk presented at DLF Forum 2017, 3/23-25/2017.

Some Past Projects

The Sea Squirt

So, yes, in common parlance, the sea squirt eats its own brain such as it is. But since the sea squirt no longer needs its brain to help it swim around or to see, this this isn't a great loss to the creature. It needs to use this now superfluous body material to help develop its digestive, reproductive, and circulatory organs.


Classes Recently Taught

Fall 2023

Fall 2022

Spring 2022