WEBVTT
00:00:09.230 --> 00:00:16.730
This is a podcast about One Health the idea that the health of humans, animals, plants and the environment that we all share are intrinsically linked.
00:00:17.201 --> 00:00:21.091
Coming to you from the University of Texas Medical Branch and the Galveston National Laboratory.
00:00:21.440 --> 00:00:22.923
This is Infectious Science.
00:00:22.923 --> 00:00:24.768
Where enthusiasm for science?
00:00:25.370 --> 00:00:26.032
is contagious.
00:00:30.580 --> 00:00:33.668
Everybody, welcome to this episode of Infectious Science.
00:00:33.668 --> 00:00:36.485
We are excited to be back for this week's episode.
00:00:36.485 --> 00:00:38.915
So we're going to be talking about infodemiology.
00:00:38.915 --> 00:00:43.875
So we are joined by two guests today Dr Heather Duncan and Dr Patrick Murphy.
00:00:43.875 --> 00:00:48.345
Heather, would you introduce yourself and tell us how you got into epidemiology?
00:00:49.006 --> 00:00:49.387
Sure.
00:00:49.387 --> 00:01:02.308
So I originally started my career as a literary studies scholar and decided a couple of years ago that I was really interested in public health and wanted to be an epidemiologist.
00:01:02.308 --> 00:01:14.564
So I went back to school and got my master's in public health and I am now finishing up my first year of my PhD in this field, and I initially had no idea that infodemiology existed.
00:01:14.564 --> 00:01:29.144
So I came to the field because of my former interest in studying digital media as a humanities scholar, and then, when I realized that was also happening in epidemiology, I became pretty interested in it.
00:01:29.144 --> 00:01:33.174
So the work that I'm doing now is more focused on infodemiology.
00:01:33.960 --> 00:01:34.703
Very cool, Dr Murphy.
00:01:34.703 --> 00:01:36.168
How did you come to infodemiology?
00:01:37.060 --> 00:01:38.403
Surprisingly through Heather.
00:01:38.403 --> 00:01:44.022
I had never heard of it until she brought it up and I think it's fascinating.
00:01:44.022 --> 00:01:49.346
It has just so many possibilities and we're all about linking things together.
00:01:49.367 --> 00:02:19.135
This is a great way to do that, yeah, and Patrick and I co-own a science and health communications and medical writing business, so we also came at it from that angle as well, and one of the things that we are interested in working on together is developing using the field of infodemiology that can be hopefully one day in the future utilized by smaller health departments to do but basically to do more with surveillance than they're currently capable of doing.
00:02:19.135 --> 00:02:20.097
Gotcha.
00:02:20.236 --> 00:02:21.121
Very cool, I'd say.
00:02:21.121 --> 00:02:29.134
When I first heard of epidemiology, I immediately thought of epidemiology and kind of the study of how diseases are moving through populations and case numbers, things like that.
00:02:29.134 --> 00:02:39.375
So I know, when I originally was talking to you about the episode, I was like oh, is infodemiology just basically how misinformation can spread in the virtual world, just the pathogens do in reality.
00:02:39.375 --> 00:02:58.143
But it's actually broader than that and the definition that you sent back to me was that and just I want to just make it clear to our listeners that way, as we go forward, when we're talking about infodemiology, we're talking about the science of distribution of determinants of information, particularly via the internet, in a population with the ultimate aim to inform public health and public policy.
00:02:58.143 --> 00:03:00.626
So can you get into?
00:03:00.626 --> 00:03:02.228
Because there's different terms within it.
00:03:02.228 --> 00:03:03.028
Can you get into?
00:03:03.028 --> 00:03:06.133
So that's like what infodemiology is, but can you get into, because there's different terms within it?
00:03:06.133 --> 00:03:06.593
Can you get into?
00:03:06.593 --> 00:03:08.475
So that's like what infodemiology?
00:03:08.496 --> 00:03:08.637
is.
00:03:08.637 --> 00:03:10.299
But can you get into what is infovalence?
00:03:10.299 --> 00:03:11.180
Or like, what's an infodemic?
00:03:11.180 --> 00:03:13.822
Yeah, that definition that you just gave.
00:03:13.822 --> 00:03:34.163
I just want to give credit because he's known as, like the father of this field, I guess, if you want to use parental terms, but Gunther Eisenbach is the one who came up with that definition in the early 2000s and it's essentially the exact classical definition of epidemiology, which is the distribution and determinants of disease with an eye towards fixing them, basically.
00:03:34.705 --> 00:03:48.502
But we're specifically interested in digital spaces and you're right, there's a huge breadth, I would say basically everything that epidemiology does as well, and actually even a little bit more.
00:03:48.502 --> 00:03:53.152
I think it's just that we are interested instead of human beings.
00:03:53.152 --> 00:04:11.789
Our unit of study is digital content, basically, and that could be anything from a tweet to a Google search or a unit of information essentially, and within infodemiology, you mentioned these two terms info, which is a play on the word surveillance.
00:04:11.789 --> 00:04:17.615
So I'll start with infovalence, because this is where the field started out.
00:04:17.615 --> 00:04:40.081
Well, to give you a little, to tell this more as a story, basically, the earliest infodemiology studies were people who were in the mid to late 90s were starting to notice that websites were beginning to pop up that were providing health information, and so some of the first questions people had were like is this good quality information?
00:04:40.081 --> 00:04:43.425
How do we decide whether a website is trustworthy?
00:04:43.425 --> 00:04:49.495
And nowadays, that's such a, I guess, a simplistic question compared to the way our digital landscape looks.
00:04:49.495 --> 00:04:53.841
But that's all it was initially so.
00:04:54.002 --> 00:05:01.324
Gunther Eisenbach I don't know exactly how he entered the field, but he became interested in these very early studies.
00:05:01.324 --> 00:05:04.192
Except he wanted to ask a different question.
00:05:04.192 --> 00:05:08.990
He wasn't just interested in evaluating the quality of health websites.
00:05:08.990 --> 00:05:14.024
He wanted to know can we somehow use the internet to do disease surveillance?
00:05:14.024 --> 00:05:26.401
Because there's a lot of different things that epidemiologists do, but surveillance is one of the most crucial because it's how we know that a disease outbreak is occurring, and especially if it's something infectious.
00:05:26.401 --> 00:05:30.853
Obviously that's one of the core functions of our public health departments.
00:05:30.853 --> 00:05:40.983
So he wanted to know can we use specifically Google searches to find out if there is an outbreak occurring?
00:05:40.983 --> 00:05:45.658
And he and his colleagues had a chance to really try this out.
00:05:45.879 --> 00:05:51.850
I don't know if you guys remember the swine flu outbreak of 2009, 2010.
00:05:51.850 --> 00:06:10.168
So that was the first time that he really got a chance to apply this idea that maybe we could look at people's search behavior and use that to figure out, faster than our sentinel surveillance networks, whether a flu outbreak is taking hold in a specific community.
00:06:10.168 --> 00:06:19.797
So that's where the idea of infovalence was born, and those early studies by Eisenbach and his colleagues were pretty successful.
00:06:19.797 --> 00:06:30.973
They validated their data against the CDC's data and they found that it very closely matched the predictions that the CDC was making with their data.
00:06:30.973 --> 00:06:38.112
Except it was much more timely Because when we're talking about the internet this stuff doesn't take particularly long to process.
00:06:38.112 --> 00:06:46.473
It doesn't have to go through all of the bureaucracy and pass from person to person the way that traditional surveillance does.
00:06:46.473 --> 00:06:55.809
So that was a huge advantage, and at this time also this is shortly after 9-11, that was also when, like the anthrax attacks took place.
00:06:55.809 --> 00:07:02.649
So the Department of Defense was also really interested in this technology and in surveillance specifically.
00:07:03.069 --> 00:07:03.851
Out of curiosity.
00:07:03.851 --> 00:07:15.466
So I guess people are really cognizant now, probably more than we were back in like 2009 or 2010, about like how your information is protected and like what's private and what's not, and like.
00:07:15.466 --> 00:07:21.911
So it's interesting to me that is just widely like available information that you could just conglomerate all of these Google searches.
00:07:21.911 --> 00:07:31.142
And when you're doing that, are people Googling like symptoms of flu or like symptoms of COVID, or is it like looking up news articles that are related to it in the local community?
00:07:31.142 --> 00:07:35.382
What were their kind of search features, I guess, to determine yes, there's an outbreak.
00:07:35.862 --> 00:07:43.863
Regarding news, that's almost like a separate area and I don't know that it would necessarily fall so much under infodemiology.
00:07:43.863 --> 00:08:00.514
But there are communication scholars who monitor media and they use their own algorithms to pull headlines and to analyze data, and infodemiologists do incorporate some of that data, but there's like a whole other branch of study that focuses on mainstream media.
00:08:00.514 --> 00:08:03.225
Yeah, patrick has some things to say.
00:08:03.225 --> 00:08:09.492
I think about more like modern day concerns about privacy and particularly like where AI is concerned as well.
00:08:09.492 --> 00:08:39.245
But at this time, the early study, the first study that Eisenbach did, trying to do this flu infovalence thing, was actually really genius, because nowadays it's very easy I shouldn't say that it is and it isn't easy to get this data, because there are challenges that I can get into a little bit later, but at the time there was no Google Trends or there really weren't tools for doing data scraping and then putting it into a usable data set format that you could analyze.
00:08:39.245 --> 00:08:47.605
So what Eisenbach did which I think this is just genius Eisenbach did, which was I think that this is just genius.
00:08:47.605 --> 00:08:49.090
I really admire the way that this study was designed.
00:08:49.090 --> 00:08:50.615
So what he did was he had a pretty small budget.
00:08:50.615 --> 00:09:07.504
I want to say it was somewhere around $500 or something, and he bought a Google ad and when you buy an ad a digital ad, and I'm not sure what it's called now, but I think back then it was called Google AdSense you could target specific geographical regions.
00:09:07.504 --> 00:09:19.082
There were some rudimentary things you could do to try and to get to your audience, and so he set those all very general and then he used as his metric the number of clicks.
00:09:19.082 --> 00:09:32.573
So it was an advertisement that said something about flu resources or what to do if you have the flu, and I think that when people clicked on it it just took them to like a WebMD type site or something like that.
00:09:32.573 --> 00:09:36.571
So it was harmless, basically, if people clicked on this ad.
00:09:36.571 --> 00:09:52.682
But he used the number of clicks in different areas and then was able to again validate that data against the CDC and found that actually, yeah, like the places where people were clicking were the places where flu was known to be in high circulation.
00:09:53.144 --> 00:10:03.091
Today it's a little bit, as I said, it's like easier to do things but also more challenging in some ways as far as privacy concerns.
00:10:03.091 --> 00:10:15.187
A lot of infodemiology nowadays uses social media, because this is unfortunately or fortunately, I guess, depending on your perspective, the number one source of health information for Americans.
00:10:15.187 --> 00:10:28.729
And with social media, unless you have specific privacy settings on your account so that only people that you've approved can see your messages, that stuff's public like it's out there.
00:10:28.729 --> 00:10:48.491
There's nothing to stop anyone from collecting that data at any time, which that might make some people uncomfortable, to which I would say check your privacy settings, but for the most part, there aren't really a ton of privacy concerns, particularly with like infovalence, because anybody can theoretically go and look at that stuff.
00:10:48.879 --> 00:11:05.293
The thing that's challenging, though, is that you need a tool to collect that data, because you can't just have a person sit down and scroll through a Twitter feed and pull whatever tweets they come across that you think relate to the flu or COVID or whatever.
00:11:05.293 --> 00:11:07.745
I mean you could, but it wouldn't be very effective.
00:11:07.745 --> 00:11:10.091
You need an algorithm to go through.
00:11:10.091 --> 00:11:11.462
It's called data scraping.
00:11:11.462 --> 00:11:36.366
You need something to scrape that data from whatever part of the internet you're interested in, but the problem is that tools for doing that tend to come and go very quickly, and that makes it difficult to replicate or reproduce these studies, because if you can't use the exact same tool for data scraping that another scholar used when they published their study.
00:11:36.366 --> 00:11:52.293
You're already introducing new elements, and a lot of this stuff is very black boxed because it's proprietary and companies don't want people to know how their algorithms work, so that can make it very challenging for the infodemiologist.
00:11:52.919 --> 00:12:03.205
I think that's all really cool and not something that I knew about, and I think it's particularly interesting that certainly the apps that we use for social media have changed greatly and are people Googling things anymore.
00:12:03.205 --> 00:12:09.542
Are they just searching them on TikTok or something, and so I think that the way that you collect that information probably has to change with that.
00:12:09.542 --> 00:12:18.864
So I could see that would also introduce different apps, or you're going to have different information available on them, based on what people are doing or what they're interested in using it.
00:12:18.864 --> 00:12:20.990
It is pretty scary to me to think about that.
00:12:20.990 --> 00:12:25.145
Socially, it's like the number one source of health information for Americans.
00:12:25.145 --> 00:12:29.193
That's really, if you just take a second to think about it, it's really wild.
00:12:29.193 --> 00:12:34.683
And so then you're talking about, like, how this data is collected and how it's scraped, and so that this is just public information.
00:12:34.683 --> 00:12:39.163
It's out there, people can use it, but what then are infodemiologists looking for?
00:12:39.163 --> 00:12:43.691
What are they interested in to scrape from these that to then draw conclusions from?
00:12:44.211 --> 00:12:44.532
Yeah.
00:12:44.532 --> 00:12:59.922
So, like I said before, infodemiology as a field is pretty much as wide as epidemiology, although I will say there are certain areas that are overrepresented, not in a bad way, just that that's where a lot of these techniques have been developed.
00:12:59.922 --> 00:13:17.673
So I would say, as far as what infodemiologists are doing, they're doing everything from trying to predict disease outbreaks, like we've discussed before, but we're also interested in things like how are people reacting to health guidance, right?
00:13:17.673 --> 00:13:19.743
Are they having positive reactions?
00:13:19.743 --> 00:13:21.365
Are they having negative reactions?
00:13:21.365 --> 00:13:22.485
Are they confused?
00:13:22.485 --> 00:13:33.245
Trying to study if and this actually, I think does get into a slightly more controversial area of this field but trying to screen people on social media for things like suicide risk.
00:13:33.687 --> 00:13:45.929
And then also there's a whole area now that is interested in I use the term mis-disinformation because it's just faster to say, I think Looking at, like, how does information flow through social media?
00:13:45.929 --> 00:13:49.928
Who is primarily responsible for creating disinformation?
00:13:49.928 --> 00:13:52.576
Who's magnifying it, right?
00:13:52.576 --> 00:13:58.715
So again, we're interested in human behavior and human reactions to this stuff.
00:13:58.715 --> 00:14:10.034
And we're also interested in can we take people's behavior and what they're saying about themselves and what they're searching for and predict what sort of diseases or risks they might be at and that sort of thing.
00:14:10.535 --> 00:14:16.413
So I've looked at some systematic reviews that have tried to divide up the field to look at where who's doing what research.
00:14:16.413 --> 00:14:18.354
What's the bulk of it focused on.
00:14:18.354 --> 00:14:37.115
A lot of it still is focused on flu, which is kind of where it started, and, of course, during COVID that expanded to cover COVID as well, interested in studying health communications and trying to use data from social media to determine what's most effective.
00:14:37.115 --> 00:14:45.078
That's, I think, a very small area, but one that is definitely growing and that has received more attention because of the COVID pandemic.
00:14:45.078 --> 00:14:55.895
So, yeah, there's really all kinds of neat things, and there's definitely a huge overlap in infodemiology with social science, because a lot of these things are concerns of social scientists as well.
00:14:56.456 --> 00:15:07.817
So can we talk a bit about accuracy for any of these for predicting flu or for predicting mental health states, because I could see that certainly what a wonderful tool if you can improve health with this.
00:15:07.817 --> 00:15:19.913
But I could also see this sort of having more of a negative side if we don't get it right and so like, then there's complications if you don't get it right, and that's true for any science, right?
00:15:19.913 --> 00:15:19.993
Not?
00:15:20.013 --> 00:15:20.192
just this.
00:15:20.192 --> 00:15:21.436
So could you talk about what that looks like?
00:15:21.436 --> 00:15:43.952
Yeah, so as far as accuracy goes, I mean, there's a big cautionary tale here as well regarding Google flu trends, which that was sort of an example of where things fell apart and didn't work the way that they were supposed to, but in terms of accuracy it's pretty good scary good, to be honest.
00:15:43.952 --> 00:16:01.981
So this field has really been in existence for depending on when you want to say it started anywhere from like almost 30 to 25 years, so it's still relatively new, but we have enough of a body of literature at this point to be able to say that, yeah, we can actually get especially with things like respiratory diseases.
00:16:01.981 --> 00:16:07.066
We can get it pretty close to what our traditional surveillance methods are telling us.
00:16:07.066 --> 00:16:14.144
Those have their problems too, but this stuff has been validated again and again and it's pretty decent.
00:16:14.446 --> 00:16:22.182
I think that where people start to get uncomfortable and the mental health stuff is a big area where this starts to.
00:16:22.182 --> 00:16:31.573
For me and a lot of people who work in the area of, like psychiatric epidemiology is that it's not so much about being accurate as it is.
00:16:31.573 --> 00:16:37.303
What do we do once we've identified someone who's say a suicide risk?
00:16:37.303 --> 00:16:50.143
Because it's one thing to be able to say okay, this syntax or these terms appearing in a social media post, or maybe even the frequency of posting or who they're engaging with online.
00:16:50.143 --> 00:17:00.077
We know that these might be markers for suicidal ideation, but once you've done that, are there then resources to connect that person to?
00:17:00.118 --> 00:17:01.662
Can we do something about it?
00:17:01.662 --> 00:17:06.681
Because if we can't do something about it, it's almost like an ethical breach.
00:17:06.681 --> 00:17:08.231
What do you do with that information?
00:17:08.231 --> 00:17:15.074
Is it actually helpful, or is this purely an academic exercise and then we can't actually do anything to make the problem better?
00:17:15.074 --> 00:17:32.500
So that's where I think the big questions still are, and, of course, we're now in this age where AI is being integrated with everything and AI is making infodemiology even more accurate, but there are also some things that are a little bit uncomfortable about that as well.
00:17:32.990 --> 00:17:49.698
So I'm definitely curious to learn more about how AI is changing infodemiology as far as accuracy or just like the sheer amount of information that can be scraped, because that's also something that it's a whole other line of people utilizing something and then us being able to look at user-generated data.
00:17:49.698 --> 00:17:54.618
But I'm curious with this idea of connection to resources, particularly what you're saying with mental health.
00:17:54.618 --> 00:18:05.355
So I worked as a peer counselor at Cornell University where I did my undergrad, because there was just like huge lack of access to care, and so a student organization formed and actually trained for two years.
00:18:05.355 --> 00:18:14.217
Then you were certified as a peer counselor because undergraduate, there's certainly a large proportion of change going on and things like that, and so being able to address peers' mental health concerns was really important.
00:18:14.618 --> 00:18:24.377
But I could definitely see where so you're analyzing this you're saying like this syntax or this particular pattern of engagement might be more indicative of something like suicidal ideation.
00:18:24.377 --> 00:18:30.656
I could see there being this disconnect of struggling to connect someone to resources and then being like how did you find this out?
00:18:30.656 --> 00:18:44.511
Your data was great, and I could see someone feeling very violated by that, and so that's such a quandary that I hadn't considered that like you might be able to say you're at risk for this or that, whether it's flu or whether it's suicide, but then how do you connect people to resources?
00:18:44.511 --> 00:18:48.582
And I think that's always this like perennial, like wicked problem in public health.
00:18:48.582 --> 00:18:53.557
You can know something but like, how do you help to actually be part of the solution and resolve?
00:18:53.637 --> 00:18:53.698
it.
00:18:53.857 --> 00:18:55.101
Or is it still just now?
00:18:55.101 --> 00:18:57.384
This is coming up in the field which is relatively new.
00:18:57.384 --> 00:19:00.713
Being 25, 30 years old, is there now this kind of now?
00:19:00.713 --> 00:19:03.320
We know this, but how do we actually make the connection?
00:19:03.662 --> 00:19:11.144
Yeah, and as far as I know, I don't think anyone really has a perfect answer for that.
00:19:11.144 --> 00:19:30.884
To get really dystopian, I know that there have been attempts at having rather than having a human person reach out to someone who seems to be at risk to have an AI chatbot reach out to them, and I know that there have been studies looking at whether therapy can be done.
00:19:30.884 --> 00:19:41.155
I think there's a lot of excitement right now about artificial intelligence and about big data, and I think the excitement is maybe a little bit premature in a lot of ways.
00:19:41.155 --> 00:20:20.500
I think that there's a lot of things that we can do my area of expertise, although I do work with some people that are in that area but I think that the infodemiology studies that have been done with, specifically, mental health I believe a majority of those have been done with people who knew they were in the study, so they basically signed up to have their content monitored by a researcher.
00:20:20.780 --> 00:20:21.682
That makes me feel better.
00:20:21.769 --> 00:20:23.513
Yeah, and so I don't know.
00:20:23.513 --> 00:20:31.202
I mean, there probably are studies out there that are just looking at whatever people put on Facebook or Twitter or Instagram or whatever.
00:20:31.202 --> 00:20:35.256
But yeah, and I think also this is slightly off topic.
00:20:35.256 --> 00:20:49.050
But another thing that gets more complicated too is when a platform like TikTok and YouTube are also big sources of health information and places where people are connecting with each other to talk about things like mental health.
00:20:49.050 --> 00:20:53.843
But there's more nuance because you've got a video element in addition to a text element.
00:20:53.843 --> 00:21:05.603
And that's again where I think AI comes in, because I think we're going to begin relying on AI more and more to try and interpret things like visual cues and body language and things like that.
00:21:05.769 --> 00:21:12.218
Because there's such a rich data environment on those platforms, a lot of things can get lost.
00:21:12.218 --> 00:21:25.278
Like sarcasm is notoriously difficult for artificial intelligence to process and understand, and the more times that you scrape data, because then it has to be stored somewhere people are going to be accessing it.
00:21:25.278 --> 00:21:31.701
Every time people access stuff, there's a risk of someone that you don't want accessing that stuff getting to it.
00:21:31.701 --> 00:21:44.953
So I think that we are likely to see some sort of like major event where people's health data unfortunately gets breached and leaked and like the consequences of that could be pretty far reaching.
00:21:44.953 --> 00:21:50.934
So it's something that I'm sure we will be for sure, using that technology to do those things.
00:21:50.934 --> 00:21:55.298
I think it'll be interesting to see the directions that this stuff goes in the next like decade or so.
00:21:55.298 --> 00:21:55.276
I've been in some.
00:21:55.276 --> 00:21:56.789
Interesting to see the directions that this stuff goes in the next like decade or so.
00:21:57.211 --> 00:22:21.366
I've been in some classes where that's been discussed as like a bioethics conundrum of, yeah, people are at risk for us, you can treat them and prescreen them, but also maybe their insurance no longer wants to cover them because they're at risk, right, I think sometimes actually a lot of times technology moves faster than legislation and if you don't have laws around how something is being used, it's essentially very unregulated and that's certainly probably riskier in the long run for us than having some type of regulation around it.
00:22:21.366 --> 00:22:27.362
But there's also that aspect of once you start regulating things, there's probably less growth on like how far you can go and do things.
00:22:27.362 --> 00:22:39.180
I'm kind of curious to touch more on so with what you're talking about, infotemiology, when I think of something like you're gathering user generated data and even if you're using something like AI, the sets it's trained on.
00:22:39.180 --> 00:22:44.326
The data sets it's trained on really are what's super important for what it's going to catch for nuance.
00:22:45.111 --> 00:22:50.063
Is that being addressed in epidemiology, like the cultural relevance of someone I grew up in New York?
00:22:50.063 --> 00:23:12.239
Someone in New York probably is searching on what happens if I have the flu and clicking on an ad, but there's places where that's not necessarily happening right, where, like the aspect of what you're looking for the content you are generating online or the searches you're generating is different depending on where you're from and how you have been brought up using it, or I can even think of a generational difference on how technology is used.
00:23:12.239 --> 00:23:15.657
So is that something that also is being factored into epidemiology?
00:23:15.657 --> 00:23:20.843
Like this, like almost like cultural relevance of like how good is the data we're getting?
00:23:21.450 --> 00:23:25.980
Yeah, it's tricky because it sort of depends on your research question, right?
00:23:25.980 --> 00:23:30.809
I mean, I think maybe a good example of this would be like the way that people express their symptoms.
00:23:30.809 --> 00:23:36.583
That might vary according to, like, culture and age group and maybe education level.
00:23:36.583 --> 00:23:44.442
As far as I know, that is where I think the human side of this has not been replaced, right?
00:23:44.442 --> 00:23:48.817
Yes, ais can be trained on multiple different data sets.
00:23:48.817 --> 00:23:57.492
They can learn over time, but as far as designing studies, it's still human beings that are sitting down for the most part and going.
00:23:57.633 --> 00:24:01.306
Okay, what are the ways that people talk about being sick?
00:24:01.306 --> 00:24:08.541
Are young people using new slang words to talk about, like sneezing or coughing, or you know?
00:24:08.541 --> 00:24:41.932
And I think that there's also a very strong awareness among people in this field that if you are not someone who is, like, chronically online I forget the exact term that people use for that but if you're not someone who's really submerged in digital spaces, certain areas of this field are going to be very challenging for you to break into and to do really good work in, because you really do have to know what's happening in online spaces, and I think a lot of us too are probably like what you would call lurkers.
00:24:41.932 --> 00:24:47.323
Like we're people that sort of sit in the background and watch what other people do.
00:24:47.569 --> 00:24:48.875
The people watchers of the internet.
00:24:49.450 --> 00:24:52.436
Yeah, exactly, and I know that's my own position.
00:24:52.436 --> 00:24:59.065
I have a lot of social media accounts and I spend a lot of time on social platforms, but not necessarily engaging.
00:24:59.065 --> 00:25:04.419
I really am just watching and reading and listening, trying to get some of my own insights.
00:25:04.419 --> 00:25:06.871
But, yeah, and I think also Camille.
00:25:06.911 --> 00:25:26.430
Another thing that question raises for me is the issue of whether a sample taken online is ever truly representative of the population, and initially, especially in the early days, that was a huge concern because not everyone had access to the internet not in the United States, certainly not in the world.
00:25:26.951 --> 00:25:44.400
But I think that with time passing and with the internet becoming this obligatory part of life that you need to be online in some capacity, that concern has decreased a bit because even in parts of the world with less internet coverage, people are finding ways to get online.
00:25:44.400 --> 00:25:46.630
A lot of developing nations are trying to get online.
00:25:46.630 --> 00:25:57.163
People will primarily access through their phones or through a mobile device of some sort, so I think that aspect of it is becoming less of a concern with time.
00:25:57.163 --> 00:26:14.996
But also, as with most research, english language content is vastly overrepresented in this field as well, although again, that's also changing, because one thing that AI is really good at is learning different languages and being able to process content in different languages.
00:26:14.996 --> 00:26:32.186
So there are increasingly more studies that are not only using English language content, but also using other languages as well, and so I think that we are improving our ability to capture a true signal in that regard, but it's certainly still a concern signal in that regard, but it's certainly still.
00:26:32.186 --> 00:26:33.270
It's still a concern, I would say.
00:26:33.792 --> 00:26:47.962
And so, speaking of kind of the current context and how that has changed and started to shape infodemiology, something that we think about a lot in science and health spaces is this sort of myths and disinformation really campaigns that are going on, right?
00:26:47.962 --> 00:26:50.784
Some of this is intentional, and so what can infodemiology do to teach us about what's going on, right?
00:26:50.784 --> 00:26:56.259
Some of this is intentional, and so what can infotemiology do to teach us about what's going on, but also to help intervene?
00:26:56.259 --> 00:27:09.298
Right, because this is really impacting health outcomes for a lot of people, and your access to information and whether or not that information is valid or perceived as trustworthy to you matters a lot when it comes to your health.
00:27:09.319 --> 00:27:20.336
Yeah, this is one of my big interests as well is what can we do as people or as public health professionals, as epidemiologists, to counteract this?
00:27:20.336 --> 00:27:26.513
And I think one thing is that we really need to understand what we're dealing with better than we currently do.
00:27:26.513 --> 00:27:39.997
There's a study that came out during the COVID-19 pandemic that was really interesting and it was produced by Center for Countering Digital Hate, that's who produced it.
00:27:39.997 --> 00:27:50.730
But it was this study where they basically tracked down who was responsible for producing a majority of the disinformation that was going around.
00:27:50.730 --> 00:28:04.036
I think it was Twitter specific not 100% sure, but they found that there were essentially 12 people behind 65% of the disinformation that was circulating on the internet.
00:28:04.036 --> 00:28:06.000
Wow, yeah.
00:28:06.361 --> 00:28:40.461
And also to go back to the flu infovalence stuff, when Eisenbach and I think the paper that I'm thinking of is by Eisenbach and Chu, but they did some analysis of misdisinformation then as well, because that was like where that started and they found that, although there was a perception among internet users that there was like this vast amount of misinformation going around, when they actually analyzed content, only like 5% was flagged as misdisinformation.
00:28:40.962 --> 00:29:08.433
So that's not to say that the problem today isn't much bigger, because I think it is, and I think that, whereas back during the swine flu H1N1 pandemic, I don't think there was quite the same degree of intent, right, I think nowadays there are bad actors, so to speak, that are very much intentionally cranking out content that is false and that is misleading and that serves to divide people and cause people to turn against each other.
00:29:09.055 --> 00:29:30.031
But I think that we really need to do more research to try and, first of all, just characterize the problem, because if that study that found that 12 people were responsible for the majority of the content going around hadn't been done, it makes it seem like this problem is really huge and amorphous and there's not much we can do to get our hands on it.
00:29:30.172 --> 00:29:35.123
But actually, if you target those 12 people, maybe there is something that you can do about it.
00:29:35.123 --> 00:29:45.770
Yeah, I think, unfortunately, we're going in the wrong direction right now, like with Meta recently announcing that they're no longer going to do fact checking on their platforms.
00:29:45.770 --> 00:29:52.723
But I do think that we have some models as far as this goes for fighting back against it.
00:29:52.723 --> 00:30:16.780
I can get into that a little bit more if you guys are interested, but it's a little bit, I would say, beyond the purview of just specifically infodemiology, which to me is more about characterizing what the problem is and how big it is and what types of mis-disinformation there are, and I think from there we can begin to start developing interventions and then testing those interventions to see what works and what doesn't.
00:30:17.309 --> 00:30:19.695
Yeah, no, I think that's really powerful.
00:30:19.695 --> 00:30:20.778
This is really cool.
00:30:20.778 --> 00:30:29.596
I knew nothing about epidemiology, so this has been very cool to learn all of it, and so what in particular like is your project within epidemiology.
00:30:29.596 --> 00:30:30.439
What are you working on?
00:30:30.829 --> 00:30:48.813
What I'm interested in doing and what I would really like to focus on in the next few years is I think I mentioned earlier that one of my big concerns is with everything that's going on with the transition to the new administration and just the general trend over the past five to 10 years.
00:30:48.813 --> 00:31:17.444
I think that public health agencies as a whole need to prepare for the fact that there may be less and less government support for what they're doing, and that a lot of these traditional surveillance activities are very manpower and time intensive and they take a lot of resources, staff and even if you're in this is like a problem in New York State for our local and city health departments is that they get tons of data.
00:31:17.444 --> 00:31:26.112
There is no shortage of data out there on every type of health problem that there is, but there aren't people to process and analyze that data.
00:31:26.112 --> 00:31:30.240
There's a lack of people with the time and a lack of skill as well.
00:31:30.240 --> 00:32:03.398
There's a lack of people with the time and a lack of skill as well, especially at the local level, but they're being asked increasingly to do something with it, and so I really want to develop tools that are automated and easy to use, that would incorporate infodemiology into other forms of surveillance that are already validated and established, like, specifically, wastewater monitoring is one of them, because wastewater monitoring also has its own set of challenges and problems and quality issues and things like that.
00:32:03.789 --> 00:32:08.342
But I'm interested in finding these sort of passive sources of data.
00:32:08.342 --> 00:32:32.923
Now, I know wastewater monitoring isn't passive per se, but infodemiology can be a very passive activity if you have the automation and the algorithms and the tools set up, and so I really want to develop data dashboards that these smaller health departments and health agencies and it doesn't have to only be public health, it could be nonprofits, even private entities could use these tools.
00:32:32.923 --> 00:32:43.756
But I want to develop things that basically take these exciting new technologies that we have and make them accessible and put them in the hands of people who can use them for good.
00:32:43.756 --> 00:32:52.636
So that's our goal as an organization and that's hopefully what I will be working on for the next couple of years as I finish up my PhD.
00:32:53.369 --> 00:32:54.071
That is very cool.
00:32:54.071 --> 00:32:56.579
We'll have to say we interviewed you before you got famous.
00:32:56.579 --> 00:32:58.214
That is really cool.
00:33:00.053 --> 00:33:03.561
Oh gosh, I don't want to be famous, especially not on the internet, please.
00:33:05.309 --> 00:33:07.857
That's why we don't have video recording yet for our podcast.
00:33:07.857 --> 00:33:10.751
We're not ready to be perceived Well.
00:33:10.751 --> 00:33:11.816
Thank you so much.
00:33:11.816 --> 00:33:12.991
This was fantastic.
00:33:12.991 --> 00:33:17.916
I feel like our listeners will get a lot out of this, especially because I think this is definitely as you're saying.
00:33:17.916 --> 00:33:29.414
Things are expensive and we kind of are in this point where we're able to basically automate more with data collection and we have more of an opportunity to do that, I think, with AI, and the technology is always shifting forward.
00:33:29.414 --> 00:33:41.491
So I think that is potentially the way things might end up going for how we surveil for different health conditions, but also how we decide where information is coming from and how trustworthy it is, and helping us figure out what those connections are.
00:33:41.491 --> 00:33:42.855
So thank you so much.